Screening of the shared pathogenic genes of ulcerative colitis and colorectal cancer by integrated bioinformatics analysis

Abstract Ulcerative colitis (UC) is one of the high‐risk pathogenic factors for colorectal cancer (CRC). However, the shared gene and signalling mechanisms between UC and CRC remain unclear. The goal of this study was to delve more into the probable causal relationship between UC and CRC. CRC and UC datasets were downloaded from the Gene Expression Omnibus database. Using R software and Perl, differentially expressed genes (DEGs) in both UC and CRC tissues were re‐annotated and screened. The biological activities and signalling pathways involved in DEGs were investigated using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. The STRING database and Cytoscape software were used to construct the gene interaction network. A total of 384 DEGs were selected for further investigation, and functional analysis revealed that inflammatory and immunological responses were crucial in the development of the two diseases. Moreover, the top 15 key genes involved in the UC and CRC were screened using cytoHubba, including IL1B, CXCL10, CCL20, MMP9, ICAM1, CCL4, CXCR1, MMP3, TLR2, PTGS2, IL1RN, IL6, COL1A2, TIMP1 and CXCL1. The identification of these genes in the present study may provide a novel perspective for the prediction, prevention and personalized medicine of UC and CRC patients.

2][23] UC stimulates the growth of CRC via the inflammation-cancer pathway, which is distinct from the typical adenoma-adenocarcinoma pathway.
However, the precise molecular mechanism through which UC promotes the development of CRC remains unknown.Thus, a comprehensive analysis of the molecular mechanism by which UC promotes the incidence of CRC is critical for the early detection and precise treatment of CRC.
The present study examined UC and CRC data from a highthroughput sequencing (HTS) chip.A total of 384 genes (197 upregulated and 187 down-regulated genes in both UC and CRC) involved in biological processes and signalling pathways were subjected to a gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses.Various protein interaction networks were built to investigate the connections between these genes.Furthermore, key genes that may play crucial roles in UC and CRC were identified, and their functions and relationships were explored.This study provides new insights into the development of UC and CRC, as well as novel biomarkers for the early detection and treatment of CRC.

| Identification of differentially expressed genes (DEGs)
Perl was used to re-annotate data from GSE87211 and GSE87466 datasets.The reannotated data was subjected to differential expression analysis using the R packages limma, pheatmap, and ggplot2, and the results were presented as heat maps and volcano maps.

| Identification of DEGs in UC and CRC
The R package VennDiagram was used to further investigate the DEGs in each GEO dataset.A total of 197 genes were up-regulated and 187 genes were down-regulated in both GSE87211 and GSE87466.

| GO and KEGG enrichment analysis
R packages colorspace, stringi, ggplot2, circlize, RcolorBrewer and ggpubr were used for GO and KEGG enrichment analysis of the target genes to investigate the biological activities and signalling pathways associated with target genes.

| Construction of protein-protein interaction (PPI) network
Using the STRING database (https://cn.string-db.org/), a PPI network containing 384 genes was predicted and created.The PPI network was subsequently tested and the 3D bubble model was generated based on the screening criteria (minimum necessary interaction score = 0.700 and conceal unconnected nodes in the network).

| Construction of co-expression network
The co-expression relationships of the 384 genes were analysed using Perl, and their co-expression networks were visualized using Cytoscape software.Furthermore, the plugin MCODE in Cytoscape software was utilized to further examine the protein interaction sub-network, and the screening settings were degree cut-off = 2 and node score cutoff = 0.2.

| Identification of key genes
The plugin cytoHubba in Cytoscape was used to score nodes of DEGs, and key genes with the highest node score were selected.

| GeneMANIA analysis
The GeneMANIA database (http://genem ania.org/) was utilized to further investigate the biological roles of each key gene and to forecast the genes that interact most closely with the key genes.LightCycler (Roche, USA) were used to perform real-time fluorescence quantitative PCR.GAPDH was employed as a reference control.The supplement file contains primer sequences (Table S1).

| Statistical analysis
All statistical analyses were conducted utilizing R software.P-values ≤0.05 were considered statistically significant.

| Identification of DEGs in UC and CRC
The RNA-seq data were downloaded from GSE87211 for re- up-regulated and 1499 down-regulated genes were identified in CRC tissues (Figure 1A).In addition, the sequencing data were downloaded from GSE87466 for re-annotation in the GEO database, including 21 normal tissues and 87 UC tissues, and 19,918 genes were identified.Then, differential expression analysis was performed based on |log2 FC| ≥ 1 and P ≤ 0.05 filter parameters, 575 up-regulated and 335 down-regulated genes were identified in UC tissues (Figure 1B).
The up-regulated genes in GSE87211 and GSE87466 datasets were intersected, and 197 genes were obtained (Figure 1C).The downregulated genes in GSE87211 and GSE87466 datasets were intersected, and 187 genes were obtained (Figure 1D).Finally, 384 DEGs were identified in both UC and CRC tissues (Supplementary file 1).
F I G U R E 2 Protein-protein interaction network based on the STRING database.(Figure 1F).

| Construction of the co-expression network
The STRING database was utilized to build the PPI network be-  between these proteins.The interacting proteins were screened and the PPI network was established with the minimum required interaction score ≥0.700 (Figure 2).Furthermore, the PPI network was visualized using Cytoscape software (Figure 3).The MCODE plugin in Cytoscape was used to identify the clusters in the PPI network based on the screening settings (degree cutoff = 2, node score cutoff = 0.2) (Figure 4).

| Identification of key genes in UC and CRC
The Cytoscape plugin cytoHubba was applied to score the nodes of interaction network map of the key genes to further investigate their relationship with potential biological activities (Figure 5B).The results revealed that the top 15 key genes were involved in multiple biological functions, such as cytokine receptor binding, leukocyte migration, chemokine receptor binding and regulation of inflammatory response.

| GO and KEGG enrichment analysis of the top 15 key genes
GO and KEGG enrichment analyses were performed to identify the biological roles and signalling pathways related to the top 15 key genes.
GO analysis showed that the top 15 key genes were enriched in leukocyte migration, cytokine-mediated signalling pathway, cytokine receptor binding, signalling receptor activator activity, etc. (Figure 5C).KEGG analysis revealed that the top 15 key genes were primarily enriched in TNF signalling pathway, IL-17 signalling pathway, Toll-like receptor (TLR) signalling pathway, NF-κB signalling pathway, etc. (Figure 5D).

| Validation of key genes
To verify whether the key genes screened in the experimental sets (GSE87211 and GSE87466) were generalized, RNA-seq data for CRC (GSE44076) and UC (GSE11223) were retrieved from the GEO database as validation sets.Differential expression analysis of the validation sets revealed that the top 15 key genes were significantly expressed in both UC and CRC tissues compared with normal intestinal mucosa (Figure 6 and Figure S1).Additionally, using qRT-PCR, we examined the expression levels of 15 core genes in the colorectal tissues of individuals with a history of UC.We found that, as compared to normal tissues, all 15 core genes were substantially strongly expressed in cancer tissues (Figure 7).CRC is the third most common cancer type worldwide.8][29] Therefore, early diagnosis and treatment of CRC are essential to improve the overall survival rate for CRC patients.Several factors are associated with CRC pathogenesis.For instance, UC is a well-known high-risk factor for CRC.It is a chronic non-specific intestinal inflammatory illness characterized by the infiltration of a significant number of immune cells at the lesion site, which results in changes in the immunological microenvironment. 19,30,31It is well established that people with long-standing UC have a higher risk for CRC.Therefore, a better understanding of the transition from UC and CRC may provide insights into the shared molecular mechanisms to identify novel biomarkers for UC and CRC.
These top 15 key genes may play essential roles in UC and CRC.
0][41] It has been shown that members of CXC F I G U R E 6 Validation of key genes.(A-H) Key genes were significantly up-regulated in UC tissues (GSE11223).(I-P) Key genes were significantly up-regulated in CRC tissues (GSE44076).3][44][45][46][47][48] The aberrant expression of IL-1B, IL-1RN, and IL-6 indicates that cellular inflammation is related to UC and CRC.The matrix metalloproteinases (MMP) family includes MMP3 and MMP9, 49 and tissue inhibitor matrix metalloproteinase 1 (TIMP1) is an MMP family natural inhibitor. 50,513][54][55] The present study discovered the aberrant expression of MMP family members in the UC and CRC transition, indicating that MMP family members may be crucial molecules in the inflammation-cancer transformation process.Intercellular adhesion molecule 1 (ICAM1) is a glycoprotein that is usually expressed on endothelial cells and immune system cells. 56,57TLR2 belongs to the TLR family, which is involved in pathogen identification and innate immune activation. 58Prostaglandin endoperoxide synthase isozyme 2 (PTGS2) has been found to induce inflammation. 59The collagen type I alpha 2 chain (COL1A2) gene encodes the pro-alpha2 chain of type I collagen and has been linked to the onset and progression of UC and CRC. 60These essential genes are strongly associated with inflammatory and immunological responses, implying that these responses may play a crucial role in the process of UC and CRC transition.Although some studies have shown that these key genes contribute to the occurrence and progression of UC or CRC, their exact function and underlying molecular mechanism in the UC and CRC transition remain elusive and are worthy of investigation.
However, this study has several limitations.First, our findings are based only on bioinformatics analysis and lack experimental validation.Second, we merely hypothesized the biological roles and signalling pathways that could be implicated in abnormally expressed genes in UC and CRC, but the underlying mechanisms were not F I G U R E 7 qRT-PCR was performed to evaluate the expression of 15 key genes in the tissues of colorectal patients with ulcerative colitis.elucidated.Finally, the differential expression of key genes was only verified in several HTS chips from the GEO database, and therefore more clinical data are needed for further validation.
In summary, the current study investigated key molecules and examined biological activities and signalling pathways associated with the UC and CRC.This study did not focus on the single-disease analysis of CRC but explored essential genes involved in the doubledisease transformation process.Firstly, bioinformatics analysis was employed to screen the pathogenic genes involved in UC and CRC.
The findings of this study provide new scientific insights into the investigation of potential molecular pathways between UC and CRC, which is useful for the practice of PPPM in UC and CRC.Second, identifying key genes in the course of UC and CRC is useful for early prognosis and prevention of CRC.Our findings may help determine normal reference values and diagnostic thresholds of key genes in CRC patients in clinical practice to aid in the prevention of UC and CRC.Dynamic monitoring of core gene expression levels in UC patients may aid in the detection of CRC at an early stage for timely and accurate treatment.Furthermore, DEGs were strongly associated with inflammatory and immunological responses, which might aid in the prevention and treatment of UC and CRC.Finally, targeted drug therapy based on core genes is a highly promising approach.
Our findings revealed novel biomarkers for the early prevention and treatment of UC and CRC.

2. 9 || 3 of 12 SHI
Tissue collection, extraction and quantitative real-time fluorescence PCR Cancer tissues and associated normal tissues were taken from colon cancer patients with a history of UC at The Second Affiliated Hospital of Nanjing Medical University, and each patient completed an informed permission form.The collection of human specimens was approved by the Ethics Committee of The Second Affiliated Hospital of Nanjing Medical University (approval no.[2019]-KY-121).The total RNA was isolated from tissues using Trizol (Life Technologies, Thermo Fisher, USA).HiScript (Vazyme, R323-01, China) is used for reverse transcription.SYBR GREEN (Vazyme, Q321-02, China) and et al.
annotation in the GEO database, including 160 normal intestinal mucosa tissues and 203 CRC tissues, and 21,713 genes were identified.Then, differential expression analysis performed based on the |log2 fold change (FC)| ≥ 1 and p ≤ 0.05 filter parameters, and 1398 F I G U R E 1 Identification of differentially expressed genes (DEGs) in GSE87211 and GSE87466 datasets.(A, B) DEGs in colorectal cancer (CRC) tissues (GSE87211) and ulcerative colitis (UC) tissues (GSE87466).(C, D) Significantly up-regulated or down-regulated DEGs in both GSE87211 and GSE87466 datasets.(E) GO functional analysis of DEGs.(F) KEGG pathway enrichment analysis of DEGs.
GO and KEGG enrichment analyses were performed to identify the biological roles and signalling pathways associated with the 384 DEGs.GO analysis revealed that most DEGs were enriched in leukocyte migration, granulocyte migration, receptor-ligand activity, myeloid leukocyte migration, etc. (Figure1E).KEGG enrichment analysis showed that most DEGs were primarily enriched in the interleukin 17 (IL-17) signalling pathway, cytokine-cytokine receptor interaction, phosphoinositide 3-kinase (PI3K)-serine/threonine kinase (Akt) signalling pathway, tumour necrosis factor (TNF) signalling pathway, nuclear factor-kappa B (NF-κB) signalling pathway, among others.
tween the 384 DEGs to further investigate their connection.The STRING database was used to calculate the interaction score F I G U R E 3 Gene co-expression network analysis.

SHI
et al.

F I G U R E 5
Identification of key genes in UC and CRC.(A) Co-expression of key genes based on the Cytoscape plugin cytoHubba.(B) Coexpression of key genes based on the GeneMANIA database.(C) GO functional analysis.(D) KEGG pathway enrichment analysis.