Integrated bioinformatics analysis of key genes involved in progress of colon cancer

Abstract Background Colon cancer is one of most malignant cancers around worldwide. Nearly 20% patients were diagnosed at colon cancer with metastasis. However, the lack of understanding regarding its pathogenesis brings difficulties to study it. Methods In this study, we acquired high‐sequence data from GEO dataset, and performed integrated bioinformatic analysis including differently expressed genes, gene ontology and Kyoto Encyclopedia of Genes and Genomes pathways analysis, protein–protein analysis, survival analysis to analyze the development of colon cancer. Results By comparing the colon cancer tissues with normal colon tissues, 109 genes were dysregulated; among them, 83 genes were downregulated and 26 genes were upregulated. Two clusters were founded based on the STRING database and MCODE plugin of cytoscape software. Then, six genes with prognostic value were filtered out in UALCAN website. Conclusion We found that SPP1, VIP, COL11A1, CA2, ADAM12, INHBA could provide great significant prognostic value for colon cancer.


| INTRODUCTION
Colon cancer, one of the most malignant cancers around worldwide, has caused more than 50,000 deaths per year (Haggar & Boushey, 2009). Due to the characters of colon cancer, such as incidence hidden, progression rapidly, prone to resistant to chemotherapy (He et al., 2017;Marin et al., 2012), it has brought seriously social and medical burden which arose public concern.
Although large-scale studies have been carried on to investigate the early diagnosis biomarkers and the mechanism of colon cancer, it is easy for us to be lost in the dense fog when treating colon cancer. Giving to the contribution of the second-generation gene sequencing (Kamps et al., 2017), it is much helpful for us to uncover the causes and pathogenesis of colon cancer as well as identifying novel biomarkers with great prognostic value.
In this study, we perform integrated analysis including differently expressed genes, gene ontology (GO) analysis, KEGG pathway analysis, survival analysis both to identify a panel of key candidate genes involved in colon cancer, and 2 | MATERIALS AND METHODS

| Ethical compliance
The clinical information and sequence data were acquired according to the requirements of GEO and TCGA databanks. Thus, no ethics committee approval or consent procedure was needed.

| Data source
High-sequence data of GSE62932 (GPL570, Affymetrix Human Genome U133 Plus 2.0 Array) were collected from GEO dataset, which includes 68 colon tissues until 03 June 2018. As GEO is a publicly available dataset, no ethics approval is required. The samples were divided into two groups based on the sample type, a total of 64 colon cancer tissues and 4 normal colon tissues were utilized for the following analysis.

| Differently expressed genes in colon cancer
Prior to analyzing the DEGs (differently expressed genes) in colon cancer, the sequence data were normalized using RMA (Robust Multichip Average). Then, we performed DEGs analysis using limma package with the cutoff of p-value <0.05 and |logFC| ≥ 2 (Robinson, McCarthy, & Smyth, 2010). The heatmap was shown by pheatmap R package based on the expression value of DEGs. To better understand how the DEGs involved in the biological process and the signal transduction process, the clusterprofiler R package was carried on the GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis (Yu, Wang, Han, & He, 2012), a p-value <0.05 was considered significant.

| Protein-protein network analysis
As genes were interacting with each other, to deep excavate the central genes, STRING database was applied to construct the interaction network of genes (Szklarczyk et al., 2017). Cytoscape software was performed to visualize the relationship between genes (Shannon et al., 2003). For the sake of further research, following the protein-protein network analysis, the MCODE plugin was performed to reanalyze the clusters among the network according to the k-core = 2.

| Survival analysis to screen the candidate genes
Survival analysis was carried on the UALCAN website (Chandrashekar et al., 2017), which is a portal for survival analysis according to the TCGA dataset. The colon cancer samples were divided into two groups according to gene expression: high expression (with Transcripts per million [TPM] values higher median) and low/median expression (with TPM values lower median). Then, we used the Kaplan-Meier method to analyze the candidate genes of significantly prognostic value with a p-value <0.05.

Protein Atlas
For validation, the candidate genes were assessed both from RNA expression level and protein level by TCGA data portal and The Human Protein Atlas database, respectively. The GEPIA website was applied to exhibit the relative RNA expression level between colon cancer and normal colon tissues while The Human Protein Atlas database was performed to map the protein in the tissues (Tang et al., 2017;Uhlén et al., 2015).

| Differently expressed genes involved in colon cancer
In this part, samples were first grouped based on the pathology of colon tissues as colon cancer tissues and normal colon tissues, respectively. Differently expressed genes analysis was performed in succession with the p-value <0.05 and |logFC| ≥ 2. One hundred and nine genes were dysregulated, among them, 83 genes were downregulated and 26 genes were upregulated. To further evaluate the genes' function, we performed GO analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. The upregulated genes are mainly enriched in C-X-C chemokine receptor (CXCR) binding, cytokine activity, chemokine activity, chemokine receptor binding, G-protein coupled receptor binding, IL-17 signaling pathway, rheumatoid arthritis, cytokine-cytokine receptor interaction, chemokine signaling pathway, Toll-like receptor signaling pathway. The downregulated genes were mostly enriched in oxidoreductase activity, acting on the CH-OH group of donors, nicotinamide adenine dinucleotide (NAD) or nicotinamide adenine dinucleotide phosphate (NADP) as acceptor, carbonate dehydratase activity, chloride channel activity, inorganic anion transmembrane transporter activity, oxidoreductase activity, acting on CH-OH group of donors, retinol metabolism, pentose and glucuronate interconversions, bile secretion, drug metabolism -cytochrome P450, metabolism of xenobiotics by cytochrome P450 (Figure 1).

| Protein-protein network analysis
To deep excavate the key genes involved in the development of colon cancer, we take STRING website to estimate the interaction relationship between genes. One thousand three hundred and twenty-six pairs involved with 229 proteins were constructed in Cytoscape software ( Figure 2). We then utilized MCODE plugin to find densely connected regions based on topology and two dense clusters were discovered. Cluster 1 involved 23 genes and 113 connec-

| Survival analysis of key genes
To seek the candidate genes which may influence the survival outcome, we perform the survival analysis on the key genes. A total of six candidate genes were screened and were found to have impact on overall survival days, which are

Protein Atlas
The RNA expression levels of SPP1, VIP, COL11A1, CA2, ADAM12, and INHBA were validated in TCGA dataset. Due to the lack of COL11A1 and INHBA information in The Human Protein Atlas dataset, the protein expression level was not evaluated (Figure 4). The results also supported that SPP1, COL11A1, ADAM12, and INHBA expressions were significantly higher in colon cancer tissues compared to that of the normal tissues in accordance with our previous study.

| DISCUSSION
In this study, we performed several bioinformatics analyses to excavate key genes involved in development of colon cancer. At first, 109 dysregulated genes were found through the comparison between the normal colon tissues and the colon cancer tissues. Protein-protein network analysis was followed to study the interactions between differently expressed genes and also cluster was studied in succession. Then, we performed survival analysis on those key genes to search the prognostic value genes. Interestingly, we found a total of six genes which are SPP1, VIP, COL11A, CA2, ADAM12, INHBA. We go a step further to validate them both in RNA expression level and protein expression level, also, the results are in accordance with our pervious study. SPP1, a secreted phosphoprotein which contains RGD domain, was firstly separated from bone matrix as an extracellular matrix protein by Herring (Heinegård, Hultenby, Oldberg, Reinholt, & Wendel, 1989;Oldberg, Franzén, & Heinegård, 1986). It is vital in bone reconstruction, antiinflammation, arteriosclerosis, and immunomodulation. A variety of cell types including osteoclast, macrophage, epithelial cells, T cells, endothelial cells could secrete SPP1 (Saitoh, Kuratsu, Takeshima, Yamamoto, & Ushio, 1995). Besides, many studies hold the opinion that SPP1 participates in the development and metastasis of malignant tumor. In gastric cancer (Imano et al., 2009), esophageal cancer (Lin et al., 2015), glioma (Ellert-Miklaszewska et al., 2016), breast cancer (Rodrigues, Teixeira, Schmitt, Paulsson, & Lindmark-Mänsson, 2007), lung cancer (Chambers et al., 1996). It was upregulated and might have served as a biomarker. Also, it could promote ovarian cancer proliferation, migration and invasion in vitro by activating Integrin β1/FAK/AKT signaling pathway (Zeng, Zhou, Wu, & Xiong, 2018). Also, the recent study showed that SPP1 could mediate macrophage polarization and lung F I G U R E 3 Survival analysis of key genes. The red plots present the high expression of each individuals while the blue plots present the median/low expression of each individuals. SPP1: NC_000004.12; VIP: NC_000006.12; COL11A1: NC_000001.11; CA2: NC_000008.11; ADAM12: NC_000010.11; INHBA: NC_000007.14 cancer evasion, which could be used as a promising drug target (Zhang, Du, Chen, & Xiang, 2017).
COL11A1 encodes one of the two alpha chains of type XI collagen, a minor fibrillar collagen. It has been widely studied in many cancers. It is overexpressed in both adenocarcinoma and squamous cell lung carcinoma, comparing with the corresponding non-neoplastic lung tissues (Wang et al., 2002), in metastatic oral cavity/pharynx squamous cell carcinoma (Schmalbach et al., 2004). Also, it was involved in lymph node metastasis in breast cancer (Feng et al., 2007) which could be used as a potential biomarker to distinguish malignant from premalignant lesions in stomach and pancreas cancer (Kleinert et al., 2015;Zhao et al., 2009). It is localized in the Golgi apparatus of normal human colon goblet cells (Bowen et al., 2008). COL11A1 may be associated with the APC/betacatenin pathway in FAP and sporadic colon cancer (Fischer et al., 2001). In lung cancer, it has a positive correlation with pathology stage, poor prognosis, and lymph node metastasis (Chong et al., 2006), it promotes ovarian cancer progression and chemoresistance to cisplatin and paclitaxel via activating NF-KB, and it promotes the expression of TWIST1, MCL1 (Wu, Huang, Chang, & Chou, 2017). It has also been reported that COL11A1 was upregulated in gastric cancer and non-small cell lung cancer which could boost the malignant behavior in vitro (Li, Li, Lin, Zhuo, & Si, 2017;Shen et al., 2016). COL11A1 could be utilized as a promising biomarker in predicting malignant relapse of breast intraductal papilloma (Freire et al., 2015).
Carbonic anhydrase (CA) II is a member of carbonic anhydrases, which are a ubiquitous group of zinc-bound metalloenzymes and catalyze the reversible hydration of carbon dioxide. Carbonic anhydrase II (hCAII) has important function in physiology and pathology process. CA II highly expresses in different normal organs, but its expression is inhibited in cancer cells (Li, Xie et al., 2012;Sheng, Dong, Zhou, Li, & Dong, 2013). CA II is also associated with osteopetrosis and renal tubular acidosis (Borthwick et al., 2003). CA II takes part in keeping the adequate balance between carbon dioxide and bicarbonate and controls the pH level in cells. As we know that the carbon dioxide F I G U R E 4 Validation of key genes in TCGA dataset and the human protein Atlas dataset and bicarbonate balance is the basic life activities, and can influence various cell behaviors, the low expression of CA II may play important roles in tumor progress and development.
ADAM12 (ADisintegrin and metalloproteinase domaincontaining protein 12) encodes a member of a family of proteins, which play important role in a variety of biological processes involving cell-cell and cell-matrix interactions (Roy, Wewer, Zurakowski, Pories, & Moses, 2004). ADAM12 have different isoform, that shorter isoforms are secreted, while longer isoforms are membrane-bound form. AMDAM12 takes part in the regulation in physic and pathological progress, including muscle development, neurogenesis, and fertilization. ADAM12 is upregulated in various cancer, including breast, prostate, ovarian, skin, stomach, lung and brain cancers (Li, Duhachek-Muggy et al., 2012;Shao et al., 2014). ADAM12 contributes to tumor progression and metastasis by promoting tumor cell proliferation, migration, invasion, and apoptosis resistance.
INHBA is a member of the TGF-beta (transforming growth factor-beta) superfamily. INHBA gene is overexpressed in different kinds of cancer, such as colorectal cancer, pancreatic cancer, and lung cancer, and promotes cell proliferation, invasion, metastasis and chemoresistance in cancer cells (Okano et al., 2013;Oshima et al., 2014). INHBA also takes part in the development of eye, tooth and testis. INHBA can form different kind of protein complex, which can activate and inhibit follicle stimulating hormone secretion from the pituitary gland, respectively.
In conclusion, in this study, we performed integrated analysis to discover the differently expressed genes involved in the development of colon cancer, also showed a panel of genes with prognostic values to better evaluate the outcome of colon cancer patients. Here, we found that SPP1, VIP, COL11A1, CA2, ADAM12, and INHBA exhibited some significant prognostic values. More in-depth studies are needed to determine the biological functions and mechanisms through which these genes impact cancer malignant cell behavior. Also, the expression pattern of these genes may be a promising target for therapy in colon cancer.

CONFLICT OF INTEREST
None declared.