Identification of hub genes associated with neutrophils infiltration in colorectal cancer

Abstract Colorectal cancer (CRC) is the leading cause of cancer‐related mortality in the world. Accumulating evidence indicate that tumour infiltrating immune cells participated in cancer progression. Among them, tumour infiltrating neutrophils (TINs) are reported to play crucial role in various cancers. In this study, we used CIBERSORTx, a digital cytometry tool to evaluate the neutrophils infiltration in CRC based on gene expression data of CRC tissues from GSE39582 data set and The Cancer Genome Atlas data set (TCGA‐COAD and TCGA‐READ). Weighted gene co‐expression network analysis (WGCNA) was conducted in GSE39582 data set to identify hub genes associated with neutrophil infiltration. The association of hub gene and neutrophils was then validated in TCGA cohorts and an independent RJ cohort. Functional analysis was performed to investigate the molecular mechanisms of the interested hub gene. We found that neutrophil infiltration is elevated in CRC tissues, and it is related to a poorer prognosis. A total of 18 gene modules are identified by WGCNA in GSE39582 data set, among which lightcyan module is significantly correlated with neutrophils infiltration. Furthermore, Superoxide Dismutase 2 (SOD2) in lightcyan module was proved to correlated with neutrophils infiltration in various cancer types. In addition, SOD2 expression is highly associated with several chemokines, including CXCL8, a neutrophils‐related attractant, and functional analysis revealed that SOD2 is involved in neutrophils recruitment biological process. These results indicate that an ‘SOD2‐CXCL8‐neutrophil recruitment’ axis plays a potential role in colorectal cancer progression.


| INTRODUC TI ON
Accounting for 1 in 10 people suffering from cancer, colorectal cancer (CRC) ranks as the third in terms of cancer incidence and second of in terms of mortality in the world. In 2018, Over 1.8 million new cases are identified, and approximately 881,000 people died of CRC. 1 Despite the improvements of treatment methods, mainly surgical resection and chemotherapy, patients with advanced CRC are still haunted by the recurrence after surgery, which compromise life quality and expectance. 2 In this context, a better understanding of CRC mechanism is needed to develop novel cures.
In recent years, the success of checkpoint-blockade therapy in clinical trials shed light on the critical role of tumour infiltrating immune cells, especially T cells. 3 As a historical milestone of cancer treatment, checkpoint blockade immunotherapies proved that our immune system can be harnessed to antagonize malignant tumour. However, despite these substantial advancements in clinical care, the majority of patients do not respond to immune checkpoint inhibitor. 4 Increasing evidence suggests that tumorigenesis is often triggered in a tumour microenvironment (TME), which is composed of extracellular matrix, blood or lymphatic vessels, fibroblasts, immune cells and inflammatory cells. Importantly, the role of the crosstalk between tumour cells and the stromal cells in the process of tumour progression, drug resistance and clinical outcomes has been validated by multiple studies. 5 As shown, various types of myeloid cells, such as tumour-associated macrophages (TAMs), tumour-infiltrated neutrophils (TINs) and T cells, play important roles in the above biological processes. Infiltration of cytotoxic CD8 + T cells has been associated with a favourable prognosis of CRC patients, mainly because of the anti-tumour activity of CD8 + T cells. Gastric cancer patients with higher TINs were prone to overall survival benefit from postoperative adjuvant chemotherapy. A number of studies have shown that high levels of circulating neutrophils and neutrophil-tolymphocyte ratio in peripheral blood are associated with poor prognosis in several types of cancers. However, the clinical significance of TINs in tumour stromal tissues is controversial. 6 Activated neutrophils can release multiple toxic components, including MPO, H2O2, cationic proteins defensins and NET to kill the invaders. However, in the context of tumour, neutrophils can be manipulated by tumour cells and switch into pro-tumour status. 7 Emerging evidence indicates that neutrophil plays a contradictory role in TME. Other than the cytotoxicity mentioned above, it can degrade extracellular matrix by secretion of MMPs and elastase, facilitating tumour cells dissemination. It also promotes angiogenesis mediated by releasing VEGF, HGF and MMP9. Moreover, neutrophil mediates immune suppression via the secretion of ROS and Arginase 1 to limit T cell-dependent anti-tumour immunity. 7,8 These pro-tumour effects might be the potential therapeutic target in the treatment of cancer.
Plenty of studies described the phenomenon of neutrophils recruitment to inflammation or invasion spot by robust chemotactic molecules, most importantly CXCL1, CXCL2, CXCL5 and CXCL8. 9 Tumours orchestrate neutrophils recruitment by releasing neutrophil-related chemokines. For example, TME hypoxia induced CXCL8 overexpression through hypoxia inducible factor 1(HIF1) signalling pathway. 10 However, different tumours may allure neutrophils in various ways, and little research work has been done to investigate the molecular mechanisms of interaction between colorectal cancer and neutrophils. In this research, we use bioinformatic tools to identify the genes potentially associated with neutrophil infiltration in colorectal cancer. Moreover, the interested hub gene was validated in CRC cases from TCGA cohort and real-world validation cohort. Tumor Immune Estimation Resource (TIMER) was used for validation of our testing results. 6,14,15 Additionally, formalin-fixed, paraffin-embedded specimen arrays of 79 paired CRC tumour and normal tissues from our centre were used in this study (RJ cohort).

| Patients and samples
Written informed consent was obtained from every enrolled patient.
The study protocol was approved by the Ethical Committee of Ruijin Hospital, Shanghai Jiao Tong University Medical School.

| Evaluation of neutrophil infiltration
In this study, we use CIBERSORTx 16 to evaluate the amount of infiltrated neutrophils in tumour tissue. CIBERSORTx is a digital cytometry tool. The algorithm enables researchers to estimate the faction of many kinds of immune cells inside tumour tissue, just by utilizing the gene expression data from bulk tissue, without running an actual flow cytometry. In this research, we evaluated the neutrophils infiltration in CRC tissue using data from TCGA and GSE39582. The calculation results come with a p-value for each sample. Samples with p-values < 0.01 were selected for further analysis.

| Tumour purity estimation
Tumour purity was evaluated using R package Estimate. 17 ESTIMATE is a R software for evaluation of tumour purity, stromal cells proportion and the infiltration level of immune cells in tumour tissues based on expression data. In this research, we use ESTIMATE to evaluate the tumour purity of samples in GSE39582 data set.

| Co-expression network analysis and visualization
The WGCNA R software package was used to construct the coexpression network of the genes in the GSE39582 data set. We se- The module eigengenes (MEs) were considered to be a representation of the gene expression profile in the module. Correlation and P-value between the module and neutrophils scores were calculated using MEs and neutrophils score. 18,19 The gene module highly correlated with the neutrophils was selected for visualization using Cytoscape. 20

| Differentially expressed genes (DEGs) and Gene Set Enrichment Analysis (GSEA)
The expression data of TCGA colorectal cancer patient were downloaded and processed using R package DESeq2 21 to calculate differentially expressed genes. The genes were then ordered according to log2(Foldchange), and the gene list was subjected to gseGO function from clusterProfiler 22 package for GSEA analysis.

| Immunohistochemical staining and histochemistry scoring
CD66b served as specific biomarker for neutrophils in TME. Tissue Microarray sections were deparaffinized and dehydrated, and then treated with 3% H2O2 at room temperature for 10 min to block endogenous peroxidase activity. Next, the tissue sections were incubated with citrate buffer for the retrieval of the antigen. Then, the tissues were blocked with 3% BSA at room temperature for 30 min, followed by incubating with anti-CD66b(ab197678, rabbit; 1:200, Abcam, Cambridge, UK), anti-SOD2 antibody (ab13533, Rabbit, 1:100, Abcam, Cambridge, UK) and anti-CXCL8 antibody(ab18672, Mouse, Abcam, Cambridge, UK) at room temperature for 50min.
Finally, the tissue slides were counterstained with haematoxylin. In validation phase, slides were then scanned (Pannoramic MIDI, 3D HISTECH), and intensity of three proteins was evaluated using Quant centre software. The expression level of each genes was quantified as 1, 2 and 3, representing weak, moderate and strong expression, respectively, and then, the histochemistry scores(H-Score) of SOD2 and CXCL8 of calculated using the method mentioned previously. 23 Intratumoral neutrophils are identified as CD66b-positive cells with segmented nuclear. When calculating the correlation coefficient of neutrophil infiltration and genes, the amounts of infiltrated neutrophils were evaluated by the mean density of strong-positive pixels of CD66b.

| Statistics and data visualization
All statistical analyses were performed with R software (version 3.6.2). Wilcoxon test was used to compare the neutrophil scores/ counts in different sample groups. Fisher's exact test was used in the categorical clinical traits comparation between the high or low neutrophil group. The Kaplan-Meier method and log-rank were used to present and evaluate the differences of survival between the high or low-neutrophil groups. The cutpoint to determine TINs high/low groups was calculated using 'surv_cutpoint' function from R package, survminer. 24 Data visualization was performed based on ggplot2 25 and ggpubr 26 packages. F I G U R E 1 Main design of the study. GSE39682 data set was used to identify hub genes associated with TINs. TCGA COREAD cohort were used for validation, DEGs calculation and GSEA analysis. GTEx, CCLE and TIMER database were used for further validation. RJ cohort were utilized for immunohistochemistry

| Study design
The design of the study was shown in Figure 1. Gene expression data from GSE39582 (n = 566) and TCGA colorectal patients (n = 603) were subjected to CIBERSORTx to get neutrophils infiltration scores. After identifying hub genes associated with using WGCNA in GSE39582 data set, the correlation between hub genes and neutrophils infiltration was then validated using database including TIMER, GTEx and CCLE. Samples of RJ Cohort from our hospital were used for immunohistochemical validation. In RJ cohort, a total of 79 patients were enrolled, about 59% of the patients were aged over 60 years, 56% of the patients were male (Table S1).

| Higher TIN levels indicating poorer prognosis of CRC patients
As shown, neutrophils score calculated by CIBERSORTx, representing amount of neutrophil infiltration, is significantly higher in tumour tissue compared with normal tissue, indicating an increasing neutrophil recruitment in CRC tumour tissue ( Figure 2A). Moreover, higher neutrophil score indicates poorer prognosis in both GSE39582 data set ( Figure 2B-2C) and TCGA data set( Figure 2D-2F). These findings highlight the clinical relevance of TINs and the necessity to investigate detailed mechanism of neutrophil-tumour interaction in colorectal cancer.

| Neutrophil associated Hub gene identification
Gene expression data from GSE39582 data set were subjected to CIBERSORTx and ESTIMATE algorithm to evaluate neutrophil infiltration and tumour purity. We discovered that tumour purity varies dramatically across samples, arranging from 0.238 to 0.972. As tumour purity might have a great influence on the neutrophil score cal-

| Validation of SOD2 associated TME neutrophil infiltration in TCGA and RJ cohort
In order to validate the association of SOD2 and neutrophil infiltration, we firstly tested our findings in TCGA cohort using TIMER. As shown, the expression of SOD2 in CRC tumour tissue is significantly correlated with neutrophils infiltration in CRC tissue ( Figure 4A).
We further explored the association in 32 different kinds of solid tumours. Intriguingly, SOD2 is related to TINs in most of the cancer types( Figure 4B). This finding indicated a universal role SOD2 in neutrophils infiltration in TME. Intriguingly, SOD1 and SOD3 are not substantially correlated with TINs ( Figure S2).
For further verifying, we conducted immunohistochemistry analysis in RJ cohort, which contains 79 cases of CRC patients. As shown ( Figure S3), cells positive for CD66b antibody and with segmented nuclear are identified as neutrophils. As expected, neutrophil count in tumour tissue are significantly higher than that in normal colon tissue( Figure 5A). Expression levels of SOD2 are positively correlated with CXCL8 ( Figure 5B, Pearson correlation, R = 0.51, P <.001) and neutrophil infiltration ( Figure 5C, Pearson correlation, R = 0.44, P <.001), as shown in representative images ( Figure 5D-5I).

| Functional investigation of downstream signalling associated with SOD2 expression
TCGA colorectal cohort was used to further investigate the SOD2 related biological activity. Based on the normalized expression level F I G U R E 5 Validation of SOD2-CXCL8-Neutrophil infiltration association in RJ cohort. A, Neutrophils counts in tumour tissue and normal tissue in RJ cohort. Y-axis represents log10(neutrophil count + 0.01). B, Pearson correlation of SOD2-CXCL8(IL8) association in RJ cohort (expression levels are quantified by H-score). C, Pearson correlation of SOD2-CD66b association in RJ cohort (expression levels are quantified by H-score). D-F, Representative images of low SOD2 expression, low CXCL8 expression and low neutrophil infiltration in the same tissue region, respectively. G-I, Representative images of high SOD2 expression, high CXCL8 expression and high neutrophil infiltration in the same tissue region, respectively samples in SOD2 high and low groups, respectively. Differentially expressed genes were calculated using DESeq2 package. As shown in the volcano plot ( Figure 6A), we found that CXCL6 and CXCL8 expression levels are significantly higher in SOD2 high group comparing to low group. Gene Set Enrichment Analysis (GSEA) of GO biological process indicates that immunological differences concerning neutrophils recruitment are significantly higher in SOD2 high group ( Figure 6B). These findings indicate that a SOD2 involved neutrophil recruitment regulating axis might exist.
As CXCL1, CXCL2, CXCL5, and CXCL8 are frequently reported chemokines to recruit neutrophils, we set our course to explores the expressional correlation between SOD2 and these chemokines.
The correlation of SOD2 and the four chemokines were evaluated in CCLE, GTEx and TCGA gene expression data. As shown, the SOD2 are significantly correlated with these chemokines, especially CXCL8, in most of the cancer types and normal tissues ( Figure 6C- Figure S4), indicating a universal association between SOD2 and CXCL8.

| D ISCUSS I ON
In fact, transforming growth factor beta (TGFβ) promote neutrophil differentiation into N2 type, whereas interferon type I (IFN-I) induces N1 type. In a word, the signalling context in the tumour microenvironment can affect the phenotype of neutrophils profoundly.
The increase of neutrophil count in peripheral circulation are found to related to poorer prognosis in CRC patients. 28 Several experiments demonstrated that CRC cells recruit neutrophils by secreting chemokines including CCL-9, CCL15, CXCL8. 32,33 Neutrophils also promote the hepatic metastasis of CRC, whereas depletion of neutrophils in murine decreased the vascular density and branching in metastatic site and inhibited the tumour growth. 34 Intriguingly, in CRC patients receiving anti-VEGF (bevacizumab) therapy, the present of CD177 + neutrophils in tumour is related to reduced survival, and depletion of neutrophils restores the sensitivity of anti-VEGF treatment. 35 Tohme  38  Altogether, we propose that a 'SOD2-CXCL8-neutrophil recruitment' axis might exist in CRC cells. Our analysis provided insights into the molecular mechanism of crosstalk between neutrophils and tumour, which may facilitate further investigation of TINs in the near future.

| CON CLUS ION
In summary, based on the comprehensive analysis of transcriptome data using Cibersort and WGCNA algorism, we discovered a list of genes highly associated with TINs. Among which, SOD2 significantly correlated with neutrophil infiltration and neutrophil-related chemokines in CRC, indicating a possible existence of 'SOD2-CXCL8-neutrophil recruitment' in CRC. The findings of our study may facilitate the further research in crosstalk between neutrophil and colorectal cancer cells.

ACK N OWLED G EM ENTS
This work was supported by the Natural Science Foundation of Shanghai (18ZR1424100).

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.

E TH I C A L S TATEM ENT
Written consent was obtained from all enrolled patients. The study was conducted in accordance with the Declaration of Helsinki, and the Ethic Committee of RuiJin Hospital, Shanghai Jiao Tong University approved the protocol of the study (GCQN-2019-A07).

DATA AVA I L A B I L I T Y S TAT E M E N T
The RNA sequencing data of CRC patients were downloaded from Xena (http://xena.ucsc.edu/). GSE39582 and GSE36133 were obtained from GEO database (https://www.ncbi.nlm.nih.gov/gds/).
Other data analysed during this study are included in this published article and the supplementary material.