Prognostic value of tumour microenvironment‐related genes by TCGA database in rectal cancer

Abstract Rectal cancer is a common malignant tumour and the progression is highly affected by the tumour microenvironment (TME). This study intended to assess the relationship between TME and prognosis, and explore prognostic genes of rectal cancer. The gene expression profile of rectal cancer was obtained from TCGA and immune/stromal scores were calculated by Estimation of Stromal and Immune cells in Malignant Tumors using Expression data (ESTIMATE) algorithm. The correlation between immune/stromal scores and survival time as well as clinical characteristics were evaluated. Differentially expressed genes (DEGs) were identified according to the stromal/immune scores, and the functional enrichment analyses were conducted to explore functions and pathways of DEGs. The survival analyses were conducted to clarify the DEGs with prognostic value, and the protein‐protein interaction (PPI) network was performed to explore the interrelation of prognostic DEGs. Finally, we validated prognostic DEGs using data from the Gene Expression Omnibus (GEO) database by PrognoScan, and we verified these genes at the protein levels using the Human Protein Atlas (HPA) databases. We downloaded gene expression profiles of 83 rectal cancer patients from The Cancer Genome Atlas (TCGA) database. The Kaplan‐Meier plot demonstrated that low‐immune score was associated with worse clinical outcome (P = .034), metastasis (M1 vs. M0, P = .031) and lymphatic invasion (+ vs. ‐, P < .001). A total of 540 genes were screened as DEGs with 539 up‐regulated genes and 1 down‐regulated gene. In addition, 60 DEGs were identified associated with overall survival. Functional enrichment analyses and PPI networks showed that the DEGs are mainly participated in immune process, and cytokine‐cytokine receptor interaction. Finally, 19 prognostic genes were verified by GSE17536 and GSE17537 from GEO, and five genes (ADAM23, ARHGAP20, ICOS, IRF4, MMRN1) were significantly different in tumour tissues compared with normal tissues at the protein level. In summary, our study demonstrated the associations between TME and prognosis as well as clinical characteristics of rectal cancer. Moreover, we explored and verified microenvironment‐related genes, which may be the potential key prognostic genes of rectal cancer. Further clinical samples and functional studies are needed to validate this finding.

overall survival. Functional enrichment analyses and PPI networks showed that the DEGs are mainly participated in immune process, and cytokine-cytokine receptor interaction. Finally, 19 prognostic genes were verified by GSE17536 and GSE17537 from GEO, and five genes (ADAM23, ARHGAP20, ICOS, IRF4, MMRN1) were significantly different in tumour tissues compared with normal tissues at the protein level.
In summary, our study demonstrated the associations between TME and prognosis as well as clinical characteristics of rectal cancer. Moreover, we explored and verified microenvironment-related genes, which may be the potential key prognostic genes of rectal cancer. Further clinical samples and functional studies are needed to validate this finding.

| INTRODUC TI ON
Rectal cancer is a common malignant tumour of gastrointestinal tract which occurs in the lower part of the colon. Rectal cancer and colon cancer are often grouped as 'colorectal cancer'(CRC), which is the third leading cause of cancer-related deaths in the world. There were 1.4 million new CRC cases every year, and this figure will rise in the future.
It is expected to increase to 2.2 million CRC cases and 1.1 million deaths in ten years, 1 of which 35% were rectal cancer. 2 In China, the rate of rectal cancer was 11.45 and 8.28 per 100,000 in men and females, respectively. 3 Over the past 30 years, effective screening measures and multimodal therapies had depressed the incidence and mortality rate and improved long-term survival rates. However, there were 80% of CRC patients show recurrence in the first 3 years. Thus, identifying reliable prognostic biomarkers to select rectal cancer patients at high risk for recurrence is important for improving the survival rate. TME consists of tumour cells, stromal cells, immune cells and extracellular matrix, which influences cancer growth and development significantly. Tumour cells in the TME can invade tissues directly or through blood and lymphatic vessels, and the infiltrated cells can induce the immune response by releasing cytokines, cytokine receptors and other factors, which influenced the progression of tumour. 4 In recent years, new studies revealed that TME significantly affect the progression of tumours, and have shown a potential predictive value for cancer prognosis, 5-11 including CRC. 5,12 With the rapid development of precision medicine, researchers are increasingly exploring new diagnosis and the treatment targets using statistical algorithms. TCGA provided genomic profiles and clinical information, making it possible to investigate the correlation between genomic features and clinical as well as prognostic characteristics. 13 ESTIMATE is an algorithm which was raised to evaluate the role of stromal and immune cells in cancer biology. ESTIMATE algorithm is a tool assessing stromal score, immune score and estimate score (that infers tumour purity) in tumour tissues by using gene expression data.
TCGA database and the ESTIMATE algorithm has been widely applied to investigate cancer prognosis prediction. Recent studies showed ESTIMATE has good precision in hepatocellular carcinoma, renal cell carcinoma and glioblastoma, 14-18 but it has not been applied for rectal cancer. Therefore, we intend to identify TME-related genes that significantly affect rectal cancer prognosis by ESTIMATE and TCGA database.

| Data source
We downloaded gene expression profile and survival information as well as clinical features of rectal cancer patients from TCGA database (https://tcga-data.nci.nih.gov/tcga). The clinical features include age, gender, race, TNM status, survival status, values of carcinoembryonic antigen (CEA), venous invasion, lymphatic invasion and perineural invasion condition. We downloaded all the data from TCGA, and performed data acquirement and application following TCGA guidelines.

| Survival analysis and DEGs identification
To explore the correlation between stromal or immune scores and prognosis of rectal cancer patients, we performed survival analysis

| PPI network construction
To understand the interactions between prognostic DEGs, we constructed the PPI network by an opensource software platform Search Tool for the Retrieval of Interacting Genes (STRING, https://strin g-db.org/). Then the modular analysis was performed by CytoHubba plug-in in the Cytoscape software (version 3.7.1), and the most significant modules were identified based on the score and node number.

| Verification of DEGs using GEO database and clinical tissue samples
To verify the prognostic DEGs from TCGA, we downloaded the gene expression and prognostic data from GSE17536 and GSE17537 data sets using PrognoScan online tool. 19 To further confirm the reliability of the prognostic DEGs, we detected the antibody-based protein expression data in normal tissues and tumour tissues from The Human Protein Atlas database (HPA, www. prote inatl as.org.).

K E Y W O R D S
ESTIMATE algorithm, overall survival, prognosis, rectal cancer, tumour microenvironment

| Expression of hub genes in cell types
The Single Cell Type Atlas part in HPA showed the expression of protein-coding genes in single human cell types, and the number of genes detected in cell types. The mRNA and protein levels of hub genes expression in cell types were evaluated using this tool.

| Statistical analysis
The Kaplan-Meier survival analyses was used to illuminate correlations between expression of DEGs and the OS of rectal cancer, and identify prognostic DEGs in overall survival. Univariate analyses were performed between clinical characteristics and stromal/ immune scores. All statistical tests were done with R (version 3.6.2). P < .05 was regarded as statistically significant.

| Stromal and Immune Scores of the Patients
The gene expression profiles and clinical information of 83 rectal cancer patients were downloaded from TCGA database. Based on the ESTIMATE algorithm, stromal score ranges from −1979.57 to 1,522.96, and immune score ranges from −656.67 to 2,102.23 (Table 1).
To explore the association between OS and immune or stromal scores, we classified the 83 rectal cancer patients into high-and lowscore groups based on the median of stromal scores (−636.30) and immune scores (268.92). Kaplan-Meier analysis was performed and the survival curves showed that patients in the high-immune score group had a better prognosis than those in the low-immune score group (P = .034, Figure 1A). However, there were no statistical differences between high-stromal score group and low-stromal score group (P = .316, Figure 1B).
In addition, we analysed the relationship between immune or stromal scores and clinical characteristics. We found that lowimmune score was associated with M1 (vs. M0, P = .031, Figure 2A), and lymphatic invasion (+ vs. -, P < .001, Figure 2B), indicating that lower immune scores indicated the advanced rectal cancer stage.
However, there were no evidence to support significant correlation between stromal/ immune scores and T status, N status, CEA value, venous or perineural invasion. ( Figure S1, P >.05).

| Identification of DEGs
To determine the DEGs associated with TME of rectal cancer, we

TA B L E 1 (Continued)
F I G U R E 1 Overall survival curves obtained by the Kaplan-Meier method describing the correlation between immune scores or stromal scores and prognosis of patients. A, Immune score was significantly associated with overall survival (P = .034). B, There was no significantly correlation between stromal score and overall survival (P = .316). Horizontal and vertical axes represent survival times and survival rates, respectively. Red and blue curves represent high and low score group, respectively

| GO function and KEGG pathway enrichment analyses
The ( Figure 5A). The KEGG analysis indicated that these DEGs were enriched in cytokine-cytokine receptor interaction and chemokine signalling pathways. (Figure 5B).

| Survival analysis of the DEGs
To explore the prognostic value of 540 DEGs, we performed the Kaplan-Meier survival analysis. Among the 540 DEGs, a total of 60 DEGs (P < .05, Table S1) were significantly associated with OS ( Figure 6), and all the genes were up-regulated DEGs.

| Module analysis from the PPI network
To further explore the interaction of the prognostic DEGs and the mechanisms underlying the rectal cancer development, we utilized the STRING online database and Cytoscape software to analyse these DEGs and construct a PPI network, which contains 40 nodes and 166

| Validation of prognostic DEGs using the GEO database
To verify the DEGs from TCGA database, we downloaded the gene expression and prognostic information from the GSE17536 and prognostic genes were verified, which may have potential value for diagnosis and treatment of rectum cancer (Figure 8).

| Verification of Prognostic DEGs using clinical tissue samples
To verify the reliability of the DEGs with prognostic values, we detected the protein expression of 19 genes in normal tissues and tumour tissues by HPA. The results showed that 5 proteins (ADAM23, ARHGAP20, ICOS, IRF4, MMRN1) were significantly different in tumour tissues compared with normal tissues ( Figure 9).

| Expression of hub genes in cell types
The mRNA and protein levels of the 5 hub genes expression in cell types were evaluated, and the results showed that expression was predominantly found in blood and immune cells, mesenchymal cells, endocrine and germ cells cell types ( Figure 10). Flow diagram of this study was listed in Figure S2.

| D ISCUSS I ON
In this study, we explored the level of stromal/immune cells and the relation between stromal/immune cells and overall survival of rectal cancer by the ESTIMATE algorithm and TCGA database.

F I G U R E 1 0 The expression of hub genes in cell types
Firstly, we found patients with high-immune score had a better OS, which may due to the positive correlation between the higher immune score and less tumour cell. In addition, the lower immune score was significantly related to the M1 stage and lymphatic invasion, indicating that lower immune scores indicated the advanced cancer stage and worse prognosis. These results were similar to previous studies, which have demonstrated that lower immune scores were significantly associated with poor overall survival in adrenocortical carcinoma, 27 osteosarcoma 32 and gastric cancer. 33 However, some studies found that a higher immune score indicated a worse OS in clear cell renal cell cancer, 16,17,25 and acute myeloid leukaemia. 34 The underlying mechanism was not clear and required more explorations.
Then A disintegrin and metalloprotease 23 (ADAM23), a member of the ADAM family, is considered a possible tumour suppressor gene, 36  There are some limitations in this study. Firstly, the selection bias could not be excluded because all data were gathered from TCGA and GEO databases. Secondly, there was no experimental research to examine the functions of DEGs. Thus, further validation is needed to testify the discovery of this research.
In summary, we found that stromal and immune scores were highly associated with the clinical outcome of rectal cancer by ESTIMATE algorithm and TCGA database. In addition, we identified 5 microenvironment-related genes which could be useful for outlining the prognosis of rectal cancer. The results could contribute to the search for rectal cancer biomarkers and therapeutic targets.

CO N FLI C T S O F I NTE R E S T
There are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.