Characterization of the gene signature correlated with favorable response to chemoradiotherapy in rectal cancer: A hypothesis‐generating study

Abstract Purpose This study aimed to define the gene signature associated with response to neoadjuvant chemoradiotherapy (nCRT), or chemoradiosensitivity (CRS) signature, in rectal cancer, and investigate the correlation between the CRS signature and characteristics of tumor. Materials and Methods Three public microarray datasets of pre‐nCRT rectal cancer were used to discover and validate the CRS signature, and the pathway analysis of the CRS signature was performed. Patients in The Cancer Genome Atlas (TCGA) dataset were stratified according to the CRS signature enrichment score, and mutational profile and proportions of infiltrated immune cells were compared. Results In the discovery dataset (GSE53781), 95 genes were upregulated in complete responders compared to non‐complete responders and defined as the CRS signature. Pathways regarding DNA replication and repair processes as well as inflammatory response were enriched in the CRS signature. In the validation datasets (GSE35452 and GSE45404), patients with favorable response to nCRT exhibited higher enrichment score of the CRS. In TCGA‐READ cohort, patients with high CRS signature harbored KRAS mutation in lower frequency than those with low CRS signature. In addition, proportions of proinflammatory immune cells were higher, but proportion of immunosuppressive M2 macrophages was lower in patients with high CRS signature than those with low CRS signature. Conclusions The current integrative bioinformatic analysis suggests the CRS signature and showed that the CRS signature is associated with dissimilar mutational profile and increased immune response. The discovered CRS signature and related characteristics may serve as candidate of stratification factor in upcoming studies for rectal cancer.


| INTRODUCTION
Neoadjuvant chemoradiotherapy (nCRT) followed by surgical resection is the mainstay of treatment for locally advanced rectal cancer (LARC). 1 Pathologic response of tumor to nCRT is closely associated with the prognosis of patients; 2 poor responders to nCRT demonstrate significantly higher rate of locoregional relapse and distant metastasis compared to good responders. On contrary, most patients with complete response to nCRT are free from recurrence, and studies showed that omission of surgical resection may be safe in terms of oncologic outcomes for the patients with clinical complete response. 3,4 In this context, biomarkers of response to nCRT in rectal cancer are under active investigation.
Several studies reported gene signatures of tumor that are associated with response to nCRT. [5][6][7][8][9] However, the results are not consistent, leaving the characteristics of nCRT-sensitive or -resistant rectal cancer obscure. Furthermore, most of the previous studies have suggested gene signatures including only a handful of genes to elevate the applicability of the signatures, so that it was difficult to draw biological implication from the gene signatures. Besides gene expression, mutation profiles of tumor are also associated with radiosensitivity. 10,11 In addition, accumulating evidence implies that tumor microenvironment, including infiltrating immune cells, is another determinant of radiation response. [12][13][14][15] Although the association between the response to nCRT and various characteristics of tumor or its microenvironment has been extensively explored, none of the features are being used as biomarkers in clinical practice. We hypothesized that studying the relationship between various biological features of tumor in association with response to nCRT would reveal novel insight into radiosensitivity of rectal cancer that would lead to the clinical application.
Here, we analyzed the public datasets of rectal cancer patients who underwent nCRT. We identified the chemoradiosensitivity (CRS) signature and validated the signature in the independent datasets. Additionally, we characterized the tumor mutation profiles and immune cell infiltration according to the enrichment of the CRS signature and identified novel features of nCRT-sensitive rectal cancer.

| Public datasets
Publicly available microarray datasets, of which processed matrices were available, were selected for analysis in this study. Preprocessed matrices of three datasets (GSE53781, 8 GSE35452, 16 and GSE45404 17 ) were downloaded from the Gene Expression Omnibus database (http://www.ncbi. nlm.nih.gov/geo). The preprocessed matrices included the normalized expression values for each probe. Then, the probe sets of CodeLink Human Whole Genome Array and Affymetrix Human Genome U133 Plus 2.0 Array were mapped to gene symbols using R packages hwgcodSYM-BOL and hgu133plus2SYMBOL, respectively. For probe sets representing a single gene, the probes with the highest expression value were selected. Additional correction was not used because the two microarray platforms were reported to be highly correlated. 18 All patients in the datasets underwent concurrent 5-fluorouracil (5-FU)-based chemotherapy with radiotherapy. The characteristics of the three datasets are detailed in Table 1. Additionally, mRNA-sequencing (mRNA-seq) data of rectal adenocarcinoma in The Cancer Genome Atlas (TCGA-READ) database was also analyzed. The clinical information and RSEM values of total transcripts were downloaded from the Broad Institute GDAC Firehose (https://gdac.broad insti tute.org).

| Definition and characterization of gene signature
Based on the normalized expression, the empirical Bayes method was used to perform differential expression analysis in an R package limma. Differentially expressed T A B L E 1 Details of public microarray datasets analyzed in the study genes (DEGs) were determined as log2 fold change >2 and false discovery rate <0.1. The genes upregulated in tumor of complete responders were defined as the gene signature of complete responder, or CRS signature. Enrichment score of the gene signature in each sample was calculated using gene set variation analysis (GSVA), with the combined z-score method and Gaussian kernel. The pathway analysis of the gene signatures was performed using gene list enrichment analysis tool, Enrichr; the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene ontology pathways were analyzed. The combined score was calculated as a parameter of enrichment as the log (p-value) multiplied by the z-score from the Fisher exact test.

| TCGA dataset analysis
Among the patients in TCGA-READ dataset, patients without mRNA-seq data or mutation profiles were excluded from the analysis. Additionally, unlike the microarray datasets, patients who received neoadjuvant therapy were excluded since neoadjuvant therapy may alter the baseline characteristics of tumor. Finally, 85 patients were available for the analysis. The patients were divided into two groups, patients with low and high CRS signature, dichotomized using the median enrichment score. The data regarding tumor mutational burden were downloaded from the previous publication. 19 Composition of tumorinfiltrating immune cells was estimated using a deconvolution algorithm CIBERSORT. 20

| Statistical analysis
The continuous variables and categorical variables between two groups were compared using Student's t-test and chi-squared test, respectively. AUC values of the CRS signature scores were calculated using an R package pROC. The Kaplan-Meier curves of survival were compared using Cox-proportional hazard model. p-value <0.05 was considered statistically significant. All statistical analyses were performed with R software v.4.1.0 (https://www.r-proje ct.org).

| Discovery and biological characterization of the CRS signature
We first discovered the gene signature of good response in the dataset of GSE53781 that stratified patients using 5-tier tumor regression grade (TRG). Among the spectrum of tumor response, it is believed that tumors showing complete response would exhibit the features of good responding tumors in the greatest extent. Therefore, we compared the gene expression profiles between complete responders (TRG = 1) and non-complete responders (TRG ≥ 2). We identified 95 upregulated DEGs in complete responders ( Figure 1A and Table S1) and defined them as the CRS signature. When we broadened the definition of responder (TRG ≤ 2), enrichment score of the CRS signature was significantly higher in the responders than nonresponders (p = 0.027; Figure 1B). Furthermore, the CRS signature score predicted the response to nCRT with AUC value of 0.794 (p = 0.002; Figure S1). To characterize the biological process associated with good response, we carried out pathway analysis of and KEGG pathways in the Gene ontology ( Figure 1C; Tables S2 and S3). We found that DNA replication and repair processes were highly enriched in the CRS signature. Signaling pathways associated with inflammatory response, including IL-17 (p = 0.010) and TNF signaling pathways (p = 0.016), were also enriched in the CRS signature. In addition, Notch signaling pathway (p = 0.033) was over-represented in the CRS signature. Collectively, the CRS signature provided insights into the biological features of rectal cancer that exhibits good response to nCRT.

| Validation of the CRS signature in independent datasets
To validate the predictive role of the CRS signature in independent datasets, we used two public microarray datasets (GSE35452 and GSE45404). In GSE35452, the responders had significantly higher enrichment score of the CRS signature compared to non-responders (p = 0.032, Figure 2A). Similarly, responders showed greater enrichment of the CRS signature than non-responders with marginal statistical significance in GSE45404 (p = 0.061; Figure 2B). Moreover, the AUC values for predictability of response using enrichment score of CRS signature were 0.674 (p = 0.03) and 0.675 (p = 0.04) in GSE35452 and GSE45404 datasets, respectively. Together, these results indicate that the discovered CRS signature may work as a universal gene signature to predict response of rectal cancer to nCRT.

| Association between the CRS signature and mutation profiles of cancer
To obtain more insight into the characteristics of tumor according to the enrichment of the CRS signature, we analyzed the 85 patients in TCGA-READ dataset who did not undergo neoadjuvant therapy. Patients were stratified by the enrichment score of the CRS signature. The baseline characteristics of patients are detailed in Table 2. Patients with high CRS signature were significantly younger than those with low CRS signature (60.2 vs. 65.7 years, p = 0.042). Stage of disease was not significantly different between the two groups. The progressionfree survival (p = 0.27) and overall survival (p = 0.07) were not significantly different according to the CRS signature ( Figure S2), suggesting that the signature is predictive rather than prognostic.
We found that total mutation count (466.1 vs. 141.7, p = 0.23) and tumor mutational burden (15.4 vs. 4.7 per 10 6 bp, p = 0.23) were numerically higher in patients with high CRS signature compared to those with low CRS Next, we compared the mutation profiles of rectal cancer according to the enrichment of the CRS signature in TCGA database. We analyzed the 30 most commonly mutated genes in the dataset, including APC, TP53, KRAS, PI3KCA, SMAD4, and NRAS. Among the analyzed genes, KRAS mutation was less frequent in patients with high CRS signature (23.8% vs. 48.8%, p = 0.030) than those with low CRS signature. In patients with high CRS signature, G13D mutation (3/10) was the most common, followed by G12V mutation (2/10). On the other hand, G12V mutation (8/21) was the most common oncogenic mutation in the patients with low CRS signature, followed by G12D (5/21) and G13D (4/21) mutations. No patients in the high CRS signature group harbored KRAS G12D mutation. In addition, patients with high CRS signature were more likely to harbor APC mutation (95.2% vs. 79.1%, p = 0.058) with marginal statistical significance. Mutation of other genes were not significantly different between the two groups. Comparisons on somatic mutation profiles between the two groups are summarized on Table 3.

| Association between the CRS signature and immune cell infiltration into cancer
We next examined the proportions of tumor-infiltrating immune cells by deconvolution of RNA-seq data in TCGA dataset (Table 4). Activated NK cells were significantly more frequent in tumor with high CRS signature compared to low CRS signature (2.80% vs. 1.48%, p = 0.038; Figure 3A). The proportion of activated memory CD4 + T cells was marginally higher in tumor with high CRS signature compared to low CRS signature (2.29% vs. 1.26%, p = 0.075; Figure 3B). Activated dendritic cells were also significantly more abundant in tumor with high CRS signature than low CRS signature (1.20% vs. 0.40%, p = 0.011; Figure 3C). On contrary, the proportion of immunosuppressive M2 macrophages was significantly lower in tumor with high CRS signature than low CRS signature (17.91% vs. 22.38%, p = 0.026; Figure 3D). Other immune cell subsets, including cytotoxic CD8 + T cells and regulatory CD4 + T cells, populated similarly in the two groups.

| DISCUSSION
This bioinformatic analysis of public database showed the correlation between the gene signature associated with response to nCRT in multiple datasets and various biological characteristics of rectal cancer. We identified that somatic mutation of specific genes and proportion of infiltrating immune cells are associated with response to nCRT in rectal cancer. To the best of our knowledge, this is the first study to comprehensively analyze the factors, including expression of specific pathways, somatic mutations of cancer, and infiltration of immune cells, associated with the response of rectal cancer to nCRT. Whether the response of rectal cancer to nCRT is associated with the somatic mutations of KRAS or APC and infiltration of activated CD4 + T cells, NK cells, dendritic cells, or M2 macrophages should be verified in future investigations. T A B L E 3 Mutation profile of patients in TCGA-READ database with high and low CRS signature Our CRS signature included several genes that were reported in the previous studies. Along with GNG4, MYC, and RRM1, which were included in the four-gene signature derived from GSE53781, the discovery dataset of our study, BRCA1, which was included in the fourgene signature in the previous study, 21 was included the CRS signature. The characterization of the signature validated in independent datasets showed that nCRTsensitive tumors are enriched for pathways associated with DNA replication and repair processes. It is well accepted that DNA repair pathways are associated with cancer therapeutic resistance, including radiation and 5-FU, 22,23 which is opposite to our findings. However, it can be postulated that downregulation of DNA repair  pathways may confer increased somatic mutations, some of which are associated with sensitivity to nCRT, supported by the numerically higher mutation load in patients with high CRS signature. In addition, the genes involved in Notch signaling pathway were enriched in radiosensitive rectal cancer. We have previously shown the association between the genomic alteration of Notch signaling pathway-associated genes and response to radiation in various cancer types. 24,25 Furthermore, the association between sensitivity to 5-FU and Notch signaling pathway has been also proposed. 26 Further studies are needed to examine the association between various molecular pathways and sensitivity to nCRT of rectal cancer.
Our results showed that somatic mutation of KRAS may be associated with low likelihood of good response to nCRT consistent with the results from previous studies. 10,11 The KRAS G12D mutation was exclusively observed in the low CRS signature group. Moreover, the most common mutation type was KRAS G13D in high CRS signature group, while it was KRAS G12V in low CRS signature group. Accumulating evidence indicates the distinct therapeutic responses of tumor with specific KRAS mutation 27 ; for example, KRAS G13D mutation is associated with sensitivity to anti-EGFR therapies. 28 The association between KRAS G12D mutation and low CRS signature in current study may have stemmed from the limited response to chemotherapy in KRAS G12D mutation harboring colorectal cancer, 29 although the relationship between specific KRAS mutation status and response to radiotherapy is largely unknown. The previous works also showed that mutation of TP53 was associated with radioresistance of rectal cancer, which was not demonstrated in TCGA dataset. Additionally, somatic mutation of APC was marginally associated with chemoradiosensitivity of rectal cancer. No preclinical or human studies have shown the association between somatic mutation of APC in cancer cells and their sensitivity to nCRT. The effect of different KRAS mutations and APC mutation on sensitivity to nCRT should be investigated in the future studies.
There have been several reports that showed the association between infiltrating immune cells and response to nCRT in rectal cancer. Previous works showed that increased infiltrating of T cells, especially CD8 + T cells, is associated with favorable response to nCRT. [13][14][15] Our results did not exhibit significant difference of CD8 + T cell infiltration in patients with high and low CRS signature; however, higher proportions of activated memory CD4 + T cells, activated NK cells, and activated dendritic cells, which are thought to play anti-tumor role in tumor microenvironment, were observed in patients with high CRS signature. Conversely, immunosuppressive M2 macrophages were less frequent in patients with high CRS signature, indicating that anti-tumor response against tumor cells may be associated with sensitivity of rectal cancer to nCRT. Enrichment of inflammatory pathways in CRS signature further supports these findings. CD4 + T cells consist of various subsets, and a previous study showed that higher Th1 subset and lower Th17 subset are associated with favorable prognosis in colorectal cancer. 30 Therefore, deeper characterization of the immune cells in regard to tumor response is necessary.
Current analysis is not free of innate limitations. Because patients in all three microarray datasets underwent concurrent 5-FU-based chemotherapy, we were unable to separate the effect of 5-FU and radiotherapy. Furthermore, although enrichment of the CRS signature was validated in two independent datasets, the signature was derived from the upregulated genes in complete responders; however, the validation datasets did not provide the information regarding the complete response. Since the signature may be enriched only in tumor with remarkable chemoradiosensitivity, the signature should be compared in patients who did and did not experience complete response. In addition, the deconvolution method has limited ability to estimate the proportion of immune cells, because it is based on the reference gene expression profiles, which could deviate in cells within tumor microenvironment. 20 Accurate enumeration of immune cells should be conducted to validate the results of current study. The stratification of patients in TCGA database was performed according to the CRS signature, not the actual response to nCRT. Although the signature was validated in the independent datasets, the stratification based on the signature may bring about potential bias; thus, the results should be carefully interpreted. Therefore, it might be helpful to verify the CRS signature and its association with other characteristics using primary multiomic data.
In conclusion, this integrative bioinformatic analysis provided novel biological insight into the response of rectal cancer to nCRT. We believe that the new features may be utilized in combination with other previously described biomarkers to predict the response. Validation of these findings and investigation on the mechanisms underlying the association should be conducted in future works.