A mutational signature for colorectal cancer prognosis prediction: Associated with immune cell infiltration

Dear Editor, Colorectal cancer (CRC) is considered as a genetic disease, which arises from the stepwise accumulation of genetic and epigenetic alterations.1,2 We found a novel mutational signature (MS) that could assist clinicians to select patients who are more suitable for immunotherapy; the risk score (RS) combined with pathological TNM stage could provide comprehensive and precise prognostic information for CRC patients. To explore the genomic basis of tumor variability in the tumor microenvironment of CRC, we integrated single nucleotide variation (SNV) and transcriptome data and collected information from 1133 and 588 CRC patients of the Memorial Sloan Kettering Cancer Center (MSKCC) and The Cancer Genome Atlas (TCGA) databases. In the training (MSKCC) cohort, we identified an MS consisting of 27 genomic variant genes and generated a prognostic model (Figures 1A and 1B). The date showed that the high-risk group has poorer overall survival (OS), which was verified in both MSKCC and TCGA cohorts (Figures 1C and 1D). The Kaplan-Meier survival curve and ROC curve were applied to evaluate the predicting power of themodel by using R packages “survival” and “survival ROC.” The ROC curve indicated that the classifier had a good predictive ability (Figure 1E).The univariate and multivariate analyses revealed MS is an independent, unfavorable prognostic factor for CRCs (Figures 1F and 1G; Table S1). A flowchart is shown in Figure S1. The coefficients of the 27 mutated genes are shown in Table S4. In the training cohort, a nomogramwas generated to predict the OS of CRC patients (Figure S2A). The predictors included tumor location,M stage, TNMstage, andRS, amongwhich theRS had the highest C-index (Figures S2B and S2C). The clinical figures of CRC patients are listed in Table S2 and S3. To explore the differences of genomic alterations in these two groups, we analyzed the data containing somatic mutations from the TCGA database. First, it revealed a significant enrichment of different mutations between lowand high-risk groups (Figure 2A). The data showed that more than 90% of CRC in the low-risk group had more


Dear Editor,
Colorectal cancer (CRC) is considered as a genetic disease, which arises from the stepwise accumulation of genetic and epigenetic alterations. 1,2 We found a novel mutational signature (MS) that could assist clinicians to select patients who are more suitable for immunotherapy; the risk score (RS) combined with pathological TNM stage could provide comprehensive and precise prognostic information for CRC patients. To explore the genomic basis of tumor variability in the tumor microenvironment of CRC, we integrated single nucleotide variation (SNV) and transcriptome data and collected information from 1133 and 588 CRC patients of the Memorial Sloan Kettering Cancer Center (MSKCC) and The Cancer Genome Atlas (TCGA) databases. In the training (MSKCC) cohort, we identified an MS consisting of 27 genomic variant genes and generated a prognostic model ( Figures 1A and 1B). The date showed that the high-risk group has poorer overall survival (OS), which was verified in both MSKCC and TCGA cohorts ( Figures 1C and 1D). The Kaplan-Meier survival curve and ROC curve were applied to evaluate the predicting power of the model by using R packages "survival" and "survival ROC." The ROC curve indicated that the classifier had a good predictive ability ( Figure 1E).The univariate and multivariate analyses revealed MS is an independent, unfavorable prognostic factor for CRCs (Figures 1F and 1G;  Table S1).
A flowchart is shown in Figure S1. The coefficients of the 27 mutated genes are shown in Table S4. In the training cohort, a nomogram was generated to predict the OS of CRC patients ( Figure S2A). The predictors included tumor location, M stage, TNM stage, and RS, among which the RS had the highest C-index ( Figures S2B and S2C). The clinical figures of CRC patients are listed in Table S2 and S3.
To explore the differences of genomic alterations in these two groups, we analyzed the data containing somatic mutations from the TCGA database. First, it revealed a significant enrichment of different mutations between lowand high-risk groups ( Figure 2A). The data showed that more than 90% of CRC in the low-risk group had more  Figure 2B), which means most of the genes with more mutations in the low-risk group. Besides, the low-and high-risk groups had different distribution of the top 10 mutated genes ( Figure 2C). Significant enrichment of oncogenic alterations in such genes as BRAF, ZFHX3, and MTOR was found in rightsided tumors and MSI (Microsatellite instability) patients, while oncogenic alteration of APC (Adenomatous Polyposis Coli) was primary found in the left-sided tumors and MSS (Microsatellite stability) patients ( Figure 2D). And all the results were consistent in the validation cohort (Figures 2E-2H; Figure S5).
The MSI status is critical when considering immunotherapy and chemotherapeutic drugs as options for CRC patients. 3,4 The RS was observed to be significantly associated with the status of MSI/dMMR and other clinical features (Figures 3A and 3B; Figure S6). In line with previous observation, the status of MSI was more common in low-risk group ( Figure 3C). Furthermore, we observed that low-risk group exhibited a higher mutations number ( Figures 3D-3F). Due to the hypermutation or high mutational load, these patients might have increased neoantigens, leading to increased immune infiltration, and thus might be more sensitive to immunotherapy. We speculate that there is a potential connection between the MS model and the immune environment.
To further clarify the relationship between MS and immune-phenotyping, we analyzed SNV and transcriptome data in the TCGA database. The immune activity was determined by analyzing 29 immune-related genesets. These genesets were analyzed using the ssGSEA. 5 A heatmap of the infiltration levels and scores of each sample of immune cells in the three subtypes is shown in Figure 4A. A higher expression level of the PD-L1 gene was found in the immunity-H cluster, and the immunity-H cluster was correlated with better survival outcome (Figures 4B and S5A-S5D). The immunity-H cluster was significantly enriched in the low-risk group ( Figure 4E). To reconfirm the findings above, we also performed consensus molecular subgroups (CMS) classification, 6   a more profound biological insight into immunity typing, and has a strong prognostic effect. PD-L1 was highly expressed in CMS1, which is defined by upregulation of immune genes and associated with MSI-h. The data suggested that CMS1 cluster was significantly enriched in the low-risk group (Figures 4D and 4F), and the CMS1 cluster is more likely benefit from PD-L1 inhibitor treatment.
To further investigate the potential predictive value of the MS for the immune status, we examined the possible associations between the immune status and RS. TMB score was negatively correlated with the RS, but TMB was positively correlated with immune score (Figures S6A and  S6B). Therefore, we postulated that the RS was negatively associated with immune score. We found a low immune score was related to a worse OS outcome ( Figure S3C), and the low-risk group may also be associated with a better survival. To figure out the infiltrated immune cell composition in the defined risk groups, we analyzed the expression signature matrix of 22 infiltrated immune cell types in tumor samples using the CIBERSORT test (Figures S3D-S3G). Regarding tumor-infiltrating immune cells in CRC microenvironment, the number of CD4 memory T cells decreased, and the macrophage M0 increased in the high-risk group (Figures S6D-S6G).
Immunotherapy has been raised as a novel effective treatment against CRC; however, the current guidelines only based on the TNM stage cannot reflect the information of host immune system response. 7,8 In clinic, MSI-H is an especially good indicator for checkpoint blockade immunotherapy in CRC, but only about 45% of MSI-H CRC patients could benefit from immunotherapy. 9 In our prediction model, we have identified a novel MS, which can generate a prognostic tool to effectively classify CRC patients with different OS risks. Moreover, the MS classifier can be used to predict patients who are more suitable for immunotherapy, and a nomogram comprising the MS could help medical staff in directing personalized therapeutic treatment selection for CRC patients.

C O N F L I C T O F I N T E R E S T
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

A U T H O R C O N T R I B U T I O N S
Study design: Ping Lan, Xiaosheng He, and Liang Xu. Literature research: Yanyun Lin and Xijie Chen. Data acquisition: Liang Xu, Yanyun Lin, and Xijie Chen. Data analysis/interpretation: Yanyun Lin and Guanman Li. Statistical analysis: Xijie Chen and Zengjie Chi. Manuscript preparation: Liang Xu and Bin Zheng. Manuscript definition of intellectual content: Lisheng Zheng and Bin Zheng. Manuscript editing: Yufeng Cheng, Jiancong Hu, Shuang Guo, and Danling Liu.

D ATA A C C E S S , R E S P O N S I B I L I T Y, A N D A N A LY S I S
All data generated or analyzed during the present study are available via the corresponding author on reasonable request.