Establishment and experimental validation of an immune miRNA signature for assessing prognosis and immune landscape of patients with colorectal cancer

Abstract As essential regulators of gene expression, miRNAs are engaged in the initiation and progression of colorectal cancer (CRC), including antitumour immune response. In this study, we proposed an integrated algorithm, ImmuMiRNA, for identifying miRNA modulators of immune‐associated pathways. Based on these immune‐associated miRNAs, we applied the LASSO algorithm to develop a reliable and individualized signature for evaluating overall survival (OS) and inflammatory landscape of CRC patients. An external public data set and qRT‐PCR data from 40 samples were further utilized to validate this signature. As a result, an immune‐associated miRNA prognostic signature (IAMIPS) consisting of three miRNAs (miR‐194‐3P, miR‐216a‐5p and miR‐3677‐3p) was established and validated. Patients in the high‐risk group possessed worse OS. After stratification for clinical factors, the signature remained a powerful independent predictor for OS. IAMIPS displayed much better accuracy than the traditional clinical stage in assessing the prognosis of CRC. Further analysis revealed that patients in the high‐risk group were characterized by inflammatory response, abundance immune cell infiltration, and higher immune checkpoint profiles and tumour mutation burden (TMB). In conclusion, the IAMIPS is highly predictive of OS in patients with CRC, which may serve as a powerful prognostic tool to further optimize immunotherapies for cancer.

radiotherapy can benefit inoperable patients in clinical practice. 2 Though recent advances in various treatments, the overall survival (OS) of CRC remains unsatisfactory. 3 More complicated, owing to the high heterogeneity of CRC, it is also challenging to predict prognosis and make clinical decision individually.
Over the past decade, immunotherapy has illustrated tremendous sensation owing to its remarkable efficacy in the treatment of solid tumours, such as melanoma, non-small-cell lung cancer, gastric carcinoma, head and neck squamous cell carcinoma, renal cell carcinoma and CRC. 4,5 Immune checkpoint inhibitors (ICIs) aim to help the immune system recognize and attack cancer cells by acting on the primary targets including programmed death-ligand 1 (PD-L1), programmed death 1 (PD-1) and cytotoxic T-lymphocyteassociated protein 4 (CTLA-4). 6 In CRC, ICI therapy was approved in 2017 for the treatment of patients with DNA mismatch repair deficient (dMMR) or advanced microsatellite instability (MSI).
However, only a subset of patients with CRC could benefit from immunotherapy. 7 Therefore, it is imperative to pursue reliable biomarkers that are competent to accurately assess immunotherapy response.
MiRNAs are a series of small non-coding single-stranded RNA suggests that miRNA-PD-L1 axis might be a therapeutic target for CRC. 11 Therefore, miRNAs have great implications for the prognosis and immunotherapy.
Considering the significant implications of miRNAs on the prognosis and immunotherapy of CRC, we proposed an integrated algorithm, ImmuMiRNA, for identifying miRNA modulators of immune-associated pathways. Based on these identified immuneassociated miRNAs, the LASSO algorithm was applied to establish a risk signature for evaluating OS of CRC patients. As a result, an immune-associated miRNA prognostic signature (IAMIPS) consisting of three miRNAs was established and further validated in an external public data set and qRT-PCR data from 40 samples. We also investigated the immune landscape, immune checkpoint profiles and tumour mutation burden (TMB) of the signature. Initial construction of an IAMIPS for patients with CRC will facilitate the complex underlying mechanisms between immune-associated miRNAs and prognosis of CRC and may advance optimize immunotherapies for patients with CRC.

| Public data set collection
The overall workflow of our study is displayed in Figure 1A. The CRC data (n = 689) were enrolled from The Cancer Genome Atlas (TCGA) cohorts TCGA-COAD (colon adenocarcinoma) and TCGA-READ (rectum adenocarcinoma). 'Level 3' transcriptome profile (RNA-Seq raw read count) and clinical information were retrieved from TCGA data portal (https://portal.gdc.cancer.gov/). Patients from TCGA were defined as TCGA-CRC cohort. Human microRNA array GSE29622 including 65 CRC patients was extracted from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). The normalized matrix file including miRNA expression profile and clinical information was directly downloaded. Patients were excluded if they 1) lacked mRNA or miRNA sequencing data; 2) did not have prognostic information; and 3) received neo-adjuvant therapy. Detailed baseline data of CRC patients are displayed in Table S1.

| Genome-wide mRNA and miRNA expression in TCGA
The mRNA and miRNA raw read count from TCGA database was converted to transcripts per kilobase million (TPM) and reads per kilobase million (RPM), respectively. A further log2 transformation was performed because RNA-seq data are often heavily right-skewed in the linear scale. The GENCODE (https://www.genco degen es.org/) and miRBase (http://www.mirba se.org/) were utilized to mRNA and miRNA annotations, respectively. The mRNAs and miRNAs with zero reads >50% of the samples were further excluded. In total, 16 985 mRNAs and 674 miRNAs were encompassed.

| ImmuMiRNA: identification of immuneassociated miRNAs in CRC
To identify the latent miRNA modulators of immune-associated pathways, we introduced an integrated algorithm that combines miRNA and gene expression data similar to ImmLnc. 15 In short, all mRNAs were ranked by their correlation with a specific miRNA. The For each specific miRNA, we first ranked all mRNAs based on the correlation of their expression with this miRNA. The expression of miRNA i and gene j across n patients was labelled as M(i) = (m 1 , m 2 , …, m n ) and G(j) = (g 1 , g2, …, g n ), respectively. The tumour purity scores across n patients were labelled as P = (p 1 , p 2 , …, p n ). We first calculated the partial correlation coefficient (PCC) between the expression of miRNA i and gene j by controlling the tumour purity as a covariable, where R MG , R MP and R GP are the correlation coefficients between the expression of miRNA i and protein-coding gene j, the expression of miRNA i and tumour purity, and the expression of gene j and tumour purity, respectively. Moreover, we obtained the P-value of the PCC, labelled as P(ij), for each miRNA-gene pair, and the rank score (RS) was calculated as follows: All genes were ranked based on RS indexes and further subjected to gene set enrichment analysis (GSEA). We mapped the genes of each immune-associated pathway to the ranked gene list.
For miRNA i and pathway k, we obtained the enrichment score (ES) and P-value (adjusted by false discovery rate (FDR)) based on GSEA.
Furthermore, following a previous study, 16 the P-value and the ES were combined to a miRES score, that is where ES(ik) is the ES score between miRNA i and immune pathway k. Thus, the miRES score ranged from −1 to 1. We considered the miRNA-pathway pairs with the absolute miRES >0.995 and FDR <0.05 as significant ones. To implement the pipeline described above, we developed a R package termed 'ImmuMiRNA' (https://github.com/Zaoqu -Liu/ImmuM iRNA).

| Construction and validation of the IAMIPS in public data sets
Before building the IAMIPS model, we transformed miRNA expression into z-score in both TCGA-CRC and GSE29622 data sets, which enhanced the comparability between different data sets. The TCGA-CRC cohort served as the modelling set, and the GSE29622 served as the external validation set.
According to the immune-associated miRNAs extracted above, we first performed univariate Cox regression analysis to select miRNAs that were significantly related to OS in TCGA-CRC cohort.
Given the aim to screen candidate miRNAs that were highly associated with OS and that the strictness of multiple testing correction might filter out some of these potential miRNAs, we included miR-NAs with unadjusted P-value <0.05 in the development of IAMIPS.
A LASSO Cox regression approach was employed to determine candidates for the IAMIPS using 'glmnet' R package. The LASSO algorithm is a prevalent machine-learning method, which is widely applied to the Cox proportional hazard regression models for prognostic analysis. 17,18 To determine the optimal values of lambda, we used 10-fold cross-validations with the 1-standard error (SE) criteria, 18 and the optimal lambda is the largest value for which the partial likelihood deviance is within one SE of the smallest value of partial likelihood deviance. Subsequently, based on this lambda value, the miRNAs with non-zero coefficients were selected to construct the prediction model. The risk score for each patient was calculated with the LASSO model weighting coefficient as follows: where n is the number of key miRNAs, Exp i is the expression of miRNA i, and Coef i is the LASSO coefficient of miRNA i. The optimal risk score cut-off value was determined by 'survminer' R package in the TCGA-CRC cohort. Using this cut-off value, the patients were divided into high-risk and low-risk groups. Human microRNA array GSE29622 served as external validation.  Table S1.

| Validated the IAMIPS in vitro experiment
Total RNA was isolated from CRC tissues and paired adjacent nontumour tissues with RNAiso Plus reagent (Takara, Dalian, China) according to the manufacturer's instructions. RNA quality was evaluated using a NanoDrop One C, and RNA integrity was assessed using agarose gel electrophoresis. An aliquot of 1 µg of total we further validated the IAMIPS in our CRC cohort.

| Gene set enrichment analysis
To explore the potential molecular mechanisms underlying the

| Prediction the clinical chemotherapeutic response
To assess the drug response in TCGA-CRC cohort, we downloaded the imputed tumour response to 138 anticancer drugs in CRC patients from a previous study. 20 Drug sensitivity was quantified by half-maximal inhibitory concentration (IC50); the lower the IC50, the more sensitive the drug. We identified tumour drugs with specific sensitivity in high-risk and low-risk groups using the following criteria: 1) a Pearson correlation was calculated for each drug's IC50 and risk score. Drugs with absolute correlation coefficient >0.3 and FDR <0.05 were retained; 2) t test was performed to compare the sensitivity difference between high-risk and low-risk groups. Drugs with absolute log2 fold change value >0.5 and FDR <0.05 were included; and 3) IAMIPS-related drugs were determined by the intersection of the above two results.

| Evaluation of the immunotherapy response
The Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was employed to predict the immunotherapy response of each patient. 21 TIDE algorithm was a computational method to model two primary mechanisms of tumour immune evasion: the induction of T cell dysfunction in tumours with high infiltration of cytotoxic T lymphocytes (CTL) and the prevention of T cell infiltration in tumours with low CTL level. Next, the Subclass Mapping (SubMap) method was utilized to evaluate the similarity between the risk groups and the patients on immunotherapy. 22 The SubMap employs GSEA algorithm to deduce the extent of commonality of the two groups. Adjusted P-values were used to assess the similarity, and the lower adjusted P-values suggested the higher similarity.

| Identification of immune-associated miRNAs in CRC
To identify miRNAs that were related to immune-associated pathways, we developed a three-step integrated algorithm framework termed ImmuMiRNA ( Figure 1B). ImmuMiRNA systematically de- By virtue of the ImmuMiRNA pipeline, we identified a total of 97 immune-associated miRNAs, which accounted for 10% of all miR-NAs in the TCGA-CRC cohort (Table S2). A higher number of miRNAs were correlated with the 'T cell receptor signalling', 'natural killer cell cytotoxicity', 'cytokine receptors' and 'antigen processing and presentation' pathways ( Figure 2A). Currently, restoring or enhancing the activity of T cells and natural killer cells is considered to be the mainstay of immunotherapy. 23 These miRNA regulators will be a resource for dissecting the immune regulation in CRC. Univariate

Cox regression analysis between each of the 97 miRNAs and OS
is shown in Table S3. A total of 11 miRNAs significantly correlated with OS were identified, of which 8 were protective factors and 3 were risk factors (P < 0.05; Figure 2B). These OS-associated miR-NAs demonstrated correlations with a variety of immune-associated pathways, which suggested that activation and inhibition of various immune pathways were significantly correlated with OS in patients ( Figure 2C).

| Construction and evaluation of the IAMIPS
The 11 OS-associated miRNAs were selected to construct an IAMIPS.
We employed a LASSO Cox regression model and identified three miRNAs that were strongly predictive of OS, including miR-216a-5p, miR-194-3p and miR-3677-3p ( Figure 3A,B). The three miRNAs also demonstrated significant differences between tumours and normal tissues (P < 0.05; Figure S1A). Next, a risk score for IAMIPS was calculated using a formula that including the three miRNAs weighted by their regression coefficients in a penalized Cox model as follows: Risk score =0.015 × the expression of miR-216a-5p -0.035 × the expression of miR-194-3p -0.124 × the expression of miR-3677-3p.
Using the optimal cut-off value (0.05) for the IAMIPS, we divided 635 patients in the TCGA-CRC cohort and 65 patients in the GSE29622 into high-risk and low-risk groups, respectively. Patients in the high-risk group had a shorter OS than the low-risk group (logrank test, both P < 0.05; Figure 3C,D). We questioned the generality of IAMIPS for various clinicopathological characteristics in CRC; thus, the stratified survival analysis was further performed. After stratification for age, gender, clinical stage and microsatellite instability, the signature remained a powerful independent predictor for OS and was suitable to the vast majority of CRC patients (log-rank test, P < 0.05; Figure S1B). The ROC results demonstrated the IAMIPS displayed better performance than each miRNA at predicting 1-, 3-and 5-year OS in two cohorts (P < 0.05; Figure S2A,B). Traditional clinical stage is currently the main method to assess the prognosis of CRC patients. Therefore, we next evaluated the prognostic performance of AJCC stage versus that of the IAMIPS. The IAMIPS also displayed better accuracy than traditional AJCC stage in two cohorts (both P < 0.05; Figure S2A,B).

| Validation of the IAMIPS in our cohort
We enrolled 40 CRC tissues and 40 paired non-tumour tissues from the First Affiliated Hospital of Zhengzhou University. Follow-up was concluded three years after surgery. Table S1 shows their clinical characteristics. qRT-PCR assay was performed in 40 pairs of CRC tissues and matched adjacent non-tumour tissues. In line with above results, the three miRNAs displayed significantly expression difference in tumour relative to normal tissues (P < 0.05; Figure 4A).
We determined another optimal cut-off value (0.008) for qRT-

| Inflammatory profiles and immune checkpoint landscape of IAMIPS
GSEA was performed to better understand the potential molecular mechanisms underlying the IAMIPS. As shown in Figure 5A,B, the high-risk group was mainly associated with tumour proliferation such as cell cycle, spliceosome and mRNA processing. The low-risk group was mainly associated with immunology such as antigen processing and presentation, cytokine-cytokine receptor interaction and adaptive immune response. These results explained the worse prognosis of patients in the high-risk group. Intriguingly, a large number of immune-associated pathways were enriched in the low-risk group, which indicated patients in the low-risk group had favourable immune infiltration status. Therefore, we further adopted the ssGSEA algorithm to assess the relative infiltration abundance of 28 immune cell types. Consistent with the above results, the abundance of immune cell infiltration in the low-risk group was significantly higher than the high-risk group, such as B cells, activated CD4+/CD8+ T cells, dendritic cells and natural killer cells (P < 0.05; Figure 5C, Figure S3). Overall, the high-risk group was significantly correlated with tumour proliferation and presented inferior immune TMB between the two groups. As expected, the low-risk groups all displayed higher expression level of these indicators compared with the high-risk groups (P < 0.05; Figure 5D), which suggested that patients in the low-risk group was more likely to benefit from available immunotherapeutic drugs such as atezolizumab, pembrolizumab and ipilimumab. 5

| Implications of IAMIPS on CRC chemotherapy
We further identify several IAMIPS-related antineoplastic drugs.
As shown in Figure 6A, we observed that patients in the low-risk group were more sensitive to BMS-536924, bortezomib, dasatinib, GW843682X, paclitaxel, PD-0325901 and WH-4-023 and patients in the high-risk group were more sensitive to PAC-1 (P < 0.05). These drugs provided a resource for precision chemotherapy in two groups.
Since we identified 'immune-hot' and 'immune-cold' phenotypes in two groups, further immunotherapy evaluations were performed.
Using the TIDE tool at Harvard University, we observed patients in the low-risk group had more immunotherapy response rate than the high-risk group (38% vs. 17%; P < 0.05; Figure 6B). In addition, SubMap analysis indicated the low-risk group displayed high similarity with patients who responded to anti-PD-1 therapy (P < 0.05; Figure 6C). These results further proved patients in the low-risk group could benefit more from immunotherapy, particularly anti-PD-1 therapy.

| D ISCUSS I ON
Accumulating evidence suggests that miRNAs are critical for im- or enhancing the activity of T cells and natural killer cells is currently the mainstay of immunotherapy, 23 and these miRNA regulators will be a resource for dissecting the immune regulation in CRC.
Immune-associated miRNAs also displayed significant impacts on the prognosis of patients with CRC. In our study, eleven miRNAs that dramatically related to OS were further identified.
An ideal machine-learning model should have fewer variables and achieve better efficacy. 25 Hence, we applied the LASSO algorithm, which was known to select key variables to avoid overfit of model. 17 to have better immunotherapy response. 26 Hence, patients in the lowrisk group might benefit more from immunotherapy. Bioinformatics algorithms including TIDE and SubMap methods further validated this conclusion. However, the limitation of our study is evaluating the immunotherapy response using bioinformatics algorithms rather than conducting large-scale immunotherapy clinical trials. In spite of this, the above results were highly consistent in terms of functional analysis and predictive results, which indicates that our results are relatively reliable. Moreover, we identified latent antitumour drugs significantly associated with IAMIPS, hoping to provide additional reference for antitumour therapies of patients with different IAMIPS risk.
In summary, we proposed a novel algorithm, ImmuMiRNA, which can systematically identify the miRNA regulators that latently regulate immune-associated pathways in CRC. The IAMIPS will facilitate the complex underlying mechanisms between immune-associated miRNAs and prognosis of CRC and may advance optimize immunotherapies for patients with CRC.

ACK N OWLED G EM ENTS
This study was supported by Henan Province Young and Middle-Aged Health Science and Technology Innovation Talent Project (YXKC2020037) and Henan Provincial Health Commission Joint Youth Project (SB201902014).

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts of interest.

E TH I C S A PPROVA L A N D CO N S E NT TO PA RTI CI PATE
The human cancer tissues used in this study were approved by

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used to support the findings of this study are available from the corresponding author upon request.