A signature of 18 immune‐related gene pairs to predict the prognosis of pancreatic cancer patients

Abstract Pancreatic cancer is one of the most lethal malignancies. With the promising prospects conveyed by immunotherapy in cancers, we aimed to construct an immune‐related gene pairs (IRGPs) signature to predict the prognosis of pancreatic cancer patients. We downloaded clinical and transcriptional data of pancreatic cancer patients from The Cancer Genome Atlas data set as the training group and GSE57495 data set as the verification group. We filtered immune‐related transcriptional data by IMMPORT. With the assistance of lasso penalized Cox regression, we constructed our prognostic IRGPs signature and divided all samples into high‐/low‐risk groups by receiver operating characteristic curve for further comparisons. The comparisons between high‐ and low‐risk groups including survival rate, multivariate, and univariate Cox proportional‐hazards analysis, infiltration of immune cells, and Gene Set Enrichment Analysis (GSEA). Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) are facilitated to analyze the proceedings in which our IRGPs signature may involve in. The results revealed that 18 IRGPs were defined as our prognostic signature. The prognostic value of this IRGPs signature was verified from the GSE57495 data set. We further demonstrated the independent prognostic value of this IRGPs signature. The contents of six immune cells between high‐/low‐risk groups were different, which was associated with the progression of diverse cancers. Results from GO, KEGG, and GSEA revealed that this IRGPs signature was involved in extracellular space, immune response, cancer pathways, cation channel, and gated channel activities. Evidently, this IRGPs signature will provide remarkable value for the therapy of pancreatic cancer patients.


| INTRODUCTION
Pancreatic cancer (PC) is considered to be one of the most lethal malignant tumors, and its global incidence is expected to rise to approximately 420,000 cases by 2020, unfortunately, the 5-year survival rate for PC patients is less than 10%. 1,2 Currently, an increasing number of people are confronting obesity and diabetes diseases, which are tightly correlated with the development of PC. 2 The anatomical location of the pancreas is deep in the abdomen thus leads to the difficulty of diagnosis at early stages. Unfortunately, surgical treatment for patients with advanced PC is not feasible, therefore, other treatment procedures including immunotherapy, chemotherapy, and radiotherapy were urged to be applied to PC. 3 Recently, dysregulation of the immune system has been reported to correlate with the development of malignant tumors. Therefore, immunotherapy has become a crucial strategy for the treatment of various of cancers. [4][5][6] Previous studies also implied the potential validity to apply immunotherapy into PC patients. Peripheral blood analysis revealed that the contents of CD8 + T cells were significantly lower in PC patients than in healthy controls and higher infiltration of CD4 + /CD8 + T cells corresponds to better survival in PC patients. 7,8 Th1 cells and Th2 cells are originated from the differentiation of Naive CD4 + T cells, the diversion from Th1 to Th2 cells associated with poor survival in PC patients. 9 CD226 and CD96 were reported to regulate the functions of natural killer (NK) cells, the contents of CD226 + and CD96 + NK cells were lower in PC patients comparing to healthy groups. Moreover, the reduction of CD226 + and CD96 + NK cells is correlate with tumor histological grade and lymph node metastasis, and the decreased percentages of CD226 + and CD96 + NK cells could cause tumor immune escape in PC patients. 10 Above all, it is considerable to apply immunotherapy into PC patients.
To date, none has applied immune-related genes (IRGs) into the therapy of PC patients. To provide an original method into the treatment of PC, we performed bioinformatic methods to construct a prognostic IRGPs signature. Data of PC patients were downloaded from The Cancer Genome Atlas (TCGA) and GEO database (GSE57495) and we further employed IMMPORT to filter the IRGs of our transcriptional data. Ultimately, we constructed and validated the prognostic value of our IRG pairs (IRGPs) signature. Taken together, this study will facilitate the application of immunotherapy into PC.

| Summary
This is a study based on TCGA data and GEO data to perform construction of immune-related prognostic signature. The TCGA group was deemed as the training group, and GSE57495 data set was employed for the verification group. RNA-seq expression data and clinical data of PC patients of the two data sets were downloaded. With the combination of Immport database, we kept the immune-related transcriptional expression data and further utilize the data with survival time to dig out prognostic-relevant IRGPs.
A well-balanced model that contains 18 IRGPs was constructed by lasso penalized Cox regression. In the meanwhile, we divided all samples into high-/low-risk groups by the optimal cut-off in ROC curve analysis. This IRGPs model was validation by the overall survival difference in the verification group. We also performed univariate/multivariate Cox proportionalhazards analysis in the training group to identify our model as an independent prognostic factor. We further compared the infiltration of immune cells between high-/low-risk groups and performed Gene Set Enrichment Analysis (GSEA) of the two groups. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed on DAVID database. Expression profile, clinical relevance, and mutational analysis of the 18 IRGPs in PC was performed on GEPIA2 and cBioportal platform.

| Data preprocessing
When patients appear in the database more than once, we average their expression profile data. When a target gene matches multiple probes, we average the probes to represent the expression level of the gene.

| IRGs extraction
Immport (https://immport.niaid.nih.gov) was one of the largest open repositories of human immunological data. 11 We downloaded a list of 2498 IRGs from Immport. Then we discard the immune-unrelated genes from our transcriptional data.

| Construction of the IRGPs signature
Each IRG was paired with each other and each IRGP (IRG pair) has a specific score. Detailly, in the pairwise comparison by R, the output is 1 if the expression of the first immune gene in a specific sample is more than the following one and 0 for the reversed order. Subsequently, we deleted the IRGPs if the score of which were 0 or 1 in more than 80% of the samples, the remaining IRGPs were deemed as initial candidate IRGPs. We performed logrank test in our training group to obtain prognosticrelated IRGPs (p < .0001), moreover, we performed lasso penalized Cox regression (iteration = 1000) to obtain a well-balanced prognostic model by R (glmnet package). Ultimately, the most stable model which contains 18 IRGPs was deemed as our final IRGPs signature. To classify patients into high-/low-risk groups, we performed time-dependent receiver operating characteristic (ROC) curve analysis at 1 year in the training group for overall survival and obtained optimal cutoff. Patients with higher risk score than the optimal cutoff will be counted in high-risk group, patients with lower risk score will be counted in low-risk group.

| Validation of the IRGPs signature
We employed R (survival package) to obtain the Kaplan-Meier curve for comparing the overall survival difference between high-/low-risk groups in TCGA and GSE57495 PC samples. Moreover, to validate our model to be an independent prognostic factor, we performed univariate and multivariate Cox proportional-hazards analysis for the training group to assess our prognostic model with other clinical factors.

| Comparison of the infiltration of immune cells between high-/low-risk groups
CIBERSORT (http://cibersort.stanford.edu/) has been widely used for analyzing the cellular composition of a tissue from its gene expression profile, especially for analyzing the composition of immune cells in tumors. 12 We compared the infiltration of 22 immune cells between high-/low-risk groups in TCGA PC samples. Results of p < .05 was regarded as statistically significant.

| GSEA
To compare the differential enrichment of gene sets between the high-/low-risk groups, we employed R (Bioconductor package fgsea) to perform GSEA and uncover potential biological mechanisms that our prognostic model may involve in. GO gene sets (c5.all.v7.0.symbols.gmt) in GSEA database were downloaded for our study. Gene sets with the results of FDR value < 0.05 were deemed to be statistically valuable.

| GO and KEGG analysis
DAVID database is used for analyze the functions of genomic statistics and further classify the data. 13 We employed DAVID to perform functional enrichment analysis including biological process, molecular function, cell component, and KEGG pathway of the genes in the IRGPs signature.

| GEPIA2
GEPIA2 (http://gepia2.cancer-pku.cn/) is a multifunctional molecular analysis platform based on TCGA data and GETx data which contains 179 PC samples and 171 normal samples altogether. 14 It was employed to comparing the expression profile of each IRGP between PC and adjacent normal tissues. The results of p value were generated by Student's t test. Differential expression of an IRGP in the two sets of tissues was said to occur only if the results of p < .05 and |Log 2 FC| > 1. Moreover, we investigated the relationship between the expression of each IRGP and the PC stage. It was said an IRGP was correlated with tumor stage only when both genes in an IRGP are related to the stage of PC (p < .05).

| cBioportal
cBioportal (https://www.cbioportal.org/) is an online database used for exploring, visualizing, and analyzing different cancer genomics data. 15 This tool was employed to analyze genetic alterations including alteration rate and detailed categories of genetic alterations of each IRGP in PC.

| Summary
We constructed an IRGPs signature to predict the prognosis of PC patients. This signature was correlated with the infiltration of immune cells and cancer-related pathways.

| The construction of IRGPs signature
We selected TCGA PC samples (n = 177) with transcriptional data and clinical data as our test group and data from GEO PC samples GSE57495 (n = 63) as the verification group. The clinical data of these two data sets are shown in Table 1. 2498 IRGs from IMMPORT database are downloaded for our data screening, and we ultimately keep IRGs in our transcriptional data.
Moreover, we paired our IRG with each other, and we deleted the IRGPs if the score of which were 0 or 1 in more than 80% of the samples, in that case, 26524 IRGPs were remained. We performed a log-rank test in our training group and further obtaining 33 prognostic IRGPs (p < .0001). Subsequently, we performed lasso penalized Cox regression (iteration = 1000) to define an index of each IRGPs, the risk score of each patient was calculated by these indexes. We further obtained a wellbalanced prognostic model, the 18 IRGPs (Table 2), were selected to construct the signature. To divide all the samples into high-/low-risk groups, we performed timedependent ROC curve analysis at 1 year in the training group for overall survival and we obtained the optimal cutoff of 1.057 ( Figure 1A). Patients with higher risk score than 1.057 will be counted in high-risk group, patients with lower risk score than 1.057 will be counted in low-risk group. The classification of high-/low-risk groups of patients in TCGA data set was shown in Figure 2A, THE classification of high-/low-risk groups of patients in GSE57495 data set was shown in Figure 2E. The area under receiver operating characteristic curve (AUC) of 1-year survival rate was 0.843, which demonstrated the validity of our prognostic signature ( Figure 1B).

| Differences of the infiltration of immune cells between high-/low-risk groups
Infiltration of immune cells was correlated with cancer development and prognosis. We employed CI-BERSORT to compare the differential contents of 22   Figure 3C) and T cells CD8 (p = .010) ( Figure 3G) were lower in the high-risk group.

| GSEA
The intention of GSEA was to infer the potential biological mechanisms which the patients in the high-risk group may involve in. The results of GSEA revealed that CA-TION_CHANNEL_ACTIVITY (Enrichment score: 0.579, FDR value = 0.028) ( Figure 4A Figure 4E) and VOLTAGE_GATED_CATION_CH-ANNEL_ACTIVITY (Enrichment score: 0.631, FDR value = 0.037) ( Figure 4F) were significantly altered between high-and low-risk groups.

| GO and KEGG
GO analysis revealed that the genes in our IRGPs signature mainly enriched in extracellular space and immune response ( Figure 4G). KEGG analysis revealed that these genes mainly involved in cancer pathways ( Figure 4H).

| Analysis of genetic alterations
Results from cBioportal revealed that 112 (67%) of the 168 PC patients have genetic alterations of at least one gene in the IRGPs signature and the most common genetic alteration category was messenger RNA (mRNA) high ( Figure 7D). CD1D_DKK1 has the highest genetic alteration rate (15%), the most common genetic alteration category was mRNA high, besides, nine patients have CD1D_DKK1 genetic amplification, which is the highest among all IRGPs ( Figure 7A). The mutation rate of IRF3_MET and ZYX_PLAU were 14% and 12%, respectively, which were the highest except CD1D_DKK1. The most common genetic alteration category of these two IRGPs was mRNA high (Figure 7B,C).

| DISCUSSION
PC is one of the most lethal malignancies. Due to its components of plentiful desmoplastic stroma which could handicap the infiltration of effector T-cell and further facilitate the immunosuppressive microenvironment, and the traits of poor immunogenicity, it was fairly intricate to accomplish immunotherapy into pancreatic cancer. 16 Despite it was a gigantic challenge, previous literatures have implied the role of immune system in PC and supplied the potential feasibility to accomplish this target. [7][8][9] As yet, the role of IRGs in PC is still unclear. In our study, we constructed an IRGPs signature to correlate with the prognosis of PC patients. Unlike the traditional prognostic model, the pairwise comparison of each IRGP and the calculation of the score were absolutely based on the genetic expressions in the same patient thus it is not a necessity in our IRGPs signature to standardize the genetic expression profiles from different sequencing platforms. Previous studies have demonstrated the validity of this method. 17,18 In our study, we construct an IRGPs signature to predict the prognosis of PC patients. The AUC of this model was 0.843, which demonstrated the validity of our prognostic signature. This signature contains 18 IRGPs, dividing all patients into high/low immune risk groups. Survival rate analysis and univariate/multivariate Cox proportional-hazards analysis have not only demonstrated the prognostic value but also certified our IRGPs signature as an independent prognostic factor.
The contents of memory B cells, macrophages M0, macrophages M1 and NK cells activated were higher in the high-risk group. The contents of naïve B cells and CD8 T cells were lower in the high-risk group. Naïve B cells and CD8 T cells were demonstrated to be anticancer immune cells. Naïve B cells and CD8 T cells were gathered in CD31 high tumors, and the higher expression of CD31 implied more vascular endothelial cells, which F I G U R E 6 Correlation between the expression profile of each IRGP with pancreatic cancer stage. (A-C) Results revealed that mRNA expressions of AGT_OAS1, ICAM1_MET, and CHGA_IL22RA1 correlated with pancreatic cancer stage. IRGP, immune-related gene pair; mRNA, messenger RNA was correlated with better prognosis in PC patients. 19 Previous studies demonstrated that high density of tumor infiltrating naïve B cells is correlated with higher survival rates in hepatocellular carcinoma patients. It was also confirmed to be an independent prognostic factor in liver cancer. High density of naïve B cells was also validated to associate with tiny tumor size and good differentiation. 20 Tumor-associated macrophages in the tumor microenvironment usually promote cancer cell proliferation, immunosuppression and angiogenesis, thereby supporting tumor growth and metastasis. Moreover, the abundance of tumor-associated macrophages was correlated with poor prognosis of patients. 21 Macrophages contains different subtypes, including M0, M1, and M2. M0 macrophages is an inactive subtype which were not capable have inflammatory and tumor-related functions. M1 macrophages and M2 macrophages were participated in diverse immune regulations and could derived from M0 macrophages. 22 High contents of M0 macrophages was demonstrated to correlate with poor prognosis of PC patients. 23 The contents of M0 and M1 macrophages were higher in colorectal cancer tissues than paired normal tissues. Moreover, abundance of M1 macrophages was correlated with poor prognosis of colorectal cancer patients. 24 Coculture of macrophages and PC cells in vitro remarkably enhanced the expression of CD163 and programmed death-ligand 1 (PD-L1), which were two risk factors and correlated with poor prognosis of PC patients. 25 Memory B cells are produced in the germinal center response during the T-cell-dependent immune response. Several B-cell malignancies including chronic lymphocytic leukemia, hairy cell leukemia and marginal zone lymphomas were demonstrated to derived from memory B cells. The activation of memory B cells correlated with the progression of these malignancies. 26 The results of GSEA revealed our IRGPs signature mainly involved in ion channel activity and gated channel activity, especially in potassium and voltage gating channels. Ion channels have been shown to play an important role in the occurrence and development of diverse cancers. Potassium channels including four categories, which are voltage gate (Kv), calcium dependent (KCa), two-hole domain group (K2p) and inward rectification (Kir). 27 KCa3.1 is a K + channel activated by Ca 2+ , which was overexpressed in PC and correlated with poor prognosis. It was demonstrated to functioned in the progression of many cancers and is related to the migration and proliferation of PC cells. 28 Kv11.1 was demonstrated to be overexpressed in PC, overexpression of Kv11.1 was correlated with poor differentiation and larger tumor size. 29 Results revealed that 12 IRGPs were overexpressed in PC. Expressions of AGT_OAS1, IACM1_MET, and CHGA_IL22RA1 were associated with PC stage. Mutational analysis revealed that the genetic alteration rate of CD1D_DKK1, IRF3_MET, ZYX_PLAU were high in PC and the most common alteration category were mRNA high. Above results illustrated the correlation between our model with PC. Moreover, KEGG analysis revealed that our IRGPs signature was correlated with cancer pathways. Previous literatures also demonstrated the correlation between our IRGPs signature with the development of cancers including PC. DKK1 is a secreted protein that prohibited the β-catenin-dependent pathway in Wnt signaling by binding to Wnt receptors. Dysregulation of DKK1 was reported to correlated with the prognosis and progression of various cancers including PC. [30][31][32] DKK1 was overexpressed in PC and Dkk1-CKAP4-PI3K/AKT signaling pathway affected PC cells proliferation. 32 IL22RA1 is a member of the class II cytokine receptor family. Previous study uncovered IL22RA1 was overexpressed in PC and the expression was correlated with poor prognosis. IL22RA1/STAT3 signaling promoted stemness and tumorigenicity in PC. 33 ICAM1 has been reported to associated with cancer metastasis, including PC. The activation of IACAM1 is activated by interleukin-35 through the GP130-STAT1 signaling pathway. 34 PLAU is a urokinase plasminogen activator which could promote the proteolytic cascade. It has been demonstrated to associated with the invasion and metastasis of cancers. 35,36 Upregulation of PLAU was correlated with lymph node metastasis and poor prognosis of PC. Low-expression of PLAU prohibited proliferation and migration of PC cells. 37 Moreover, the expression of CD1D, ERAP2, SSTR1, CXCL9, CXCL11, IL1A, EREG also certainly influenced the development of PC. [38][39][40][41][42][43][44][45][46] Hence, these studies demonstrated the correlation between our IRGPs signature with the progression of PC.
The aim of this study was to apply IRGs into the treatment of PC. Despite we have demonstrated the validity of our IRGPs signature and verified the correlation between our model and immune cells, we also acknowledge the limitation in our study. We expect further experiments including western blot or immunohistochemistry could be performed in our study.

| CONCLUSION
In conclusion, our IRGPs signature could predict the prognosis of PC patients, and this prognostic signature will facilitate the application of immunotherapy in PC.