Development and verification of an immune‐related gene pairs prognostic signature in ovarian cancer

Abstract Ovarian cancer (OV) is the most common gynaecological cancer worldwide. Immunotherapy has recently been proven to be an effective treatment strategy. The work here attempts to produce a prognostic immune‐related gene pair (IRGP) signature to estimate OV patient survival. The Gene Expression Omnibus (GEO) and Cancer Genome Atlas (TCGA) databases provided the genetic expression profiles and clinical data of OV patients. Based on the InnateDB database and the least absolute shrinkage and selection operator (LASSO) regression model, we first identified a 17‐IRGP signature associated with survival. The average area under the curve (AUC) values of the training, validation, and all TCGA sets were 0.869, 0.712, and 0.778, respectively. The 17‐IRGP signature noticeably split patients into high‐ and low‐risk groups with different prognostic outcomes. As suggested by a functional study, some biological pathways, including the Toll‐like receptor and chemokine signalling pathways, were significantly negatively correlated with risk scores; however, pathways such as the p53 and apoptosis signalling pathways had a positive correlation. Moreover, tumour stage III, IV, grade G1/G2, and G3/G4 samples had significant differences in risk scores. In conclusion, an effective 17‐IRGP signature was produced to predict prognostic outcomes in OV, providing new insights into immunological biomarkers.

function of major immune cell subsets (including bone marrow cells, macrophages, dendritic cells and T cells) in the OV microenvironment have been reported in response to immunotherapy. 6 Pre-clinical studies have also been conducted in the past decade, with most emphasis on the use of set protein 1 (PD-1) or its ligand (PD-L1) to induce cell death. Research based on tumour biology is exploring new and more effective immunotherapies. There have been several combination therapies, such as checkpoint inhibitors, anti-VEGF therapy, PARP inhibitors and adoptive immunotherapies in OV treatments. 7 Additionally, targeting other immunosuppressive pathways may become a way to enhance responses to immunotherapy.
Recently, based on microarray and RNA-sequencing methods, there have been increasing studies on immune-related prognostic signatures in human cancers. For example, using a cohort of glioma samples with expression information from whole-genome microarrays from the Chinese Glioma Genome Atlas and TCGA databases, researchers constructed a local risk signature associated with immunity that can independently identify patients with a high risk. 8 Wang et al 9 used the TCGA and ImmPort databases to build a 15-gene prognostic model in renal papillary cell carcinoma. Their signature was associated with tumour staging and tumour type.
Other immune-related signatures have also been reported in cancers, including gastric cancer, 10 anaplastic gliomas, 11 breast cancer 12 and pancreatic cancer. 13 However, there have been no reports of immune-related gene signatures in OV.
In the present study, we used the genetic expression profiles and clinical data of OV cases harvested according to the TCGA and GEO databases, which were divided into training, validation and testing sets. Based on the InnateDB database and LASSO regression model, we identified a 17-IRGP signature that was significantly associated with survival. When compared with four other signatures in OV, our 17-IRGP signature had better prognostic prediction performance. In conclusion, an effective 17-IRGP signature was produced to predict prognostic outcome in OV, providing new insights into immunological biomarkers.

| Gene expression data source
The OV datasets in this study were derived from the TCGA 14 16 and GSE26712 (n = 195 samples). 17 They were both performed on the GPL96 platform (Affymetrix Human Genome U133A Array).

| Data processing
To ensure the analysis consistency in different datasets, we downloaded the raw data of GSE14764 and GSE26712 and used the robust multiarray average (RMA) method 18 for homogenization.
Because both GEO datasets were performed by GPL96 platform, we merged them into an independent external validation set for subsequent analysis and performed batch correction to eliminate batch effects. Before constructing the prognostic signature, the original data were pre-processed by 1) removing tumour samples without clinical information and overall survival (OS) was 0 day; 2) removing normal samples; 3) removing genes with low expression (gene expression was missing or 0 in more than half of all samples); and 4) retaining only the expression profiles of immune-related genes. The clinical information of patients in two GEO datasets was shown in Table S7. The pre-processed dataset ultimately contained a total of 594 OV samples.

| Computation of immune-related gene pairs
First, we downloaded the genes associated with immune from the InnateDB dbase (https://www.innat edb.com/). 19 This database records the innate immune-related genes of multiple species that are supported by the literature and manually corrected. Here, we obtained endogenous human IRGs. After sorting (removing genes with duplicate symbols), there were a total of 1039 IRGs (Table S1). IRGPs were constructed based on 1,039 IRGs with the calculation rules as described previously. 20 In brief: If Expr IRG1 < Expr IRG2 , IRGP = 1, else: IRGP = 0.
Expr IRG1 is the expression value of immune gene 1, and Expr IRG2 is the expression value of immune gene 2. According to the above rules, the IRGP value of each dataset was calculated separately, and a total of (1,039 * 1,038)/2 IRGPs were obtained. After removing samples in all datasets with a gene pair value of 0 or 1, we left the residual IRGPs and subsequently selected them as first candidate IRGPs to conduct further analysis ( Figure S1).

| Construction of prognostic IRGP signatures
First, 334 samples were divided into the training or validation set in the TCGA dataset. To avoid the influence of random assignment bias on the stability of the subsequent modelling, all samples were put back into random groupings 100 times. The samples were selected according to a ratio of 1:1 in the training set and validation set. Here, the training set and validation set with no significant differences in the distributions of age, tumour stage, follow-up time and patient survival status were selected for signature construction.
The resulting training set and validation set samples are available in Table 1. The prognostic signature was constructed in two steps as follows: 1) a univariate Cox proportional hazards regression model was employed for the calculation of the relationship between each IRGP and patients' prognosis with log-rank test P-value < 0.05; 2) we used the LASSO regression to further filter the above IRGPs to reduce the numbers in the risk model.

| Functional enrichment analysis
We used the clusterProfiler R package 21 to analyse the IRGPs for molecular function (MF), cellular component (CC), biology procedure (BP) enrichment by studying the Gene Ontology (GO) terms. A q-value <0.05 was set as the threshold for significant enrichment. The dot plot of clusterProfiler displayed the enrichment results. We took the genes with differential expression (DEGs) in groups of high and low risk with the use of the rank test with false discovery rate (FDR) < 0.05, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of DEGs was carried out using the GSVA R package.

| Evaluation of IRGP prognostic signature
Based on the 17 IRGPs obtained by LASSO regression and its regression coefficient, we obtained the risk score of each respective OV patient by: Here, the IRGP was the immune gene pair, and the coefficient was the regression coefficient. We split all OV samples into a low-risk group (Risk-L) or a high-risk group (Risk-H) in line with the median risk score. We used the performance of this signature to plot the receiver operating characteristic (ROC) curves.

| Comparison of IRGP risk signatures with other models
To evaluate the performance of the 17-IRGP risk model and other existing prognostic models, we selected models that were also constructed based on gene expression data for comparison. After consulting the literature, four prognostic risk models were selected: Liu (7-gene signature, 2018, based on GSE32062 for Japan cohort in GEO, GSE63885 for Poland cohort in GEO, and E-MTAB-386 for USA cohort in arrayexpress), 22 Liu (5-gene signature, 2016, based on TCGA database), 23 Hou (6-gene signature, 2018, based on n TCGA database with records of Taxol treatment), 24 and Zhang (2-gene signature, 2018, based on TCGA database). 25 To make the model more comparable, we used the same method for the calculation of the risk score of each OV sample and evaluated the AUC value of each model. According to the median risk score, the samples were also split into Risk-H and Risk-L groups, and the difference in overall survival between the two groups was calculated by log-rank test. Using the concordance index (C-index) and restricted mean survival (RMS), we assessed the prognosis precision of the signature. 26

| Statistical analysis
With the use of R (version 3.6.1) software, we carried out statistical analyses. We used the survival R package for univariate and multivariate risk regression analysis, the glmnet package for LASSO Cox regression model, and the survivalROC package to evaluate the model's ROC curve and calculated AUC values. Moreover, the KMsurv R package was performed to show the K-M curves of grouped samples.
The C-index calculation used the survcomp R package, and RMS cure and RMS time calculation were performed by using the survival and survRM2 packages. In terms of each test, a p-value < 0.05 suggested a significant difference. *P-value <0.05, ** p-value < 0.01, and ***Pvalue <0.001 express statistically significant characteristics.

| Construction of IRGP signatures
In this study, in accordance with the gene expression profiles of OV from the TCGA database, we performed a series of bioinformatics analyses to build a model for prognostic evaluation of IRGPs in OV patients ( Figure 1 Figure S2).

| The evaluation and validation of IRGP signatures for survival prediction
Based on the above 17 IRGPs, we constructed a prognostic risk model for OV patients. Since the overall survival time of patients was distributed over more than 2 years ( Figure S3), the predictive effect of this model on datasets for 1, 3 and 5 years was evaluated.
Next, the IRGPs were adopted for the calculation of the risk score for the respective case in the TCGA training group. The average AUC of the training set was 0.869, and the average AUC of the validation set was 0.712 (Table S3). In addition, the average AUC of all TCGA datasets was 0.778, and the average AUC of the independent testing set was 0.73 ( Figure 2A-D). By dividing OV patients into low-(Risk-L) and high-risk groups (Risk-H) based on the median risk score, we observed that the Risk-H group exhibited noticeably poorer prognosis than the Risk-L group on the training set, validation set, all TCGA set and independent GEO testing set (log-rank test P-value <0.05, Figure 3A-D).

| Functional analysis of immune-related gene pairs
In our signature, the 17 IRGPs contained a total of 29 immune genes.
These genes were significantly enriched in the 'caspase' and 'interferon receptor' families (P-value <0.01, Table 3). The GO annotation results of 29 genes suggested that 4 genes were significantly enriched in interleukin-1 secretion and inflammatory bowel disease biological processes ( Figure 4A). Seven of the 17 IRGPs showed a negative correlation with risk scores, and ten showed a positive correlation ( Figure 4B, Table 4). Here, IRGPs reflected the level of relative expression between two genes. We found that genes The workflow of this study. Production and verification of a gene pair prognostic signature related to immunity in ovarian cancer The prognostic accuracy of signature was estimated using the C-index and RMS related to immune activation/response (P2RY14, CXCL11, CXCR3, MST1R and NOD2) showed relatively low expression levels ( Table   S4). The relatively highly expressed genes (CASP6, CASP7, IL1B, THBS1 and TPST1) were mainly related to the apoptotic process and inflammation response (Table S5). Further enrichment analysis of DEGs in the high-and low-risk groups was performed (Table   S6). The results suggested that the immune response-related pathways (Toll-like receptor and chemokine signal pathway, and others)

An immune-related gene pairs prognostic signature in ovarian cancer
were significantly negatively correlated with risk scores; however, the pathways such as p53 signalling pathway and apoptosis had a positive correlation with the risk scores ( Figure 4C), which seems to indicate that samples from the low-risk group may have higher immune activity (or immune activation status).

| Relationship between prognostic risk signature and clinical features
Using clinical information such as age, tumour stage and grade from the TCGA database, we analysed the relationship between the 17-IRGP risk signature and clinical characteristics. Here, samples from the tumour stage II, III and IV groups showed significant differences in risk scores ( Figure 5A), but no significant difference was observed for prognosis of the high-and low-risk group samples in stage II (Pvalue >0.05). However, significant differences were shown in stages III and IV, indicating that the model may be more suitable for stage III/IV OV patients ( Figure 5B-D). For tumour grades, G1/G2 and G3/ G4 samples had no significant difference in risk scores ( Figure 5E).
However, the prognosis of the high-and low-risk group samples showed significant differences in G1/G2 and G3/G4 ( Figure 5F-G).
We also did not observe a significant correlation between age and risk scores ( Figure 5H).

| The performance of prognostic risk signature in OV subtypes
The TCGA project revealed that surviving gene expression characteristics can predict clinical outcomes and divide OV patients into four transcription subtypes, including differentiated, immunoreactive, mesenchymal and proliferative. 27 We next compared the prognostic performance of our model on these four molecular subtypes.
Low-and high-risk groups of the four subtypes were identified to have significant prognostic differences ( Figure 6A-D). In addition, we also found the best prognosis in immunoreactive subtype, Risk-L samples, while the worst prognosis mesenchymal subtype, Risk-H samples. Moreover, OV was divided into four immune subtypes (C1-C4) 28 based on immune molecular tags. We further compared the model's performance on different immune subtypes (the C3 immune subtype had only 3 samples and was not added to the analysis).
Among the above three immune subtypes, there were also different survival outcomes between the high-and low-risk groups in both the C1 and C2 immune subtypes ( Figure 6E-G).

| Comparison of our prognostic risk signature with other models
Finally, using the same method, we evaluated the AUC values of four existing OV prognostic models at 1, 3 and 5 years. The average AUC  Moreover, we examined using GSE14764 and GSE26712, separately. As shown in Figure S4, we first used the same method in   and TLR5 were reported to be highly expressed in normal and neoplastic ovarian epithelium. 36 However, pathways such as the p53 signalling and apoptosis pathways had a positive correlation with the risk scores. Thus, the above signalling pathways were shown to be closely related to the risk scores of our signature and may be involved in the immune response to OV.

| D ISCUSS I ON
The prognostic signature in our study consists of 29 unique IRGs. CXCL11 (CXC chemokine ligand 11) is a chemokine involved in the progression of various cancers. CXCL11 is overexpressed in CRC tissues and cell lines. Repression of CXCL11 significantly inhibited cell migration, invasion and epithelial-mesenchymal transition (EMT). 37 It was also reported that its down-regulation can inhibit tumour angiogenesis in epithelial OV. 38 High CXCL11 expression was determined to predict worse OS in high-grade serous OV. 39 STAT4 (signal transducer and activator of transcription 4) is a member of the STAT family. Its overexpression was shown to be associated with poor prognosis in OV patients. 40 It has also been reported to  45 Gene pairs are more reliable prognostic markers than single genes because they can be found even in gene expression profiling where no significant prognostic genes are present. 46,47 In our study, we identified an effective 17-IRGP signature for OV patients, which have a better prognostic assessment ability than .
There are also some limitations to our study. First, the robustness of IRGPs was based on the gene expression profiles produced

ACK N OWLED G EM ENTS
We thank the Department of Obstetrics and Gynecology in Shengjing Hospital of China Medical University for technical advice. We also gratefully thank American Journal Experts (https://www.aje.cn/) for editing the present paper (F7C8-397B-2647-877F-4324).

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.

AUTH O R CO NTR I B UTI O N S
BZ and XCN designed experiments. XXM, SW and JL contributed to the literature review. BZ wrote the initial draft of the manuscript.
SKW designed the study and edited the paper. All authors have approved the final version of the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data generated or analysed during this study are included in this published article and its supplementary information files.