Identification of a novel six‐gene signature with potential prognostic and therapeutic value in cervical cancer

Abstract Introduction Cervical cancer has high mortality, high recurrence and poor prognosis. Although prognostic biomarkers such as clinicopathological features have been proposed, their accuracy and precision are far from satisfactory. Therefore, novel biomarkers are urgently needed for disease surveillance, prognosis prediction and treatment selection. Materials Differentially expressed genes (DEGs) between cervical cancer and normal tissues from three microarray datasets extracted from the Gene Expression Omnibus platform were identified and screened. Based on these DEGs, a six‐gene prognostic signature was constructed using cervical squamous cell carcinoma and endocervical adenocarcinoma data from The Cancer Genome Atlas. Next, the molecular functions and related pathways of the six genes were investigated through gene set enrichment analysis and co‐expression analysis. Additionally, immunophenoscore analysis and the QuartataWeb Server were employed to explore the therapeutic value of the six‐gene signature. Results We discovered 178 overlapping DEGs in three microarray datasets and established a six‐gene (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) prognostic signature with stable and excellent performance in predicting overall survival in different subgroups. Intriguingly, the six‐gene signature was closely associated with the immune response and tumour immune microenvironment. The six‐gene signature might be used for predicting response to immune checkpoint inhibitors (ICIs) and the six genes may serve as new drug targets for cervical cancer. Conclusion Our study established a novel six‐gene (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) signature that was closely associated with the immune response and tumour immune microenvironment. The six‐gene signature was indicative of aggressive features of cervical cancer and therefore might serve as a promising biomarker for predicting not only overall survival but also ICI treatment effectiveness. Moreover, three genes (UPP1, ISG20 and GLTP) within the six‐gene signature have the potential to become novel drug targets.


| INTRODUCTION
Cervical cancer ranks fourth in both incidence and mortality in women worldwide, with approximately 500,000 new cases and 250,000 deaths occurring annually. 1 Despite continuous advancements in treatment, including surgery, radiotherapy, chemotherapy and targeted or immunotherapy, 2-4 the 5-year overall survival for cervical cancer is reported to be approximately 50%~65%, 5,6 with a high recurrence rate and poor prognosis. 7 It is known that clinicopathological features, such as International Federation of Gynecology and Obstetrics stage, lymph node metastasis, lymphovascular space invasion (LVSI), deep stromal infiltration and parametrial involvement, are traditional prognostic factors. However, these characteristics are not precise enough or temporally dynamic, 8 and they can only be obtained postoperatively.
With the rapid development of molecular biology techniques, cancer biomarkers for prognosis prediction are increasingly emerging. In cervical cancer, although squamous cell carcinoma antigen (SCCA) 9 has been applied in clinical practice, its specificity and sensitivity are far from satisfactory. 10 For example, SCCA also increases in other types of squamous cell carcinoma [11][12][13] and even in nontumour diseases. 14 In addition, it is not good for identifying and monitoring cervical adenocarcinoma. Therefore, the identification of novel biomarkers and a deeper understanding of the molecular mechanisms of cervical cancer are essential for disease surveillance, prognosis prediction and treatment selection.
High-throughput microarray and RNA sequencing (RNAseq) technologies integrated with bioinformatics analyses have emerged as promising tools for detecting genetic alterations in the processes of tumour formation, recurrence and metastasis. Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih. gov/geo/), 15 a large database of gene chip data maintained by the National Center of Biotechnology and The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) 16 offering comprehensive cancer genomic expression, clinicopathological and survival information, are two commonly used public databases. In-depth exploration of such databases assists researchers in screening, identifying and validating cancer biomarkers, providing strong support for diagnosis, prognostication and individualised treatment selection for different types of tumours; for example, SETD7 was identified as a diagnostic biomarker for colorectal cancer, 17 TFAP2B was identified as a prognostic predictor for endometrial cancer 18 and phospho-STAT1 was identified as a biomarker for patient selection for immunotherapy in breast cancer. 19 However, to the best of our knowledge, previous studies on prognostic biomarkers for cervical cancer have been either based on individual microarray dataset or database, 20,21 thus failed to present convincing results, or failed to identify the function and clinical value of the studied genes. 22 Therefore, more comprehensive research uniting different data sources and delving further into gene function in cervical cancer is urgently needed.
In the current study, to explore potential prognostic biomarkers for cervical cancer, we first analysed three mRNA microarray datasets from the GEO platform and obtained differentially expressed genes (DEGs) between cervical cancer tissues and normal tissues. Subsequently, DEGs with prognostic value were identified with the data from TCGA. Notably, we established a novel six-gene (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) prognostic signature and discovered for the first time that this signature was associated not only with intrinsic aggressive features but also with the tumour immune microenvironment. Furthermore, we also investigated the therapeutic value of the six-gene signature and found that it could help predict the response to immune checkpoint inhibitors (ICIs) and that three genes (GLTP, ISG20 and UPP1) in the six-gene signature might serve as potential drug targets for cervical cancer.

| Data collection and preprocessing
Microarray datasets (GSE7803, GSE9750, GSE13 8080 and GSE12 7265) were obtained from GEO (https://www.ncbi. nlm.nih.gov/geo/). 15 Detailed information is summarised in Table S1. Gene probes were converted into corresponding gene symbols according to the annotation information on the platform. The Robust Multi-array Average method was used to normalise and log2-transform data, which were prepared for the next-step screening to identify DEGs. The cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) dataset of the TCGA database were downloaded from Xena (https://xena.ucsc.edu/) including normalised gene expression RNA-seq fragments per kilobase of exon model per million mapped fragments (FPKM) data, clinicopathological data and survival data.
biomarker for predicting not only overall survival but also ICI treatment effectiveness. Moreover, three genes (UPP1, ISG20 and GLTP) within the six-gene signature have the potential to become novel drug targets.

K E Y W O R D S
cervical cancer, gene signature, prognosis, tumour immune microenvironment | 6883 QU et al.

| DEG screening in GEO
The 'Limma' package of R software (version 3.6.2) was applied to identify DEGs in three microarray datasets (GSE7803, GSE9750 and GSE13 8080) on the GEO platform with thresholds of |log 2 fold change| >1 and adjusted p value <0.05. Three volcano plots demonstrating the DEGs and a Venn diagram indicating overlapping genes were generated with R software.

| Construction and evaluation of the six-gene prognostic signature
We first applied univariate Cox regression analysis to identify DEGs with prognostic value in the TCGA CESC dataset. Then, a multivariate Cox regression analysis was conducted with the 'step Akaike information criterion (AIC)' method. Genes with statistical significance (p < 0.05) were selected to establish a multiple-gene prognostic signature. The risk score predicting overall survival was calculated as follows: ∑ coefficient * gene expression. To assess the accuracy of the prognostic signature, time-dependent receiver operating characteristic (ROC) curves predicting survival at different time points were drawn and the area under the curve (AUC) value was calculated in R software using the 'survival ROC' package. Additionally, the patients were divided into a high-risk group and a low-risk group according to the median risk score. Kaplan-Meier curves were generated and the log-rank test was performed between these two groups with the 'survival' package of R software. To examine whether the six-gene signature we constructed was an independent risk factor, we performed univariate and multivariate Cox regression analyses employing both the six-gene signature and clinicopathological features. A nomogram was used to visualise the multivariate Cox model and calibration curves were generated to test the concordance of the prediction model at 1, 3, 5 and 10 years. In addition, to evaluate the robustness of the six-gene prognostic signature, we performed an extra subgroup analysis and the results were presented as a forest plot. The nomogram, calibration curves and forest plot were generated with R software with the packages 'survival', 'rms' and 'forestplot', respectively.

| Gene set enrichment analysis
Gene set enrichment analysis (GSEA) is a computational method that determines whether a defined set of genes exhibit statistical and concordant differences between two phenotypes. We used GSEA software 23,24 (version 4.0.3) to explore the enrichment of hallmark gene sets in patients with high-risk and low-risk according to the six-gene signature. Furthermore, pathways related to high expression and low expression patterns of the six studied genes were also investigated individually.

| Analysis of the correlations between the prognostic genes and tumour-infiltrating immune cells
RNA-seq data were transformed into abundances of immune cells via CIBERSORT 25 in R software. We analysed the correlation between the six-gene signature risk score as well as single gene expression values and the abundances of immune cells. Scatter plots were generated in R software when a correlation (R > 0.3 and p < 0.001) was detected. Associations between gene expression and important immune markers of a variety of immune cells, including monocytes, tumourassociated macrophages, M1 macrophages, M2 macrophages,

| Immunophenoscore analysis
The immunophenoscore (IPS) was derived with machine learning algorithms from the representative gene expression in four major categories of factors related to immunogenicity, including effector cells, immunosuppressive cells, major histocompatibility complex (MHC) molecules and immunomodulators. The IPS was calculated based on the z-scores of gene expression using a scale ranging from 0 to 10. It has been validated that higher IPSs are associated with stronger immunogenicity, indicating a better response to ICIs. 28 The IPSs of CESC patients were acquired from The Cancer Immunome Atlas (https://tcia.at/home).

| Searching the QuartataWeb Server for potential drugs
The QuartataWeb Server 29 (http://quart ata.csb.pitt.edu.) is a user-friendly website designed for polypharmacological and chemogenomics analyses, whose original data sources are DrugBank 30 and STITCH. 31 Users can conveniently obtain access to information on experimentally verified protein-drug/chemical interactions and computationally predicted interactions. With the help of the QuartataWeb Server, we identified the potential drugs targeting the proteins encoded by the genes we were interested in.

| Statistical analysis
Continuous variables are described as the mean ± SE or the median, and categorical variables are presented as the frequency (n) and proportion (%). Differences in the variables between groups were tested using t tests, nonparametric tests, chi-square tests, or ANOVA tests. All hypothetical tests were two-sided and a p value less than 0.05 was considered statistically significant. Statistical analyses were performed with R software (version 3.6.2), SPSS (version 25) and GraphPad Prism (version 8.2.1).

| A total of 178 overlapping DEGs were identified from 3 GEO datasets of cervical cancer
The whole study was conducted according to the flow chart ( Figure 1). We analysed 3 cervical cancer datasets (GSE7803, GSE9750 and GSE13 8080) on the GEO platform, identifying 543 (293 upregulated and 250 downregulated), 1602 (396 upregulated and 1206 downregulated) and 914 (380 upregulated and 534 downregulated) DEGs, respectively. Notably, 178 DEGs overlapped amongst the three datasets, including 64 upregulated genes and 114 downregulated genes (Figure 2A-D). Whether these DEGs play significant roles in the development of cervical cancer and offer great prognostic value were the next questions we sought to answer.
Up-regulated genes Down-regulated genes

ROC of the six-gene signature
1-Specificity 3.2 | A promising six-gene prognostic signature was constructed with TCGA data.
To explore prognostic biomarkers, we performed survival analysis using the CESC dataset from the TCGA database, which includes 306 cervical cancer samples with RNA-seq data, survival data and clinicopathological data (Table 1). First, we analysed the 178 DEGs by univariate Cox regression analysis and found 36 DEGs with potential prognostic value. Then, multivariate Cox regression analysis was performed with the 'step AIC' method. Of note, we found that six genes (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) had remarkable statistical significance (p < 0.05) which were subsequently included in the overall survival prediction model (Table 2). Notably, we established a six-gene signature that could predict overall survival for the first time and the risk score was calculated as follows: 0.3322*UPP1-0.6835*GLTP + 0.2347*SLC24A3+0.2777*SPP1-0.4078*ISG20-0.3020*APOC1 (concordance = 0.764, SE = 0.028). Additionally, we divided 306 patients into a high-risk group and a low-risk group using the median risk score as the cut-off value. Kaplan-Meier curves and log-rank tests were employed to compare the overall survival between high-risk and low-risk patients ( Figure 2E). The results showed that a higher risk score predicted a shorter overall survival. Time-dependent ROC curves for the six-gene signature were generated ( Figure 2F) and the AUC values for the 1-, 3-, 5-and 10-year survival predictions were 0.786, 0.777, 0.808 and 0.748, respectively. Collectively, these results suggested that the novel six-gene (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) panel we constructed might serve as a promising prognostic biomarker.
3.3 | The six-gene signature was an independent prognostic factor.
To further examine whether the six-gene signature was an independent prognostic factor, we performed not only univariate but also multivariate Cox regression analysis (Table 3). Clinical stage (stage III and IV vs. stage I and II), lymph node status (N1 vs. N0), lymphovascular infiltration (present vs. absent), metastasis (M1 vs. M0) and the six-gene signature proved to be prognostic factors according to the univariate model, while the multivariate model included LVSI and the six-gene signature (concordance = 0.86, SE = 0.033), demonstrating that the six-gene signature could serve as an independent prognostic factor. Then a nomogram was constructed based on LVSI and the six-gene signature to predict the overall survival of cervical cancer patients and the calibration curves predicting 1, 3, 5, 10-year overall survival (OS) were illustrated (Figure 3). To evaluate the stability and reliability of the six-gene signature, we conducted a subgroup analysis ( Figure 4A). A higher risk score predicted poorer overall survival in most subgroups, with a p value less than 0.05. Overall, we proved that the six-gene panel can be an independent prognostic biomarker.

| The six-gene signature we established was related to aggressive features.
To further analyse whether the six-gene signature was related to clinicopathological features, we conducted the statistical analysis of the differences between the six-gene signature low-risk group and high-risk group (Table S2). We noticed that a high-risk score was linked with lymph node metastasis (p = 0.022) and showed a trend for being related to the advanced clinical stage (p = 0.076). In addition, we compared the risk scores in different subgroups ( Figure 4B,C). Discrepancies existed between the early-stage group and advanced-stage group (p = 0.005), further confirming the association between the six-gene panel and clinical stage of cervical cancer. But risk scores had no differences between the N1 group and N0 group (p = 0.3253), indicating that the correlation between the six-gene signature and lymph node metastasis was not robust and convincing enough. The statistical difference in Table S2 (p = 0.022) was probably due to the missing data. Additionally, by analysing another cervical cancer dataset on the GEO platform (GSE 127265), the association between the six-gene signature risk score and more advanced clinical stage was confirmed ( Figure S1). In a word, a high six-gene signature risk score was linked to the advanced clinical stage representing the aggressive features of cervical cancer. Therefore, we wondered how these six genes enact their ability to promote tumour aggressiveness.  3.5 | The six-gene signature was found to be closely associated with immune-related pathways.
Subsequently, we deeply examined the gene functions and pathways related to the signature with GSEA ( Figure 5). We found that epithelial-mesenchymal transition, TGFbeta pathway and hypoxia were enriched in the high-risk group, while oxidative phosphorylation, interferon-alpha response and interferon-gamma response were enriched in the low-risk group. These results indicated that the poor prognosis of the high-risk score group was related to the activation of common carcinogenesis pathways; in contrast, the better prognosis of the low-risk score group was linked to a positive immune response. Additionally, we also studied the functional pathways correlated with the six genes individually. Interestingly, APOC1, ISG20, SPP1 and UPP1 were found to be closely connected with the immune response and inflammation-related pathways. Collectively, these results implied that the six genes might regulate the immune microenvironment or mediate immune reactions, thus influencing tumour aggressiveness and disease prognosis. To further confirm whether the six-gene signature was indicative of the immune microenvironment, we explored the association between the six-gene signature and tumour immune microenvironment in terms of various tumour-infiltrating immune cells (Table 4) through CIBERSORT and correlation analysis. We observed that the six-gene risk score was positively correlated with the abundances of resting CD4 + memory T cells, M0 macrophages and activated mast cells but negatively associated with the abundances of CD8 + T cells and resting mast cells. We also investigated the correlations between the six genes and the abundances of immune cells individually ( Figure S2). Notably, we found that APOC1 had strong links with M2 macrophages, activated DCs, CD4 + resting memory T cells and CD8 + T cells. To further test whether the six genes potentially regulated immune cells, we conducted co-expression analysis between the six genes and immune markers demonstrating the state and function of immune cells ( Figure 6). We discovered that APOC1 had a strong correlation with immune markers pf monocytes, M2 macrophages, DCs and T cells, especially exhausted T cells. Of note, APOC1 had the strongest association (R = 0.744) with HAVCR2 (also named TIM-3), a marker of exhausted T cells. Additionally, ISG20 exhibited a similar expression pattern as APOC1, while SPP1 was positively associated with M2 macrophages. Overall, we confirmed that the novel sixgene signature we established was closely associated with the tumour immune microenvironment.

| The six-gene signature exhibited potential therapeutic value
We sought to determine whether the six-gene signature could provide therapeutic value. First, we used the IPS to predict ICI effectiveness within patients classified as high-risk and low-risk according to the six-gene signature ( Figure 7A-D). The low-risk group exhibited higher IPSs, as well as higher IPS-CTLA4 and IPS-PD1/PDL1/PDL2+CTLA4 scores, than the high-risk group, representing higher immunogenicity, thus predicting a better response to ICIs including CTLA4 blockade and PDL1 blockade. In summary, the six-gene signature might assist in predicting ICI treatment effectiveness. Second, we searched the QuartataWeb Server for potential drugs targeting the proteins encoded by the six prognostic genes. We identified 1, 5 and 2 known small molecule drugs targeting ISG20, GLTP and UPP1, respectively (Table S3).
Notably, fostamatinib and phenethyl isothiocyanate targeted both UPP1 and ISG20, and lactose interacted with ISG20 and GLTP ( Figure 7E). Overall, the six-gene signature exhibited an appreciable therapeutic value in predicting ICI treatment effectiveness and highlighting potential drug targets (GLTP, UPP1 and ISG20).

| DISCUSSION
Despite the rapid development of molecular biology and sequencing technologies, previous studies exploring novel biomarkers [32][33][34] in cervical cancer were mostly based on a single microarray dataset or database with limited validity 20,21 or lacked a clear description of molecular function. 22 In the present study, by comprehensively integrating three GEO datasets and CESC data from the TCGA database, we established a novel six-gene (APOC1, ISG20, SPP1, UPP1, GLTP and SLC24A3) prognostic signature. The six-gene signature was related to the clinical stage and proved to be an independent prognostic factor for cervical cancer with stable performance in different subgroups. Subsequently, a strong link between tumour immune environment and the six genes had been discovered. Intriguingly, the six-gene risk score was positively correlated with the abundances of resting CD4 + T cells and activated mast cells but negatively correlated with the abundances of CD8 + T cells, activated CD4 + T cells and resting mast cells. That is, a lower risk score represented higher abundances of CD8 + T cells and activated CD4 + T cells, which have been widely reported to be effector cells [35][36][37][38] in the tumour microenvironment generally leading to a better prognosis. In addition, significant associations between vessel density and mast cell abundance have been found in cervical carcinoma, 39,40 providing an explanation for the link between high six-gene risk score and poor prognosis: high activated mast cell abundance and increased angiogenesis.
To determine the therapeutic value of the six-gene signature, The absolute value of the correlation was more than 0.3 and p value was less than 0.001 in the rows with bold texts implying a relatively robust relation with statistical significance (in bold).

T A B L E 4
Correlations between the sixgene signature and 22 types of immune cells we performed IPS analysis and concluded that a lower sixgene risk score predicted a better response to ICIs than a higher six-gene risk score. Additionally, we identified drugs targeting GLTP, UPP1 and ISG20, offering new directions for drug development.
Gene functions and mechanisms are of vital importance in determining clinical value and applications. Hence, we placed much emphasis on investigating how the six genes from our signature function and what pathways they mediate (Table S4). In the field of cervical cancer, a previous study found that SPP1 was upregulated in cancer tissues compared with normal tissues. 41 Experiments proved that inhibiting SPP1 could inhibit proliferation, induce apoptosis and improve the chemosensitivity of cervical cancer cells. In addition, ISG20 was also reported to be differentially expressed in cervical cancer tissues. 42 However, the other four genes within F I G U R E 6 Heatmap of the six prognostic genes expression correlated with immune markers colours representing the correlation coefficients (R) and '*'standing for p value. ****p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05. The correlations were adjusted by tumour purity the signature have rarely been studied in cervical cancer. In our current study, through GSEA, we found that protective genes for overall survival, especially APOC1 and ISG20, had positive correlations with immune-related pathways, including the interferon-alpha pathway, interferon-gamma pathway and complement pathway, implying that an effective immune response might lead to good prognosis. Moreover, the risk genes predicting poor survival, particularly SPP1 and UPP1, were linked with common carcinogenesis pathways as well as immune-related pathways such as the inflammatory response and TNF signalling via NFKB. In summary, our study discovered that the six prognostic genes, especially APOC1, ISG20, SPP1 and UPP1, had the potential to regulate immune response thus contributing to cervical cancer aggressiveness. These findings greatly expanded our understanding of the function of these genes in cervical cancer.
We also studied whether the expression of the six prognostic genes reflected the immune microenvironment or immune response. Through co-expression analyses, we verified that APOC1, ISG20 and SPP1 had notably strong correlations with tumour-infiltrating immune cells, highlighting new directions for further research. (1) APOC1, as a protective factor for overall survival, was found to be closely related to the abundances of multiple tumour-infiltrating immune cells, such as M2 macrophages, activated DCs and CD8 + T cells. Correspondingly, APOC1 had strong associations with 18 immune markers. Of note, APOC1 had an extremely strong relationship with HAVCR2 (also named TIM-3), a marker of exhausted T cells. In addition, APOC1 was positively correlated with the abundance of M2 macrophages but negatively correlated with PTGS2, a marker of M1 macrophages, indicating a potential association with F I G U R E 7 (A) IPS, (B) IPS-CTLA4, (C) IPS-PD1/PDL1/PDL2 and (D) IPS-PD1 PD1/PDL1/PDL2+CTLA4 in the six-gene signature high-risk group and low-risk group. (E) The known (grey thick lines) and predicted (red thin lines) drugs targeting UPP1, ISG20 and GLTP. IPS, immunophenoscore macrophage polarisation, which is critical in the antitumour response. 43,44 Hence, we suggested that APOC1 might affect tumour malignancy and cancer prognosis by regulating T cell exhaustion and macrophage polarisation. (2) Additionally, we observed that ISG20, as a protective gene for survival, was correlated with T cells and DCs. As ISG20 participates in the antiviral response, we hypothesised that ISG20 might participate in the anti-HPV immune response by activating T cells or DCs and thus influence the overall survival of cervical cancer patients. (3) Moreover, we discovered that SPP1 expression was positively correlated with the abundance of M2 macrophages as well as the expression of corresponding immune markers (CD163 and VSIG4). M2 macrophages exert anti-inflammatory and pro-tumour effects, promoting the progression and metastasis of a variety of tumours, such as breast cancer and gastric cancer. 45,46 Therefore, we hypothesised that SPP1 might contribute to tumour aggressiveness and influence cervical cancer prognosis by regulating M2 macrophages, but this hypothesis requires further investigation and validation.
The present study had several limitations. First, extra clinical cohorts with large-scale samples were not recruited to accomplish validation of the newly constructed six-gene signature. Second, the value of the six-gene signature in immunotherapy prediction was not shown in clinical cohorts. Finally, experiments to verify the hypothesis we proposed regarding the function and mechanisms of the six prognostic genes, as well as the predicted drug-protein interactions, were not performed yet. In the future, clinical validation and experiments would be conducted to verify our current conclusions.
In conclusion, our current study represented the first attempt to construct a novel six-gene (APOC1, GLTP, ISG20, SPP1, SLC24A3 and UPP1) signature that comprehensively integrated GEO and TCGA data. With a deeper investigation of the gene functions and pathways, we found that the sixgene signature was closely associated with the immune response and tumour immune microenvironment. The six-gene signature was indicative of aggressive features of cervical cancer and could serve as a promising biomarker for predicting not only overall survival but also ICI treatment effectiveness. Moreover, three genes (UPP1, ISG20 and GLTP) within the signature have the potential to become novel drug targets.