The identification of six risk genes for ovarian cancer platinum response based on global network algorithm and verification analysis

Abstract Ovarian cancer is the most lethal gynaecological cancer, and resistance of platinum‐based chemotherapy is the main reason for treatment failure. The aim of the present study was to identify candidate genes involved in ovarian cancer platinum response by analysing genes from homologous recombination and Fanconi anaemia pathways. Associations between these two functional genes were explored in the study, and we performed a random walk algorithm based on reconstructed gene‐gene network, including protein‐protein interaction and co‐expression relations. Following the random walk, all genes were ranked and GSEA analysis showed that the biological functions focused primarily on autophagy, histone modification and gluconeogenesis. Based on three types of seed nodes, the top two genes were utilized as examples. We selected a total of six candidate genes (FANCA, FANCG, POLD1, KDM1A, BLM and BRCA1) for subsequent verification. The validation results of the six candidate genes have significance in three independent ovarian cancer data sets with platinum‐resistant and platinum‐sensitive information. To explore the correlation between biomarkers and clinical prognostic factors, we performed differential analysis and multivariate clinical subgroup analysis for six candidate genes at both mRNA and protein levels. And each of the six candidate genes and their neighbouring genes with a mutation rate greater than 10% were also analysed by network construction and functional enrichment analysis. In the meanwhile, the survival analysis for platinum‐treated patients was performed in the current study. Finally, the RT‐qPCR assay was used to determine the performance of candidate genes in ovarian cancer platinum response. Taken together, this research demonstrated that comprehensive bioinformatics methods could help to understand the molecular mechanism of platinum response and provide new strategies for overcoming platinum resistance in ovarian cancer treatment.

Associations between these two functional genes were explored in the study, and we performed a random walk algorithm based on reconstructed gene-gene network, including protein-protein interaction and co-expression relations. Following the random walk, all genes were ranked and GSEA analysis showed that the biological functions focused primarily on autophagy, histone modification and gluconeogenesis.
Based on three types of seed nodes, the top two genes were utilized as examples.
We selected a total of six candidate genes (FANCA, FANCG, POLD1, KDM1A, BLM and BRCA1) for subsequent verification. The validation results of the six candidate genes have significance in three independent ovarian cancer data sets with platinumresistant and platinum-sensitive information. To explore the correlation between biomarkers and clinical prognostic factors, we performed differential analysis and multivariate clinical subgroup analysis for six candidate genes at both mRNA and protein levels. And each of the six candidate genes and their neighbouring genes with a mutation rate greater than 10% were also analysed by network construction and functional enrichment analysis. In the meanwhile, the survival analysis for platinumtreated patients was performed in the current study. Finally, the RT-qPCR assay was used to determine the performance of candidate genes in ovarian cancer platinum response. Taken together, this research demonstrated that comprehensive bioinformatics methods could help to understand the molecular mechanism of platinum response and provide new strategies for overcoming platinum resistance in ovarian cancer treatment.

K E Y W O R D S
Fanconi anaemia, homologous recombination, ovarian cancer, platinum treatment

| INTRODUC TI ON
Ovarian cancer is the leading killer of gynaecological malignancy in women worldwide. In spite of debulking surgery and medical treatment of platinum, drug resistance is still the stumbling block to ovarian cancer therapeutics. 1 Patients who have evidence of disease progression on primary therapy or after a treatment-free interval of less than 6 months are considered platinum resistant, and those with evidence of relapse or develop progression after a treatment-free interval of exceeding 6 months can be called platinum sensitive. 2 After standard treatment, although more than 70% of patients respond to chemotherapy with cisplatin, a large number of patients will relapse and develop drug resistance within two years, with a survival rate of about 40%. 3 Hence, exploring the molecular mechanisms underlying chemoresistance and identifying risk signatures are the key strategy to accelerate advancement in ovarian cancer therapy. 4 Resistance to platinum-based treatment can be intrinsic or acquired, and it is caused by a variety of mechanisms in ovarian cancer. 5 The researchers have performed a large amount of low-throughput experiments to analyse the resistance-related mechanism and identify potential biomarkers for overcoming platinum resistance in ovarian cancer. For example, Wu et al discovered that Akt inhibitor SC66 was used in a NOD-SCID xenograft mouse model and a group of eight ovarian cancer cell lines. They found that SC66 regulated collagen type XI alpha 1 chain by inhibiting Akt/mTOR signalling, and it could enhance cell sensitivity to drugs and inhibit proliferation/invasion. 6 An additional study by Hu et al discovered that interleukin 17 receptor B (CRL4) was significantly increased in cisplatin-resistant ovarian cancer cells, and knocking down CRL4 with shRNA reversed cisplatin resistance in ovarian cancer cells. CRL4 has been proved to play an important role in apoptosis and drug resistance by targeting baculoviral IAP repeat containing 3 (BIRC3) in ovarian cancer cells. 7 However, these studies were performed at the low-throughput level for identifying single gene signature.
Moreover, it remains unclear whether the mechanism is universal in all patients, because of the small number of tissue samples in researches. 8,9 Therefore, it is necessary to employ a large number of tissue samples and integrate multiple sets of data for analysis in the research.
Homologous recombination (HR) and Fanconi anaemia (FA) are two of the major DNA repair pathways, and some researches have indicated that both of these pathways are related to platinum resistance in ovarian cancer. It is well known that the main target of platinum agents is DNA, which mainly plays a role in DNA damage, thereby activating the DNA damage response. 10 However, if DNA fails to repair the damage, tumours or activated cell death will occur. Moreover, changes in these repair pathways will contribute to the tumour sensitive or resistant to platinum agents. 11 HR is an error-free DNA repair system that is activated in the case of DNA double-stranded damage. 12 In the past few years, more than 50% of patients with high-grade serous ovarian cancer have been proven to have defects in HR repair. Because of the existence of this defect, this tumour type has a very high sensitivity to platinum agents. 13 16 However, the inner associations between HR and FA pathways, and key risk genes within these functions were not explored.
Recently, many researches were performed based on global network to explore the complex biological mechanism involved in ovarian cancer. Through public databases, Wang et al obtained the data on lncRNAs, mRNAs and miRNAs with differential expression, compared with normal ovarian tissue and epithelial ovarian cancer. They used the bioinformatics method to predict interactions of lncRNAs, mRNAs and miRNAs and then built the LINC00284related ceRNA network. Based on biological function analysis, they found that the LINC00284-related ceRNA network was related to epithelial ovarian cancer carcinogenesis, and finally confirmed that LINC00284 was a new potential prognostic biomarker for epithelial ovarian cancer. 17 In another study, authors downloaded three sets of expression profiles from the Gene Expression Omnibus (GEO) database, containing information on ovarian cancer tissues and normal tissues. A total of 190 differentially expressed genes were identified. The protein-protein interaction (PPI) network was constructed by the identified differentially expressed genes.
Ultimately, the study identified the 17 most closely related genes among differentially expressed genes from the PPI network. 18 Network-based random walk algorithm was developed to identify candidate genes by use of a global network distance measure. 19 This algorithm not only provides an improved method for risk gene selection but also added core seed genes integration framework in global mechanism exploration.
In this study, we first acquired the HR-related pathway genes and FA-related FANC-BRCA pathway genes in the Molecular Signatures Database (MsigDB) and classified these genes into three types, including HR only gene(HO-G), HR/FA common gene (HFC-G) and FA only gene (FO-G). Secondly, we randomly walked three types of genes and seed nodes in the complex disease-specific gene-gene network to optimize risk genes. According to the random walk results, the top two genes from each seed node were selected as instances of candidate genes' verification analysis. Finally, a total of six candidate genes were analysed and verified to different degrees in multiple databases. Notably, the quantitative real-time polymerase chain reaction (RT-qPCR) assay was performed to verify the differential expression of mRNA levels in cisplatin-resistant and cisplatin-sensitive ovarian cancer cell lines.

| Publicly expression data sets and signature genes
The expression data sets were downloaded from the public data-

| The integrated complex network
We integrated the PPI network and ovarian cancer co-expression network to form the complex network. The data of PPI networks 21 were

| Random walk algorithm based on integrated network
Based on the global combined network which contained both PPI interaction and co-expression relations, we further performed a global risk impact analysis to optimize mRNAs by using the random walk algorithm. The random walk algorithm was developed and utilized for multiple types of disease mechanism analysis and displayed more advantages in risk or prognostic genes identification 19,22,23 on the basis of the global network. Based on the reconstructed network mentioned above, the functional genes from HR/FA pathways were regarded as seed nodes. Considering the difference between these two functions, we, respectively, annotated each of the three types of genes (HO-G, HFC-G and FO-G) into this global network and the corresponding annotated genes were treated as seed nodes. The random walk algorithm was then used to evaluate the global risk impact of seed nodes on each component as follows: where W is the column-normalized adjacency matrix of the global integrated network, which consisted of 0 and 1. P t was a vector, in which a node in the global network held the probability of finding itself in this process up to step t. The initial probability vector P 0 was constructed in such a way, where equal probabilities were assigned to all seed nodes and the sum of their probabilities was equal to 1. Additionally, the restart of the walker at each step was the probability r (r = .7). When the difference between P t and P t+1 fell below 10 −6 , the probabilities reached a steady state. Finally, each gene in the global network was given a score according to the values in the steady-state probability vector P ∞ . In this study, the random walk process was performed three times to, respectively, obtain different optimization order for all genes.

| Enrichment analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed with GSEA function in clusterProfiler R package. 24 The ordered gene list after random walk algorithm was treated as an input file, which reflected the comprehensive impact of seed nodes based on network topology.
And false-discovery rate adjusted P values were calculated by using Benjamini-Hochberg correction.

| Oncomine analysis
Oncomine (www.oncom ine.org) is a database with powerful functions for analysing expression differences, which includes 715 gene expression data sets from 86 733 cancer tissues and normal tissues. 25 In this study, TCGA, Yoshihara, Adib, Bonome, Welsh, Hendrix and Lu Ovarian 26-32 were used to analyse the differential expression of the six candidate genes. The raw data downloaded from the Oncomine database were plotted as box plots using GraphPad Prism 7, and the P < .05 was considered to be significantly different.

| cBio Cancer Genomics Portal analysis
The cBio Cancer Genomics Portal (c-BioPortal) (http://cbiop ortal. org) is an open-access resource for interactive exploration of multiple cancer genomics data sets. 34 In the TCGA PanCancer Atlas study, we searched the parameters of mutations, CNV and mRNA expression and then performed an in-depth analysis of the six candidate genes. The Network tab visualized the interaction network of the six candidate genes and their neighbouring genes. We also selected the neighbouring genes with a mutation frequency greater than 10% for enrichment analysis using DAVID webtool (https://david. ncifc rf.gov/home.jsp). And the GO biological process (BP) and KEGG were considered in enrichment analysis. Cytoscape (version 3.5.1) was utilized to build and visualize the gene-function networks of enrichment analysis results by DAVID.

| KM-Plotter database
Kaplan-Meier Plotter (KM-Plotter) database (http://kmplot.com/ analy sis/) is widely used to analyse the clinical effects of individual genes on the survival rate of different cancer types, mainly for the survival discovery and validation of meta-analytic markers. 35 We explored the research on 'ovarian cancer' in the database. In this research, we analysed the clinical effects of both overall survival and disease-free survival for six candidate genes. It is worth noting that we compared the survival analysis of all patients in the study with those who had undergone platinum-based chemotherapy. In all survival analysis, we considered P < .05 as a significant result.

| Cancer cell lines and culture
Human ovarian cancer cell line A2780 and cisplatin-resistant cell line A2780DDP were purchased from Shanghai Chuanqiu Biotechnology Co., Ltd. The A2780 and A2780DDP cell lines were cultured in RPMI (Beijing Labgic Technology Co, Ltd) 1640 with 10% foetal bovine serum in a humid incubator containing 5% CO 2 at 37°C.  Table S1. We calculated the level of expression of six candidate genes in the cells by the 2 −ΔΔCT method.

| Optimization of candidate genes based on HR/ FA function and integrated network
To investigate platinum resistance in ovarian cancer, 56 genes of the FA-associated genes and 53 genes of HR-associated genes were extracted and intersected. As a result, 15 of these genes shared by FA and HR were regarded as HR genes (HFC-G), indicating the close relationship between these two functions. Forty-one genes were only associated with FA-associated genes, but not with HR (FO-G), and 38 genes were reversed (HO-G). Furthermore, we integrated the PPI network from 12 databases and gene co-expression network calculated based on TCGA data set (see Section 2). The robustness of PPI interactions was investigated using integrated networks with the interaction from at least two databases.
Finally, these three types of genes were, respectively, regarded as seed genes and annotated into the integrated network. The random walk algorithm was performed three times to select the candidate genes (see Section 2). And detailed random walk results are provided in Table S2. The overall workflow of this study is shown in Figure 1.

| GSEA functional analysis of candidate genes
Each gene involved in the network scored from 0 to 1 after the random walk, suggesting the association with seed functions. To further understand the biological functions driven by seed nodes, all genes were ranked in descending order and then the genes with a score of 0 were eliminated. Functional analysis of BP and KEGG to the ordered gene using GSEA analysis was subsequently performed.
As shown in Figure 2A, the result revealed that three types of ordered genes were all enriched in many important BP terms, including autophagy, autophagy mechanism, histone modification and active regulation of catabolic processes. These biological functions are closely related to the examination of platinum resistance in ovarian cancer. 36 As shown in Figure 2B, the result of KEGG, there were many significant biological pathways from FO-G, such as inositol phosphate metabolism, glycolysis/gluconeogenesis, cysteine and methionine metabolism. Some researchers have studied the platinum resistance of ovarian cancer from the perspective of glucose metabolism and found that the glucose metabolism pathway has an important effect on overcoming platinum resistance of ovarian cancer. 37 The similar KEGG results were also observed for HFC-G and HO-G (see Figure S1).

| Independent validation of candidate genes in GEO database
Three independent GEO validation sets with available information of platinum response (see Section 2) were obtained to test the performance of the candidate genes from three types of seed nodes. We calculated the -log(P) mean of the three types of candidate genes in each data set, respectively. Based on the average of the three sets of data sets, we comprehensively obtained the average again. It can be seen from Figure S3 that the top ten genes all displayed good effects. The results of the genes ranked 1, 2, 4, 8 were significant, and the scores of the 5th and 6th genes were close to meaningful.
As the good results were provided with the top two genes, we took the top two genes from three types of seed node as examples for the subsequent verification of candidate genes.  Figure 2C-E. It was recognized that compared with the resistant group, the expression level of the six candidate genes was increased in the sensitive group, indicating that F I G U R E 1 The overall workflow these candidate genes were platinum-sensitive genes. Additionally, we plotted the direct interaction network for two genes, FANCA and POLD1, as an example. As shown in Figure 2F,G, the most interaction genes of FANCA and POLD1 were also the seed genes. Therefore, these seed genes produced more impacts on each other from both co-expression relations and PPI interactions.

| mRNA-and protein-level differential expression analysis of candidate genes
The differential expression analysis for the six candidate genes was employed by the Oncomine and UALCAN databases. A total of seven available studies were in the Oncomine database, and distinct studies have been conducted for different these genes. All the results are shown in Table 1. In detail, BLM was overexpressed in ovarian cancer tissues in the seven studies (see Figure 3). BRCA1 showed significant differences between ovarian cancer tissues and normal tissues in TCGA, Bonome Figure 4. Taking KDM1A as an example, the protein expression level of KDM1A in ovarian cancer tissues was significantly higher than that in normal tissues ( Figure 4C). As shown in Figure 4A, there were significant differences between the three groups in the stage. It could not be confused from Figure 4B that the two groups were significantly different in the ethnic grouping. And compared with the 80-100-yearold group, KDM1A mRNA expression was decreased at the age of 21-40 years old ( Figure 4D). In conclusion, the differential expression analysis from both mRNA and protein levels indicated that these candidate genes are important biomarkers for predicting unfavourable biological behaviour in ovarian cancer formation and development.

| Mutation-driven network and survival analysis of candidate genes
Next, we made a thorough inquiry to the bio-interaction network and survival analysis of the six candidate genes. In the current study, the 'network' function in the c-BioPortal database was used to screen out neighbouring genes which mutations exceeded 10% of the six candidate genes, whereas BP and KEGG enrichment analyses were also assessed for these genes by DAVID software (Tables S3 and S4).
The results showed that three of the candidate genes (BLM, BRCA1 and KDM1A) were mainly located in nucleosome tissues, and they were mainly involved in chromatin assembly, chromatin assembly or disassembly and nucleosome assembly. FANCA, FANCG and POLD1 were mainly involved in DNA metabolism processes (see Figure 5A).
In the KEGG pathway analysis, there were more than three genes enriched in these pathways in mismatch repair, DNA replication, Fanconi anaemia pathway, systemic lupus erythematosus and alcoholism (see Figure 5B).
To assess the association between candidate genes and the prognosis of ovarian cancer patients, we utilized the KM-Plotter database to analyse all ovarian cancer patients and patients with platinum. As shown in Figure 5, we observed that patients with

| Low-throughput RT-qPCR analysis of candidate genes
Ultimately, to further verify the expression difference of the six candidate genes in ovarian cancer platinum response, we tested the mRNA levels of six candidate genes in ovarian cancer cell lines, A2780 and A2780DDP by RT-qPCR (see Section 2). As shown in Figure 6,

| D ISCUSS I ON
Platinum is the main chemotherapy for advanced ovarian cancer; however, the drug resistance still deeply afflicts most patients and clinicians. Therefore, it is urgent to overcome the platinum-based chemotherapy resistance of ovarian cancer and identify platinum sively quested to be associated with ovarian cancer platinum response mechanisms, we utilized these two pathways as the origin.
These genes were further classified into three groups for subsequent network optimization. To date, we provided high-throughput network optimization algorithms to analyse platinum-response studies in ovarian cancer.
We performed the random walk algorithm based on these three types of genes as seed genes, integrated network (including PPI and co-expression relationships) and established three scoring matrices for candidate gene selections. The GSEA functional analysis was employed for enrichment analysis of three types of genes. And it is gratifying that the GSEA enrichment results (including BP and KEGG) for three types of random walk results were consistent, which were also related to platinum response in ovarian cancer.
Through the scoring matrix, we screened out the three types of seed nodes as candidate genes and detected the differential expression in three data sets with ovarian cancer platinum response information from GEO database. The top ten genes revealed reliable results to varying degrees. From Figure S3, it was not difficult to observe that the first and second genes displayed a rising polyline, whereas the third gene had a lower biological significance than the top two, showing a declining polyline. Therefore, we served the top two genes in the three scoring matrices as instances. And the six candidate genes were selected for verification and analysis. The results obtained in the current study demonstrated that expression levels of the six candidate genes were different in the most validation set, unfortunately, because FANCA and FANCG had no expression values in GSE15622 and not verified in this data set.
Many researchers have studied in six candidate genes to varying degrees in ovarian cancer. BRCA1 is one of the most common ovarian cancer genes in the process of HR repair of double-stranded DNA breaks. 38 Increasing numbers of studies have demonstrated that BRCA1 mutations increase the risk of ovarian cancer. 39 The study has pointed out that serous ovarian cancer was sensitive to platinum because of a functional defect caused by insufficient BRCA1 levels. Patients lacking BRCA1 had a better chemotherapy response; however, reactivation of BRCA1 mutations might be the basis of platinum resistance in end-stage patients. 40 Genetic and functional evidence has suggested that BRCA1 is the major determi-   77 We employed equal weight and random walk algorithm in complex disease-specific networks so that the analysis results were not biased. The six candidate genes were verified in the multiple validation sets, making these results more accurate. Further exploration of six candidate genes revealed that the mRNA and protein expression levels possessed significant differences in the analysis of clinicopathological factors. Notably, there were also meaningful relationships between these genes and both overall survival and disease-free survival. The RT-qPCR assay further confirmed that the expression levels of the six candidate genes in platinum-sensitive ovarian cancer cell line were higher than those in platinum-resistant cell line. Therefore, these reliable results show that our analytical method is of great significance in identifying platinum response biomarkers for ovarian cancer.
However, our research still exists some limitations and disadvantages. The RT-qPCR assay is a sensitive, accurate test method, which can detect the gene expression (ie mRNA) level and perform quantitative analysis. 78 The Western blotting is the most widely used experimental technique in protein expression and analysis.
Strong specificity, high sensitivity and easy operation are its advantages. 79 In our research, we detected the mRNA expression level and did not verify the protein expression level of the six candidate gene in cisplatin-resistant and cisplatin-sensitive ovarian cancer cell lines. The two types of functional genes were used as seed nodes, which have certain limitations. In subsequent studies, we will increase the variety of functional genes as seed nodes and examine the protein expression level of six candidate genes in cell lines. The tissues of ovarian cancer patients with platinum response information will be collected as the research samples. We will strive to provide more valuable research on the mechanism of platinum response in ovarian cancer.

| CON CLUS ION
We applied the random walk algorithm based on reconstructed integrated network and analysed the global impact of genes from HR and FA functions. Besides, GSEA enrichment analysis was performed to evaluate the function of the three types of functional genes. The candidate genes were identified and further verified in the three data sets from the GEO database. Moreover, we also performed differentially expressed analysis, clinicopathological multivariate analysis, functional evaluation of mutation neighbouring genes and survival analysis for six candidate genes. Finally, the RT-qPCR assay was performed to further support the above findings. In conclusion, our research can provide new understandings of the mechanism of platinum response in ovarian cancer patients and identify candidate genes for clinical usage.

ACK N OWLED G EM ENTS
We Writing-review & editing (supporting).

E TH I C S A PPROVA L A N D CO N S E NT TO PA RTI CI PATE
Not applicable.

CO N S E NT FO R PU B LI C ATI O N
All authors agree for publication.

CO M PE TI N G I NTER E S TS
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used and analysed during this study are available from the corresponding author on request.