Systematic prediction of key genes for ovarian cancer by co‐expression network analysis

Abstract Ovarian cancer (OC) is the most lethal gynaecological malignancy, characterized by high recurrence and mortality. However, the mechanisms of its pathogenesis remain largely unknown, hindering the investigation of the functional roles. This study sought to identify key hub genes that may serve as biomarkers correlated with prognosis. Here, we conduct an integrated analysis using the weighted gene co‐expression network analysis (WGCNA) to explore the clinically significant gene sets and identify candidate hub genes associated with OC clinical phenotypes. The gene expression profiles were obtained from the MERAV database. Validations of candidate hub genes were performed with RNASeqV2 data and the corresponding clinical information available from The Cancer Genome Atlas (TCGA) database. In addition, we examined the candidate genes in ovarian cancer cells. Totally, 19 modules were identified and 26 hub genes were extracted from the most significant module (R 2 = .53) in clinical stages. Through the validation of TCGA data, we found that five hub genes (COL1A1, DCN, LUM, POSTN and THBS2) predicted poor prognosis. Receiver operating characteristic (ROC) curves demonstrated that these five genes exhibited diagnostic efficiency for early‐stage and advanced‐stage cancer. The protein expression of these five genes in tumour tissues was significantly higher than that in normal tissues. Besides, the expression of COL1A1 was associated with the TAX resistance of tumours and could be affected by the autophagy level in OC cell line. In conclusion, our findings identified five genes could serve as biomarkers related to the prognosis of OC and may be helpful for revealing pathogenic mechanism and developing further research.


| INTRODUC TI ON
Ovarian cancer (OC), the most fatal gynaecological malignancy, is characterized by early diagnosis difficulty, rapid metastasis and high recurrence rate. 1 Despite good therapeutic response in early stage, most patients are diagnosed in advanced stage with poor overall survival. 2 These features are related to the biological mechanism, which is the key determinant of outcome. 3 Current therapeutic strategies of OC have improved significantly. The outcome depend on numbers of factors, including the patient's age, physical condition and stage at presentation. 4 However, the 5-year survival rate for advanced OC is much lower than in early stage, and about 70% of patients relapse. 5 Previous study has linked OC recurrence to histological type, FIGO stage and chemotherapy regimens. 6 Numerous genetic variations occur during the development of the tumour. With the help of large-scale screening and bioinformatics, hundreds of genes alterations have been revealed to be closely related to the development and prognosis of tumours. 7 However, studies of hub genes remain lacking. Therefore, more meaningful biomarkers need to be explored.
The weighted gene co-expression network analysis (WGCNA) is a powerful technique developed by Langfelder and Horvath. 8 It is widely used as a systematic biology method to describe the correlation patterns between genes and clinical features. Thus, in the present study, we focused on the association between the gene sets and common OC phenotypes and identified novel biomarkers for better OC prognostic investigation.
In many cases, recurrence leads to adverse reactions to chemotherapy. 9 And tumour recurrence is closely related to drug resistance. In fact, studies have exhibited that drug-resistant cancer cells augmented activation of autophagy, which seems to be a major obstacle in chemotherapy. 10,11 In this context, we validated the candidate hub gene COL1A1 which was regulated by autophagy and affected the drug sensitivity of tumour cells. This indicates that our screening candidates demonstrate the prognostic value of OC.

| Data procession
The gene expression profiles of OC were extracted from the MERAV (MERAV, http://merav.wi.mit.edu) database. Data adjustments included data filtering, log2 transformation and normalization. All data were summarized using 'Affy' package from Bioconductor (http:// www.bioco nduct or.org/). Then, the top 25% most variant genes were selected for subsequent WGCNA.

| Establishment of weighted coexpression network
The chosen variant genes were constructed to an approximate scalefree fundamental gene co-expression network using the R package 'WGCNA'. 8 In order to calculate the connection strength between each pair of genes, the adjacency matrix a ij was defined as follows: a ij encoded the adjacent network connection strength between gene i and gene j, x i and x j were vectors of expression value for gene i and j, and s ij represented Pearson's correlation coefficient between gene i and gene j. For network generation, genes with a high correlation coefficient were clustered. The network modules were generated using the topological overlap measure (TOM), 12 which was calculated using the adjacency matrix.
The adjacency matrix A ij was defined as follows: According to the TOM-based dissimilarity measure, the network was built with the function blockwiseModules, where the minimum module size (minModuleSize) was 30. Average linkage hierarchical clustering was conducted to classify genes with high correlation coefficients into the same modules. DynamicTreeCut algorithm was used to identify the co-expression module, where MergeCutHeight = 0.25 merges modules with a similarity of 0.75.

| Clinical significant relevant modules identify
After obtaining the gene modules, we combined the clinical information with the genes in modules to analyse gene significance (GS) and the modular membership (MM), which were used to measure the correlation between the sample traits and the gene modules. Next, the targeted module genes were visualized with Cytoscape 3.6.1 software. 13

| Functional enrichment annotation
The targeted gene module was annotated, visualized and ana-

| Hub gene identification and validation
Hub genes are the genes that rank high in connectivity in a module. They are located at the central hub, which can represent the characteristics of the module. Compared with all genes in the network, the hub genes in the module have greater biological significance. In this study, the protein-protein interaction (PPI) information was extracted from the STRING database (http://strin g-db.org/) and visualized using Cytoscape software. Subsequently, the plug-in molecular complex detection (MCODE) of Cytoscape was used to construct the subnetwork for further validation.
Kaplan-Meier plotter (www. kmplot.com) was used to perform progression-free survival (PFS) analyses of hub genes. 14 RNA sequencing data and the corresponding clinical information of OC were extracted from The Cancer Genome Atlas Project (TCGA, https://cance rgeno me.nih.gov/) database and normalized using edgeR package. The expression of hub genes was measured by the immunohistochemistry using the Human Protein Atlas (http://www.prote inatlas. org).

| Western blot analysis
Cells in different groups were collected, respectively, and were lysed in radioimmunoprecipitation assay (RIPA) lysis buffer (Beyotime Biotechnology) containing 1% protease inhibitor cocktail. Protein quantitative processing was performed with the BCA Kit (Sigma). Protein samples were separated on 10% SDS-polyacrylamide gel and blotted to polyvinylidene difluoride (PVDF) membranes. Immunoblotting was incubated with the primary antibody (1:1,000 dilutions) at 4°C overnight and the secondary antibody at room temperature for 2 hours. Immunostaining was performed with ECL Western blotting detection reagent (Thermo).

| RNA extraction and qRT-PCR
For qRT-PCR analyses, total RNA was extracted using an RNA extraction kit (Takara Bio) according to the manufacturer's instructions. cDNA syntheses were performed with PrimeScript F I G U R E 1 Clustering dendrogram of 23 samples RT Master Mix (Takara Bio) and amplified with SYBR pre-mix EX Taq (Takara Bio). The qRT-PCR analyses were performed on a 7500 PCR system (Applied Biosystems) with the primers shown in Table S1.

| Statistical analysis
The sensitivity and specificity were depicted by receiver operating characteristic (ROC) curves using GraphPad Prism software. The log-rank test was used to compare survival curves. Student's t test was used to compare the differences between groups, and P < .05 was considered statistically significant. Statistical analyses were conducted using R software (version 3.5.0).

| Construction of weighted co-expression network and identification of key modules
After data pre-processing, the top 25% most variant genes were selected for subsequent analyses. The samples were clustered using average linkage method and Pearson's correlation analysis ( Figure 1).
In this study, to ensure a scale-free network, the power of β = 8 (R 2 = .919) was chosen for the soft-threshold parameter (Figure 2A- . Nineteen modules were identified based on the average linkage hierarchical clustering ( Figure 2C). The relationships between the modules and clinical traits were analysed. Finally, the blue module was found significantly correlated with clinical stages and pathological T factor (P < .01, Figure 2D). Total genes of the blue module are shown in Table S2.

| Functional enrichment analysis and proteinprotein network construction
The genes in blue module were categorized into four groups, includ-

| Validation of hub genes
To further validate the prognostic value of the hub genes, we con-

| Construction of tax-resistant cells and experimental validation of COL1A1
To further test the importance of the hub genes, COL1A1 was selected as we also previously reported. 15  to the A2780R cells, COL1A1 expression was significantly lower compared with GAPDH, indicating that COL1A1 was affected by autophagy in TAX-resistant OC cells ( Figure 6B). In order to explore the relationship between autophagy and TAX resistance, we adopted CCK-8 method to detect cell activity at different TAX concentrations. The difference in TAX resistance among groups was then determined by half-maximal inhibitory concentration (IC50). In A2780R group, IC50 was significantly higher than the parental A2780 group (22.42 ± 1.007 μmol/L vs 5.109 ± 1.478 μmol/L, P < .05, Figure 6C

| D ISCUSS I ON
Ovarian cancer (OC), one of the three common gynaecological malignancies, ranks seventh among the tumours in women. 18 There are numerous factors that affect the prognosis of OC, and the mechanism is complicated. Most OC patients will undergo clinical standardized treatment, but still develop tumour recurrence in 6-18 months after treatment; however, advanced OC has a worse prognosis. 19 Therefore, the analyses of OC's clinical stages and drug-resistance have important references value. It is necessary to conduct in-depth analyses in clinical and basic researches to find out the biomarkers for the prognosis and explore their mechanisms.
In the current exploratory research, it is very important to construct a gene co-expression network, which can help us identify genes related to diseases. 20 Gene sets of weak effect are difficult for traditional analysis, but the WGCNA system is a good supplement, and modules can integrate weakly affecting genes. WGCNA has been successfully applied in the study of disease pathogenesis, classification, diagnosis and prognosis. After the power function processing, WGCNA will not make strong correlation relationships affected; however, the weak correlation relationships decrease significantly, Autophagy plays a complex role in human cancer, which is influenced by tumour micro-environment, carcinogenic mutation type and other factors. It can effectively inhibit tumour growth in early stage. However, when cancer has suffered a long-term stimulation, autophagy could degrade lipids and proteins to produce ATP, which promotes the development and growth of cancer. 17 Among the five candidate biomarkers, the collagen type I alpha 1 chain (COL1A1) was found to be closely related to the development of OC, which was consistent with our previous study. 15 Therefore, we constructed the A2780 TAX resistance cell line and can interact with a variety of cytokines or membrane receptors and participate in the regulation of collagen fibre formation. 24,25 Studies have found that DCN can inhibit the proliferation and migration of a variety of tumour cells in vitro, such as liver cancer, kidney cancer and breast cancer. [26][27][28] In this study, it was found that DCN expression in tumour tissues was significantly higher than that in normal tissues, but the difference in advanced and early OC was not significant, suggesting that DCN is an oncogene in tumorigenesis, but it has no predictive effect on tumour development. In breast cancer, high expression of LUM indicates poor prognosis. 31 However, LUM can induce proteoglycan composition changes and regulate the cell cycle to participate in tumour development. 32 In the validation data of TCGA, our results indicated that LUM was significantly up-regulated in OC tissues, and even higher in advanced stage than in early stage.
Periostin (POSTN) is a cellular adhesive protein. As an ECM protein, POSTN has been found to be highly expressed in a variety of malignancies and is often associated with recurrence, metastasis and poor prognosis. [33][34][35] In OC, study has found that POSTN expression was substantially higher with chemotherapy resistance specimens than in those with chemotherapy-sensitive patients. 36 In our analysis, POSTN expression is negatively correlated with PFS, and the expression is more significantly up-regulated in advanced OC.
As an important member of the thrombospondins (THBS) family, thrombospondin-2 (THBS2) participates in a variety of cellular biological processes by binding ECM proteins and cell-surface receptors. 37 Studies have shown that THBS2 plays an important role in tumours and is related to the degree of malignancy. [38][39][40][41][42] In this study, we found that THBS2 may be related to the clinical stage of OC and may be used as a biomarker for the prognosis. . E, Quantitative real-time PCR (qRT-PCR) was performed to analyse the mRNA levels of COL1A1 and autophagy (BECN1 and ATG5) genes (n = 3). *P < .05; **P < .01; and ***P < .001. Student's t tests were used to evaluate the statistical significance of differences. BECN1: Beclin 1; ATG5: autophagy-related 5 All five biomarkers we screened are closely related to the formation of ECM. This may shed light on the pathophysiological mechanism of OC and contribute to potential targeted therapies.
In conclusion, our WGCNA identified candidate biomarkers for OC.
Meanwhile, further studies were needed to make clear the underlying molecular mechanisms.