Identification of castration‐resistant prostate cancer‐related hub genes using weighted gene co‐expression network analysis

Abstract Prostate cancer is the most common malignancy in urinary system and brings heavy burdens in men. We downloaded gene expression profile of mRNA and related clinical data of GSE70768 data set from public database. Weighted gene co‐expression network analysis (WGCNA) was used to identify the relationships between gene modules and clinical features, as well as the candidate genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were developed to investigate the potential functions of related hub genes. Importantly, basic experiments were performed to verify the relationship between hub genes and the phenotype previously identified. Lastly, copy number variation (CNV) analysis was conducted to explore the genetical alteration. WGCNA identified that black module was the most relevant module which was tightly related to castration‐resistant prostate cancer (CRPC) phenotype. KEGG and GO analysis results revealed genes in black module were mainly related to RNA splicing. Additionally, 9 genes were chosen as hub genes and heterogeneous nuclear ribonucleoprotein A2/B1 (HNRNPA2B1), golgin A8 family member B (GOLGA8B) and mitogen‐activated protein kinase 8 interacting protein 3 (MAPK8IP3) were identified to be associated with PCa progression and prognosis. Moreover, all above three genes were highly expressed in CRPC‐like cells and their suppression led to hindered cell proliferation in vitro. Finally, CNV analysis found that amplification was the main type of alteration of the 3 hub genes. Our study found that HNRNPA2B1, GOLGA8B and MAPK8IP3 were identified to be tightly associated with tumour progression and prognosis, and further researches are needed before clinical application.


| INTRODUC TI ON
Prostate cancer (PCa) is the most common malignancy in urinary system, leading to the fifth cause of death in male worldwide. In 2018, the global estimated new cases and deaths were 1 276 106 and 358 989, respectively. 1 Localized PCa patients can be treated effectively via various methods including active surveillance, watchful waiting, surgery and radiation. However, management for advanced diseases remains to be a difficult problem for urologists. Generally, androgen deprivation therapy (ADT), raised by Charles Huggins 2 firstly in the 1940s, is the standard treatment for advanced PCa patients. Types of ADT forms mainly include medical castration (luteinizing hormone-releasing hormone (LHRH) agonist) and surgical castration (bilateral orchiectomy). Unfortunately, ADT therapy for these castration-resistant prostate cancer (CRPC) patients is not curative. As a result, progressing to CRPC condition is inevitable within two years. [3][4][5] Taken together, previous data indicated that 10%-20% of all PCa patients develop CRPC within approximately 5 years from initial diagnosis. More than 84% of CRPC patients had metastases at diagnosis with bone as the most preferential site. 6 For those who had no metastases at diagnosis of CRPC, 33% of patients were estimated to develop metastases within 2 years. 7 Radionuclide therapy, radiotherapy, bisphosphonates and opioids were the most common treatment types for those patients with metastatic CRPC and skeletal symptoms. A systematic review showed that only about one-third of CRPC patients received chemotherapy while the rest could only receive supportive care or steroids. 8 Prognosis of CRPC patients is poor, especially for those patients with metastatic CRPC. Study conducted by Emmanuel S. Antonarakis et al showed that the median survival from CRPC initial diagnosis was 14 months while the overall survival of patients with metastatic CRPC was only 9-13 months. 8,9 Nowadays, the specific mechanisms underlying the development of castration resistance have not been well discussed and strategies of CRPC treatment are limited. Study conducted by Wu et al 10 showed that PCa patients with variant allele of HSD3B1 increased the risk of progressing to CRPC, but it was not associated with biochemical recurrence, mortality or overall survival. Joshua W.
Russo et al 11 proposed that down-regulation of dipeptidyl peptidase 4 promoted progression to CRPC. With the development of DNA microarray and high-throughput sequencing, it is convenient to explore hub genes related to tumour process or progression benefited from bioinformatics techniques and big data integration. Weighted gene co-expression network analysis (WGCNA) is a powerful biological method which can systematically investigate highly synergistically altered gene modules. It can thoroughly study the relationship between gene sets and clinical characteristics, and identify possible biomarkers for further researches. In this study, we aimed to find and validate differentially expressed genes (DEGs) which were associated with castration resistance in PCa by WGCNA. In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were developed to investigate the potential functions of hub genes in the key module.

| Data processing
A flow chart of this study was shown in Figure 1. Gene expression profile of mRNA and related clinical data of PCa patients were downloaded from Gene Expression Omnibus (GEO) database (http:// www.ncbi.nlm.nih.gov/geo/). GSE70768 included 16 331 genes and 125 tumour tissues, which was used to construct WGCNA for this F I G U R E 1 Flow chart of data preparation, processing, analysis and validation study. Among them, 13 were CRPC tissues and 112 were hormonesensitive prostate cancer (HSPC) tissues. The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov/) was used for validation of hub genes. The data of mRNA expression were transformed to values in transcripts per million (TPM).

| Construction of the co-expression network
'WGCNA' R package was applied to find clinical traits-related modules and hub genes. 12 The formulas used to calculate the interaction coefficient between genes and transform the connection coefficient into a weighted coefficient were specifically described in previous studies. 13 The top 5000 most variant genes by analysis of variance were retained for WGCNA analysis.
Firstly, we use pickSoftThreshold function to find a soft threshold power β in accordance with standard scale-free networks.
Second, the adjacencies between all filtered genes were calculated using the power adjacent function to Pearson's correlation matrix to transform data into a topological overlap matrix (TOM), and the corresponding dissimilarity (1-TOM) was calculated. Then, the dissimilarity of module eigengenes was calculated to further analyse the module. Finally, we set soft-thresholding power of 9, cut-off height of 0.25 and minimal module size of 30 to identify key modules.

| Identification of clinically significant modules
The module eigengene (ME) was defined as the first principal

| Pathway enrichment and GO analysis
DAVID (http://david.abcc.ncifc rf.gov/) is a database for annotation, visualization and integrated discovery. KEGG pathway and GO analysis of genes in the module with the highest correlation with clinical traits were carried out using DAVID (version 6.8) online tools: functional annotation. Enriched GO terms and KEGG pathways were identified according to the cut-off criterion of P < 0.1.

| Hub genes detection and validation
Hub genes were defined as those with cor.geneTraitSignificance > 0.2 and cor.geneModuleMembership > 0.85. Then, TCGA-PRAD data were applied to perform survival analysis to validate and select genes with clinical significance. Then, we compared the expression of genes between tumours and normal tissues along with expression between tumours with different clinical status in TCGA database.
Two weeks later, colonies were stained via 0.5% crystal violet and those with more than 50 cells were counted after imaged. For EdU assay, EdU staining kit (Ribobio) was used based on the protocol. In brief, cells in 96-well plates were processed with 50 μmol/L EdU medium (100 μL) for 3 hours. After being fixed and permeabilized, cells were stained via DAPI and then observed and imaged under a fluorescence microscope (Olympus, Tokyo, Japan).

| Cell apoptosis analysis
Cell apoptosis was examined by TUNEL (terminal dexynucleotidyl transferase-mediated dUTP nick end labelling) assay and flow cytometry analysis. In regard to TUNEL assay, cells were permeated with 0.1% Triton X-100, rinsed in phosphate buffer saline (PBS) and then treated with TUNEL reaction for 1 hour according to the protocol of the TUNEL staining kit. Finally, apoptotic cells were observed and pictured by Olympus fluorescence microscope. As for flow cytometry analysis, cells were treated with 15 minutes of staining of FITC-Annexin V and propidium iodide (PI), followed by analysis via flow cytometer (BD Biosciences, Franklin Lakes, NJ, USA).

| Genetical alteration of hub genes and proteinprotein interaction (PPI) analysis
The cBioPortal for Cancer Genomics (http://www.cbiop ortal.org/) is a large-scale cancer genomics database. We select four data sets associated with metastasis (SU2C, MICH, SU2C219 and MPC Project) to explore genetic alterations connected with the 9 hub genes. Next, copy number variation (CNV) data from TCGA database were used to conduct CNV analysis. Samples were classified into three groups: amplification, wild-type and deletion according to the CNV data, and the expression levels of genes were compared among these groups.

| Statistical analyses
Kaplan-Meier curves and the log-rank tests were used to assess association between mRNA expression and overall survival of PCa patients. Relationships between mRNA expression and clinical or CNV features were assessed by the Wilcoxon rank-sum test for comparison between two groups and ANOVA for comparison between more than two groups. All statistical analyses were performed using R (version 3.6.1). All statistical tests were 2-sided, and P < 0.05 was considered statistically significant.

| Weighted co-expression network and Key Modules
Pearson's correlation method and average linkage method were used to cluster the samples from GSE70768 data set ( Figure S1). Then, co-expression network was constructed using co-expression analysis. Under the soft-thresholding power of 9, cut height of 0.25 and minimal module size as 30, we identified 13 modules ( Figure 2). Later, the heatmaps of the correlations confirmed the independence of these 13 modules and gene expression in each module ( Figure 3A-B). After module-trait relationship analysis, we observed that black module had the highest association with castration resistance ( Figure 3C). In addition, a strong relationship was detected between genes in black module and CRPC (P = 6.1 × 10 −9 , Figure 3D). Consequently, we focused on black module and the clinical characteristic of CRPC in our further study.

| Pathway enrichment and gene ontology analysis
KEGG pathway enrichment and GO analysis were applied to perform gene annotation. Figure 4A   genes was shown in Figure S2.

| Hub genes identification and validation
We further evaluated the relationship between these 3 genes and PCa risk. As shown in Figure 5D-F, dramatic differences were found in the expression of these 3 genes between tumours and normal tissues

| Genetical alteration of hub genes and CNV analysis
The genetical alteration of the 9 hub genes was analysed using data concerning metastasis from cBioPortal database. The alteration frequency of each hub gene was shown in Figure 7A. It was shown that FAM156B, HNRNPA2B1 and MAPK8IP3 altered most (8%, 7% and 7%, respectively, Figure 7A). In addition, we found that amplification was the main type of alteration in all the four subjects ( Figure 7B). The CNV analysis of gene HNRNPA2B1, GOLGA8B and MAPK8IP3 confirmed the result that there existed abundant of amplification among this group of genes ( Figure 7C-E). Figure S5 showed the relationship of the genes in the black module and the other frequently altered neighbour genes. HNRNPA2B1 and RBM25 were significantly important in the network.

| D ISCUSS I ON
PCa is a heterogeneous disease, and its pathogenesis remains unclear. Many studies have explored novel biomarkers that related to oncogenesis, tumour progression and prognosis using RNA sequencing and gene chip. However, huge inconsistencies existed between the DEGs found in these studies. 14  Eventually, HNRNPA2B1, GOLGA8B and MAPK8IP3 were picked F I G U R E 6 Silencing HNRNPA2B1 impaired CRPC cell proliferation and accelerated cell apoptosis in vitro. A, Expression trend of HNRNPA2B1 in CRPC-like cells compared to androgen-dependent LNCaP cells. B, Knockdown efficiencies of sh-HNRNPA2B1#1/2 in LNCaP-AI and C4-2 cells. C-D, Colony formation and EdU assays were carried out to evaluate cell proliferation under HNRNPA2B1 silence. E-F, Cell apoptosis in two CRPC-like cells with or without HNRNPA2B1 suppression was detected via TUNEL assay and flow cytometry analysis. **P < 0.01 F I G U R E 7 Genetic alterations associated with 9 hub genes. A, A visual summary of Genetic alterations (data from SU2C, MICH, SU2C219 and MPC Project) shows the genetic alteration of 9 hub genes. B, The total alteration frequency of 9 hub genes is illustrated. C-E, Expression difference among amplification, wild-type and deletion groups of HNRNPA2B1, GOLGA8B and MAPK8IP3 in the TCGA-PRAD data set | 8015 CHENG Et al.
out to be significantly associated with tumour prognosis. In order to further investigate potential mechanism of 9 hub genes, KEGG pathway analyses showed that genes in black module were mainly enriched in spliceosome pathway, RNA transport, surveillance and degradation pathways. Furthermore, GO analysis revealed that the hub genes were mainly associated with RNA splicing, mRNA processing, cilium morphogenesis and apoptotic process. Among the significantly enriched pathways, RNA transport and spliceosome pathway have been proven to be associated with tumour progression and prognosis in some cancer types, such as colorectal cancer, 21 hepatocellular carcinoma, 22 15 we also uncovered that the hub genes were associated with PCa progression. Nevertheless, there also existed differences between our present study and theirs. Song et al 15 identified a module significantly related to Gleason score. In our study, we identified a module that could distinguish CRPC from HSPC and the hub genes were related to PCa prognosis and survival. And we aimed at finding out biomarkers for CRPC and validating their effectiveness.
Co-expression analysis is a useful technique for multigene analysis of large data sets. Based on these findings, we can apply the hub genes into clinical use in the future. However, some limitations in this study should be taken into consideration. First of all, it is a retrospective research, and data of this research were extracted from public database. Further large-cohort, multicentre and prospective studies are needed to validate these hub genes that may be related to tumour progression and prognosis. Secondly, this current study is lack of the in vivo experiments of investigating the molecular mechanisms to provide clinical guidance.

| CON CLUS ION
Our study picked out and characterized 9 DEGs and significant gene module closely related to CRPC phenotype using WGCNA.
Additionally, HNRNPA2B1, GOLGA8B and MAPK8IP3 were identified to be tightly associated with tumour progression and prognosis.
Further in vivo experiments are needed before clinical application.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in the TCGA data set (https://portal.gdc.cancer.gov/) and GEO data set (http://www.ncbi. nlm.nih.gov/geo/).