Joint analysis of WES and RNA‐Seq identify signature genes related to metastasis in prostate cancer

Abstract Prostate cancer (PCa) has a certain degree of heritability, and metastasis occurs as cancer progresses. However, its underlying mechanism remains largely unknown. We sequenced four cases of cancer without metastasis, four metastatic cancer, and four benign hyperplasia tissues as controls. A total of 1839 damaging mutations were identified. Pathway analysis, gene clustering, and weighted gene co‐expression network analysis were employed to find characteristics associated with metastasis. Chr19 had the most mutation density and 1p36 had the highest mutation frequency across the genome. These mutations occurred in 1630 genes, including the most frequently mutated genes TTN and PLEC, and dozens of metastasis‐related genes, such as FOXA1, NCOA1, CD34, and BRCA2. Ras signalling and arachidonic acid metabolism were uniquely enriched in metastatic cancer. Gene programmes 10 and 11 showed the signatures indicating the occurrence of metastasis better. A module (135 genes) was specifically associated with metastasis. Of them, 67.41% reoccurred in program 10, with 26 genes further retained as the signature genes related to PCa metastasis, including AGR3, RAPH1, SOX14, DPEP1, and UBL4A. Our study provides new molecular perspectives on PCa metastasis. The signature genes and pathways could be served as potential therapeutic targets for metastasis or cancer progression.


| INTRODUC TI ON
With nearly 1.4 million new cases and 375,000 deaths worldwide, prostate cancer (PCa) is the second most frequent cancer, the fifth leading cause of cancer death among men, and the fourth incidence of new cancer cases among both sexes in 2020. 1 PCa shows extensive intra-and inter-tumour heterogeneity and diverse clinical behaviour, with some patients dying of metastasis within 2-3 years after diagnosis while others can live for more than 10 years, indicating the potential role of genomic diversity. 2 Tumour metastasis is a process by which cancer cells disperse from a primary site to distant organs or tissues. In addition to sharing some key driver mutations with the primary tumour from which they arise, metastatic tumours often develop additional mutations as they evolve during metastasis and treatment. 3,4 With the cancer progress, PCa tends to metastasize, especially bone. Studies have identified several genes or pathways implicated in metastasis. For instance, PMEPA1 induced by TGFβ1 was identified to suppress PCa metastasis to the bone by blocking TGFβ signalling. 5 MAOA was demonstrated to involve in the EMT process 6 and promote bone metastasis through activation of paracrine Shh signalling in osteoblasts. 7 Activated RANK-RANKL signalling in PCa cells is implicated in colonizing cancer cells in the bone. 8 The Wnt signalling pathway also plays a role in PCa cells by promoting osteoblast differentiation. 9 Moreover, alterations in androgen signalling, DNA repair, and phosphoinositide 3-kinase (PI3K) signalling, as well as recurrent mutations in some genes such as FOXA1 and IDH1, can be frequently found in PCa metastasis. [10][11][12][13][14] In recent years, next-generation sequencing-based analysis has greatly improved our understanding of the genomic definition and molecular characteristics of cancer. Large cohorts like The Cancer Genome Atlas (TCGA), Chinese Prostate Cancer Genome and Epigenome Atlas, 10 and MET500 4 have profiled the molecular signatures and provided insights into the PCa. However, systemically characterizing the PCa metastasis with a multi-omics approach was merely reported.
In this study, PCa and PCa metastasis samples were deployed for whole exome sequencing (WES) and transcriptome sequencing.
Both genomic and transcriptomic landscapes for PCa metastasis were characterized. More importantly, signature genes related to PCa metastasis were further identified.

| Variants filtration
The variants were filtered with the following criteria: (1) variants classified as stop-gain or stop-loss, (2) variants classified as frameshift and nonsynonymous in cancer samples, (3) variants with minor allele frequency less than 1% in the human population (1000G, gno-mAD and esp6500), (4) variants predicted to be deleterious in at least three of five in silicon analysis (Mutation Taster, PROVEAN, Polyphen2, SIFT, and CADD), of which the criterion for CADD was greater than 20.

| RNA sequencing and quantification
Total RNA was extracted using the TRIzol reagent according to the manufacturer's protocol. RNA purity and quantification were evaluated using the NanoDrop 2000 spectrophotometer (Thermo Scientific). RNA integrity was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies). Then the libraries were con- Raw reads were first processed using Trimmomatic (v0.36), 20 and the low-quality reads were removed to obtain clean data. About 45M clean reads for each sample were retained. The clean reads were then mapped to the human genome (GRCh38) using HISAT2 (v2.2.1.0). 21 The read count of each gene was obtained by HTSeq-count (v0.9.1), 22 and the fragments per kilobase of transcript per million fragments mapped (FPKM) 23 of each gene was calculated using Cufflinks. 24

| Clustering analysis
Genes were divided into 11 clusters with K-means in R (v4.1.2). The average expression of each gene (FPKM value) in each group was calculated. The Gene cluster trend module in Hiplot (https://hiplo t-acade mic.com/basic/ gene-trend) was used to classify the different expression patterns.

| Weighted gene co-expression analysis
We first removed genes whose expression value (FPKM) is equal to 0 in all samples. The filtered data was log-transformed with log2(x + 1) and used as input for the Weighted Gene Co-expression Network Analysis (WGCNA). 27 The unsigned soft threshold power β of 10 was determined according to the standard scale-free network. Then an unsigned co-expression network was constructed based on the Topological Overlap Matrix and Hierarchical clustering algorithm.
For each module, the summary profile (eigengene) was defined as the first principal component of the given module. The correlations between the eigengenes and the external traits were used for evaluating the module-trait relationships.

| RE SULTS
WES and RNA-Seq were performed to profile gene mutation and expression from four prostatic hyperplasia patients (as control group P), four PCa patients (PC group), and four PCas with multiple metastases (M_PC group). None of the cancer patients received any therapy before their diagnosis. We generated over 1.22 billion raw reads and an average of 100 million per sample from WES. The coverage of the sequencing depth ranged from 82.42 to 113.20X, with an average depth of 96.24X. We acquired 556.92 million raw reads from RNA sequencing in total and an average of 46.41 million for each sample.
The WES and RNA sequencing statistics are summarized in Table S1.

| Damaging mutations across the PCa and metastasis
SNPs and InDels were filtered by limiting function, mutation type, frequency, and in silico prediction ( Figure 1A). 1839 damaging mutations were identified within 1630 reference genes (Data S1), with the vast majority (83.74%) of them classified as nonsynonymous variants followed by stop gain (8.16%), frameshift deletion (4.57%), frameshift insertion (2.94%) and stop loss (0.60%) ( Figure 1B). The mutation ratio of stop codons in M_PC and PC groups were almost the same, with 67.08% and 67.70% in contrast to 20.50% in the control group ( Figure 1C). In addition, the mutations of C>T and G>A were predominant compared with others ( Figure 1D). Chromosome 1 (Chr1) harboured the most damaging mutations (201 out of 1839), especially at 1p36. We considered the unequal length of each chromosome to better reveal the distribution of mutations and found Chr19 had the most mutation density ( Figure 1A). The regions of major mutations occurred only in M_PC group were at 1p31-p36, 1q32, 19q13, 19p13, 11p11-p15, 11q11-q14, while in PC group were 1p34-p36, 11p11-p15, 11q12-q14, 17p11-p13, 17q21-q25, 19p13, 19q13.
In alignment with the previous findings, 10 TTN and PLEC harboured the most damaging mutations in our study. TTN consists of 364 exons, transcribes an mRNA over 100 kb long, and encodes the largest known protein titin, which acts in critical structural, developmental, mechanical, and regulatory roles in cardiac and skeletal muscles. 28 Besides, this gene owned a high correlation coefficient between TTN-tumour mutation burden and tumour mutation burden-WES for PCa. 29 PLEC, a member of the Plakin family of proteins and a cytoskeletal linker, modulates protein kinase C signalling and mitogen-activated protein (MAP) kinases involved in cellular stress responses and migration in cancer. High levels of PLEC are associated with PCa and metastasis. 30 Some PCa genes were also identified as having damaging mutations, such as FOXA1, NCOA1, BRCA1, EPHA6, SPOP, etc. (Data S1). As a pioneer transcription factor, FOXA1 targets the androgen receptor signalling pathway and acts as an oncogene in PCa. 31 The mutation rate of FOXA1 was much higher in the Chinese cohort (41%) than in TCGA PCa (4%). 31 The pace was slightly higher in metastatic PCa (12%-13%) than in primary PCa (8%-9%) in other cohorts. 10 Four frameshift deletions (Chr14: 38061131-38061177) in FOXA1 were identified in a single M_PC patient. One mutation in CD34 (rs28362497:C>A) was identified in the M_PC group. CD34 is observed as one of the hallmarks indicating tumour progression and metastasis. 32 Increased expression of CD34 confers tumour progression and aggressiveness on PCa accompanied by higher prostate-specific antigen level, Gleason score, and the possibility of tumour recurrence. 33

| Transcriptomic architectures of PCa and metastasis
We employed PCA and found the samples were clustered into three groups except for two outliers, PC2 and M_PC4 (Figure 2A), which were then removed for subsequent analysis. DEGs (adjusted p < 0.05 and |log 2 FC| > 1) were then identified using the DESeq2 package, with a total of 4005, 2925, and 102 yielded from each comparison (Data S2).
Of these, 1207 genes were up-regulated and 2798 genes were downregulated in the PC group compared to the P group ( Figure 2B); 1066 genes were up-regulated and 1859 genes were down-regulated in the M_PC group compared to the P group ( Figure 2C); accordingly, 69 genes were up-regulated and 33 genes were down-regulated in the M_PC group compared to the PC group ( Figure 2D). The over-representation analysis was used to estimate the functions of the DEGs. With the threshold of FDR <0.05, 64 terms were significantly over-represented for the DEGs identified between the PC and P groups, and 22 terms were enriched for the DEGs between the M_PC and P groups. However, no statistically significant term was enriched in M_PC and PC groups. Among these terms, several PCa-related pathways were enriched ( Figure 2E,F), including "pathways in cancer", "ECM-receptor interaction", "cell adhesion molecules", "focal adhesion", "cAMP signaling pathway", "MAPK signaling pathway", and "PI3K-Akt signaling pathway". Additionally, 20 terms commonly occurred in both comparisons (PC vs. P and M_PC vs. invasion, causing bone metastasis observed in bone marrow coculture PC3 and bone marrow adipocyte. 35

| Gene expression clustering of PCa and metastasis
We next performed gene clustering analysis to identify the signature genes associated with metastasis. By applying the K-means method, a total of 11 programmes were generated ( Figure 3A In program 10, 5 KEGG pathways, including mismatch repair, mitophagy, melanoma, gastric cancer, and breast cancer, were enriched ( Figure 3C). Genes related to three other cancers (melanoma, gastric, and breast cancer) suggested the potential function in PCa.

| Signature genes associated with the PCa metastasis
It has been widely recognized that co-expressed genes are commonly involved in similar biological pathways or processes. Therefore, we constructed gene co-expression networks using WGCNA to identify the genes that played a similar role in the process of metastasis. A total of 18,262 genes were parsed into 34 co-expression modules.
Test of associations between the metastasis state and the module eigengenes was performed and the white-coloured module was shown to have the most positive association with the metastasis (r = 0.72; p = 0.02, Figure 4A).
The module harboured 135 genes. We further looked up the distributions of those genes in the 11 gene expression clusters. The result demonstrated 67.41% genes reoccurred in the program 10 ( Figure 4B). Of them, 26 genes were further retained, as their expression levels showed significant differences between the M_PC and P groups. Among the 26 genes, 18 genes showed significant expression differences between the M_PC and PC groups (Table S2). Their exomic and transcriptomic profiles were shown in Figure 4C. It is worth noting that these genes were specifically highly expressed in M_PC, and only two genes contained damaging mutations (FAM222A: rs199694375; OTOGL: rs78377084, Chr12: 80747138, C>T).
We further explored the 26 genes under the specific phenotypes and diseases (keywords: prostate cancer and metastasis) in the Phenolyzer (https://pheno lyzer.wglab.org) 40 ( Figure 4D) and found NOS3 played a relatively crucial role in these selected genes. NOS3 (endothelial nitric oxide synthase or eNOS) was highly expressed in metastatic PCa and was targeted by microRNA to suppress bone metastasis. 41 In addition, other genes have also been associated with cancer metastasis. CST2 and PGM2L1 functioned in the cell migration and invasion of PCa. 42,43 The function of AGR3 in cancer development remains unclear. However, AGR3 could bind to metastasis-related proteins C4.4a and DAG-1, and co-expression analysis depicted some AGR3 partners linking AGR3 with cell adhesion, tight junction, motility, metastasis, and regulation of cell cycle. 44

| Landscape of mutations and expression level changes for essential genes
Previous research has reported some essential molecules in PCa. 14,50 Therefore, we focused on 62 genes that have key roles in several aspects, including ETS, R.B., EPI, AR signalling pathway, PI3K signalling pathway, Wnt signalling pathway, and DNA repair (other unclassified genes were grouped into others).
Only 21 genes harboured damaging mutations, with TP53 and

| DISCUSS ION
With the development of PCa, metastasis will eventually happen to patients, and metastases are the final result of a multi-stage process.
This process depends on the specific phenotypic characteristics of tumour cells and the interaction with the host microenvironment and immune system. Genetic alterations in primary and metastatic tumours may contribute to these phenotypes and host interactions. 3 In this study, we conducted a comprehensive analysis by comparing patients without cancer, patients with primary cancer and metastatic cancer. Comparing the genetic similarities and differences between primary and metastatic tumours may provide new insights into the biology of metastasis.
With the assistance of WES, we discovered that some chromosomal regions were highly mutated. Chr1 had the highest frequency within the filtered 1839 variants. WNT4 located in 1p36 and the induction of WNT4/TCF7L1 resulted in increased malignancy in prostate cancer that was linked to dysregulation of androgen receptor signalling and activation of the IL-8/CXCR2 pathway. 51 Furthermore, several essential genes were located in highly mutated regions such as CCND1 (11q13). CCND1 was over-expressed in more aggressive prostate cancer phenotypes, and its expression was regulated by oestrogens via ERβ and might contribute to the progression and pathogenesis of prostate cancer. 52 Previous studies found that genes like KMT2D and BRCA1 mutation frequency were higher in metastatic samples. 3 Although non-damaging mutations were iden-  metastasis. In PCa, the Ras signalling network activated NF-κB and MAPK members in the process of metastasis. 34 The Ras/Raf/ MEK/ERK/Elk-1 signalling cascade was the central backbone in regulating bone metastasis. 55 Among those genes enriched in Ras signalling, several genes belong to fibroblast growth factors (FGF) and relevant receptors. FGF signalling mediates cell secretion to influence PCa metastasis. 56 To clarify gene expression change in different groups, we performed clustering analysis, and part programmes presented high expression in M_PC. There was an upward tendency in program F I G U R E 5 The landscape of alterations of crucial genes in prostate cancer (PCa). The top and the first bar graph on the right of the heatmap represented the number of genes mutated in one sample and the number of samples where one gene mutated. Two bar graphs on the right represented the differences in expression levels of critical genes between the two groups of cancer and the control group, respectively. *p < 0.05. 11, and pathway analysis suggested that genes in program 11 were related to DNA replication and cell proliferation. Besides, we used WGCNA to combine phenotype and expression and connected with clustering results to narrow programmes. Twenty-six genes were found by selecting the pattern and module that best matched the feature of metastasis. Among them, several genes had been identified to play a role in the progression and metastasis of PCa, and the remaining were associated with other cancers.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors confirm that there are no conflicts of interest.

DATA S U B M I S S I O N
The sequencing data have been already submitted to Sequence Read Archive (SRA) of NCBI and were available at PRJNA883010.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information can be found online in the Supporting Information section at the end of this article.