Transcriptome‐wide association study identifies multiple genes and pathways associated with pancreatic cancer

Abstract Aim To identify novel candidate genes for pancreatic cancer. Methods We performed a transcriptome‐wide association study (TWAS) analysis of pancreatic cancer (PC). GWAS summary data were driven from the published studies of PC, totally involving 558 542 SNPs in 1896 individuals with pancreatic cancer and 1939 healthy controls. FUSION software was applied to the PC GWAS summary data for tissue‐related TWAS analysis, including whole blood, peripheral blood, adipose, and pancreas. The functional relevance of identified genes with PC was further validated by Oncomine, STRING, and CluePedia tool. Results Transcriptome‐wide association study analysis identified 19 genes significantly associated with PC, such as LRP5L (P value = 5.21 × 10‐5), SOX4 (P value = 3.2 × 10‐4), and EGLN3 (P value = 6.2 × 10‐3). KEGG pathway enrichment analysis detected several PC‐associated pathways, such as One carbon pool by folate (P value = 1.60 × 10‐16), Cell cycle (P value = 1.27 × 10‐7), TGF‐beta signaling pathway (P value = 4.64 × 10‐6). Further comparing the 19 genes with previously identified overexpressed genes in PC patients found one overlapped gene SOX4. Conclusion We identified some novel candidate genes and pathways associated with PC. Our results provide novel clues for the genetic mechanism studies of pancreatic cancer.

Genome-wide association studies (GWAS) are an efficient tool for genetic mechanism studies of complex diseases. Expression quantitative trait loci (eQTLs) are a group of important regulatory loci, which can regulate gene expression levels. 4 In recent years, integrative analysis of GWAS data and eQTLs annotation information is rapidly becoming a standard approach for explore the genetic basis of disease susceptibility. 5 Nowadays, transcriptome-wide association study (TWAS) analysis was proposed to utilize extensive published GWAS data. TWAS analysis adopts pre-computed gene expression weights together with disease GWAS summary statistics to estimate the association of each gene to diseases. 6,7 TWAS showed a high efficiency for identifying novel causal genes of complex diseases. [8][9][10][11] In this study, we conducted a tissue-related TWAS for PC, considering whole blood, peripheral blood, adipose, and pancreas. TWAS was first applied to a large-scale GWAS data to detect novel susceptibility genes associated with PC. The functional relevance of identified genes with PC was further validated by Oncomine, STRING, and CluePedia tool.

| GWAS summary datasets of PC
A large-scale GWAS summary data of pancreatic cancer were used in this study. 7 Briefly, this GWAS comprised of 1896 individuals with pancreatic cancer and 1939 controls from 12 prospective cohorts and a hospital-based case-control study. Samples were excluded based on low completion rates (<98%) and unexpected inter-or intra-study duplicates. Commercial platform Illumina Hap500 Infinium genotyping assay was used for genome-wide SNP genotyping. GLU, Genotyping Library and Utilities, was used for data analysis and management. The association between single SNPs and pancreatic cancer was tested by Logistic regression. SNPs were excluded based on low call rates (<90%). Detailed information of cohorts, genotyping, and quality control approaches can be found in the published studies. 7

| Gene expression datasets
The over-expression genes in PC patients were driven from the Oncomine database. 12 Oncomine (https://www.oncomine.org) is a cancer microarray database and web-based data-mining platform for facilitating discovery from genomewide expression analyses. Differential gene expression was identified by comparing major types of cancer to respective normal tissues. 12,13

| TWAS of pancreatic cancer
FUSION software was applied to the PC GWAS summary data for tissue-related TWAS analysis, including whole blood, peripheral blood, adipose, and pancreas. TWAS analysis uses pre-computed gene expression weights together with disease GWAS summary statistics to calculate the association of every gene to disease. 14 The genetic values of expression were computed one probe set at a time using SNP genotyping data located 500 kb on either sides of the gene boundary. For this study, the gene expression weights of whole blood, peripheral blood, adipose, and pancreas were driven from the FUSION website (https://gusevlab.org/projects/fusion/). 14 The genes with significant and suggestive association signals were identified at P value <3.73 × 10 -6 after strict Bonferroni correcting and P value <0.05, respectively.

| PPI network and pathway enrichment analysis
The functional relevance of identified genes with PC was further validated byOncomine, STRING, and CluePedia tool. STRING (Search Tool for the Retrieval of Interacting Genes) (https://string-db.org/cgi/input.pl) is an online tool designed to evaluate the protein-protein interaction (PPI) network. 15,16 The CluePedia, a plugin of Cytoscape software, is a tool for searching potential genes associated with the certain signaling pathway by calculating linear and nonlinear statistical dependencies from experimental data. 17,18 The PPI (proteinprotein interaction) network of significant genes identified by TWAS was constructed by STRING. We also analyzed the signaling pathways of these significant genes by STRING, then verified, and visualized them by CluePedia. The candidate pathways were identified at P value <0.05.

| Identification of overexpressed genes in SCLC
We identified overexpressed and down-expressed genes in PC by comparing the PC samples to normal tissues in Oncomine online database. The top 1% genes overexpressed and down-expressed in 78 PC samples were shown in Appendix S1. After comparing the genes identified by TWAS with the overexpressed and down-expressed genes detected by Oncomine, we found one overlapped gene SOX4, which was overexpressed in PC patients. Furthermore, the significant SNP rs12530233 of SOX4 gene was also an eQTL, suggesting the functional importance of rs12530233 in the development of PC.

| DISCUSSION
Pancreatic cancer is the main cause of cancer death worldwide. The cause of pancreatic cancer is complex and multifactorial. Smoking, advanced age, and family history of chronic pancreatitis are main risk factor for PC. 1 Most patients with pancreatic cancer remain asymptomatic until the tumor metastasizes to other tissues and organs. 19 As there is no standard program for screening patients at early-stage of PC, it is necessary to find more effective susceptibility gene for PC prevention.
Consistent with the result of TWAS, previous studies have reported 4 of the 19 genes (PPP2R2A, E2F3, KCNK5, and SOX4) play important roles in the development of PC, 20-23 and 2 of the 19 genes (CHSY1 and EGLN3) have been proved associated with PC via bioinformatics methods. 25,26 SOX4 is significantly associated with PC and overexpressed in PC patients. Furthermore, it is interesting that the significant SNP rs12530233 of SOX4 gene is an eQTL, suggesting the importance of rs12530233 in the dysfunction of SOX4 expression regulation during the development of PC. Previous research has verified that SOX4 is expressed in the early processes of PC tumorigenesis and suggested that SOX4 might function as a master transcription factor in PC formation. 24 Further studies, such as fine mapping and RNA sequencing of SOX4, are needed to confirm our finding and clarify the potential mechanism of SOX4 involved in the development of PC. Another susceptibility gene E2F3, is a transcription factor family, plays an important role in cellular proliferation, apoptosis, and differentiation. 27 MiR-210 is induced by hypoxia and expressed in the development of PC. Chen et al 28 have suggested that E2F3 may be potential miR-210 targets in PC. A systemslevel analysis of the scale-free GMCs network taken by Rajamani et al 29 identified that E2F3 is associated with PC progression.
It has been known that PPP2R2A encodes an alpha isoform from the regulatory subunit B55 subfamily (B55α) and can selectively control Akt phosphorylation. 30 Intriguingly, Hein et al 21 have reported that PPP2R2A promotes PC development by maintaining hyperactivity of multiple oncogenic signaling pathways, including ERK, Wnt as well as AKT. In addition, Shen et al 22 have found PPP2R2A was significantly higher expressed in SH-PAN cells than DT-PCa cells and proved that decreased expression of PPP2R2A inhibited the development and progression of PC.
KCNK5 (also known as TASK-2 or K2P5.1) has been shown to be the volume sensitive K (+) channel in cells. 31 KCNK5 is expressed in the kidney, liver, stomach, small intestine, colon, and pancreatic acinus. An electrophysiological study indicated that KCNK5 was expressed in human pancreatic ductal adenocarcinoma cell line, and the pH-sensitive K2P subunits coded by KCNK5 were shown to be expressed in pancreatic. 20 Transcriptome-wide association study analysis identified several novel candidate genes for PC, such as RP11-65J3.1, PPFIBP2, GEMIN4, NIPA2,RNASEH2B, FARS2, MTHFD1L, F2R, TXNDC15, NDUFA3, CRISPLD2, IQSEC3, and LRP5L. As is known to all, few efforts have been paid to investigate the potential roles of them in the formation of PC. Further biological experimental studies are required to confirm our finding and clarify the potential roles of novel candidate genes in the development of PC.
Pathway enrichment analysis detected several candidate pathways for PC, some of which has been reported to be implicated in the development of PC directly or indirectly. For instance, it has been demonstrated that TGF-beta signaling pathway in pancreatic cancer can be utilized in targeted therapy clinical trials. 32 Another interesting pathway is one carbon pool by folate, which is associated with influenced PC incidence. 33 Researchers have also proved that aminoacyl-tRNA biosynthesis, metabolic pathways, cell cycle, tight junction, purine metabolism, and mRNA surveillance pathway are existed in PC cells and related to occurrence and development of PC. [33][34][35][36][37][38] The rest of these signaling pathways are solid tumor directly. Like bladder cancer, lung cancer, glioma, melanoma, chronic myeloid leukemia as well as PC.
In summary, we conducted a tissue-related TWAS analysis and identified some novel candidate genes and pathways associated with PC. Our results provide novel clues for clarifying the genetic mechanism of PC. Further biological studies are warranted to confirm our findings and reveal the potential mechanism of identified genes and pathways involved in the development of PC.