Evaluation of polymorphisms in microRNA‐binding sites and pancreatic cancer risk in Chinese population

Abstract As promising biomarkers and therapy targets, microRNAs (miRNAs) are involved in various physiological and tumorigenic processes. Genetic variants in miRNA‐binding sites can lead to dysfunction of miRNAs and contribute to disease. However, systematic investigation of the miRNA‐related single nucleotide polymorphisms (SNPs) for pancreatic cancer (PC) risk remains elusive. We performed integrative bioinformatics analyses to select 31 SNPs located in miRNA‐target binding sites using the miRNASNP v2.0, a solid database providing miRNA‐related SNPs for genetic research, and investigated their associations with risk of PC in two large case‐control studies totally including 1847 cases and 5713 controls. We observed that the SNP rs3802266 is significantly associated with increased risk of PC (odds ratio (OR) = 1.21, 95% confidence intervals (CI) = 1.11‐1.31, P = 1.29E‐05). Following luciferase reporter gene assays show that rs3802266‐G creates a stronger binding site for miR‐181a‐2‐3p in 3′ untranslated region (3′UTR) of the gene ZHX2. Expression quantitative trait loci (eQTL) analysis suggests that ZHX2 expression is lower in individuals carrying rs3802266‐G with increased PC risk. In conclusion, our findings highlight the involvement of miRNA‐binding SNPs in PC susceptibility and provide new clues for PC carcinogenesis.


| INTRODUC TI ON
Pancreatic cancer (PC) is the twelfth most common cancer around the world 1 and is one of the most lethal human cancers with a rather low 5-year survival rate of 5%. 2,3 In China, there are estimated over 90 000 new cases and nearly 80 000 related deaths in 2015. 4 Chronic pancreatitis, type 2 diabetes, obesity and cigarette smoking have been established as PC risk factors. Meanwhile, the genetic pathogenesis of PC is still unclear. 5,6 Over the last decade, genome-wide association studies (GWASs) have identified multiple PC-associated chromosome loci. 7 For Chinese population, the first GWAS located 6 susceptibility loci (5p13.1, 10q26.11, 13q22.1, 21q21.3, 21q22.3 and 22q13.32) 8 and a subsequent investigation further indicated another locus 17q24.3 containing LINC00673 contributing to PC risk. 9 Recently, our group performed a exome-wide association study (EWAS) and discovered three new associated regions (19p13.12, 8p21.3 and 2p24.1). 10 However, the single nucleotide polymorphisms (SNPs) identified by GWASs and EWASs could only explain a minor portion of heritability, 11,12 and the missing heritability of PC remains to be dissected. 13 MicroRNAs (miRNAs) are endogenous small non-coding RNAs constituting of about 22 nucleotides, 14,15 and they could inhibit mRNA translation and induce mRNA decay via base-pairing binding to the 3′ UTR of target mRNAs. 16,17 As valuable biomarkers and promising therapeutic targets, miRNAs are found involved in multiple physiological and tumorigenic processes. 18 Meanwhile, SNPs in miRNAs' target binding sites could impact the interactions between miRNAs and target genes, functioning as regulatory variants for various phenotypes and diseases. 19 Previously, we upgraded a widely used online database called miRNASNP v2.0, 20 and successfully applied it to locate a functional SNP rs1062044 affecting miR-423-5p binding to the gene LAMC1 in colorectal cancer susceptibility locus 1q25.3. 21 Still, systematic investigation on the miRNA-binding SNPs for PC risk is absent.
Pancreatic cancer EWAS data-mining should help us to perform a genome-wide evaluation of the miRNA-binding polymorphisms. The exome chip (Illumina Human Exome Beadchip) we previously used was one platform that primarily focused on variations in the exon regions of genes. Based on EWAS data, the genotypes of un-assayed SNPs in 3′ untranslated regions (3′ UTRs) could be accurately imputed, in addition to those in the flanking exon and intron regions.
In this study, we integrated the data from the database miRNASNP v2.0 and our PC EWAS to genome-widely screen miRNA-binding polymorphisms for PC risk. Followed by an independent case-control study and corresponding functional assays, we identified a PC-associated variant rs3802266 affecting miR-181a-2-3p binding to the gene ZHX2.

| Study participants
A two-stage case-control study was conducted to assess the associations between SNPs and PC risk. In Discovery Stage, we conducted the first-phase association study according to the EWAS imputation data of 943 cases and 3908 controls. The enrolment and characteristics of EWAS population were described previously. 10 Replication Stage consisted of 904 cases and 1805 controls that were recruited from different hospitals at Wuhan area. All subjects were unrelated Han Chinese. All patients were histopathologically confirmed PC without previous chemotherapy or radiotherapy, while cancer-free controls came from community nutritional surveys in the same area during the same time period. With a written informed consent, about 1-mL venous blood was obtained from each participant, part of which was included in our previous studies. [22][23][24][25] This study was carried out under the approval from the ethics committee of Tongji Medical College, Huazhong University of Science and Technology.

| Selection of candidate SNPs
Through a systematic bioinformatics analysis, we screened out the most promisingly functional polymorphisms in miRNA-binding sites for PC risk on a genome-wide scale. Totally, there were 236 241 lossof-function SNPs (Loss SNPs) and 263 696 gain-of-function SNPs (Gain SNPs) that predictably impact miRNA-target interactions in the credible database miRNASNP v2.0 we previously built. 20 First, we narrowed down candidate microRNA-related polymorphisms according to the transcriptome data from The Cancer Genome Atlas (TCGA) pancreatic ductal adenocarcinoma (PAAD) data sets. We picked the SNPs whose affecting miRNAs that authentically expressed in PC tissues, with an expression level more than 100 reads per million mapped reads (RPM) according to the RNA-seq data from TCGA database. We further exclude the SNPs with corresponding target genes that were not expressed in PC tissues. Second, given that the energy change (ΔΔG) represented the impact of a SNP on the binding between miRNA and mRNA, we selected SNPs with altered energy bigger than median ΔΔG (17.6 kcal/mol for Loss SNPs and 16.6 kcal/ mol for Gain SNPs). Third, common variants with minor allele frequencies (MAFs) larger than 0.05 in Han Chinese in Beijing (CHB) were chosen for next-step screening on the basis of data from Ensembl database (version 90: http://e90.ensem bl.org/Homo_sapiens). Fourth, we sorted out the SNPs that were eQTLs (P < .01) whose gain-offunction allele was associated with lower expression of target gene or loss-of-function allele was associated with higher expression according to The Genotype-Tissue Expression (GTEx) database, 26 in the light of inhibition and decay effect of miRNA on mRNA. Finally, promising SNPs were submitted to a previous PC EWAS data set which was well genotype-imputed as mentioned before. 10 The process of the integrative bioinformatics analysis is summarized in Figure 1

| Genotyping
DNA were extracted from blood samples using RelaxGene Blood System DP319-02 (Tiangen). In Replication Stage, the genotypes of promising SNP were determined through TaqMan SNP Genotyping Assays on the platform of 7900HT Fast Real-Time PCR (Applied Biosystems). For quality control, we randomly selected 5% samples to assess the reproducibility and found out a concordance rate of 100%.

PANC-1 and BxPC-3 cells were obtained from China Center for Type
Culture Collection. Two cell lines were periodically authenticated by short tandem repeat (STR) and were checked for the absence of mycoplasma contamination (MycoAlert). Cells were cultured into Dulbecco's Modified Eagle's Medium (DMEM; Gibco), with addition of 10% foetal bovine serum (FBS; Gibco) and 1% antibiotics (100 U/ mL penicillin and 100 μg/mL streptomycin), under the environment with an atmosphere of 5% CO 2 and 37°C.
For assays, after PC cells were seeded into 96-well plates, 50 ng constructed plasmids and 6 pmol miR-181a-2-3p mimics and/or inhibitors were cotransfected using Lipofectamine 3000 Reagent (Invitrogen) following manufacturer's instruction. Forty-eight hours later, Firefly and Renilla luciferase activities were detected by Dual Luciferase Reporter Assay System (Promega). The ratio of Firefly to Renilla luciferase activity was calculated as relative luciferase activity for each sample. Three independent assays were performed, while each assay was conducted in triplicate.

| Statistical analysis
The distribution differences of gender, age and genotype between cases and controls were estimated by t test or chi-square test whenever appropriate. The odds ratios (ORs) and corresponding 95% confidence intervals (95%CIs) were calculated through multivariate logistic regression, adjusted by gender and age. For statistical analyses, PLINK software was used in the discovery stage and SPSS Software v20.0 (SPSS, USA) was used in the replication stage. P values were two-sided and the criterion of P < .05 was considered as statistically significant.

| Population characteristics
Characteristics of the participants from Replication Stage were summarized in Table 1. In Replication Stage, 904 cases and 1805 controls were recruited, while cases and controls were well matched in terms of gender and age group (P > .05). We then combined the two data sets from both discovery and replication stage, and a total of 1847 cases and 5713 controls were included in the combined analyses (Table S1).

| Selection of candidate SNPs and association analyses
As the results of the bioinformatics analysis on the data from miR-NASNP v2.0, TCGA, Ensembl and GTEx, 31 common SNPs were selected as candidate SNPs for the first-stage case-control study which was our previous EWAS ( Figure 1, Table S2). Table 2 Note: All the P values were adjusted by gender and age group. The nominal significant results were in bold.
After two stages were pooled together, significant association between rs3802266 and PC risk was consistent in Combined Study (

| Dual luciferase reporter gene assays
We conducted dual luciferase reporter assays to investigate the effect of rs3802266 on the binding between miR-181a-2-3p and

| Expression quantitative trait loci (eQTL) analyses
To further investigate whether the functional variant rs3802266 affect ZHX2 expression, we performed cis-eQTL analysis using newly released data from GTEx (GTEx Analysis Release V7). 26 It showed that individuals carrying the rs3802266-G genotype had significantly lower ZHX2 mRNA levels in pancreas tissue samples compared with those carrying the rs3802266-A genotype (Figure 3). On the other side, a huge-scale eQTL analysis containing 31-684 blood samples, named eQTLGen, also indicated rs3802266 as a significant cis-eQTL of ZHX2 (P = 1.72E-118). 27

| D ISCUSS I ON
In this study, we systematically evaluated the associations be-

CO N FLI C T O F I NTE R E S T
All authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets during the current study are available from the corresponding author on reasonable request.