Genetic variations in miR‐125 family and the survival of non‐small cell lung cancer in Chinese population

Abstract To investigate the associations between the functional single nucleotide polymorphisms (SNPs) in the miR‐125 family and the survival of non‐small cell lung cancer (NSCLC) patients, we systematically selected six functional SNPs located in three pre‐miRNAs (miR‐125a, miR‐125b‐1, miR‐125b‐2). Cox proportional hazard regression analyses were conducted to estimate the crude and adjusted hazard ratios (HRs) and their 95% confidence intervals (CIs). Reporter gene luciferase assay was performed to examine the relationship between the SNPs and transcriptive activity of the miRNAs. The expression of miRNAs in different cells was detected using quantitative real‐time PCR assay. We found that rs2241490 (upstream of miR‐125b‐1, G > A, adjusted HR = 1.24, 95%CI = 1.05‐1.48, P = 0.014, in dominant model; adjusted HR = 1.18, 95%CI = 1.03‐1.35, P = 0.014, in additive model), rs512932 (upstream of miR‐125b‐1, A > G, dominant model: adjusted HR = 1.25, 95%CI = 1.05‐1.48, P = 0.013) and rs8111742 (upstream of miR‐125a, G > A, dominant model: adjusted HR = 0.84, 95%CI = 0.71‐1.00, P = 0.047) were associated with the prognosis of 1001 Chinese NSCLC patients. The combined analysis of the three SNPs related the number of risk alleles (rs2241490‐A, rs512932‐G and rs8111742‐G) to death risk of NSCLC in a locus‐dosage mode (P for trend <0.001). Furthermore, luciferase reporter gene assay showed significantly higher levels of luciferase activity with rs512932 variant G than that with A allele in 293T, SPC‐A1 and A549 cell lines. Besides, miR‐125b was highly expressed in lung cancer cells than the normal lung cell. Our study indicated that genetic variations in miR‐125 family were implicated in the survival of NSCLC patients. Larger population‐based and functional studies are needed to verify these findings.


| INTRODUCTION
Lung cancer is among the leading cause of cancer-related death globally, with the 5-year survival rate generally lower than 20% around the world. 1 The proportion of non-small cell lung cancer (NSCLC) is 80% in all the lung cancer cases. The tumor-node-metastasis staging system has traditionally and extensively been used to evaluate the prognosis and determine the most rational management of cancer patients. However, there are remarkable differences in recurrence and survival within the same staging group, indicating the inadequacy of the staging system to account for this heterogeneity. Therefore, it is necessary to develop specific prognostic biomarkers to help improve cancer detection, treatment and prognosis. 2 MicroRNAs (miRNAs) belong to endogenous small noncoding and single stranded RNAs with 19 ~ 25 nucleotides. They predominantly bind to the 3' untranslated region (UTR) or 5'UTR of the targeted mRNAs, thus regulating the abundance of mRNAs and the expression of the corresponding proteins. 3 A single miRNA can bind to hundreds of mRNA targets, thereby implicating in crucial biological processes. [4][5][6] Studies have found significantly different miRNA profiles in tumors, indicating that miRNAs play substantial roles in the development, progression and survival of cancers, 7 including lung cancer. [8][9][10] As the human homolog of the miRNA lin-4, which was the first discovered miRNA in C. elegans development, 11 miR-125 family consists of hsa-miR-125a, hsa-miR-125b-1 and hsa-miR-125b-2. Members of the family play crucial roles in tumorigenesis and progression, mainly regulating tumor cell growth in immunity, proliferation, apoptosis, and invasion or metastasis. [12][13][14][15] Expression of different members in miR-125 family has controversial properties in lung cancer. 15,16 Germline sequence abnormalities such as single nucleotide polymorphisms (SNPs) identified in miRNA genes can affect the transcription of primary transcripts (pri-miRNAs), precursor RNAs (pre-miRNAs) processing and maturation, or miRNA-mRNA interactions. [17][18][19] For instance, the miR-30c-1 rs928508 G allele was associated with significantly decreased expression levels of pre-and mature-miR-30c-1 via modulating the pri-miRNA processing, subsequently resulted in better NSCLC survival. 20 As each miRNA has various targets, the inherited and miRNA expression-related minor variations might have significant influences on the expression of various protein-coding genes which are implicated in malignant transformation.
According to the indicative important role of miR-125 family and SNPs in cancer development, we hypothesized that the SNPs in the region of miR-125 family (hsa-miR-125a, hsa-miR-125b-1 and hsa-miR-125b-2) genes may have effects on clinical outcome in NSCLC patients, possibly through influencing the expression of miR-125a/b. We selected six potentially functional SNPs located in and 10kb upstream regions of the three pre-miRNAs and tested the association between these SNPs and survival of Chinese NSCLC patients.

| Study population
We recruited newly diagnosed patients who were confirmed as NSCLC by histopathologic or cytologic examinations and without other cancer history, from the Cancer Hospital of Jiangsu Province, and the First Affiliated Hospital of Nanjing Medical University (NMU) (Nanjing, China). The diagnosis was reviewed by at least two pathologists. A standard questionnaire was administered through faceto-face interviews to obtain demographic data and lifestyle factors, including sex, age, and cigarette smoking. Those who smoked at least one cigarette per day and over one year throughout their lifetime were defined as smokers; otherwise, they were considered as nonsmokers. Each patient donated 5 mL of fasting venous blood sample and was followed up every 3 months to collect the information of treatment and progression. From July 2003 to August 2013, a total of 1341 NSCLC patients have been prospectively recruited, among which 1001 cases (74.6%) had complete demographic and subsequent follow-up information and sufficient DNA specimens, with a median survival time (MST) of 26.0 months. 21 The median follow-up time of the patient cohort was 18.8 months for all patients. Our study was authorized by the Institutional Review Board in NMU. All the participants have signed informed consent before participating in the study.
for the Development of Jiangsu Higher Education Institutions, Grant/Award Number: Public Health and Preventive Medicine cell. Our study indicated that genetic variations in miR-125 family were implicated in the survival of NSCLC patients. Larger population-based and functional studies are needed to verify these findings.

K E Y W O R D S
miR-125, miRNAs, non-small cell lung cancer, single nucleotide polymorphisms, survival

| SNP selection and genotyping
We focused on potentially functional SNPs of the selected pre-miRNAs. SNPs located in or within 10kb upstream of hsa-miR-125a, hsa-miR-125b-1 and hsa-miR-125b-2, were extracted from the HapMap database (phase II +III Feb 09, on NCBI B36 assembly, dbSNP b126). SNPs meeting the following criteria were included in our study: (a) having a minor allele frequency (MAF) ≥ 0.05 in population of Han Chinese in Beijing (CHB); (b) satisfying Hardy-Weinberg equilibrium (HWE) (P ≥ 0.05); (c) with genotyping rate ≥90%. SNPs were further annotated using SNPinfo Web Server (http://snpinfo.niehs.nih.gov/). When multiple SNPs were in strong linkage disequilibrium (r 2 ≥ 0.8), we kept only one SNP. As a result, six SNPs were selected. Genomic DNA was isolated from leukocytes of venous blood via proteinase K digestion and then extracted by phenol-chloroform. The genotyping was implemented on Illumina Infinium® BeadChip (Illumina Inc). The information of assay conditions and the primers and probes is available if requested. Quality control strategies (ie, one blank well and three repeated samples) were strictly followed, as described in our previous studies. 21 Finally, the six SNPs were all successfully genotyped with genotyping rates above 95%.

| Cell culture
The human lung cancer cells (A549, SPC-A1, H460, and PC9), 16 human bronchial epithelial cells (16HBE), and human embryonic kidney cells (293T) were purchased from the Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences (Shanghai, China). The cell lines were cultured in DMEM medium with 10% heat-inactivated fetal bovine serum (Gibco, USA) and 100 ug/mL streptomycin (Gibco, USA) at a 37℃-incubator supplemented with 5% CO 2 . We have authenticated the cells and did not find contamination from mycoplasma as well as cell line cross-contamination.

| Reporter gene luciferase assay
The 5'-flanking region sequence of pre-miR-125b-1 gene was obtained from the Homo sapiens chromosome 11 (NC_000011.10) after a blast search. The fragments from −1000 to the transcription start site (TSS) of the pre-miR-125b-1(including rs2241490) and from −3590 to −2590 (including rs512932) sequences were separately synthesized and constructed into pGL3-enhancer vector (Promega, Madison, WI, USA) by Generay Company (Shanghai, China). We validated all the plasmids by DNA sequencing.
For transfection, cells were seeded into 24-well culture plates and transfected via lipofectamine-2000 transfection reagent with 0.5 μg constructed luciferase reporter gene plasmids mentioned above. pRL-SV40 (as internal control) was transiently co-transfected into cells for correcting transfection efficiency. Twenty-four hours after transfection, all cells were washed with PBS and lysed with 1 × passive lysis buffer. Luciferase activity was determined by the Dual-Luciferase Reporter Assay System (Promega, Madison, WI) following the manufacturer's protocol. Each cell line was used in 3 independent transfection experiments, and each experiment was performed in quadruplicate.

| RNA isolation and quantitative realtime PCR assay
Total RNA from four types of cells was extracted by using the miRNeasy Mini kit (Qiagen, Germany) and then was reversely transcribed to complementary DNA with the TaqMan miRNA RT Kit and stem-loop RT primers (Applied Biosystems, USA). miRNA expression was detected based on the TaqMan PCR kit implemented in the ABI 7900 realtime PCR System (Applied Biosystems, USA) and normalized using the threshold cycle (Ct) of U6. All reactions were performed in triplicate.

| Statistical analyses
Deviation status of genotype distribution from the HWE for each SNP was examined by a goodness-of-fit χ 2 test. Survival time was determined starting from the date of lung cancer diagnosis to that of death or last follow-up. Kaplan-Meier method and log-rank test were used to compare the survival time among subgroups categorized by patient characteristics, clinical features and genotypes. Univariate and multivariate Cox proportional hazard regression analyses were conducted to estimate the crude and adjusted hazard ratios (HRs) and their 95% confidence intervals (CIs). Adjusted covariates included age, sex, cigarette smoking, clinical stage, histology, lung cancer surgery, and chemotherapy or radiotherapy status. The heterogeneity among subgroups was examined with the χ 2 -based Q-test. Oneway ANOVA and Scheffe method were used to compare the levels of luciferase activity among three groups and between each two groups in the condition of homogeneity of variance, respectively. Comparison of the miRNA levels in lung cancer cell lines with 16HBE was conducted by using Student's t-test. All analyses were two-sided with 0.05 as significant level and used R software (Version 3.3.2; the R Foundation for Statistical Computing). STATA (version 12.0; Stata Corp., College Station, TX) was used to plot the survival curves of patients with different allele combinations. And histograms of relative luciferase activity and miR-125b expression level were plotted using MS Office Excel 2013.
As shown in Table 1, 1001 patients were included in the final analysis, among which 545 deaths were observed during the period of follow-up. The median age at diagnosis was 62 years and 69.4% (n = 695) were males. There were 657 (65.6%) adenocarcinomas and 344 (34.4%) squamous cell carcinomas. Cigarette smoking, advanced clinical stage, and chemotherapy or radiotherapy were significantly associated with shorter survival time (log-rank P < 0.05, Table 1). In contrast, female and lung cancer surgical resection could significantly ameliorate the prognosis of NSCLC (log-rank P < 0.05, Table 1).
Then we evaluated the combined effects of the three SNPs (rs2241490-A, rs512932-G and rs8111742-G) on NSCLC survival. The results showed that the more risk alleles the patients carried, the shorter MST they would survive, suggesting a significant locus-dosage effect between risk alleles and survival of NSCLC (P for trend <0.001). Compared to subjects with "0 ~ 1" risk allele (MST = 39.4 months), subjects who carried two or more risk alleles had shorter of MSTs (MST = 25.9, 25.0, 24.0 months for "2", "3" and "≥4" risk alleles, respectively) and larger HR of 1.26 (95%CI = 0.95-1.67), 1.42(1.08-1.88), and 1.65(1.22-2.24), respectively (Table 3, Figure 1). To further characterize the relationship of the three SNPs with NSCLC survival, we conducted a stratified analysis according to age, sex, cigarette smoking, surgical operation, stage, histology, and chemotherapy or radiotherapy therapy in dominant model. As shown in Table 4, we found that rs2241490 had stronger risk effect on NSCLC survival in female patients and those with I/II clinical stage (P = 0.036, P = 0.024 for heterogeneity test, respectively). Similarly, rs512932 also had a stronger risk effect in patients with I/ II stage (P Heterogeneity = 0.002). According to a further interaction analysis, statistically significant multiplicative interactions were found for rs512932 and rs2241490 with stage (both with P int < 0.001) ( Table 5).
According to the SNPinfo, rs2241490 and rs512932 might modulate the binding of transcription factor. Thus, we hypothesized rs2241490-A and rs512932-G might influence the hsa-miR-125b-1 expression. We generated four luciferase reporter gene plasmids (rs2241490 G and A allele; rs512932 A and G allele) and used pRL-SV40 plasmids to normalize the transfections. Significantly higher levels of luciferase activity were observed for the reporter gene vector with rs512932 G allele than that with A allele in 293T, SPC-A1 and A549 cells (7.810 vs 1.009, P = 0.002; 9.119 vs 0.831, P = 0.002; 8.206 vs 0.691, P < 0.001, respectively, Figure 2). However, no significantly different levels of luciferase activity were observed when we transfected luciferase reporter gene plasmids carrying the rs2241490-G allele or carrying the rs2241490-A (293T: 0.939 vs 0.756, P = 0.140; SPCA1: 0.980 vs 0.826, P = 0.100; A549: 0.946 vs 0.866, P = 0.250, respectively, Figure 2). These results suggested that rs512932 A > G might upregulate the expression of miR-125b-1 by increasing transcriptive activity. In addition, according to the real-time PCR assay, miRNA-125b had significantly higher expression in lung cancer cells when compared with 16HBE cells (P < 0.001) (Figure 3).

| DISCUSSION
In this study, we focused on six potentially functional SNPs in miR-125 family (miR-125a, miR-125b-1, miR-125b-2) and found that rs2241490, rs512932 and rs8111742 were associated with the prognosis of NSCLC patients in a Chinese population, and the combined analysis of the three SNPs showed a significant locus-dosage effect of the number of risk alleles (rs2241490-A, rs512932-G and rs8111742-G) on NSCLC survival. Furthermore, luciferase reporter gene assay showed that rs512932-G could increase the transcription activity of miR-125b-1. To the best of our knowledge, this is the first clinical follow-up study to evaluate the association between germline genetic variants in miR-125 family and NSCLC survival in Chinese population.

T A B L E 2 Distributions of six SNPs in the NSCLC patients and associations with the survival
Pre-miRNA SNP a Many studies have highlighted the role that germline variants in miRNAs or regulatory elements played in cancer. 18,22 In this study, we found that rs2241490 (228bp upstream of pre-miR-125b-1), rs512932 (2989bp upstream of pre-miR-125b-1) and rs8111742 (1033bp upstream of pre-miR-125a) were significantly associated with the survival of NSCLC patients, and the G allele of rs512932 increased the transcriptional activity. According to HaploReg, 23 rs512932 in A549 is marked with enhancer activity (H3K4me1 24 and chromatin states 25-state model), in normal human lung fibroblasts (NHLF) is marked as both enhancer (H3K4me1, H3K27ac) and promoter (H3K4me3) and also as 5' preferentially transcribed. The regulatory motifs possibly altered by the SNP are CDP. 25 In addition, it showed that rs512932 is an eQTL of lnc-RNA RP11-166D19.1 (ENSG00000255248.2, slope=0.085, P = 0.013) according to GTExV7. 26 RP11-166D19.1 is an isoform of MIR100HG, which is a leukemia-related oncogene 27 hosting three miRNAs (let-7a, miR-100, and miR-125b-1) T A B L E 3 The associations between three positive SNPs in the miR-125 family and survival of the NSCLC patients The combined genotypes were addition of risk alleles carried (rs2241490-A, rs512932-G and rs8111742-G).

F I G U R E 1 Kaplan-Meier plots of survival according to
combined risk alleles of the three SNPs (rs2241490-A, rs512932-G and rs8111742-G) in the Chinese NSCLC patients. SNP, single nucleotide polymorphism as a cluster in its introns. 28 We did not find any evidence for rs512932 regarding eQTLs toward hsa-mir-125b-1/hsa-miR-125b-5p/hsa-miR-125b-1-3p in GTEx or TCGA lung adenocarcinoma (LUAD)/squamous cell lung carcinoma (LUSC) samples. However, there might still be a possible mechanism in different cell context that rs512932 modify the transcription of miR-125b-1 with further functional studies to reveal the relationships among the SNP, miR-125b-1, MIR100HG and RP11-166D19.1. miR-125b is transcribed from two loci located on chromosome 11q24 and 21q21 with corresponding product of hsa-miR-125b-1 and hsa-miR-125b-2, 16 respectively. Studies have shown that miR-125b, as an oncogene, is highly expressed in lung cancer cells/tissues/serums and is associated with poor outcome of NSCLC patients. [29][30][31][32][33] Our study indicated that the G allele of rs512932 might play a role in the development and prognosis of non-small-cell lung cancer through affecting the binding of the transcription factor, which then alters miR-125b expression. Rs2241490 was not associated with the transcriptional activity of miR-125b-1. As it was predicted by chromatin states 25-state model that rs2241490 in A549 has promoter activity, the SNP may be associated with NSCLC survival through other mechanisms. Further function-based studies are warranted to verify our findings.
Most of the studies on miR-125a-3p/5p found that it was a tumor suppressor and was downregulated in lung cancer. Higher expression of miR-125a may predict better survival in NSCLC patients. 14,34-37 Wang et al reported that miR-125a, as a metastatic suppressor in lung cancer cells, activated by epidermal growth factor receptor (EGFR) signaling, inhibits tumorigenesis and tube formation. 38 In our study, rs8111742 located 1033bp upstream of miR-125a was associated with better survival in NSCLC patients. The SNP in A549 is marked by both enhancer (H3K4me1 and H3K27ac) and promoter (H3K4me3 and H3K9ac) according to HaploReg, indicating the region is active regulatory elements. It is possible that rs811742 might change the activity of the regulatory elements that harbor it, thus change the expression of miR-125a, which is associated with the survival of NSCLC.
Several limitations of our study are needed to be addressed. First of all, a relatively small sample size could confine the statistical power of the study, especially in the interaction analysis, and additional larger scale populationbased studies are needed to reinforce the reliability of our results. Secondly, as a hospital-based study, intrinsic selection bias cannot be entirely excluded. Thirdly, when taking multi comparison into consideration, two SNPs remained significant (P adj =0.023 for both rs2241490 and rs512932) except rs8111742 (P adj = 0.056) in dominant models after using false discovery rate (FDR). Finally, although higher luciferase activity of reporter plasmids containing rs512932 variant G allele in three cell lines was observed, evidence from lung cancer tissue with the same origin of the blood specimen analyzed was limited. And we were unable to clarify real biological effects derived from allele difference. Further functional studies on cell lines or tissues may help to confirm and expand our findings. Nevertheless, this is the first ever to examine the association between the polymorphisms of miR-125 family and prognosis of NSCLC, and provided valuable information for future researches and clinical practice. This study indicated that rs2241490, rs512932 and rs8111742 in miR-125 family were associated with the F I G U R E 2 Different levels of luciferase activity of the region harboring rs512932 and rs2241490 in 293T, SPC-A1, and A549 cell lines. All constructs were co-transfected with pRL-SV40 to standardize the transfection efficiency. Data presented are the mean ± SD. Each cell line was used in 3 independent transfection experiments, and each experiment was performed in quadruplicate.*** represents P < 0.001 Relative miR-125b-5p level prognosis of NSCLC patients in a Chinese population. Larger population-based and functional studies are needed to verify these findings.