Gene polymorphism of cytochrome P450 significantly affects lung cancer susceptibility

Abstract Background Cytochrome P450 (CYPs) are heme proteins involved in the metabolism of a variety of endogenous and exogenous substances and play an important role in the carcinogenesis mechanisms of environmental and hereditary factors. The objective of this study was to investigate how polymorphisms of CYPs correlate with lung cancer (LC) susceptibility. Methods Six single nucleotide polymorphisms (SNPs) were genotyped in this study. The chi‐square test and unconditional logistic regression model were used to evaluate the correlation between SNPs and LC susceptibility. The expressions and survival data of genes in patients with LC were mined using Oncomine and Kaplan‐Meier Plotter database. Results Four SNPs were found to be significantly associated with the risk of LC development (P < 0.05). The most significant correlation was that the A allele and AA genotype of CYP2D6 rs1065852 were associated with increased risk of LC development (adjusted odds ratio [OR] = 1.35, 95% confidence interval [95%CI] = 1.13‐1.60, P = 9.04e‐4; OR = 1.83, 95%CI = 1.29‐2.59, P = 0.001 respectively). Similar association of this variant was also found in the subgroups of male patients, cases in III‐IV stages, positive lymph node, squamous cell carcinomas and adenocarcinomas. Whereas rs1065852 was considered as protective factor in females (adjusted OR = 0.33, 95% CI = 0.16‐0.70, P = 0.004). In stratified analyses, the association of CYP24A1 rs2762934, CYP24A1 rs6068816, CYP20A1 rs2043449 polymorphism with LC risk appeared stronger in some subgroups. CYP2D6, CYP24A1 and CYP20A1 are overexpressed in some pathological types of LC (P < 0.05), and high levels of CYP2D6 and CYP20A1 indicate poor and good prognosis of LC, respectively. Conclusion This study revealed that rs1065852, rs2043449, rs2762s934, and rs6068816 of CYPs were associated with LC susceptibility in the Northwestern Chinese Han population; CYP2D6 and CYP20A1 were overexpressed and correlated with prognosis of LC.


| INTRODUCTION
Cancer constitutes a burden all over the world. 1 It is estimated that nearly half of the new cases and more than half of the cancer deaths in the world will occur in Asia in 2018. 2 Lung cancer (LC) is the most common cancer which accounts for 11.6% of cancer cases, and is the leading cause of male patient deaths which account for 18.4% of cancer death, especially in East Asia and Polynesia. 2,3 Cigarette use remains the primary causal agent of LC, 4 however, other susceptibility factors such as ionizing radiation, air pollution, and exposure to occupational and environmental carcinogens, such as radon and formaldehyde could also increase the incidence of LC.
Nowadays, an increasing number of studies show a strong link between genetic factors and carcinogenesis. [5][6][7] Genomewide association studies (GWAS) have been identified several susceptibility gene locus of cancer in European people, including CHRNA3/5, CHRNB4, BRCA2, CHEK2, TERT, but only a small part of LC genetic capacity can be proved by these gene loci, and most have not been systematically verified in Asian populations. [8][9][10] Since it is the ethnic group with the largest population in East Asia, it is crucial to explore the relationship between genetic polymorphisms and susceptibility of LC in the Chinese Han population.
The cytochrome P450 superfamily (CYPs), located primarily in the liver, small intestine and kidney, 11 is a large superfamily of integral membrane conserved proteins present in animals, plants, and microorganisms, 12 which play a crucial role in the metabolism and activation of carcinogens. 13 All these active carcinogens can combine with DNA and form DNA adducts which are capable of inducing mutations and initiating tumorigenesis. Genetic polymorphisms of CYPs have been reported to be associated with various diseases and adverse drug reactions among different populations by affecting the enzyme catalytic activity. 14, 15 Kiyohara C has found that the CYPs genetic polymorphism is related to the susceptibility of colorectal cancer. 16 Maurya et al reported that polymorphisms of drug metabolizing CYPs showed modest associations with head and neck squamous cell carcinoma risk. 17 Genetic polymorphisms have been reported for CYPs involved in the metabolic activation of polycyclic aromatic hydrocarbons (PAHs) and tobaccospecific nitrosamines, 18,19 both of which are wide spreading environmental procarcinogens that induce LC and skin carcinoma. [20][21][22] However, Kiyohara C et al have found no significant association between the genetic polymorphism of enzymes involved in xenobiotic metabolism and the risk of LC. 23 To sum up, the correlation of CYPs polymorphisms and LC risk is contradictory and inconclusive due to the diversity of ethnicity and sample size in study groups. In order to validate the association between genetic polymorphisms of CYPs and susceptibility to LC in Northwest Chinese Han population, we adopted a case-control study and selected six SNPs associated with cancer from the target enzyme system to genotype and evaluate the impact of CYPs genetic polymorphisms on the risk of LC development in general and different subgroups concerning gender, tumor stages, lymph node status, and pathologies. The gene expression and relationship between the expression level and prognosis of LC were further analyzed using Oncomine and Kaplan-Meier plotter database.

| Subject and ethics statement
Five hundred and ten Pathologically confirmed LC patients hospitalized in the First Affiliated Hospital of Xi'an Jiaotong University, Shaanxi, China, were included in this study (both SCLC and NSCLC were included). Tumor stages and pathological classifications were based on the 8th edition of TNM staging system published by the Union for International Cancer Control and pathological results respectively. 24 Relevant information was extracted from medical files. Patients with other tumors and communication problems were excluded. Five hundred and four healthy subjects were recruited into the cancer-free control group in the same hospital at the same time, none of them had any history of cancers, severe endocrine and autoimmune diseases. It was made sure that there was no genetic relationship between the cases and the control subjects, the purpose of which was to minimize the environmental, hereditary and therapeutic factors affecting genetic susceptibility to LC. This study strictly complies with the Helsinki declaration of the World Medical Association. The cases and the subjects of the control group provided consent and the research was approved by the Ethics Committee of The First Affiliated Hospital of Xi'an Jiaotong University.

| SNPs selection and primer design
Six SNPs from three genes of CYPs associated with LC were selected for analysis in this study based on 1000 genome Chinese Han population; CYP2D6 and CYP20A1 were overexpressed and correlated with prognosis of LC.

K E Y W O R D S
cytochrome P450 (CYP450), genetic polymorphism, lung cancer, susceptibility projects. Each of them met the criteria of the minimum allele frequency (MAF), more than 5%, in the HapMap of the Chinese Han Beijing population. All primers were designed using ASSAY DESIGN SUITE V2.0. (http://agena cx.com/ online-tools , Table 1).

| SNPs genotyping and haplotype analysis
Genomic DNA was extracted from peripheral blood using GoldMag-Mini Whole Blood Genomic DNA Purification Kits (GoldMag Co. Ltd., Xi'an City, China), and quantified with a spectrophotometer (NanoDrop 2000; Thermo Fisher Scientific, Waltham, MA, United States). To have sufficient DNA for further reactions, polymerase chain reaction (PCR) was applied to each sample. Then SAP purification was performed to remove the remaining dNTP and amplified primers in PCR products. Using a MassARRAY Nanodispenser (Agena Bioscience, San Diego, CA), standardized genotyping reactions were dispensed onto a 384-well spectroCHIP. The repeated control samples were set in every genotyping plate and the concordance was more than 99%. The genotyping of these SNPs was carried out on the MassARRAY iPLEX (Agena Bioscience, San Diego, CA) platform using the allele-specific matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF-MS). Genotyping results were output by Agena Bioscience TYPER software, version 4.0. Haploview software package (version 4.2) was used to analyze the linkage disequilibrium (LD), haplotype construction, genetic association at polymorphism loci and haplotype blocks were defined according to the criteria laid out by Gabriel and others. 25

| Statistical analysis
Data were analyzed by using SPSS 18.0 statistical software (SPSS Inc, Chicago, IL) and Microsoft Excel (Microsoft Corp., Redmond, WA). All continuous data are expressed as means ± standard deviations (SDs). Pearson's χ 2 test and t test were used to compare the distribution of categorical variables and continuous variables between the cases and controls respectively.
Supposing the lower frequency alleles were coded as the minor allele. Frequencies of all SNPs in both case and control groups were evaluated for the test of Hardy-Weinberg Equilibrium (HWE). The three genetic models (dominant, recessive and additive) were applied using PLINK software (http://www.cog-genom ics.org/plink 2/) to assess the association of single SNPs with the risk of LC development. The odds ratios (ORs) and 95% confidence intervals (95% CIs) were calculated by using logistic regression analysis and were adjusted for age and gender. To explore the possibility that the effect of a genetic polymorphism in candidate genes may be biologically active only in some specific subgroups, we conducted stratified analyses investigating the effect of genotype within the gender, lymph node status, tumor stage and histological subtypes based on medical reports. Exploratory analyses examining the effect of genetic polymorphisms within the histological subtypes based on pathology reports were also conducted. Statistical significance was identified at P ≤ 0.05 (two-side). Power and Sample Size (PS) Calculation software (http://biost at.mc.vande rbilt.edu/wiki/Main/Power Sampl eSize ) was used to calculate the power of the significant difference. 26 We estimated the association of haplotype with the susceptibility to LC using PLINK software (http:// www.cog-genom ics.org/plink 2/). The ORs and 95% CI were also calculated using unconditional logistic regression analyses adjusted for age and sex.

| Gene expression and survival analysis
Expression and survival data of candidate genes in LC patients were mined using ONCOMINE (https ://www.oncom ine.org/resou rce/login.html) and Kaplan-Meier Plotter (http://kmplot.com/analy sis/) database. The Kaplan-Meier method and Cox regression were performed to construct survival curves and estimate hazard ratios (HRs) to assess the relationship between risk genes expression and prognosis of LC.

| Baseline characteristics
A total of 1014 participants were included in the study, 510 patients with LC in the case group (384 males and 126 females; average age: 58.08 ± 10.55 years) and 504 healthy subjects in the control group (381 males and 123 females; average age: 57.27 ± 10.85 years). Characteristics of patients in the case group and the subjects in the control group included in this study are listed in Table 2, There was no significant difference in distribution of gender and age between the two groups (P = 0.911; 0.227 respectively).

| Linkage between candidate gene polymorphisms and LC
Six SNPs of CYPs were identified. The success ratio was >99.40% for all SNPs. Primary information of the candidate SNPs is shown in Table 3. No significant deviation of genotype frequencies for CYPs from the HWE was found in both groups ( Table 3). The A allele of rs1065852 in CYP2D6 was associated with a 0.35 times increased risk of LC development in the allelic model analysis with power values of 0.937 (adjusted OR = 1.35, 95%CI = 1.13-1.60, P = 9.04e-4).

| Linkage between candidate SNPs and LC development in genetic models and Haplotype analysis
We further conducted logistic regression analysis tests to analyze model associations. For SNP rs1065852 in CYP2D6, the genotype frequency distributions were different between the case group and control group (P = 0.003, CI = 1.13-1.60, P = 0.001, Table 5), with power values of 0.868, 0.997 and 0.879 respectively. We also found that the AG genotype of CYP24A1 rs2762934 was associated with decreased risk of LC development (adjusted OR = 0.71, 95% CI = 0.51-1.00, P = 0.048). No statistically significant difference in the haplotype distributions between the case group and control group was observed for CYP24A1 (P > 0.05, Figure S1, Table S1).

| Linkage between candidate gene polymorphisms and LC in stratification analysis
As shown in were also found in some genetic models of part of subgroups ( Table 6). As shown in Table 7, the stratified analyses showed that the AG genotype of CYP24A1 rs2762934 was associated with decreased LC risk in males (adjusted OR = 0.68; 95%CI = 0.46-0.99, P = 0.046). A similar result was observed in recessive model of males (adjusted OR = 0.68, 95%CI = 0.47-0.99, P = 0.044). We also identified that TC genotype of CYP24A1 rs6068816 has potential effect on reducing the susceptibility to LC of the type of small cell lung cancer (SCLC, adjusted OR = 0.58, 95%CI = 0.36-0.94, P = 0.026, Table 8).  Table 9).
Power calculations confirm that the sample size was large enough to discover the differences among cases and controls in candidate SNPs because the power values were more than 0.8 except in some genetic models in stratified analysis (Table 3-8). However, no significant association was observed between other SNPs and LC in stratification analysis (Tables S2 and S3).

| The expression and prognostic value of candidate genes in LC patients
As shown in Figure 1, we found the expressions of CYP2D6 were significantly up-regulated in large cell lung carcinoma, AC and SCC patients compared with the normal samples (P < 0.05), and CYP24A1 and CYP20A1 were found overexpressed in the AC (P < 0.05). Kaplan-Meier curve and log-rank test analyses revealed that the increased CYP2D6 level and decreased CYP20A1 level were significantly associated with poor overall survival (OS) in all LC patients (HR = 1.42, 95%CI = 1.25-1.62, P = 1.1e-07; HR = 0.72, 95%CI = 0.63-0.82, P = 4.2e-7 respectively, Figure 2). There was no significant association between expression of CYP24A1 and OS of LC (P = 0.098).

| DISCUSSION
Based on the metabolic characteristics of CYPs, we hypothesized that their polymorphisms were related to the risk of LC development. This study validated the potential relationship between four SNPs in three CYPs and risk of LC development. We found that rs2762934 and rs6068816 in CYP24A1 decreased the risk of LC development in males and SCLC respectively, and CYP20A1 rs2043449 was identified as a risk factor of LC development in males, III-IV stage, and SCLC subgroups. The most significant discovery is that the "A" allele and "AA" genotype of CYP2D6 rs1065852 confer risk to LC, especially in the cases of III-IV stage AC, SCC, lymph node positive and males. These results made us assume that the susceptibility to LC may in part be defined by the individual's genetic background of CYPs.
CYP24A1 encodes 24-hydroxylase, the rate-limiting enzyme that catalyzes the inactivation of 1,25(OH)2D3 (1,25-D3), which is considered as a proto-oncogene. 27 High 1,25-D3 levels have antidifferentiation and antiproliferation activities in human LC cell lines. 28 Earlier researches reported that the gene copy number of CYP24A1 is aberrantly amplified in several cancers, 29,30 and spontaneous upregulation of CYP24A1 is a negative prognosticator of survival in lung, breast, ovarian and colon cancer. 31 might promote the progression of colon cancer. 34 Wu et al found that mutated homozygous CYP24A1 rs6068816 was significantly related to the decrease of the risk of non-small cell lung cancer (NSCLC) development among Chinese people. 35 Liu et al found CYP24A1 rs2762934 contributed to the risk of food hypersensitivity and breast cancer. 36,37 In the present study, we found that rs2762934 and rs6068816 in CYP24A1 are protective factors to LC for males and in SCLC respectively. Furthermore, CYP24A1 was significantly upregulated in LC. Nithya Ramnath et al revealed that promoter DNA hypermethylation of CYP24A1 is a key mechanism regulating CYP24A1 expression in LC. 38 CYP24A1 has a promoter region that is rich in CpG islands, and transcriptional silencing of the CYP24A1 gene is caused by promoter hypermethylation that would be conductive to 1,25-D3 antiproliferative effects in LC. Because the amino acid sequence of CYP24A1 is not affected by rs6068816 due to synonymous polymorphisms, the SNPs, located in silencers or enhancers of splicing regions, have an effect on the phenotype of biologic activities by influencing the efficiency of mRNA splicing. The rs2762934 plays a crucial role in intron variant and 3′UTR variant. RNA-binding proteins combined with cisacting elements in the 3'UTR region to regulate protein synthesis by influencing mRNA abundance. 39 Both the variation of 3'UTR sequence and abnormal expression of trans-acting factors can significantly influence the transcription and expression of target genes. A possible reason for the association of rs6068816 and rs2762934 in CYP24A1 with decreased risk of SCLC and LC development in males is the alternation of posttranscription process and dysfunction of the proteins. For all we know, this has been the first clinic study to estimate the relationship between rs2762934 in CYP24A1 and LC susceptibility.
CYP2D6 is a member of the CYP450 superfamily of enzymes involved in the metabolism of therapeutic drugs and is a potential susceptibility factor for certain environmental agent-induced diseases. [40][41][42] It plays an important role particularly in the metabolism of PAH, nicotine and other carcinogens related to LC. To date, there have been studies that have shown that genetic polymorphisms of CYP2D6 increase the susceptibility to numerous cancers. Studies have indicated that polymorphisms of CYP2D6 imposed an increased risk of breast cancer and esophageal squamous cell carcinoma in those people with a family history of cancers. 43, 44 Zienolddiny S et al found that CYP2D6 and CYP1B1 increased genetic susceptibility to NSCLC. 45 In addition, Lee JY et al showed that hydroxychloroquine metabolism was related to CYP2D6 rs1065852 polymorphisms. 46 It has been confirmed that CYP2D6 participates in the metabolism of the tobacco, nitrosamine, nicotine-derived nitrosamine ketone, nicotine, cotinine, as well as the activation of nitrosamine, all of which are common carcinogenetic agents of LC. 47,48 In China, the proportion of smoking and tobacco-attributed mortality is much higher in males than in females. 49 SCC is one of the most common pathological type of smoking-related LC. 50 Therefore, it could be assumed that the significant increased risk of LC in males and SCC patients by rs1065852 may be caused by the accumulation of smoking-related genetic damage. Meanwhile, high-level CYP2D6 was found in SCC and AC, and survival analysis also confirmed the poor prognosis of LC caused by CYP2D6. CYP2D6 rs1065852 is located in the intron region of CYP2D6 gene and involved in intron mutation. Intron is important for functions in RNA stability, regulation of gene expression and alternative splicing. Misregulation of alternative splicing is known contribute to tumorigenesis, 51 and the missense variant of base near the splice site could lead to protein and amino acid change due to aberrant splicing. It might be an assumption that the polymorphism of rs1065852 may be involved in the development of LC by influencing the biological function of gene products and mRNA splicing. As we know, no study has validated the association between CYP2D6 rs1065852 and LC susceptibility, and the present study is the first of its kind to verify the correlation between CYP2D6 rs1065852 and the increased of LC in Asians.
Stratified analysis also revealed significant associations between CYP20A1 rs2043449 and increased risk of LC in T A B L E 5 Analysis of association between candidate SNPs and the risk of lung cancer in genetic model Abbreviations: AC, adenocarcinoma; CI, confidence interval; ORs, odds ratios; SCC, Squamous cell carcinoma; SCLC, small cell lung cancer; TNM, tumor-lymph node-metastasis. males, III-IV stage, and SCLC subgroups. Although high level of CYP20A1 predicts a better prognosis in survival analysis, the conflict between these two outcomes might be due to the sample size, territory and racial differences. Previous studies showed that CYP20A1 is expressed in the human hippocampus and substantia nigra, suggesting its involvement in brain and early development. 52 As far as we know, CYP20A1 was considered as "orphan" CYP with no functional information. 53 Therefore, the mechanism of rs2043449 affecting tumor susceptibility in these subgroups remains unclear; further functional analysis of CYP20A1 in these subgroups may help to clarify the relevant genetic effects of LC pathogenesis.
In this study, we identified four novel loci in three genes that show a significant linkage with LC development, and observed the expression of candidate genes in LC and the relationship between poor prognosis of LC and two genes. Although the results showed strong statistical significance, there are still several potential limitations in the present research. First, LC is a very heterogeneous disease with many other risk factors, and more genes need to be included in follow-up studies. Second, the study is conducted among only in the Chinese Han people in Northwest China, for which further investigations are needed to confirm these associations in other populations. Third, the sample size was not large enough to support some genetic models in stratified analyses. Finally, the smoking data of the samples were not collected, and further study is needed to improve the deficiencies of this research.

| CONCLUSION
In this study, we systematically evaluated the association of candidates genes and LC risk in a case-control study including 510 cases and 504 healthy controls. And finally we found the significant relationship of CYP2D6 rs1065852, CYP20A1 rs2043449, CYP24A1 rs2762934, and CYP24A1 rs6068816 with susceptibility to LC. In addition, we explored the overexpression of candidate genes in LC and estimated the relationship between LC prognosis and genes expression level in survival analysis using Oncomine and Kaplan-Meier Plotter database, which could potentially contribute to elucidate the etiology of LC and be used as diagnostic and prognostic molecular markers for LC in Northwest Chinese Han population.

ACKNOWLEDGMENTS
The authors thank all the researchers and patients who participated in this study.