Association between sex hormones regulation‐related SNP rs12233719 and lung cancer risk among never‐smoking Chinese women

Abstract Background The mechanism of rapidly increased non‐small cell lung cancer (NSCLC) among never‐smoking Chinese women has not been elucidated. Ovarian sex steroid hormones have been suggested to counteract lung cancer development, and sex hormone‐binding globulin (SHBG) is essential in sex hormones regulation. This study aims to exploring single nucleotide polymorphisms (SNPs) in genomic regions associated with SHBG concentrations that contributed to never‐smoking female NSCLC. Methods Candidate genes were selected by a genome‐wide association (GWAS) meta‐analysis and gene expression profiles of never‐smoking NSCLC of Chinese women. The candidate SNPs limited to common minor allele frequency (MAF), missense variant, ethnic heterogeneous distribution, and SNPs were genotyped using the TaqMan method. A two‐stage case‐control design was adopted for exploration and validation of associations between candidate SNPs and risk of NSCLC. All participants were never‐smoking Chinese women. Chi‐square test and multivariate logistic regression were applied. Results Beginning with 12 genomic regions associated with circulating SHBG concentrations and gene expression profiles from never‐smoking NSCLC in Chinese women, candidate SNP rs12233719 and rs7439366 both located in candidate gene UGT2B7, which may be related to circulating SHBG concentrations and cancer risk, were identified. A two‐stage case‐control study was conducted in Shenyang and Tianjin represented as the training stage and validation stage, respectively. Under the dominant model, compared to individuals with the wild G/G genotype, the adjusted OR of those with the T allele was 1.58 (95% CI: 1.15–2.16) in Chinese Shenyang training set, and was 1.49 (95% CI: 1.02–2.18) in Chinese Tianjin validation set, both accompanied with a significant trend relationship consistently. UGT2B7 was upregulated in female NSCLC patients’ tumor tissues and was associated with a poor prognosis in NSCLC. Conclusion Our findings indicated that a sex hormones regulation‐related SNP rs12233719 was associated with never‐smoking female lung cancer risk, which might partially explain NSCLC‐susceptibility in Chinese women.


| INTRODUCTION
Lung cancer is both the most common cancer and the leading cause of death from cancer in China and worldwide. 1,2 Research has shown genetic factors, lifestyle, and environment acting together in the non-small cell lung cancer (NSCLC) development, including genetic variations and alterations, tobacco, air pollution, lung diseases history, nutrition, and occupational exposure to radon, asbestos, and radiation. 3 The incidence of lung cancer in Chinese women with low smoking prevalence is unexplainable high. 4 Previous study reported that compared to an older age at menopause, a younger age had an increased risk of lung cancer. 5,6 Based on another prospective epidemiologic study, among the never-smoking female patients with lung adenocarcinoma, the premenopausal women had poor prognosis, 7 postulating the beneficial effects of ovarian sex steroid hormones. Women have a better survival of lung cancer than men, and recently, both cancer-specific survival and overall survival of late-stage NSCLC in women was reported significantly improved by estrogen monotherapy therapy. A meta-analysis including five prospective cohorts found a protective role of hormone therapy use in prognosis of lung cancer, and the pooled hazard ratio (HR) of the mortality rate was 0.80 (95% CI: 0.69-0.92). 8 Accumulated evidence has suggested that a potential role of ovarian sex steroid hormones in female NSCLC.
Sex hormone-binding globulin (SHBG) is a glycoprotein, having the capabilities to bind 17 beta-hydroxysteroid hormones with high affinity, including testosterone and estradiol. And its concentration is able to regulate the bound-hormone and free-hormone states, playing as a transport carrier and regulating biological activities of sex hormones. 9 Previously study has identified genetic variations of SHBG contributed to hormone-sensitive cancer risk, 10 such as prostate 11 and breast. 12 While the study about the impact of the genetic variation on SHBG concentrations and never-smoking female NSCLC risk has not been evaluated before. We aimed to fill this gap by conducting a two-stage case-control study to explore never-smoking female NSCLC-susceptibility and 12 genomic regions associated with plasma SHBG concentrations, including 1p13. 3

| Study design
In this study, we adopted a two-stage case-control design 14 in Chinese Shenyang and Tianjin, respectively. In stage 1, we aimed to obtain a crude measure of association between candidate SNPs and lung cancer risk. For stage 2, by conducting a propensity score matching, the association between SNP and lung cancer risk was validated in an independent validation population adjusted for other potential confounders.

| Study subjects
In stage 1, Chinese Shenyang training stage, including 417 cases and 368 controls, was recruited from April 2011 to July 2015 in Shenyang, China. Inclusion criteria of cases were never-smoking female who aged over 18 years, newly diagnosed NSCLC patients with histologically confirmation, without radiotherapy and chemotherapy. Inclusion criteria of controls were recruited from residents who lived in the same or nearby communities, healthy participants without cancer. For stage 2, Chinese Tianjin validation stage, was conducted at Tianjin Medical University Cancer Hospital, including 282 cases and 282 controls, as well as approved by ethical review. The recruitment of research participants was the same as mentioned above from January 2006 to May 2011. All subjects were genetically unrelated to the Han ethnic Chinese. Demographic characteristics were collected by structured questionnaires through an in-person interview, and a peripheral blood sample (10 ml) was required for each participant collected by an ethylenediaminetetraacetic acid (EDTA) vacutainer tube. This study was approved by the Medical Ethics Committees of Human Studies at Tianjin Medical University Cancer Hospital, and written informed consent was signed by each participant.

| RNA extraction and transcriptome sequencing
Eleven never-smoking female NSCLC patients with paired tumor and adjacent tissues were used for RNA extraction and transcriptome sequencing. Differentially expressed were Conclusion: Our findings indicated that a sex hormones regulation-related SNP rs12233719 was associated with never-smoking female lung cancer risk, which might partially explain NSCLC-susceptibility in Chinese women.

K E Y W O R D S
Chinese women, lung cancer, never-smoking, sex hormones regulation, SNP identified in the cuffdiff 2.2.1 based on the criterion of being significantly alteration (False discovery rate p < 0.05 and absolution of fold changes >1.5). Sequencing data were processed in general pipelines. 15

| SNP selection and genotyping
Based on the information of NCBI dbSNP and 1000 genomes project, 16 SNPs were selected committed to the following criteria: (a) located in 12 genomic regions associated with plasma SHBG concentrations; (b) located in the genes differentially expressed in tumor and adjacent tissues in never-smoking female NSCLC; (c) located in the genes with function of related to SHBG concentration as well as sex hormones regulation; (d) SNP function class limited to missense; (e) MAF>0.05 in CHB according to 1000 genomes project; (f) ethnic heterogeneous distribution between Chinese and Caucasian. The genotyping of the selected genetic polymorphisms was using TaqMan method reported previously. 17

| Bioinformatics analysis
Bioinformatics analysis was performed from publicly available data sets. The combined data from GSE32863 and GSE37764 after batch normalization was used to explore UGT 2 B 7 expression level in tumor and paired adjacent normal samples. GSE32863 was a gene expression profiling data set of paired samples from lung adenocarcinoma patients, including 16 paired samples of Asian never-smoking women. GSE37764 contained a high-throughput multidimensional sequencing study data of primary non-small cell lung adenocarcinoma tumors and adjacent normal tissues of six neversmoking Korean female patients. Meanwhile, GSE11969 and GSE13213 was obtained from the Gene Expression Omnibus (GEO) database which contained 17 and 7 never-smoking female Japanese adenocarcinoma patients with EGFR mutant, respectively. The combined data from GSE11969 and GSE13213 after batch normalization was used to explore the relationship of UGT 2 B 7 expression and diagnosis of lung cancer.

| Statistical analysis
Comparisons between cases and controls for continuous variables with a normal distribution (presented as the mean ± SD) were conducted by using the independent sample t-test or Mann-Whitney U test, as appropriate. Propensity score matching (PSM) analysis was conducted by using exact match method, and standard mean difference was used to examine the balance of covariate distribution between cases and controls. Hardy-Weinberg equilibrium was calculated in each control data sample. Odds ratio (OR) and corresponding 95% confidence intervals (95% CI) were calculated to evaluate the associations between SNPs and NSCLC risk. To explore the independent effect of SNPs on the risk of NSCLC, we adjusted for potential confounding including age, family history of cancer by using multivariable conditional logistic regression model. HR was calculated to evaluate the associations between UGT 2 B 7 expression and diagnosis of never-smoking female NSCLC. Statistically significant was defined as a two-sided p-value < 0.05. SAS version 8.2 ((SAS Institute Inc.), R software (version 4.0.2), and GraphPad Prism 6.0 (GraphPad Software Inc.) were used in the study analysis. Figure 1 shows the diagram of how to select candidate genes. Beginning with 12 genomic regions associated with circulating SHBG concentrations, 465 genes were identified. Meanwhile, results from the transcriptome sequencing analysis from never-smoking Chinese female NSCLC patients shown that UGT 2 B 7 mRNA was upregulated in tumor tissues (data were not shown), 4098 genes were identified. Finally, 91 candidate genes that may be related to circulating SHBG concentrations and cancer risk were selected. Considering the gene function, only UGT 2 B 7 was included, which was related to SHBG concentrations and sex hormones regulation by reported study (Figure 1).

| Selection of candidate SNPs
Beginning with candidate gene UGT 2 B 7 , combined with information from 1000 genomes, Figure 1B shows the diagram of how to select candidate SNPs. Finally, two SNPs, rs12233719 and rs7439366 both committed to common missense variants, were identified in our study (Figure 2). Hardy-Weinberg equilibrium test is shown in Table S1.

| Association between rs12233719 and never-smoking NSCLC risk in Chinese women
We conducted a two-stage case-control study in Chinese Shenyang (417 cases and 368 controls) and Tianjin (282 cases and 282 controls) as training and validation stage, respectively. Propensity score matching (PSM) analysis was conducted to balance the distribution of age in Tianjin validation set (Table S2). Under the dominant model, compared

F I G U R E 2
The location information of candidate gene and SNPs. UGT 2 B 7 was located in 4q13.2, while SNPs rs12233719 and rs7439366 were annotated by red and blue tags, respectively to individuals with the wild G/G genotype of rs12233719, the adjusted odds ratio (OR) of those with the T allele was 1.58 (95% CI: 1.15-2.16) in Chinese Shenyang training set, and was 1.49 (95% CI: 1.02-2.18) in Chinese Tianjin validation set. The trend analysis showed a significant relationship in both data sets (Table 1). In the meta-analysis of combined training and validation set, the T allele of rs12233719 was associated with increased never-smoking lung cancer risk in Chinese women, and the combined OR of the dominant model was 1.54 (95% CI: 1.21-1.96) (Figure 3). For rs7439366, no significant association was found (Table S3).

| UGT 2 B 7 mRNA expression in NSCLC
In the batch normalized combined data set of GSE32863 and GSE37764, compared to adjacent normal tissues, UGT 2 B 7 was high expressed in tumor tissues (p = 0.012) ( Figure 4A). This result was consistent with our transcriptome sequencing analysis from 11 paired never-smoking female NSCLC samples (data were not shown). Then, the relationship between UGT 2 B 7 expression and diagnosis was explored in combined data from GSE11969 and GSE13213 after batch normalization. Comparing to low UGT 2 B 7 expression, patients with high UGT 2 B 7 expression had significant poor overall survival among never-smoking female Asian lung adenocarcinoma patients with EGFR mutation (HR = 4.80, 95% CI = 1.18-14.21, p = 0.027) ( Figure 4B).

| DISCUSSION
In this study, by conducting a two-stage case-control study, we explored SNPs in genomic regions that were associated with SHBG concentrations contributed to never-smoking female NSCLC. Our findings indicated that sex hormones regulation-related SNP rs12233719 was associated with NSCLC risk among never-smoking Chinese women, which T A B L E 1 Association between rs12233719 (G>T) and never-smoking NSCLC risk in women of Chinese Shenyang training set and Chinese Tianjin validation set might partially explain NSCLC-susceptibility in Chinese women.
In this study, rs12233719 and rs7439366 both located in UGT 2 B 7 at locus 4q13.2 were identified as the candidate gene and SNPs. The UGT 2 B 7 gene is located on chromosome 4q13.2 with the length of 16 kb, and is widely distributed in the breast, lung, kidney, and intestine, which is considered to be highly polymorphic gene. 18 UGT 2 B 7 consists of six exons and five introns and can code for 529 amino acid residues. UGT 2 B 7 was a well-known pharmacogene belonging to the uridine diphosphate glucuronyltransferase (UGT) gene family, 19 which played an essential role in estrogen regulation, and its enzymatical activity was found changed by genetic variants. 20 Notably, UGT 2 B 7 had unique specificity for estrogens and catechol estrogens, including estradiol, estriol, 4-OH-estrone, and 4-OH-estradiol, serving a significant role in the elimination of exogenous and endogenous estrogens. 21 While genetic variation in UGT 2 B 7 was recently reported to increase SHBG levels in premenopausal women with oral contraceptive use. 22 Many studies have revealed associations Our study first identified SNP rs12233719 G>T polymorphism being a risk factor of never-smoking female NSCLC, which could partially explain the NSCLC-susceptibility to never-smokers in Chinese women. In the coding region of the UGT 2 B 7 , rs12233719 was another missense mutation at position 211 (G211T). A G>T transversion at position G211T was associated with an amino acid change from Ala71 to Ser71, which resulted in a change from a lipophilic residue to a hydrophilic residue. 19,23 However, differ from previous studies that reported the not any statistically significant association between UGT 2 B 7 rs12233719 SNP in all genetic models and breast cancer risk, 24 our study found that SNP rs12233719 G>T polymorphism being a risk factor of never-smoking female NSCLC. A possible explanation might be that rs12233719 is a non-synonymous SNP and substitutes amino acid 71 from alanine to serine and alters the physical and chemical properties of this position, 25 suggesting that polymorphism probably affect the expression and enzymatic activity of UGT 2 B 7 . While the mechanism of rs12233719 impact on NSCLC was unknown, more functional experiments and further studies are urgent to elucidate the underlying linkage. Besides, according to the source of biological samples and the population of SNPs selection, the findings of our study were limited to Chinese female population, which should be interpreted cautiously.
We found rs7439366 was not associated with neversmoking female NSCLC risk in our study. Polymorphism of rs7439366 is a common missense located in exon 2 of UGT 2 B 7 , which may rise the enzymes with either histidine (H) or tyrosine (Y) at amino acid 268. 26 Rs7439366 with a T to C transversion at nucleotide 802, according to Tyr to His conversion at residue 268. The C alleles were 0.489 and 0.732 in Caucasians and Japanese, respectively. 27 Many previous studies evaluated the role of UGT 2 B 7 rs7439366 SNP on cancer risk. To explore the role of UGT 2 B 7 SNPs in cancer susceptibility, a meta-analysis study pooled all eligible studies and found no significant association was observed in all genetic models. Further subgroup analysis found that rs7439366 was associated with colorectal cancer risk. 28 Consideration should also be given to potential limitations in this study. First, we failed to perform vivo or vitro experiment to confirm the relationship between rs12233719 and SHBG concentrations. Second, we only adjusted a few potential confounders in the multivariable logistics regression model, which may lead to bias in an unpredictable direction. Potential residual confounders within the individual participants may affect the estimate, such as menopause status. Menopausal status is closely correlated with women's lung cancer risk, but we were not able to assess this because of the limited data source. Thus, larger studies, especially prospective studies, are warranted in the future. Third, we failed prospectively estimated the sample size for two-stage case-control study, as well as the transcriptome sequencing experiment. Due to the retrospective characteristics of casecontrol study design, we conducted the exploration and validation in a two-stage case-control study for screening SNPs without prospective sample size estimation. Meanwhile, the transcriptome sequencing in 11 paired NSCLC patients and controls samples for screening differential genes in nonsmoking female NSCLC was limited several years ago due to limited clinical samples. Besides, it was a preliminary exploration, as well as lack of proper reference data for sample size estimation. Therefore, the results should be interpreted with caution. In the next step, we will try to expand the sample size based on this study.

| CONCLUSIONS
In summary, we found that rs12233179, involved in sex hormones regulation, was associated with NSCLC risk among never-smoking Chinese women. Our finding suggests that the SNP rs12233179 in sex hormones regulation may be significant in Chinese female NSCLC, and provides new insight into the possible role of ovarian sex steroid hormones in lung cancer.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
This study was approved by the Medical Ethics Committees of Human Studies at China Medical University, and written information consent was signed by each participant. All participants provided written informed consent.