Effect of the thymine‐DNA glycosylase rs4135050 variant on Saudi smoker population

Abstract Background Thymine‐DNA glycosylase (TDG) is an essential DNA‐repair enzyme which works in both epigenetic regulation and genome maintenance. It is also responsible for efficient correction of multiple endogenous DNA lesions which occur commonly in mammalian genomes. Research of genetic variants such as SNPs, resulting in disease, is predicted to yield clinical advancements through the identification of sensitive genetic markers and the development of disease prevention and therapy. To that end, the main objective of the present study is to identify the possible interactions between cigarette smoking and the rs4135050 variant of the TDG gene, situated in the intron position, among Saudi individuals. Methods TDG rs4135050 (A/T) was investigated by genotyping 239, and 235 blood specimens were obtained from nonsmokers and smokers of cigarette respectively. Results T allele frequency was found which showed a significant protective effect on Saudi male smokers (OR = 0.64, p = 0.0187) compared to nonsmoking subjects, but not in female smokers. Furthermore, smokers aged less than 29 years, the AT and AT+TT genotypes decreased more than four times the risk of initiation of smoking related‐diseases compare to the ancestral AA homozygous genotype. Paradoxically, the AT (OR = 3.88, p = 0.0169) and AT+TT (OR = 2.86, p = 0.0420) genotypes were present at a higher frequency in smoking patients aged more than 29 years as compared to nonsmokers at the same ages. Conclusion Depending on the gender and age of patients, TDG rs4135050 may provide a novel biomarker for the early diagnosis and prevention of several diseases caused by cigarette smoking.

Puri, 2014), breast tumor (Verde et al., 2016), cardiovascular diseases, and asthma (Kovacs et al., 2012). Somatic mutations may be found in nonmalignant tissues caused by CS (Boran et al., 2017). A recent study reported a strong indication that CS is the key factor of genomic instability and heterogeneity, which may lead to the initiation of diverse types of cancer, such as lung, bladder, and colorectal cancers (Kytola et al., 2017). Furthermore, CS may lead to the development of gene mutations which occurs in cell cycle-control p53 gene (TP53 in humans), which is a major cause of cancer development risk among various ethnic populations (Gibbons, Byers, & Kurie, 2014;Kytola et al., 2017;Liu et al., 2014;Wu et al., 2015).
Numerous toxic compounds are present in cigarettes, including reactive oxygen species which may damage DNA, leading to susceptibility to all types of cancers (Kovacs et al., 2012;Pryor, 1997). DNA repair genes have a fundamental role in maintaining genome integrity by repairing the damaged DNA nucleotides (Sjolund et al., 2014) caused by CS through various DNA repair processes, including the nucleotide excision repair mechanism, the base excision repair (BER) mechanism, and the mismatch repair mechanism (Christmann, Tomicic, Roos, & Kaina, 2003;Yu, Chen, Ford, Brackley, & Glickman, 1999). The BER process repairs DNA damage caused by endogenous and environmental agents. The BER pathway is generally activated via DNA glycosylase enzymes which recognize and excise the mismatched and/or damaged nucleotides (Da, Shi, Ning, & Yu, 2018;Sjolund et al., 2014;Sjolund, Senejani, & Sweasy, 2013). Thymine DNA glycosylase (TDG) begins the BER pathway by cleaving the N-glycosidic bond between the targeted DNA base and the deoxyribose sugar . The TDG gene in humans is situated on chromosome 12q24.1 and contains 10 exons, having a protein length of 410 amino acids (Cortazar, Kunz, Saito, Steinacher, & Schar, 2007). TDG is well known for its ability to catalyze the deletion of uracil and thymine combined with guanine (Sjolund et al., 2014). It is an essential DNA-repair enzyme which functions in both epigenetic regulation and genome maintenance (Dodd, Yan, Kossmann, Martin, & Ivanov, 2018). It is also responsible for the efficient correction of multiple endogenous DNA lesions which commonly occur in mammalian genomes (Sjolund et al., 2014). DNA repair is a fundamental process in maintaining the genomic stability of the human genome. Abnormal activity of this process may lead to cancer susceptibility (Huang et al., 2015). Genomic instability caused by DNA lesions may contribute to the inefficiency of DNA repair genes (Paz et al., 2017). Therefore, studying genetic variants such as SNPs and their causes of leading to diseases will likely result in clinical advancements, through the identification of sensitive genetic markers and the development of disease prevention and therapy (Saenko & Rogounovitch, 2018). Although SNPs' role in leading to diseases development, is not thoroughly understood. They have been widely detected in multiple diseases (Bonassi et al., 2005). Shedding light on the role of SNPs in disease pathogenesis in genes located in the BER pathway will hold particular value, as the BER mechanism is sensitive to diverse endogenous and exogenous factors which may be considered a biomarker of DNA damage (Huang et al., 2015). However, no previous studies evaluated the effect of rs4135050 polymorphic variants on molecular activity of TDG, and subsequent functioning of BER system. Genetic polymorphisms in DNA repair genes such as TDG have been cited as a major influence in developing various types of cancer due to repair genes' contribution to the modification and alteration of gene functions (de Boer, 2002;Xi, Jones, & Mohrenweiser, 2004). One study described the TDG rs4135113 SNP as ordinarily heterozygous with a minor allele frequency of 10%, commonly detected in African and East Asian individuals (Maiti, Morgan, Pozharski, & Drohat, 2008). This SNP may drive tumorigenesis (Sjolund et al., 2014) and is also associated with esophageal squamous cell carcinoma in the Chinese population (Li et al., 2013).
The main objective of the present study is to identify the possible correlation between CS and genetic polymorphism in TDG rs4135050 among Saudi individuals. As a potential biomarker, this has practical applications not only in the diagnosis of diseases associated with the CS but also in the prevention of CS effects on healthy individuals.

| Ethical compliance
The study was conducted and approved by an ethical committee of Applied Medical Sciences College, King Saud University (KSU), Riyadh, Kingdom of Saudi Arabia (KSA), (ethical approval reference number CAMS 13/3536).

| Specimen collection from participants
The participants were 474 Saudi men and women who visited Aleman Public Hospital in Riyadh, in the Kingdom of Saudi Arabia (KSA) from January 2016 to January 2018. Among them, 239 were nonsmokers, and the other 235 were smokers of cigarette whose ethnicity and age matched the nonsmokers'. The smokers and nonsmokers were interviewed via a self-completed questionnaire about smoking frequency, smoking status, age, gender, and family history. Self-reported CS history and medical history, including allergy symptoms and disease, were also obtained from the questionnaire. All procedures were performed according to | 3 of 12 ALMUTAIRI eT AL. ethical standards. Exclusion criteria included a history of any kind of inflammatory and/or chronic respiratory disease and family history of cancer. Blood specimens were collected from both groups for genotyping of the TDG gene. A detailed description of the study subjects' general characteristics is given in Table 1.

| Genomic DNA isolation from blood samples
First, 3-ml blood specimens were taken from the subjects in tubes, containing an anticoagulant substance such as EDTA (EDTA-coated tubes). Next, genomic DNA was immediately purified from peripheral lymphocytes (200 μl) with the DNA Blood Mini Kit (QIAGEN) according to standard procedures. The purified DNA samples were then preserved at −80°C until molecular analyses were performed. Finally, a spectrophotometer (Nano Drop 8000, Thermo Fisher Scientific) was used to measure the concentration and purity of the isolated DNA. If the A260/A280 ratio of the purified DNA sample was not between 1.7 and 2.0, the isolated DNA was deemed contaminated and excluded from the study.

| TDG SNP selection and genotyping
Before the genotyping assay began, 10 ng of genomic DNA blood specimens were prepared. TDG (Gen Bank reference sequence; NC_000012.12, accession number; NC_000012, and region number; 103965,815...103988878) SNP rs4135050 (A/T) was evaluated and selected from the NCBI database (http://www.ncbi.nlm.nih.gov/snp) based on its location, allele frequency, and role in diseases relevance among diverse ethnic groups. Each sample was genotyped in a 10-μl reaction using TaqMan assay. A 10-μl reaction comprised the following components: DNA template (2 μl), 40 × TaqMan ® Genotyping SNP Assay (0.2 µl) (Applied Biosystems), TaqMan ® Genotyping Master Mix (5.3 µl) (Applied Biosystems), and DNase-free water (2.5 μl). A negative control was performed by DNA substitution with the equivalent DNase-free water volume. The DNA was amplified in 96-well plates under the following PCR cycle conditions: an initial denaturation stage at 95°C for 5 min, followed by a PCR stage of 95°C for 30 s (denaturation) repeated for 40 cycles, 60°C for 30 s (annealing), 72°C for 30 s (extension), and a final extension stage at 72°C for 5 min. PCR was accomplished with a Quant Studio™ 7 Flex Realtime PCR System (Applied Biosystems) with sequence-detection software for data analyses.

| Statistical methods
All statistical methods employed the Statistical Package for the Social Sciences (SPSS) software (version 16.0, SPSS) and Microsoft Excel. Hardy-Weinberg Equilibrium (HWE) of the SNP genotype distributions in the smoking and nonsmoking groups were assessed by the chi-squared test. Allele and genotype prevalence were contrasted between the groups by using both the chi-squared test and Fisher's exact test. Multiple logistic regression analyses were used to determine the odd ratios (ORs), and 95% confidence intervals (CIs) were obtained to examine the correlation strength between TDG SNP and CS. Statistical significance was defined as a probability value of p < 0.05. Table 1 describes the basic features of the smoking and nonsmoking participants. The population comprised a total of 474 Saudi individuals; 235 male and female CS patients, and 239 male and female nonsmoking controls. No significant differences were found between the two categories in age, gender, and smoking characteristics (see Table 1). In fact, the mean ages of the nonsmoking control group and the smoking

Number of participants 239 235
Age (mean ± SD) 27.9 ± 8.63 28.5 ± 5.36 Age ( patients were almost equal (27.9 ± 8.63 and 28.5 ± 5.36, respectively). The smoking class was divided into those who had smoked cigarettes for more than 5 years (40% of the group) and those who had smoked cigarettes for 5 years or less (60%). The smoking group was further separated into two categories, based on the average daily number of cigarettes smoked. The categories were; smokers who smoked more than 10 cigarettes a day (53%) and those who smoked fewer than 10 cigarettes a day (47%, Table 1). CS within the family was reported in 63% of the smoking group and in 33% of the nonsmoking group. Finally, the percentage of smoking patients who had stopped CS for a period and then, started again was 66%, with parent consanguineous of 40% in tobacco users and 38% in nonsmokers (Table 1).

| Genetic variation in TDG SNP rs4135050 with CS
The relation between genetic polymorphism rs4135050 (A/T) of the TDG gene and CS among the smoking and nonsmoking groups of the Saudi Arabian population was investigated by using HWE. The TDG SNP is present in the intron region. Table 2 shows the phenotypic and genotypic distributions of TDG SNP rs4135050 among the smokers and the nonsmoking control group. In this ethnic population, a reference allele of the homozygous ancestral allele was identified to detect the potential associated CS risk. No significant correlations were observed between any smoking behavior and the selected TDG SNPs. The genotypic allocation of this SNP was 9% AA, 36% AT, and 55% TT in smoking patients and 8% AA, 31% AT, and 61% TT in the nonsmoking control group. The T allele showed no difference in the wild-type A allele frequency between the two subject groups. The T allele was distributed at 73% and 77% among the smokers and the controls, respectively, compared to the A allele reference distribution of 27% in the smokers and 23% in the controls ( Table 2). The narrower our confidence interval and the more accurate our results are the more powerful our statistical tests.

| Frequencies of TDG SNP rs4135050 according to smoking duration
As shown in Table 3, the study population was distributed into 93 long-term smokers (>5 years) and 142 short-term smokers (≤5 years) to investigate any association between the selected TDG SNP and duration of smoking (Table 3A,B). The investigation of the allele and genotype frequencies for rs4135050 SNP did not present any relationship with CS in long-or short-term smokers when compared to non-smokers (Table 3A,B). The genotype frequency was distributed into the following categories of nonsmokers: 8% AA, 31% AT, and 61% TT. However, in long-term and short-term smokers, genetic frequencies were 8% and 11% AA, 38% and 36% AT, and 54% and 53% TT, respectively. The allelic allocations of this SNP were 23% A and 77% T in those who had never smoked. Conversely, in long-term smokers, allelic distributions were 27% A and 73% T, while they were 29% A and 71% T in short-term smokers (Table 3A,B).

| Relationship between TDG SNP rs4135050 and daily CS
As presented in Table 3C,D, the smoking study subjects were divided into heavy smokers (≥10 cigarettes per day, 125 subjects) and moderate smokers (<10 cigarettes per day, 110 subjects) in order to investigate the relationship between TDG polymorphism allelic, genetic differences and the daily rate of cigarette consumption. The SNP analyses of the TDG gene did not display any statistically significant relationship among the heavy or moderate smokers as compared to the nonsmoking group (Table 3C,D). For example, genotypic allocations were 10% and 9% for the AA reference allele, 33% and 38% for heterozygous AT, and 57% and 53% for double mutant TT in heavy and moderate smokers respectively. This is in contrast to 8% AA, 31% AT, and 61% TT in the control group ( smoker class, and 23% A and 77% T alleles in nonsmoking individuals (Table 3C,D).

| Association between TDG SNP rs4135050 and gender in smokers
The results supported a correlation between smoker patient gender and polymorphism rs4135050 in the TDG gene. The prevalence of allele and genotype frequencies of the TDG SNP observed in smoker patients and the nonsmoking control according to gender is described in Table 3E,F. Notably, the T allele presents a significant correlation with a protective effect of CS among male smoking patients (OR = 0.64, CI = 0.4409-0.9299, p = 0.0187). However, there is no significant association with the T allele among female smoking patients (Table 3F). By contrast, no correlation was observed between the genotypic frequency of the TDG SNP and CS in either gender. The genotypic allocations of the selected SNP were 10% and 4% for the AA reference allele, 36% and 32% for heterozygous AT, and 54% and 64% for double-mutant TT in male and female smoking subjects, respectively. In the control group, these values were 5% and 9% for AA, 29% and 35% for AT, and 66% and 56% for TT in male and female populations, respectively (Table 3E,F).

| TDG SNP rs4135050 correlation with smoking patient age and other clinical characteristics
One of the most important questions investigated in this work was whether the tested TDG SNP had any links with the age of CS patients in phenotype and genotype variations.
To determine this, the nonsmoking and smoking patients were categorized by age, with 92 smokers and 70 nonsmokers aged 29 and older and 143 smokers and 169 nonsmokers below age 29 (Table 1) Table 3G). Among CS subjects under 29 years, the AT and AT+TT genotypes of the analyzed SNP show a nearly 4-fold increase of risk of developing diseases linked to CS and a 3-fold increase of risk over that of the AA homozygous allele, respectively (AT: OR = 3.88, CI = 1.2295-12.2596, p = 0.0169; AT+TT: OR = 2.86. CI = 1.0037-8.1219, p = 0.0420; Table 3H). The T allele allocation has no relationship with smoking effects in younger smokers (˂29) as compared to the control population (Table 3H). Finally, a correlation was sought between SNP rs4135050 of the TDG gene and certain clinical characteristics not previously examined; quitting smoking, family smoking history, and parents consanguineous. The resulting analyses show no connection in genotype and allele variations between groups of CS patients and nonsmoking subjects in these characteristics (Table S1).

T A B L E 3 (Continued)
| 7 of 12 ALMUTAIRI eT AL.

| Comparison of the allele distribution of TDG rs4135050 between KSA and other populations
A comparison was made between the allele variation of TDG rs413505 in the study population (Saudi Arabian) and that in other populations available in the International Hap Map project study groups (http://hapmap.ncbi.nlm.nih.gov/). The results reveal that the allelic variation for TDG rs413505 is clearly different between the Saudi Arabian population and the populations from the Hap Map project. TDG rs413505 presents a similar allele distribution in our study population which is found in two international Hap Map project populations; CEU, and YRI (Table 4). The allele frequency of this SNP is significantly different in the KSA population, used in this study than in the two international populations, HCB and JPT (Table 4).

| Linkage disequilibrium
One challenge of the current study was to investigate the real association between TDG rs4135050 and a set of SNPs with varying degrees of association due to local linkage disequilibrium (LD) patterns. Genomic regions were visually inspected to determine the extent of their association signal and position, relative to nearby TDG rs4135050 (Figure 1). The analysis reveals that most of the SNP marker combinations exhibited perfect LD scores and show a differential pattern of high LD scores. Figure 1 shows various loci, found very close to the SNP 4135050, with r 2 values more than 0.8 and up to 1 (Figure1c).

| DISCUSSION
Smoking is a leading cause of 80% of lung cancer, and it also increases the risk of chronic periodontitis (Shereef, Sanara, Karuppanan, Noorudeen, & Joseph, 2015), cancer cell invasion, and metastasis (Liao, Yong, & Hua, 2018). It has also been demonstrated that tobacco smoking causes the development of multiple autoimmune diseases; allergies, chronic pulmonary and vascular, and cancers (Qiu et al., 2017). In recent years, CS has become a major public health issue in the KSA among adolescents (Algorinees et al., 2016), and smokeless tobacco consumption appears to have potential risk factors contributing to oral cancer (Alharbi & Quadri, 2018). The damaging impacts of CS are attributed to the numerous chemical components of cigarettes, such as nicotine and carbon monoxide (Qiu et al., 2017). CS generates DNA damage, leading to mutations and potentially changing the immune microenvironment, which contributes to smoking-related immune dysfunction (Desrichard et al., 2018). Most DNA mutation, if not repaired, may lead to genetic instability; DNA repair pathways play an essential role in preventing carcinogenesis and maintaining DNA integrity (Kiyohara, Takayama, & Nakanishi, 2006). Genes in DNA repair pathways are vital in protecting DNA from multiple types of damage initiated by tobacco's chemical carcinogens (Hoeijmakers, 2001). Genetic variations, such as SNPs of DNA repair genes, modify DNA repair efficiency by changing protein function and therefore increase the risk for various cancers (de Boer, 2002;Xi et al., 2004), such as chronic pulmonary disease and lung malignancy (Arimilli, Schmidt, Damratoski, & Prasad, 2017;Kheradmand et al., 2017). Among DNA repair genes, the TDG gene was identified as the first mismatch-specific enzyme playing a key role in recognizing and correcting a variety of damaged and/or mismatched nucleotides (Cortazar et al., 2007;. All data support the hypothesis that TDG genetic polymorphisms and tobacco smoking may lead to the development of smoking-related diseases. The main goal of this study was to investigate the potential role of associations between the genetic polymorphism rs4135050 of the TDG gene and CS, using samples from cases and controls in a Saudi Arabian population to detect a genetic marker that could be beneficial to decreasing the risks of disease caused by CS smoking among healthy individuals. A literature review revealed no prior work assessing the relationship between genetic variation of the TDG gene and CS effects. The present study focuses on investigating the allocations of TDG SNP rs4135050 in genomic DNA isolated from the peripheral blood cells of cigarette smokers and nonsmoking subjects. No significant relations between the TDG gene polymorphism tested here and smoking behavior were found in the study population. In addition, no genetic and allelic differences were detected between this SNP and the smoking patients in terms, duration of CS, or daily rate of CS among Saudi smokers as compared to the control subjects. The results suggest that the TDG expression profile is not influenced by the TDG SNP tested here, possibly owing to intron localization of the analyzed polymorphism. Therefore, further studies on the TDG SNPs located in other positions on the gene are strongly recommended, particularly of those SNPs located in the regulatory regions, such as the promoter and exon regions. Specific DNA sequences located in the intron positions, termed the cis-regulatory elements, may participate in the transcription regulation of gene expression (Jeziorska, Jordan, & Vance, 2009). Similar results were found in previous studies of the TDG, using other polymorphisms in cancer disease among other populations. These polymorphisms were TDG SNP rs4135113, which is unrelated to the risk of skin cancer (Ruczinski et al., 2012), rectal cancer (Curtin et al., 2011), and lung cancer (Krzesniak, Butkiewicz, Samojedny, Chorazy, & Rusin, 2004) and TDG SNP rs2888805, which was reported to have no significant correlation with lung cancer risk (Krzesniak et al., 2004). The present study shows that the smoker's gender plays a major role in the genetic allocations of the TDG gene for rs4135050; in male patients, the T allele for rs413505 presents a significant effect in the prevention of diseases caused by CS. Conversely, in the female population, this SNP appears to increase the protection from CS-related disease, but not at a statistically significant rate. It should be noted that the number of female samples was insufficient to identify any statistically significant correlation which may exist for this polymorphism in the TDG gene. This limitation was caused by the social traditions in the KSA. Therefore, further study is necessary to confirm these results. The findings of opposite effects of CS by gender are supported by other recent studies. However, it has been found that smoking presents a stronger risk susceptibility of specific cancers in women than in men. Anderson, Moezardalan, Messina, Latreille, and Shaw (2011) has clearly documented that in women, CS can considerably increase the risk for advanced colorectal neoplasia after as little as 10 pack-years of smoking, whereas it takes 30 or more pack-years for men (Anderson et al., 2011). This effect is closely related to the effect of CS on sex hormones and seems to vary by menopausal status.
Smokers have higher progesterone (Duskova et al., 2012), higher testosterone (Cupisti et al., 2010;Duskova et al., 2012), and lower estrogen levels (Duskova et al., 2012;Gu et al., 2013). The mediating effect of smoking on sex hormones and the subsequent risk of chronic disease, including both cancer and cardiovascular health problems, have attracted growing interest from researchers in recent decades (Benson, Green, Pirie, & Beral, 2010). In a recent study using a sample of nearly 80,000 postmenopausal women, Luo et al. observed an increased risk of breast cancer by 9% and 16% in former smokers and current smokers, respectively, as compared to nonsmokers (Luo et al., 2011). The total number of Saudi smokers among adolescent male increased between 2001(Al-Bedah, Qureshi, Al-Guhaimani, & Dukhan, 2011. The present study shows that the age of smokers significantly (p < 0.05) affects the relationship between cigarette consumption and the TDG gene for rs4135050. In individuals over 29 years old, the distribution of the AT genotype has protective effects, whereas it has harmful effects in the population under 29 years old. A recent study reveals that the median age of the Saudi population is 30.2 years ("General Authority of Statistics, Kingdome of Saudi Arabia," 2018), and persons of this age-especially males-consume more tobacco products at high amounts since the cost of a pack of 20 cigarettes does not exceed US$2.50. Several epidemiological studies suggest that younger smokers are at greater risk of developing lung cancer and that smoking is more harmful for this age category. Earlier smoking cessation in young adults may bring about greater benefits than in older adults. These results suggest that smoking prevention in young adults should be taken seriously.

| CONCLUSION
The study results show that rs4135050 SNP has a protective effect on older males which could help to inhibit any potential risk of developing smoking-related disease. In smokers aged less than 29 years, the genotype distribution of this polymorphism presents an increased risk of developing diseases related to CS. Thus, a novel biomarker may exist for the early diagnosis and prevention of several diseases caused by CS in this sub-category of the population. Further research with sufficiently larger samples, functional analysis, and using various other populations is recommended to verify these findings and to examine the relation between genetic variation of TDG and the effects of smoking.
F I G U R E 1 Linkage disequilibrium (LD) plot generated by using interactivity explore Proxy and putatively functional variants (https://ldlink. nci.nih.gov) for the SNP rs4135050. (a) Proxies SNPs for thymine-DNA glycosylase (TDG) rs 4135050. (b) Proxies genes in chromosome 11 coordinate. (c) Table for proxy variants of LD plot of TDG rs 4135050