Genome‐wide time‐to‐event analysis on smoking progression stages in a family‐based study

Abstract Background Various pivotal stages in smoking behavior can be identified, including initiation, conversion from experimenting to established use, development of tolerance, and cessation. Previous studies have shown high heritability for age of smoking initiation and cessation; however, time‐to‐event genome‐wide association studies aiming to identify underpinning genes that accelerate or delay these transitions are missing to date. Methods We investigated which single nucleotide polymorphisms (SNPs) across the whole genome contribute to the hazard ratio of transition between different stages of smoking behavior by performing time‐to‐event analyses within a large Finnish twin family cohort (N = 1962), and further conducted mediation analyses of plausible intermediate traits for significant SNPs. Results Genome‐wide significant signals were detected for three of the four transitions: (1) for smoking cessation on 10p14 (P = 4.47e‐08 for rs72779075 flanked by RP11‐575N15 and GATA3), (2) for tolerance on 11p13 (P = 1.29e‐08 for rs11031684 in RP1‐65P5.1), mediated by smoking quantity, and on 9q34.12 (P = 3.81e‐08 for rs2304808 in FUBP3), independent of smoking quantity, and (3) for smoking initiation on 19q13.33 (P = 3.37e‐08 for rs73050610 flanked by TRPM4 and SLC6A16) in analysis adjusted for first time sensations. Although our top SNPs did not replicate, another SNP in the TRPM4‐SLC6A16 gene region showed statistically significant association after region‐based multiple testing correction in an independent Australian twin family sample. Conclusion Our results suggest that the functional effect of the TRPM4‐SLC6A16 gene region deserves further investigation, and that complex neurotransmitter networks including dopamine and glutamate may play a critical role in smoking initiation. Moreover, comparison of these results implies that genetic contributions to the complex smoking behavioral phenotypes vary among the transitions.


Introduction
Various pivotal stages can be identified in an individual's smoking history, including smoking initiation, conversion from experimenting to established use, development of tolerance, and cessation. Each transition is likely influenced by environmental and genetic factors, some of which are common to all steps, and others that are specific. Modest to high heritability has been reported for the majority of smoking behavior phenotypes (Madden et al. 2004;Horimoto et al. 2012;Loukola et al. 2014), with a study of Finnish adult twins reporting heritability estimates of 0.59 in males and 0.36 in females for age at initiation of smoking (Broms et al. 2006).
Nicotine is the main psychoactive compound in tobacco, and exerts its functions by binding to nicotinic acetylcholine receptors (nAChR). Genome-wide association study (GWAS) meta-analyses have robustly reported that the CHRNA5-CHRNA3-CHRNB4 nAChR gene cluster on 15q25 and the CHRNB3-CHRNA6 region on 8p11.21 are associated with smoking quantity (measured by cigarettes per day, CPD) and nicotine dependence (ND) (measured by the Fagerström Test for Nicotine Dependence, FTND (Heatherton et al. 1991)) (Liu et al. 2010; The Tobacco and Genetics Consortium 2010; Thorgeirsson et al. 2010). However, less than 1% of the variance in the amount smoked is explained by alleles of these genes, with an average effect per allele of one CPD. Age of onset phenotypes have been utilized in some targeted studies of nAChR genes. Variants in the CHRNA5-CHRNA3-CHRNB4 gene cluster are shown to predict a later age of smoking cessation , and the effect of a functional CHRNA5 variant (rs16969968) on smoking quantity is reported to be stronger in early-onset smokers than in late-onset smokers (Hartz et al. 2012). Further, a genetic risk score composed of CHRNA5-CHRNA3-CHRNB4 and CYP2A6 (encoding the main metabolic enzyme for nicotine) variants highlighted in large CPD GWAS meta-analyses (Liu et al. 2010; The Tobacco and Genetics Consortium 2010; Thorgeirsson et al. 2010) was unrelated to smoking initiation, but associated with progression to heavy smoking and ND (Belsky et al. 2013).
Several GWAS have targeted smoking initiation (Vink et al. 2009; The Tobacco and Genetics Consortium 2010; Thorgeirsson et al. 2010;Siedlinski et al. 2011;Argos et al. 2014) or cessation (The Tobacco and Genetics Consortium 2010; Siedlinski et al. 2011;Argos et al. 2014) with phenotypes dichotomized into ever versus never or used as quantitative age of onset phenotypes. Only the large GWAS meta-analysis of the Tobacco and Genetics Consortium yielded signals in tyrosine kinase and dopamine signaling pathway genes that genome-wide significantly associated with smoking initiation (never vs. ever smokers) and smoking cessation (former vs. current smokers), respectively (The Tobacco and Genetics Consortium 2010). Time-to-event analysis is more powerful than analysis of binary traits or transformed quantitative phenotypes because it incorporates information of follow-up time span and allows for censoring. There is a huge gap in understanding the contribution of an associating variant to a specific trait. Causal mediation analysis has been used to improve the understanding of the mechanisms underlying detected associations (Jiang et al. 2013;Liu et al. 2013). The estimation of mediation effects in the context of survival models has been discussed in previous literature (Lange and Hansen 2011;VanderWeele 2011;Nemes et al. 2013). Smoking behavior is likely influenced by a variety of additional factors besides the function of nicotinic receptors and nicotine metabolism, such as psychiatric disorders (e.g., schizophrenia, depression) and somatic consequences of smoking (e.g., bronchitis, chronic obstructive pulmonary disease). Understanding the mechanisms underlying the progression of smoking behavior could facilitate the development of targeted cessation pharmacotherapies and interventions.
In this study, we investigated which SNPs across the whole genome contribute to the speed of transition between different stages of smoking behavior by performing time-to-event analyses within a large Finnish twin family cohort (N = 1962). We tracked and elaborately recorded smoking history by detailed interviews. We adopted time-to-event random effects models to examine the rate at which the smokers proceed to the next stage, and incorporated a kinship matrix to account for the family structure. Specifically, we tested whether genetic variants are associated with a younger age at smoking initiation, speed of transition to daily smoking (dichotomized into rapid vs. slow progression), speed of transition from daily smoking to the period of heaviest smoking, and earlier quitting from smoking. When performing association analyses, we considered plausible intermediate traits as covariates. We then investigated whether the independent variable, that is, a SNP, affects the outcome independently or influences the mediators, which in turn affects the outcome.

Materials and Methods
Sample Data collection has been described in previous publications (Broms et al. 2007;Loukola et al. 2014 initiated smoking and smoked on average 15 CPD. Altogether 880 subjects were successful quitters defined by selfreported abstinence of at least 6 months at the time of the interview. Written informed consent was obtained from all subjects who were interviewed and/or gave DNA samples before the beginning of the studies. The collection of the informed consent as well as blood samples followed the recommendations given in the Declaration of Helsinki and its amendments. Data collection was approved by the hospital district of Helsinki and Uusimaa, the ethical committee for epidemiology and public health (HUS 136/E3/ 01). A flowchart of the overall study design is shown in Figure 1.

Replication sample
For replication of the detected associations, we utilized an independent Australian twin family sample (NAG-OZALC, N = 1884, N = 3389, and N = 2723, for initiation, tolerance, and cessation analyses, respectively). A brief sample description is presented in the Table S1; detailed sample description has been previously published elsewhere (Heath et al. 2011).

Phenotypes
Detailed information on the evolution of smoking behavior as well as relevant covariates was retrospectively collected through structured telephone interviews, as previously described (Loukola et al. 2014). Questions used to inquire the several milestones in smoking behavior are presented in Table 1. We constructed four time-to-event phenotypic variables based on the transitions measured in years between progressive smoking states: (1) smoking initiation (years from birth to the age of smoking the first cigarette), (2) persistent smoking (years from the age of smoking the first cigarette to the age of daily smoking), (3) tolerance (years from the age of daily smoking to the age when the heaviest smoking started), and (4) cessation (years from the age of daily smoking to the age of successful quitting). For smoking cessation, we defined continuous abstinence of more than 6 months as successful quitting, as previously suggested (Hughes et al. 2003), and those individuals still smoking by the time of the interview were treated as censored. We excluded subjects (N = 70) abstinent for less than 6 months prior to the interview because their cessation status cannot be deduced. Further, we excluded 55 subjects who reported quitting due to health reasons, as genetic background may have little or no effect on quitting success in such a situation. The basic characteristics of the data used in the analyses are listed in Table 2.

Evaluation of Covariates
We first performed genome-wide time-to-event analyses for each of the transitions with only sex as a covariate. As an exception, for smoking initiation we also included another genotype-independent covariate, birth year, as the age of smoking initiation of an individual may be affected by the specific environment in his/her generation (Morabia et al. 2002). As a follow-up analysis, we then conducted another set of genome-wide time-to-event analyses with additional transition-specific intermediate covariates in order to obtain comparable results for the following mediation analysis of significant SNPs, and to increase the chance of detecting SNPs associated with the transitions independently of the intermediate covariates. We evaluated biologically plausible covariates for each transition (selected based on literature and a priori knowledge of factors affecting smoking behavior), and included those significantly associated with the transition (P < 0.05) ( Table 2), as only significantly associated covariates are eligible as candidate mediators.
For smoking initiation, we considered positive and negative first time sensations as these variables attempt to capture the individual responses to the first-ever dose of nicotine, and likely have a significant effect on the probability of smoking a whole cigarette (Rios-Bedoya et al. 2009).
For persistent smoking, we considered age at the first cigarette, as it significantly affects the downstream steps in smoking behavior (Breslau et al. 1993). Further, the interval between the first and second cigarette was considered as an estimate of the initial speed of transition. However, as the genome-wide analyses showed lack of power, no follow-up analysis was performed.
For tolerance, we considered age of initiation of daily smoking, as it is shown to predict earlier age at heaviest smoking (Kendler et al. 2013). Further, we considered CPD, max CPD ever smoked during a 24 h period, and Age of successful quitting "Do you still smoke or have you quit?"; if one replies "has quit", then ask "When did you last smoke (even a puff)?"; if one replies "Last puff >6 months ago", then ask "How old were you when you last smoked (even a puff)"?

Covariate
Cigarettes per day (CPD) "How many cigarettes do you/did you used to smoke per day?" 8 response categories (1-2, 3-5, 6-10, 11-15, 16-19, 20-25, 26-39, and 40 + ), original categorical observations were replaced with class means of CPD (1.5,3.5,8,13,17.5,22.5,32.5,and 45 CPD,respectively) 14.95 (AE9.25) Maximum cigarettes per day (max CPD) "What is the maximum number of cigarettes you have ever smoked during 1 day (24-h period)?" 28.77 (AE14.41) Positive first time sensations Sum score of three questions measuring sensation felt after smoking the first cigarette or first puffs ("While smoking your very first cigarettes, did you (1) like the taste or smell of the cigarette, (2) feel more relaxed, (3) feel a pleasurable rush or buzz?") (range 1-10) 3.06 (AE2.66) Negative first time sensations Sum score of seven questions measuring sensation felt after smoking the first cigarette or first puffs ("While smoking your very first cigarettes, did you (1) cough, (2) feel dizzy or light headed, (3) get a headache, (4) feel your heart racing, (5) feel nauseated, (6) feel your muscles tremble or become jittery, (7)  The participants were queried about each DSM-IV nicotine withdrawal symptom: irritability, restlessness, concentration problems, depressed mood, increased appetite, sleep problems, nervousness, and decreased heart rate (Association 2000), within the context of a smoking cessation. The reported symptoms were summed up to form a symptom score (range 0-8) CPD at the period of heaviest smoking, as smoking quantity may affect the development of tolerance; rodent studies show that rapid tolerance is related to frequency of nicotine administration and dose (Aceto et al. 1986). For smoking cessation, we considered FTND and DSM-IV nicotine withdrawal symptoms, as they likely affect the ability to quit (Kozlowski et al. 1994).

Genotyping and quality control
Genotyping was performed at the Wellcome Trust Sanger Institute using the Human670-QuadCustom Illumina BeadChip (N = 1104) and the Illumina Human Core Exome BeadChip (N = 858). Pre-imputation exclusion criteria for the data generated with the Human670-QuadCustom Illumina BeadChip were minor allele frequency (MAF) < 0.01, sample and SNP call rate <0.95 (<0.99 for SNPs with MAF < 0.05); and the criteria for the data generated with the Illumina Human Core Exome BeadChip were minor allele count <2, sample call rate <0.98, SNP call rate <0.95 (<0.99 for SNPs with MAF < 0.05). Both genotype datasets were filtered according to the Hardy-Weinberg equilibrium (HWE) test P < 1e-06. Further, sample heterozygosity test, gender, and Multidimensional Scaling (MDS) outlier checks were done for both. Pre-phasing of the data was done with SHAPEIT2 (Delaneau et al. 2013) and imputation with IMPUTE2 (Howie et al. 2009) using the 1000 Genomes Phase I integrated haplotypes (produced using SHAPEIT2) reference panel (1000 Genomes Project Consortium, 2012). Quality controls and imputation for the GWAS data were done centrally at the Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. The sample sizes for each transition analysis differ slightly because any subject with missing data on the corresponding phenotype or covariates was excluded from the analysis of that transition. Out of the 1529 subjects, 450 proceeded to daily smoking within one year, while for 1079 subjects it took over a year. In our data set additional 37 subjects did not proceed to daily smoking during the follow-up period.

Statistical analyses
The genome-wide time-to-event analyses for smoking initiation, tolerance, and cessation were performed using the Cox proportional hazards (PH) model (Cox 1972) with random effects. We calculated the empirical kinship matrix based on the observed relationship of family members, and employed the coxme R package (Therneau 2012) which implements the Cox PH model with random effects of multivariate normal distribution by utilizing penalized partial likelihood (Ripatti and Palmgren 2000). The selected sample with only one co-twin from each MZ pair and at most one parent in each family resulted in a kinship matrix in which individuals in a family share the same genetic correlation coefficient, dramatically reducing the computational time of the coxme function. For persistent smoking, we found that over a quarter of individuals (N = 450) became daily smokers within 1 year, and those who did not engage in daily smoking within 20 years since initiation were considered as long-term survivors. In order to account for this, we first adopted and implemented a mixture cure model to analyze this transition in the context of survival framework (Yu and Peng 2008). The results, however, showed an inflated false positive error rate (data not shown) probably due to the potential inaccuracy of the variance estimation as shown in previous simulation studies (Yu and Peng 2008). We therefore addressed this issue by dichotomizing the survival time variable and classified those becoming a daily smoker within 1 year after smoking the first cigarette as rapid progression (N = 450), and those becoming a daily smoker after > 1 year as slow progression (N = 1079). We then conducted the association analysis on this binary variable with the logistic linear mixed effects model implemented in the glmmML R package (Brostr€ om and Holmberg 2011). We performed the single-variant association analyses only for those SNPs with a MAF > 5% and HWE test P > 1e-05, and ensured that identified top signals had high imputation information score (>0.8). The total number of SNPs included in the genome-wide time-to-event analyses was 5,918,992. We checked for potential population stratification by investigating the principal components for the family founders with the EIGENSOFT package (Price et al. 2006); no outliers with foreign ancestry were found, as was expected as the twin data consist purely of native Finnish population. We adopted the genome-wide significance Pvalue of P < 5e-08 as a cutoff, which has been broadly recognized as a criterion based on the sequencing data of European populations (Sham and Purcell 2014). For chromosomes containing loci exceeding the cutoff, we performed conditional analyses where we adjusted for the top SNP to test whether the detected association represented an independent signal. We list top five SNPs, regardless of their P-values, to allow for comparison with results from the follow-up studies adjusting for the relevant covariates. For SNPs identified as significantly associated with the transitions, we further performed mediation analyses to investigate whether the association is through plausible mediators. Details of the mediation analyses are presented in the Appendix S1. We used GWAVA (Ritchie et al. 2014) for predicting the potential functional effects of the associating noncoding region SNPs.
For genome-wide significant signals, replication was attempted for the top five SNPs as well as with all SNPs within the nominated genes (with AE50 kb flanking regions). For the replication analyses, models identical to those applied in the discovery sample were used. To account for multiple testing, we applied a modified Bonferroni correction based on the effective number of independent SNPs in the gene regions calculated using a formula proposed by Gao and colleagues (Gao et al. 2008;Hendricks et al. 2014).
We further conducted fixed-effect meta-analyses for the top five SNPs based on the effect sizes and standard errors from the discovery and replication studies using GWAMA (http://www.well.ox.ac.uk/gwama). P-values below 5e-08 were considered statistically significant.

Genome-wide time-to-event analysis of smoking initiation
The top five SNPs for the age at the first cigarette are listed in Table 3. In the time-to-event analysis rs73050610 on 19q13.33 was highlighted (P = 1.12e-07) (Fig. S1, Table 3). In a follow-up analysis additionally adjusted for first time sensations rs73050610 achieved genome-wide significance (P = 3.37e-08) (Fig. S2, Table 3). The LD block in which all top five SNPs are located is flanked by genes TRPM4 (1 kb apart) and SLC6A16 (60 kb apart). A regional plot of the 19q13.33 locus is shown in Figure S3. In an analysis conditioned on rs73050610, no residual genome-wide significant signal remained, suggesting that there is only one independent signal in this locus. The hazard ratio (HR) of rs73050610 is 0.80, suggesting that carriers of the minor allele have a 20% lower hazard per allele of smoking the first whole cigarette.
None of the 19q13.33 top five SNPs showed statistically significant evidence for replication in an independent Australian twin family sample; however, the effect sizes shared the same direction in analyses adjusted for first time sensations. When attempting replication with all SNPs located in TRPM4 and SLC6A16 (with AE50 kb flanking regions), statistically significant association was seen in analyses adjusted for first time sensations for  rs352813 (P = 9.2e-04, surpassing the significance threshold of P = 9.43e-04 based on the modified Bonferroni correction), located 30 kb from the top SNP rs73050610 (Table S2). Rs352813 was not statistically significant in the Finnish sample, and the direction of effect of rs352813 was different in the Australian sample when compared to the Finnish sample. Altogether 18 SNPs in 19q13.33 showed some association (P < 0.05) in both populations. Meta-analysis of the top five SNPs did not yield genome-wide statistically significant signals. The estimated average causal mediation effects (ACME) of rs73050610 through the positive and negative first time sensations were 0.0213 (P = 0.70) (a positive coefficient from the mediation analysis means that the hazard is decreased. Refer to the Appendix S1 for the details of the mediation analyses) and À0.0291 (P = 0.77), respectively, suggesting that the effect of rs73050610 is not mediated through the positive or negative first time sensations.

Genome-wide time-to-event analysis of persistent smoking
The top five SNPs for the transition from the age at the first cigarette to the age of daily smoking (rapid vs. slow transition) are shown in Table 3. None of the SNPs exceeded or approached the genome-wide significance threshold. Manhattan and Q-Q plots are presented in Figure S4. The Q-Q plots show deflated P-values indicating lack of sufficient statistical power, likely due to the limited sample sizes for an association analysis with a binary variable. We did not pursue follow-up analyses or attempt replication for this transition.

Genome-wide time-to-event analysis of tolerance
The top five SNPs for the transition from daily smoking to heaviest smoking are presented in Table 3. In the time-to-event analysis rs11031684 on 11p13 showed genome-wide significant association (P = 1.29e-08) (Fig. S5, Table 3) with an HR of 1.46, suggesting that the minor allele accelerates the progression to tolerance. This SNP is located in a pseudogene RP1-65P5.1, and is flanked by RCN1 and WT1 within a distance of approximately 150 kb. A regional plot of the 11p13 locus is shown in Figure S6. In an analysis conditioned on rs11031684, no residual genome-wide significant signal remained, suggesting that there is only one independent signal in this locus. In a follow-up analysis additionally adjusted for age of daily smoking, CPD, and max CPD, rs2304808 residing in FUBP3 on 9q34.12 showed genome-wide significant association (P = 3.81e-08) (Fig. S7, Table 3) with an HR of 1.31, suggesting that carriers of the minor allele progress more quickly to tolerance. A regional plot of the 9q34.12 locus is shown in Figure S8. In an analysis conditioned on rs2304808, no residual genome-wide significant signal remained, suggesting that there is only one independent signal in this locus. None of the top five SNPs replicated in an independent Australian twin family sample, nor did any SNPs located within RP1-65P5.1, WT1, RCN1, or FUBP3. Meta-analysis of the top five SNPs did not yield genome-wide statistically significant signals.
The estimated ACME of rs11031684 through CPD and max CPD were À0.525 (P = 0.04) and À0.619 (P = 0.07), respectively, suggesting that some of the effects of rs11031684 is mediated through CPD. Additionally, we found that rs11031684 was nominally associated with CPD (P = 0.0084) and max CPD (P = 0.0062). The estimated ACME of rs2304808 through CPD and max CPD were 0.231 (P = 0.22) and 0.304 (P = 0.24), respectively, suggesting that the effect of rs2304808 is not mediated through CPD or max CPD.

Genome-wide time-to-event analysis of cessation
The top five SNPs for the transition between daily smoking and cessation are presented in Table 3. In the time-toevent analysis rs72779075 on 10p14 showed genome-wide significant association (P = 4.47e-08) (Fig. S9, Table 3) with an HR of 1.48, suggesting that carriers of the minor allele quit earlier than noncarriers. In a follow-up analysis additionally adjusted for FTND and DSM-IV nicotine withdrawal symptom score, rs72779075 remained as the top SNP but the signal no longer was significant (Fig. S10, Table 3). This locus is close to a pseudogene RP11-575N15 (with a distance of <40 kb), and the nearest gene is GATA3 (725 kb apart). A regional plot of the 10p14 locus is shown in Figure S11. In an analysis conditioned on rs72779075, no residual genome-wide significant signal remained, suggesting that there is only one independent signal in this locus. None of the top five SNPs replicated in an independent Australian twin family sample, nor did any SNPs located within GATA3. Meta-analysis of the top five SNPs did not yield genome-wide statistically significant signals.
The estimated ACME of rs72779075 through FTND and DSM-IV nicotine withdrawal symptoms were À4.1744 (P = 0.15), and À0.25967 (P = 0.55), respectively, suggesting that the effect of rs72779075 is not mediated through ND or nicotine withdrawal symptoms.

Discussion
The progression of smoking behavior from initiation to persistent smoking or cessation is a complex process involving multiple factors. Although some of the genetic factors related to smoking quantity and ND have been identified, we have only began to understand the underlying genome-wide genetic effects on the development of smoking behavior. In this study, we identified novel SNPs associated with three specific transitions in smoking behavior in a Finnish twin family sample (N = 1962).
Considering age at smoking initiation we found that Finnish Twin Cohort subjects born at later decades began smoking at younger age compared to subjects born at earlier decades, which is consistent with a previous Swiss study (Morabia et al. 2002), and that females started smoking later than males, which is also in accordance with previous findings (Okoli et al. 2013). This is also consistent with the evolution of tobacco use in Finland in the 20th century. Although the Q-Q plots (Figs S1 and S2) suggest that the family correlation structure was well controlled for, a mild overdispersion is observed. This implies that common environmental factors may also play a role in smoking initiation even after adjusting for birth year. DZ twins share more environmental factors than non-twin members of a family, and this environmental correlation structure is not captured by the used kinship matrix, which may lead to slight P-value inflation.
In the time-to-event analysis of age at smoking initiation adjusted for first time sensations, multiple SNPs on 19q13.33, flanked by TRPM4 and SLC6A16, were highlighted in both the Finnish sample and the independent Australian sample. The highlighted SNPs from the two studies differed but intertwined with each other, providing motivation for further investigating the involvement of nearby genes in smoking initiation. The associating SNPs seem to have a heterogeneous effect. The top five SNPs from the Finnish sample showed no statistically significant association in the Australian sample although they shared the same direction of effect. On the other hand, in the replication sample, a SNP on 19q13.33, located 30 kb from the top SNP in the Finnish sample, showed association adjusted for multiple testing, with a direction of effect opposite from that seen in the Finnish sample. The heterogeneous effects of these markers may reflect variation in LD block structures (Rosenberg et al. 2010) or be due to gene-environment interactions which substantially amplify the difference of SNP effects, and may suggest that these population-specific interactions play a critical role in the complex behavioral phenotypes, as previously suggested (Adeyemo and Rotimi 2010;Ho et al. 2010).
First time sensations plausibly affect the probability of smoking a whole cigarette; in line with this, in our study sample positive and negative first time sensations are associated with earlier and later age of initiation, respectively. In order to evaluate whether the association on 19q13.33 was independent of first time sensations, we included them as covariates in the follow-up analysis, and detected genome-wide significant association. Further, mediation analysis confirmed that the effect of the top SNP, rs73050610, is independent of first time sensations, and thus the effect is likely due to other mechanisms besides the initial sensations experienced after the firstever dose of nicotine. The functional annotation with GWAVA suggests that rs3843746, which is in complete LD with rs73050610 in the Finnish population (D 0 = 1.00, r 2 = 0.979), is a CTCF-binding region variant. CTCF is involved in multiple regulatory influences on expression of genes, suggesting that the highlighted SNP may have a role in regulating nearby genes. Both of the flanking genes have functions relevant for smoking behavior. TRPM4 is a calcium-activated ion channel involved in many activities including immune response (Guinamard et al. 2010), and is the key gene encoding the channel for most calcium-activated nonselective cationic currents (I can ) observed in native tissues (Mrejeru et al. 2011). I can are involved in the generation of tonic and bursting activity in dopamine neurons (Mrejeru et al. 2011), and the burst firing is proposed to encode a "reward" signal during habit learning and pathological addictions (Phillips et al. 2003). Thus, variants in TRPM4 may affect smoking behavior through the dopaminergic system. The other flanking gene, SLC6A16, is a member of the Na + /Cl À dependent neurotransmitter transporter gene family (Farmer et al. 2000), and little is known about its substrates and functional significance. Another member of the solute carrier gene group, SLC17A7, located 211 kb from rs73050610, is a vesicular glutamate transporter previously found to be induced by smoking (Flatscher-Bader et al. 2008). Blockade of glutamatergic transmission inhibits the rewarding-enhancing effects of nicotine, thus reducing nicotine-seeking behavior (Li et al. 2014). Alterations in glutamatergic neurotransmission are involved in several psychiatric disorders, such as schizophrenia and alcohol dependence (Shigeri et al. 2004;Comasco et al. 2014), which also influence smoking behavior; however, in this study we did not assess comorbid psychiatric disorders. Inclusion of such phenotypes in future studies would allow scrutiny of potential pleiotropic effects.
Our analyses of persistent smoking indicate that being a female, smoking the first cigarette at a later age, and having a longer interval between the first and the second cigarette predicts later onset of daily smoking. We detected no genome-wide significant associations for the transition from the age of smoking the first cigarette to the age when daily smoking started. We likely had insufficient power in the analysis of the binary variable (rapid vs. slow transition) especially when using the mixed effects model. Larger samples are needed to investigate the transition to daily smoking.
Our analyses of tolerance indicate that being a woman and smoking less are related to slower progression to tolerance, while later age of daily smoking predicts acceleration. In the time-to-event analysis, we detected a genome-wide significant association with rs11031684 residing on 11p13 in a pseudogene RP1-65P5.1, and located 100 kb downstream of WT1. WT1 is a transcription factor involved in the regulation of human cell growth and differentiation, and is an established tumor suppressor gene. Exposure to heavy smoking influences the methylation pattern of CpG islands in WT1 (Bruno et al. 2012), providing a plausible mechanism for smoking induced cancers. In addition to affecting methylation, heavy smoking is suggested to affect the development of tolerance (Aceto et al. 1986). In order to evaluate whether the detected association was independent of smoking quantity, we included measures of smoking quantity as covariates in the follow-up analysis. The signal on 11p13 no longer was significant at a genomewide level (P = 4.04e-06); further, mediation analysis suggested that rs11031684 affects the hazard of progression to tolerance partly through smoking quantity. Interestingly, in the follow-up analysis a novel genome-wide significant signal emerged for rs2304808 on 9q34.12 within FUBP3, which belongs to a family of homologous gene-regulatory proteins that regulate many common target genes. Media-tion analysis further confirmed that the effect of rs2304808 was not mediated through smoking quantity. Inclusion of smoking quantity as a covariate substantially increased the significance of rs2304808, suggesting that including intermediate covariates may increase the power to detect SNPs that are independent of the covariates and directly associated with the phenotype.
In the analysis of smoking cessation we found that being a female, scoring higher in ND, and stronger nicotine withdrawal symptoms predicts slower transition to quitting, whereas later age of daily smoking predicts faster quitting. In the time-to-event analysis we detected genome-wide significant association with rs72779075 on 10p14, located 35 kb from a pseudogene RP11-575N15 and around 725 kb downstream of GATA3. In the follow-up analysis the signal of this SNP was no longer significant; however, our mediation analysis showed no statistically significant evidence for the effect of rs72779075 being mediated by ND or nicotine withdrawal symptoms, and thus the effect on difficulty in achieving and maintaining abstinence is likely due to other mechanisms besides the severity of ND and withdrawal. Interestingly, nicotine upregulates expression of GATA3 through stimulation of nAChRs (Arredondo et al. 2006). GATA3 is crucial in inducing allergic airway inflammation (Barnes 2008); although rs72779075 is located 725 kb downstream of GATA3, it may tag variants that influence symptoms of airway inflammation and thus may motivate smokers to quit. Alternatively, RP11-575N15 may have an unidentified function. Transcripts produced from pseudogenes may, for example, regulate the effects of microRNAs on their targets by competing for microRNA binding (Swami 2010).
Although our data included an extraordinarily detailed smoking history, there are still some limitations in our study. Smoking behavior encompasses psychiatric and social behaviors in which both complex genetic and environmental factors are involved; these were not accounted for in our analyses. Also other plausibly relevant covariates, such as socio-economic status, and working environment, were not considered. Further, our phenotype data were collected retrospectively in subjects with a mean age of 56 years at the time of the interview; thus the accuracy of self-reported ages of onsets may be influenced by recall bias. Although the interviews contained detailed measures of ND, the age of onset of ND was not assessed, and thus we were not able to perform time-to-event analyses of ND. However, we analyzed tolerance, which is the key dimension of ND. The efficient long-term survival model accounting for family structure, which to the best of our knowledge is not available, would be more appropriate when analyzing the transitions to daily smoking and quitting, and it should be considered in the future. Future studies should also attempt to investigate low-frequency variants (MAF < 0.05) in the context of transitions in smoking behavior.
The lack of replication may implicate false positive findings. Alternatively, it may be due to lack of power in the replication sample, population specificity of the associations, as well as gene-environment interactions. In addition, our study sample comes from one of the bestcharacterized founder populations, the Finns. Unique LD patterns are observed in founder populations (Service et al. 2006); thus, the lack of replication may at least partly be due to the genetic heterogeneity between the discovery sample (Finns) and replication sample (Australians). It has been shown that population isolates, especially those founded recently, such as Finland, have longer stretches of LD than outbred populations and may thus achieve better genome-wide coverage with equivalent numbers of markers (Peltonen et al. 2000;Service et al. 2006). Our top SNPs may tag underlying functional variants in the Finnish sample, but due to differences in LD structures the functional variants are not necessarily captured by these SNPs in the Australian data. Population-specific functional variants are known to exist (Lim et al. 2014), and one has already been documented in the Finnish population for a behavioral trait (Bevilacqua et al. 2010).
The advantages of this study include the detailed phenotype profiles, allowing us to more precisely handle the complexity of the smoking behavior phenotype, which has previously been modeled in a relatively static way (e.g., ever vs. never smoker). This approach to phenotype refinement may help to identify novel signals, and perhaps be tractable with smaller samples than conventionally required. Our novel results suggest that the various stages in smoking history are affected by different underpinning mechanisms. Complex neurotransmitter networks including dopamine and glutamate may play a critical role in initiation, while airway inflammation possibly contributes to smoking cessation.
In conclusion, we detected genome-wide significant association between SNPs and three transitions in smoking behavior in a Finnish twin family sample. The interpretation of our findings should be cautious before robust evidence of replication is obtained. Our results are valuable for guiding follow-up functional analyses, provide valuable clues into the etiology of smoking behavior, and encourage further studies utilizing time-to-event phenotypes in addictive behavior.

Supporting Information
Additional supporting information may be found online in the supporting information tab for this article:  and birth year, as well as positive and negative sensation scores) (k = 1.089). Figure S3. Regional plot of the 19q13.33 locus rs73050610 identified for smoking initiation (data from analysis adjusted for sex and birth year, as well as positive and negative sensation scores). Figure S4. (A) Manhattan and (B) Q--Q plots for the genome-wide time-to-event analysis of persistent smoking (adjusted for sex) (k = 1.002). Figure S5. (A) Manhattan and (B) Q-Q plots for the genome-wide time-to-event analysis of tolerance (adjusted for sex) (k = 1.027). Figure S6. Regional plot of the 11p13 locus rs11031684 identified in the genome-wide time-to-event analysis of tolerance (data from analysis adjusted for sex). Figure S7. (A) Manhattan and (B) Q-Q plots for genome-wide time-to-event analysis of tolerance (adjusted for sex, age of daily smoking, CPD, and max CPD) (k = 1.078). Figure S8. Regional plot of the 9q34.12 locus rs2304808 identified in the genome-wide time-to-event analysis of tolerance (data from analysis adjusted for sex, age of daily smoking, CPD, and max CPD). Figure S9. Manhattan and Q-Q plots for the genomewide time-to-event analysis of cessation (adjusted for sex) (k = 1.001). Figure S10. Manhattan and Q-Q plots for the genomewide time-to-event analysis of cessation (adjusted for sex, FTND, and DSM-IV nicotine withdrawal) (k = 1.011). Figure S11. Regional plot of the 10p14 locus rs72779075 identified in the genome-wide time-to-event analysis of cessation (data from analysis adjusted for sex). Table S1. Description of the NAG-OZALC sample used for replication. Table S2. Results of the replication study for SNPs located in TRPM4 and SLC6A16 (with AE50 kb flanking regions). For comparison, corresponding results from the Finnish discovery sample are also shown. Table S3. List of SNPs in smoking-related genes reported from previous GWAS that show nominal association (P < 0.05) with the transitions. Appendix S1. Details of the mediation analyses.