Using Mendelian randomization to explore the gateway hypothesis: possible causal effects of smoking initiation and alcohol consumption on substance use outcomes

Abstract Background and Aims Initial use of drugs such as tobacco and alcohol may lead to subsequent more problematic drug use—the ‘gateway’ hypothesis. However, observed associations may be due to a shared underlying risk factor, such as trait impulsivity. We used bidirectional Mendelian randomization (MR) to test the gateway hypothesis. Design Our main method was inverse‐variance weighted (IVW) MR, with other methods included as sensitivity analyses (where consistent results across methods would raise confidence in our primary results). MR is a genetic instrumental variable approach used to support stronger causal inference in observational studies. Setting and participants Genome‐wide association summary data among European ancestry individuals for smoking initiation, alcoholic drinks per week, cannabis use and dependence, cocaine and opioid dependence (n = 1749–1 232 091). Measurements Genetic variants for exposure. Findings We found evidence of causal effects from smoking initiation to increased drinks per week [(IVW): β = 0.06; 95% confidence interval (CI) = 0.03–0.09; P = 9.44 × 10−06], cannabis use [IVW: odds ratio (OR) = 1.34; 95% CI = 1.24–1.44; P = 1.95 × 10−14] and cannabis dependence (IVW: OR = 1.68; 95% CI = 1.12–2.51; P = 0.01). We also found evidence of an effect of cannabis use on the increased likelihood of smoking initiation (IVW: OR = 1.39; 95% CI = 1.08–1.80; P = 0.01). We did not find evidence of an effect of drinks per week on other substance use outcomes, except weak evidence of an effect on cannabis use (IVW: OR = 0.55; 95% CI = 0.16–1.93; P‐value = 0.35). We found weak evidence of an effect of opioid dependence on increased drinks per week (IVW: β = 0.002; 95% CI = 0.0005–0.003; P = 8.61 × 10−03). Conclusions Bidirectional Mendelian randomization testing of the gateway hypothesis reveals that smoking initiation may lead to increased alcohol consumption, cannabis use and cannabis dependence. Cannabis use may also lead to smoking initiation and opioid dependence to alcohol consumption. However, given that tobacco and alcohol use typically begin before other drug use, these results may reflect a shared risk factor or a bidirectional effect for cannabis use and opioid dependence.


INTRODUCTION
Illicit substance use and substance use disorders result in a substantial global burden on a range of health conditions [1,2]. Identifying causal risk factors in the development of problematic substance use is important for designing successful interventions and preventing subsequent health problems.
The gateway hypothesis, in its simplest form, is the theory that initial use of legal 'gateway' drugs, including tobacco and alcohol, may lead to illicit drug use such as cannabis, cocaine and opioids [3][4][5].
While these studies may support the gateway hypothesis it is equally plausible that there are underlying shared risk factors; for example, risk-taking or impulsive behaviours. Previous studies have reported an association of attention deficit hyperactive disorder (ADHD) with substance use outcomes [27,28] and ADHD genetic risk with smoking initiation [29,30], supporting impulsivity as a potential shared risk factor, although others-such as risk-taking, or adverse childhood experiences -could also lead to these outcomes. In terms of establishing whether the relationships between smoking and alcohol and other substance use are causal, there is some evidence (e.g. from randomized controlled trials) that smoking cessation may result in reduced substance use or abstinence [31], supporting a possible causal effect of smoking on substance use outcomes.
Mendelian randomization (MR) is a well-established method for causal inference based on instrumental variable (IV) analysis, which attempts to overcome issues of residual confounding and reverse causation [32][33][34][35]. MR uses genetic variants, assigned randomly at conception, as IVs for an exposure to estimate the causal relationship with an outcome. In two-sample MR [36] the single nucleotide polymorphism (SNP)-exposure and SNP-outcome estimates are obtained from independent-sample genome-wide association studies (GWAS) to estimate possible causal effects. Previous MR studies examining this relationship examined cannabis use only, and used smaller GWAS sample sizes than in the current study. One study found weak evidence of a causal effect of smoking initiation on cannabis use [37], while the other found no evidence [38]. Incorporating larger GWAS and a range of substance use outcomes may improve power to detect causal effects and provide clearer evidence as to whether or not these relationships are due to a gateway effect.
We applied this two-sample MR approach to investigate the possible causal effect between both smoking initiation and alcohol consumption (defined as drinks per week) and substance use outcomes of cannabis use and dependence, cocaine dependence and opioid dependence. We refer to these outcomes as 'illicit substance use', although we acknowledge that cannabis is not illegal in all jurisdictions. We also examined the association between smoking initiation and alcohol consumption. We used a bidirectional approach (Fig. 1) to assess whether there is evidence supporting the gateway hypothesis (i.e. that smoking initiation/alcohol consumption can lead to use of other substances and dependence) or whether there is evidence of a shared risk factor. Some pathways (e.g. from opioid use to smoking initiation) are unlikely, so analyses in this direction acted more as a sensitivity analysis, which could help to identify a shared risk factor rather than a causal effect.
F I G U R E 1 Bidirectional two-sample Mendelian randomization between smoking initiation/alcohol consumption and illicit substance use outcomes. A directed acyclic graph (DAG) for the causal effect between smoking initiation/alcohol consumption and illicit substance use outcomes. Evidence of a causal effect in the other direction may indicate a bidirectional effect or a common underlying risk factor

Data sources
We used GWAS summary statistics obtained from several consortia and other samples, the details of which are shown in Table 1, together with the variance explained by genome-wide significant SNPs and SNP heritabilities where these were reported. GWAS were conducted in samples of European ancestry. Sample overlap should be avoided or reduced, so as not to bias the estimates towards a more conservative effect estimate [39]. Therefore, we used GWAS with certain samples excluded from the consortia (see Table 1).

Smoking initiation
The smoking initiation GWAS [23] identified 378 conditionally independent genome-wide significant SNPs associated with ever being a smoker, i.e. where participants reported ever being a regular smoker in their life. See Supporting information for further details. The total sample size was 1 232 091 for the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) consortium; however, the sample size for the GWAS in each of our analyses varied to try to avoid sample overlap (see Table 1). Full genome-wide summary statistics were only publicly available without 23andMe. We requested 23andMe summary statistics separately and meta-analysed them with the publicly available data to recreate the original full GWAS summary statistics. The meta-analysis was conducted using the genome-wide association meta-analysis (GWAMA) software [44].

Drinks per week
The drinks per week GWAS [23] identified 99 independent genome-wide significant SNPs associated with the average number of alcoholic drinks consumed per week. See Supporting information for further details.

Cannabis use
The cannabis use GWAS [40] identified eight independent genomewide significant SNPs associated with ever using cannabis. See Supporting information for further details.

Cannabis dependence
The cannabis dependence GWAS [41] did not identify any genomewide significant SNPs associated with cannabis dependence. Cases were established based on meeting three or more criteria for Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) cannabis dependence.

Cocaine dependence
The cocaine dependence GWAS [42] identified one genome-wide significant SNP associated with cocaine dependence. All participants were interviewed using the semi-structured assessment for drug dependence and alcoholism (SSADA) and cocaine-dependent cases were established based on responses according to the DSM-IV criteria and reflect life-time cocaine dependence.

Opioid dependence
The opioid dependence GWAS [43] did not identify any genome-wide significant SNPs associated with opioid dependence. All participants were interviewed using the SSADA and opioid-dependent cases were

Statistical analyses
MR was used to assess whether relationships may be causal by using genetic variants as IV proxies for the exposures. Further details can be found in the Supporting information. Two-sample MR was conducted in R (version 4.0.0) [45] using the TwoSampleMR package (version 0.5.3) [46,47]. Genome-wide significant SNPs were selected as instruments for the smoking initiation, alcohol and cannabis use exposures. However, where cocaine, opioid and cannabis dependence were the exposures, there were either too few or no genome-wide significant SNPs, so we used a less stringent threshold of 1 × 10 −05 .
Multiple MR methods were used to assess the causal effects of: (i) the exposure of smoking initiation/alcohol consumption on illicit substance use outcomes and (ii) illicit substance use exposures on smoking initiation/alcohol consumption. These were inverse-variance weighted (IVW) [48], MR-Egger [49], weighted median [50], simple mode and weighted mode [51] MR methods. We were interested in the question of whether there is evidence of causal effects. We were concerned with the strength of evidence for an effect, as opposed to the effect estimate, and considered whether the direction of effect was as predicted and the strength of statistical evidence against the null. To do this we interpreted the P-value as a continuous measure of statistical evidence [52] and considered whether our results were consistent across different MR approaches. The IVW approach was our main method, with the others being sensitivity analyses which make different assumptions. We describe our findings in terms of lack of evidence, weak evidence, evidence or strong evidence of an effect, accounting for all these factors. The sensitivity methods have less statistical power than the IVW approach; therefore, we considered all results and the consistency of the direction of effect observed among T A B L E 1 GWAS used for two-sample Mendelian randomization. We also estimated the mean F-statistic, unweighted and weighted Finally, we conducted multivariable MR (MVMR) to investigate whether the causal effect of smoking initiation was independent of that for the drinks per week exposure for any illicit substance use outcomes where both exposures were associated with the outcome.

Phenotype
MVMR is an extension of MR that estimates the causal effect of multiple exposures on an outcome and assesses whether each exposure is independent of the others [54]. Please note that our analyses were not pre-registered, and therefore our results should be considered exploratory.

RESULTS
Evidence of causal effects of smoking initiation on illicit substance use outcomes Our two-sample MR results (Supporting information, Table S2 and  analyses (see also Supporting information, Fig. S1). We observed evidence of heterogeneity in results for the IVW method (see also Supporting information, Fig. S2), but this was not necessarily indicative of horizontal pleiotropy (see also Supporting information, Fig. S3).
Leave-one-out analyses did not reveal that any single SNP was driving the association (Supporting information, Fig. S4).
We also found evidence of a causal effect of smoking initiation on cannabis use (IVW: OR = 1.34; 95% CI = 1.24-1.44; Pvalue = 1.95 × 10 −14 ). Results were in a consistent direction among MR analyses (see also Supporting information, Fig. S5), although evidence for this was only found additionally for the weighted median method. There was evidence of heterogeneity with both the IVW and MR-Egger methods (see also Supporting information, Fig. S6) but not horizontal pleiotropy (see also Supporting information, Fig. S7). Leaveone-out analyses did not reveal that any single SNP was driving the association (Supporting information, Fig. S8).
F I G U R E 2 Forest plot for two-sample Mendelian randomization with smoking initiation as the exposure. Causal effects from the inversevariance weighted Mendelian randomization method where smoking initiation is the exposure. Effect estimates are presented as beta or odds ratios (OR) depending on whether the outcome was continuous or binary, with 95% confidence intervals (CI). SNP = single nucleotide polymorphism We found evidence of a causal effect of smoking initiation on cannabis dependence (IVW: OR = 1.68; 95% CI = 1.12-2.51; P-value = 0.01). Results were in a consistent direction for the SIMEX-adjusted MR-Egger and weighted median methods (Supporting information, Table S1), although evidence for these was weak (see also Supporting information, Fig. S9). There was no evidence of heterogeneity or horizontal pleiotropy (also see Supporting information, Figs S10 and S11). Leave-one-out analyses did not reveal that any single SNP was driving the association (Supporting information, Fig. S12).
Finally, we did not find evidence of a causal effect of smoking

Causal effects of illicit substance use exposures on smoking initiation
For the direction of illicit substance use to smoking initiation (Supporting information, Table S3

DISCUSSION
We examined whether there was evidence for causal effects of smoking initiation and alcohol consumption on cannabis use and dependence on cannabis, cocaine and opioids, which may support the 'gateway' hypothesis. We also examined the reverse direction, where evidence of an association, particularly in both directions, may be indicative of an underlying common risk factor.
Our main findings were those for cannabis use and dependence, which suggest that ever smoking may act as a gateway to subsequent cannabis use and perhaps even dependence, although evidence was weaker for the latter. This supports previous observational studies demonstrating an association between these phenotypes [7,8,55], and is in line with previous findings suggesting that tobacco is a gateway drug to other more problematic substance use [5,6,8,10]. Our MR analyses support stronger causal inference, although further triangulation with other study designs would strengthen this. Previous literature also suggests that alcohol consumption may be causally associated with cannabis use; however, our MVMR results suggest no evidence for independent effects of alcohol consumption, only evidence for a causal effect of smoking initiation on cannabis use.
We also found evidence for a potential causal pathway from cannabis use to smoking initiation. It has been previously suggested that cannabis use may act as gateway to tobacco use, possibly due to the form in which cannabis is used, i.e. if smoked with tobacco [56].
However, our finding of potential causal pathways between cannabis use and smoking initiation in both directions may suggest that this association is due to an underlying common risk factor, as opposed to either being a gateway drug. We found that all the SNPs used in the and this has been discussed previously in relation to mental health behavioural risk factors [57,58]. Previous studies have suggested that impulsive or risk-taking behaviours may be associated with smoking initiation and substance use [59][60][61] Therefore, the mechanisms behind these associations need to be examined further, and the possibility of a bidirectional relationship should also be considered.
We found a potential causal effect of smoking initiation on increased drinks per week, but did not find an association in the reverse direction. It is plausible that an underlying risk-taking behaviour may affect alcohol consumption via smoking. However, a biological mechanism behind this association should also be considered and studied further. Finally, we saw weak evidence of a causal effect of opioid dependence on increased drinks per week; however, due to the low power for the opioid dependence GWAS and the small effect size we would interpret this with caution. Opioid dependence (compared with ever use) is less probably explained by underlying risk-taking behaviour.
Therefore, research into alternative shared risk factors is warranted. It may be the case that opioid dependence has a causal effect on increased alcohol use, and this also warrants further investigation.

Limitations
Our study is the first, to our knowledge, to examine whether causal pathways may exist between smoking initiation/alcohol con- We also found some evidence of heterogeneity and horizontal pleiotropy for different analyses, meaning that these results should be interpreted in light of this, as some of the SNPs used may be associated with the outcome other than via the exposure. However, the additional MR analyses, which account for this, were generally in the same direction as our main results, although we were unable to formally test for directional pleiotropy in some cases where the Isquared estimate was low. In cases where the IVW shows evidence for a causal effect but results are inconsistent across the sensitivity analyses, this may be indicative of pleiotropy. However, inconsistent effects across sensitivity analyses and no evidence from the IVW is more likely to reflect a lack of evidence for an effect.
Another consideration is that the MR instruments used may not be valid for smoking, as they may be picking up risk-taking behaviours more than smoking itself [62]. Therefore, it would be useful to examine this further with other smoking-related phenotypes, such as smoking heaviness. Additionally, while we tried to avoid sample overlap, there was still some for the cannabis use GWAS (17% of the sample was also present in the GWAS for smoking initiation and drinks per week). Sample overlap could bias estimates towards a more conservative effect estimate [39], which should be considered when interpreting our results.
The MR analysis itself is subject to several limitations [33]. For example, the GWAS used for MR may suffer from 'winner's curse', where the SNP-exposure estimates may be overestimated due to selecting SNPs with the smallest P-values and biasing the MR estimate towards the null. Thus, interpreting the direction of effect as opposed to the effect size itself is more valid here. The effect estimate may also be biased by trait heterogeneity; for example, different aspects of substance use behaviours may be associated with the same genetic variants and therefore it is difficult to gain a precise estimate for a single aspect of any substance use behaviour.
Finally, our results should be considered in the context of the multiple potential causal pathways that we have investigated.

CONCLUSION
While, to some extent, our findings support the gateway hypothesis they also point to a potential underlying common risk factor, and with better-powered GWAS or those with more precise instruments and additional research we may be able to interrogate this further.
Triangulating our results with other approaches would help to answer this question [63,64]. By so doing we may be able to identify risk factors to substance use which could ultimately help with intervention design.