Exploring the cross‐cancer effect of smoking and its fingerprints in blood DNA methylation on multiple cancers: A Mendelian randomization study

Abstract Aberrant smoking‐related DNA methylation has been widely investigated as a carcinogenesis mechanism, but whether the cross‐cancer epigenetic pathways exist remains unclear. We conducted two‐sample Mendelian randomization (MR) analyses respectively on smoking behaviors (age of smoking initiation, smoking initiation, smoking cessation, and lifetime smoking index [LSI]) and smoking‐related DNA methylation to investigate their effect on 15 site‐specific cancers, based on a genome‐wide association study (GWAS) of 1.2 million European individuals and an epigenome‐WAS (EWAS) of 5907 blood samples of Europeans for smoking and 15 GWASs of European ancestry for multiple site‐specific cancers. Significantly identified CpG sites were further used for colocalization analysis, and those with cross‐cancer effect were validated by overlapping with tissue‐specific eQTLs. In the genomic MR, smoking measurements of smoking initiation, smoking cessation and LSI were suggested to be casually associated with risk of seven types of site‐specific cancers, among which cancers at lung, cervix and colorectum were provided with strong evidence. In the epigenetic MR, methylation at 75 CpG sites were reported to be significantly associated with increased risks of multiple cancers. Eight out of 75 CpG sites were observed with cross‐cancer effect, among which cg06639488 (EFNA1), cg12101586 (CYP1A1) and cg14142171 (HLA‐L) were validated by eQTLs at specific cancer sites, and cg07932199 (ATXN2) had strong evidence to be associated with cancers of lung (coefficient, 0.65, 95% confidence interval [CI], 0.31‐1.00), colorectum (0.90 [0.61, 1.18]), breast (0.31 [0.20, 0.43]) and endometrium (0.98 [0.68, 1.27]). These findings highlight the potential practices targeting DNA methylation‐involved cross‐cancer pathways.


| INTRODUCTION
Smoking has been widely recognized as a risk factor for numerous diseases including cancer. Observational And experimental studies have confirmed the causal relationship between smoking and the risk of lung cancer, 1,2 and other common cancers like breast cancer, 3 prostate cancer, 4 ovarian cancer and cervix cancer have also been deemed as potentially consequential events associated with smoking. 5 Studies have found aberrant DNA methylation can be induced by smoking behaviors, 6 which is considered as a potential mechanism to trigger carcinogenesis. Also, hypotheses of DNA methylation-related epigenetic modification have long been investigated and widely elaborated as one of the mechanisms of carcinogenesis. 7 DNA methylation is classically characterized by the process of forming the 5-methylcytosine in the C5 position of cytosine-phosphate-guanine (CpG) dinucleotides. This is prone to obstacle the combination of transcription complex and DNA and cause nonprogrammed alteration in downstream gene expression, 8 such as hypomethylation in activating proto-oncogenes and hypermethylation in the silencing of tumor-suppressor genes in the promoter region 9 in carcinoma tissues. A number of studies have investigated the association of smoking and single site-specific cancer via differentiated methylation level [10][11][12] suggesting the key role of methylation in the process of carcinogenesis. However, whether the epigenetic effect is exerted universally across multiple cancers remains unknown, and whether the mechanisms of methylation are shared through some specific CpG sites or common pathways is worth exploring.
Mendelian randomization analysis is a method that uses genetic variants, for example, single nucleotide polymorphisms (SNPs) or quantitative trait loci (QTLs) as proxies for risk factors of interest to explore the causality between exposure and disease. 13 This minimizes the unmeasured confounding effects and diminishes reverse causality.
In our study, we sequentially conducted two two-sample MR analyses using instrumental variants (IVs) of SNPs as genetic proxies and methylation QTLs (mQTLs) as epigenetic proxies of methylation at CpG sites to explore the causal effect of smoking on the risk of multiple site-specific cancers. We further assessed the cross-cancer effects of smoking-related blood methylation and validated the tissue-specificity with expression-QTLs (eQTLs).

| Study design
In our study, we firstly conducted a two-sample Mendelian randomization (MR) analysis to investigate the causal effect of smoking on genetic predisposition to 15 site-specific cancers, in which we chose three phenotypes of smoking behaviors (age of initiation, smoking initiation and smoking cessation) and an aggregative lifetime smoking index (LSI) as specific measurements of smoking. Then we performed a second two-sample MR analysis to reveal the aforementioned causality on the epigenome-wide level, using mQTLs as IVs for smokingrelated blood DNA methylation at CpG sites, and further focused on those that have a cross-cancer effect and are validated with tissuespecific eQTLs. For the significant CpG cites, we also conducted a colocalization analysis to investigate the effect of sharing variants both on DNA methylation and susceptibility of cancers ( Figure 1).

| Genome-wide association study summarylevel data of smoking behaviors
The IVs for four smoking behaviors were extracted from two genomewide association studies (GWASs) separately. First, SNPs associated with age of smoking initiation (10 associated variants, N = 341 427), smoking initiation (378 variants, N = 1 232 091), smoking cessation (24 variants, N = 547 219) at the significant threshold (P < 5 Â 10 À8 ) as genetic instruments were obtained from a published GWAS that identified variants associated with different aspects of smoking (initiation, cessation and heaviness) from a total of 1 232 091 individuals of European ancestry. 14 In our study, smoking initiation and cessation are both binary phenotypes comparing individuals' smoking status, with current or previous smokers coded as "2" and never smokers coded as "1" for smoking initiation, and current smokers coded as "2" and previous smokers coded as "1" for smoking cessation. Then we derived genetic variants for LSI, an aggregative indicator of smoking, from a GWAS involving 462 690 individuals of European ancestry (126 independent, genome-wide significant SNPs). 15 Linkage disequilibrium (LD) was calculated based on 1000 Genomes European reference panel, and only genetic variants without LD (r 2 ≤ 0.01 and clump window >10 000 kb) were selected (Tables S1 and S2).

| Epigenome-wide data of smoking-related DNA methylation
The information of the association between smoking and DNA methylation (ie, smoking-related CpG sites) was derived from a genomewide meta-analysis measuring 5907 blood-derived DNA samples from participants in 16 cohorts of the Cohorts for Heart and Aging Research in the Genetic Epidemiology Consortium. 6 A total of 2623 CpG sites with differentiated methylation was identified between current smokers and never smokers (false discovery rate [FDR] < 0.05, P < 1 Â 10 À7 ). We then obtained CpG-associated mQTLs as genetic proxies from Genetics of DNA Methylation Consortium (GoDMC) (http://mqtldb.godmc.org.uk/), a mQTL database containing genetic and methylation data from over 30 000 participants. 16  (2) colorectal cancer (CRC) GWAS data were acquired from a meta-

Two Sample Mendelian Randomization Analysis Subsequent Analyses
Association of smoking behaviors with cancers (using SNPs as genetic proxies) Step 1: First MR Association of DNA methylation (CpG sites) with cancers Step 2: Secondary MR

Overlap CpG sites and genes among multiple cancers
Step 3: Cross-cancer overlaps CpG sites related mQTLs overlap with expression-QTLs (GTEx) Step 5: eQTLs validation Sharing variants driving both methylation at CpG sites and susceptibility of cancers

| Two-sample MR
We successively performed two two-sample MR analyses on the levels of genome and epigenome. In our first MR analysis, genetic variants (SNPs) identified for four measurements of smoking were employed as exposure to investigate its causal effect on the risk of multiple cancers. In the second phase, we identified mQTLs as epigenetic proxies for methylation at CpG cites associated with smoking.
The effect allele of each mQTL was unified to be in the same direction with the effect of smoking on DNA methylation. We used Wald ratio to estimate the association when exposure had only one SNP for proxy, and inverse-variance weighted (IVW) method with randomeffects to measure the combined effect for each exposure as the main method. Sensitivity analyses were additionally applied to improve the robustness of the results. The MR Egger regression and the intercept test were utilized to detect and correct for horizontal pleiotropy. 26 The Weighted Median method was used to provide consistent estimates when valid IVs weighed more than 50%. 27 The MR-PRESSO method was also employed to detect horizontal pleiotropy (global test), correct outliers by removing them (outlier test) and assess its distortion significance (distortion test). 28 The Cochrane's Q value was used to evaluate the heterogeneity of genetic variants (Q < 0.05). Fstatistics were calculated to measure the strength of instruments (F < 10 was considered to be a weak instrument). 29 The beta coefficient was calculated per SD for each genetic instrument, and odds ratios (ORs) with 95% confidence interval (CI) were scaled to per one SD increase in genetically predicted smoking and one-unit increase in the log OR of liability to multiple cancers. False discovery rate was computed for multiple-testing (FDR < 0.05). MR analyses were performed by using the "TwoSampleMR" R package.
For the significant CpG sites among multiple cancers in the second MR analysis, we further identified those that had cross-cancer associations. The mQTLs of CpG sites with cross-cancer effect were then obtained and searched in the Genotype-Tissue Expression (GTEx) resource 30 to further investigate their expression effect as eQTLs in the cancer-associated tissues. The significance threshold of expression evidence was set by both P-values (eQTL effect size) and m-values (existence of eQTL-effect in the specific tissue in the crosstissue meta-analysis), 31,32 and mQTLs that met P-value <.05/(the number of SNP-gene pairs) after Bonferroni correction and m-value >0.9 were indicated to have a statistically significant eQTL effect.

| Colocalization analysis
Among the CpG sites that were significantly associated with risk of multiple cancers, we additionally performed a colocalization analysis to investigate whether the susceptibility to site-specific cancers was driven by the same variants influencing methylation at the CpG sites.
Observation of 75% or higher posterior probability of association (PPA) for both the summary effect of the CpG site and the single effect of a mQTL were deemed as evidence of colocalization. GWAS data for cancer and EWAS data for smoking-methylation (with mQTLs as proxies) were the same as those used in the MR analysis. The colocalization analysis was performed by the "coloc" R package. 33 All analyses were undertaken with R Software 4.2.1.

| MR analysis of smoking behaviors and multiple cancers
Genetic variants for three smoking behaviors (age of smoking, smoking initiation and smoking cessation) and LSI are shown in Table S2, and the F-statistic for each IV was above 10, suggesting there was no substantial weak instrument bias.
Seven types of site-specific cancers were found significantly associated with three out of four measurements of smoking behaviors (except for age of smoking) utilizing the IVW method ( Figure 2 (Table S3).

| Colocalization analysis
Given that our results of smoking initiation (comparing current/former smokers with nonsmokers) indicated the overall association of smoking with increased risk of CRC, which was also supported by subsequent epigenetic MR analysis, this finding did not affect our overall conclusions. Nonetheless, some previous studies have also implied similar clues for the contradictory findings. The genetic correlation analysis in the GWAS where we obtained three smoking measurements suggested that smoking cessation was negatively (with current smokers coded as "2" and previous smokers coded as "1") associated with inflammatory bowel disease (especially ulcerative colitis), potentially indicating an irregular association pattern between smoking cessation and intestinal diseases. 14 Nevertheless, since the relationships between smoking behaviors and cancers were complicated and the effect of smoking cessation was affected by other factors including smoking duration, smoking intensity, the age of quitting smoking and so forth, 38 further evidence is needed.
A vast number of studies have revealed the effect of smoking exerted on DNA methylation across the whole epigenome, 6 which is also responsible for increasing risk of multiple cancers. Our study found that the CpG site cg06639488 (EFNA1) to have cross-cancer effect on breast and lung cancer, consistent with previous findings.

EFNA1 belongs to the subfamily of ephrins acting as the ligands for
Eph receptors, and the interaction of EFNA1 with its most common receptor EphA2 is deemed crucial to the onset of malignant tumors, possibly via regulation of cell cytoskeleton and cell adhesion. 39,40 The upregulation of EFNA1 has already been reported in a broad variety of cancers, for instance, a study has reported a higher transcription and expression of EFNA1 in breast cancer tissues than para-cancerous tissues using the UALCAN database, elucidating the potential values of EPHA/EFNA family-related pathways in predicting breast cancer. 41 In addition, EphA2 was found overexpressed in diverse cancers, among which lung cancer, also reported in our findings, was provided with a pointed strategy targeting EPHA2 blockade. 42 48 Similarly, a study analyzing the role of m 7 G-lncRNAs using TCGA identified ATXN2 as a key target regulated by m7G-lncRNAs, with a higher expression in CRC. 49 Since evidence on the association of ATXN2 and multiple cancers is relatively sparse, our study provides supportive evidence from an epigenetic perspective of methylation, which requires deeper research in the future targeting related pathways.
Notably, a previous study investigating the causal effect of mQTLs at lung cancer-related CpG sites with lung cancer suggested no confounding effects of smoking behaviors in the associations, 50 which potentially indicated that there was possibly little overlap between smoking-associated and lung cancer-associated methylation pathways, and therefore might not cause confounding to each other.
This finding also highlighted the importance of future investigations on the interaction and overlap of methylation among different trait is needed.
Our study has several strengths. First, we explore the association between smoking and 15 site-specific cancers to provide a comprehensive perspective of the varying risks among different cancers responding to smoking, and to further investigate the cross-cancer effect of smoking. Also, genetic instruments for smoking behaviors and multiple cancers were derived from the newest and largest GWASs to ensure accuracy and reliability. MR analyses were applied in our study to avoid reverse causality and to reduce the interference of confounding factors, and evidence was provided from both genetic and epigenetic perspectives by respectively utilizing SNPs and mQTLs as IVs, further highlighting the underlying role of methylation modification in carcinogenesis. Our study also has some limitations. We obtained methylation data from peripheral blood samples which could show a different methylation pattern from specific tissues, 51 and our methylation data did not include information on time-varying methylation changes.
Nonetheless, we validated our main findings (eg, EFNA1, CYP1A1, HLA-L, etc) further with colocalization and tissue-specific expression evidence to enhance the reliability of the results as causal effects. The power of the analyses with some cancers (eg, biliary cancer, testis cancer, etc) might be attenuated due to their small case numbers. Horizontal pleiotropy is an inevitable problem when utilizing genetic variables, especially for phenotypes predicted by a few SNPs. However, we conducted sensitivity analyses, for example, MR Egger and MR-PRESSO which could correct and remove outliers to ensure the robustness. Also, all GWASs were derived from cohort or studies of European ancestry, which impose restrictions on the extrapolation of our conclusion.

| CONCLUSIONS
Our study found smoking behaviors to be genetically associated with multiple cancers, and provided further epigenetic perspective that DNA methylation at CpG sites could potentially act as a crucial part of carcinogenesis. Aberrant DNA methylation at several CpG sites related to smoking, including cg06639488 (EFNA1), cg12101586 (CYP1A1), cg14142171 (HLA-L) and cg07932199 (ATXN2), were indicated with cross-cancer carcinogenic effects.

AUTHOR CONTRIBUTIONS
The work reported in the paper has been performed by the authors, unless clearly specified in the text. Study conceptualization and design: