Gene–environment interactions involving functional variants: Results from the Breast Cancer Association Consortium

Investigating the most likely causal variants identified by fine‐mapping analyses may improve the power to detect gene–environment interactions. We assessed the interplay between 70 single nucleotide polymorphisms identified by genetic fine‐scale mapping of susceptibility loci and 11 epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER) positive (ER+) and ER negative (ER–) disease. The Bayesian False Discovery Probability (BFDP) was computed to assess the noteworthiness of the results. Four potential gene–environment interactions were identified as noteworthy (BFDP < 0.80) when assuming a true prior interaction probability of 0.01. The strongest interaction result in relation to overall breast cancer risk was found between CFLAR‐rs7558475 and current smoking (ORint = 0.77, 95% CI: 0.67–0.88, p int = 1.8 × 10−4). The interaction with the strongest statistical evidence was found between 5q14‐rs7707921 and alcohol consumption (ORint =1.36, 95% CI: 1.16–1.59, p int = 1.9 × 10−5) in relation to ER– disease risk. The remaining two gene–environment interactions were also identified in relation to ER– breast cancer risk and were found between 3p21‐rs6796502 and age at menarche (ORint = 1.26, 95% CI: 1.12–1.43, p int =1.8 × 10−4) and between 8q23‐rs13267382 and age at first full‐term pregnancy (ORint = 0.89, 95% CI: 0.83–0.95, p int = 5.2 × 10−4). While these results do not suggest any strong gene–environment interactions, our results may still be useful to inform experimental studies. These may in turn, shed light on the potential interactions observed.

Investigating the most likely causal variants identified by fine-mapping analyses may improve the power to detect gene-environment interactions. We assessed the interplay between 70 single nucleotide polymorphisms identified by genetic fine-scale mapping of susceptibility loci and 11 epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER) positive (ER1) and ER negative (ER-) disease. The Bayesian False Discovery Probability (BFDP) was computed to assess the noteworthiness of the results. Four potential gene-environment interactions were identified as noteworthy (BFDP < 0.80) when assuming a true prior interaction probability of 0.01. The strongest interaction result in relation to overall breast cancer risk was found between CFLAR-rs7558475 and current smoking (OR int 5 0.77, 95% CI: 0.67-0.88, p int 5 1.8 3 10 24 ). The interaction with the strongest statistical evidence was found between 5q14-rs7707921 and alcohol consumption (OR int 51.36, 95% CI: 1.16-1.59, p int 5 1.9 3 10 25 ) in relation to ER-disease risk. The remaining two gene-environment interactions were also identified in relation to ER-breast cancer risk and were found between 3p21-rs6796502 and age at menarche (OR int 5 1.26, 95% CI: 1.12-1.43, p int 51. 8 3 10 24 ) and between 8q23-rs13267382 and age at first full-term pregnancy (OR int 5 0.89, 95% CI: 0.83-0.95, p int 5 5.2 3 10 24 ). While these results do not suggest any strong gene-environment interactions, our results may still be useful to inform experimental studies. These may in turn, shed light on the potential interactions observed.
In 1968, MacMahon stated that "In no field are there more complex examples of the gene-environment relationship than in experimental cancer research." 1 . Following his words and the general opinion that genetic and non-genetic risk factors do not give rise to the disease solely by acting on independent pathways, several studies have investigated gene-environment interplay in relation to breast cancer risk. Studies of this type are motivated by the fact that the identification of gene-environment interactions in relation to breast cancer could provide insight into the biological mechanisms underlying the disease, allow the distinction of women at high risk from women at lower risk and improve the accuracy of risk prediction models. However, despite large-scale, international efforts, to date, there are few single nucleotide polymorphisms (SNPs) for which the effect on breast carcinogenesis has been found to be modified by an epidemiological risk factor, and only one of these has been replicated. 2,3 Several breast cancer risk loci that were previously identified in genome-wide association studies (GWAS) were recently investigated further by genetic fine-scale mapping in the framework of the Collaborative Oncological Gene-Environment Study (COGS) using samples from studies participating in the Breast Cancer Association Consortium (BCAC). The SNPs identified in the fine-mapping studies were further investigated in subsequent functional studies to identify potential causal associations. The consideration of causal variants may improve power to detect gene-environment interplay. However, if no interactions are detected, the weight of evidence against gene-environment interactions for the locus in question is strengthened. Additionally, new susceptibility alleles were What's new? Although it is widely acknowledged that genes and environment may interact to cooperatively modify breast cancer risk, no such interaction is known at the single nucleotide polymorphism (SNP) level. Here, the authors assessed the interplay of 70 SNPs with 11 known breast cancer risk factors in estrogen receptor-positive and -negative disease. Weak interactions were found with individual SNPs and current smoking or alcohol consumption but no strong gene-environment interaction was identified. These data do not support the model of strong modification of genetic cancer risk by environmental factors. identified from genotypes generated by imputation using the 1000 Genomes Project reference panel. Therefore, in this analyses, multiplicative gene-environment interaction in relation to breast cancer risk was assessed between 55 potentially causal as well as 15 newly identified SNP alleles, and the following 11 established epidemiological risk factors: age at menarche, oral contraceptive (OC) use, parity, age at first full-term pregnancy (FTP), number of FTPs, breastfeeding, use of menopausal hormone therapy (MHT), body mass index (BMI), adult height, smoking and alcohol consumption. We also investigated interactions in relation to estrogen receptor (ER) specific breast cancer risk since different disease subtypes may arise through different pathways. The analyses reported in this article are based on the largest, currently available dataset with genetic data and extensive epidemiological information.

Study subjects
Data on subjects of European descent derived from 21 studies participating in the BCAC were pooled. A brief description of each study can be found in Supporting Information Table S1. There were 12 population-based and 9 non-population based studies, each contributing at least 200 cases and 200 controls with available SNP data and information on at least one epidemiological risk factor. Subjects were excluded from the gene-environment interaction analyses if they were male, of non-European origin, a prevalent case or had missing data on age at diagnosis or age at interview, the epidemiological risk factor in question or any of the adjustment variables. Hence, the number of study subjects for each SNP-risk factor pair varied with the availability of epidemiological data. Analyses were based on between 11,342 subjects (5,445 cases and 5,897 controls) for effect modification by alcohol consumption and 58,573 subjects (26,968 cases and 31,605 controls) for effect modification by number of FTPs. The set of study subjects that were included in at least one gene-environment interaction model comprised 30,000 cases and 34,501 controls. All studies were approved by the relevant ethics committees and informed consent was obtained from all participants.

SNP selection and genotyping
Genotyping was carried out using an Illumina iSelect array (iCOGS) in the framework of the COGS project (www. nature.com/icogs). With the aim of detecting causal variants, a number of loci known to confer breast cancer risk at the time of the design of the iCOGS array were further investigated using fine scale genetic mapping. To improve SNP density, imputation of the respective regions was performed using the March 2012 release of the 1000 Genomes as reference panel. The functional follow-up work was not carried out centrally for all regions but divided between the different working groups of BCAC and thus the methods used varied somewhat. [4][5][6][7][8][9][10][11][12][13][14][15][16][17] In addition, imputed genotypes for 15 new susceptibility loci identified through a meta-analysis of 11 GWAS with genotypes SNPs generated by imputation using the 1000 Genomes Project March 2012 release as the reference panel were used. 5 A list of the 70 SNPs included in the analyses for this report can be found in Supporting Information Table S2.

Data filtering
Data from the participating studies were centrally cleaned and harmonized. The information on epidemiological factors was collected at date of reference. In the case-control studies, this was defined as the date of diagnosis for cases and the date of questionnaire for controls, and in the three cohort studies (

Statistical analysis
Association analyses of SNP alleles and breast cancer risk were carried out using logistic regression models adjusted for age at reference, study and ethnicity. In all models used in this study, genotyped SNPs were treated as ordinal variables (counts of minor alleles) and imputed SNPs as continuous variables.
The main effects of the epidemiological risk factors were also investigated using logistic regression models adjusted for reference age, study and self-assessed ethnicity. Heterogeneity across studies was explored by means of Cochrane's Q-test. The epidemiological variables used in this analyses were categorized as follows: age at menarche (per 2 years), ever use of OC (yes or no), ever parous (yes or no), number of FTPs for parous women (1, 2, 3 and 4 FTPs), ever breastfed (yes or no), age at first FTP (per 5 years), adult BMI for pre-and postmenopausal women, respectively (per 5 kg/m 2 ), adult height (per 5 cm), current use of MHT in the form of estrogen and progesterone or estrogen only (yes or no), lifetime Cancer Genetics and Epigenetics average alcohol intake (per 10 g/day), current smoking (yes or no) and pack-years of smoking (per 10 pack-years).
Multiplicative gene-environment interaction was assessed by comparing logistic regression models with and without SNP-risk factor interaction terms by means of the likelihood ratio test. All models on which this study is based were adjusted for study, reference age and ethnicity, so as to capture genetic population sub-structure. An interaction term between the epidemiological variable and an indicator for population based study design was included to protect against bias due to the differing selection of study participants in non-population based versus population-based studies. Interactions of SNPs and epidemiological risk factors were also investigated in relation to ER specific (ER1 or ER-) breast cancer risk, using cases and controls. Furthermore, potentially differential gene-environment interaction according to ER status was assessed in caseonly analyses comparing ER-cases to ER1 cases. The ERspecific models and the case-only analyses were adjusted similarly as the interaction models for overall breast cancer risk. To elucidate the results of the interaction analyses, risk association between SNPs and breast cancer was investigated by stratifying on the epidemiological variables.
MHT was sub-divided into estrogen only and combined (estrogen plus progestogen) therapy and investigated in relation to breast cancer risk using only post-menopausal women. All statistical models involving MHT use were further adjusted for former MHT use and current use of the MHT preparation (estrogen only or combined) not included in the interaction term. Additionally, interactions of SNPs and BMI for postmenopausal women were assessed in never-and former users of MHT only. All risk analyses were carried out using SAS 9.2.
Between-study heterogeneity of the interaction odds ratio (OR) estimates was investigated using Cochrane's Q-test and quantified by the ratio of true heterogeneity to the total observed variation, denoted I 2 . Heterogeneity was investigated for SNP-risk factor pairs with an interaction p values below the Bonferroni corrected threshold of statistical significance for genetic main effects, computed by dividing the standard threshold of 0.05 by the number of SNPs (0.05/70 > 7 3 10 24 ). Interaction ORs were tested for heterogeneity across studies on basis of interaction p values in models of overall or ER specific breast cancer risk, although the latter on the condition that a heterogeneity p < 0.05 of ER1 versus ERdisease had been observed. Heterogeneity tests were conducted using the R package "rmeta" (version 2.2).
The Bayesian False Discovery Probability (BFDP) was computed to control the number of false-positive findings and assess the noteworthiness of the results. 18 The cut-off for noteworthiness is based on the ratio of the cost of a false non-discovery to the cost of a false discovery. As suggested in the literature, we set the cost of failing to discover a true association to four times the cost of a falsely reported one, classifying results with a BFDP < 0.8 as noteworthy. The BFDP was calculated for all SNP-risk factor pairs with an interaction p values below the Bonferroni-corrected threshold given above (p < 7 3 10 24 ). The BFDP was computed for two different prior probabilities of this (0.01, 0.001), under the assumption that the probability of observing a true interaction OR inside the interval 0.66-1.5 was 95%. As a complementary measure to the BFDP, we also computed the ABF, which approximates the ratio of the probability of the data given that the null hypothesis is true, to the probability of the data given that the alternative hypothesis is true. The null hypothesis in this case is that the coefficient of the interaction term in the logistic regression model is equal to zero.

Results
The studies included in the gene-environment interaction analyses are listed in Table 1 together with the number of cases and controls, overall and by ER status. The median time between questionnaire and diagnosis was 3 years in the MCCS cohort, 2 years in the UKBGS cohort and 7 years in the CPSII cohort.
The associations between SNP alleles and breast cancer risk in the subset of BCAC studies with risk factor data available were consistent with earlier reports and can be found in Supporting Information Table S3. [4][5][6][7][8][9][10][11][12][13][14] Main effects of the epidemiological variables on breast cancer risk across studies are presented in Supporting Information Figure 1. These analyses were carried out using only population-based studies and the results were consistent with what has been reported earlier in the literature. 3,19-30 Current use of OC, MHT use (E only, as well as E 1 P), alcohol consumption, height as well as never having breastfed (vs. ever having breastfed) were all factors that showed an increased risk of breast cancer. A reduction in risk was observed for older age at menarche, ever being parous, number of FTPs and high BMI for pre-menopausal women. For current smoking and pack-years of smoking, no significant association with breast cancer risk was detected.
The complete results from the interaction analyses, showing the risk association between SNPs and breast cancer across categories of the epidemiological variables, are presented in Supporting Information Table S4. We identified four SNP-risk factor pairs with at least one interaction p < 7 3 10 24 in relation to overall, ER1 or ER-breast cancer risk, as presented in Table 2. All of these interactions were classified as noteworthy (BFDP < 0.8) assuming a prior probability of true interaction of 0.01 but no result remained noteworthy at the 0.001 level (Table 3).
First evidence of an interaction in relation to overall disease risk was noted between the variant CFLAR-rs7558475 and current smoking (OR int 5 0.77, 95% CI: 0.67-0.88, p int 5 1.8 3 10 24 ). This result was considered noteworthy (BFDP 5 0.40) assuming a prior probability of true interaction of 0.01 and the approximated Bayes factor (ABF) 5 0.007 indicated that the data were almost 140 times more likely given the alternative hypothesis than given the null. Breast cancer risk was reduced for current smokers carrying the minor allele (G) (OR per-allele 5 0.76, 95% CI: 0.66-0.88, p 5 2.2 3 10 24 ) compared to that of non-smoker carriers (OR per-allele 5 0.99, 95% CI: 0.91-1.08, p 5 0.9) where no risk association was observed. When comparing ER-cases to ER1 cases, the results did not indicate any effect heterogeneity (p het 5 0.48). There was no strong evidence of interaction, neither with respect to ER1 risk (p int 5 0.0014) nor with respect to ER-risk (p int 5 0.75).
The most promising result of the gene-environment interaction analyses in terms of noteworthiness was considered noteworthy at the 0.01 probability level and was noted between the variant 5q14-rs7707921 located in an intron of the autophagy related 10 (ATG10) gene, and alcohol consumption (OR int 5 1.36, 95% CI: 1.16-1.59, p int 5 1.9 3 10 25 ) in relation to ER-breast cancer. This result had the lowest BFDP 5 0.33, and conditioning on the alternative, the data were about 200 times more likely as compared to conditioning on the null (ABF 5 0.005). Carriers of the minor allele (T) of rs7707921 had an increased risk of ERbreast cancer if they consumed >20 g of alcohol per day (OR per-allele 5 2.56, 95% CI: 1.45-4.62, p 5 0.001), but not if they consumed <20 g of alcohol per day (OR per-allele 5 1.07, 95% CI: 0.92-1.24, p 5 0.36). A strong effect heterogeneity was detected when comparing ER-cases to ER1 cases (p het 5 6.7 3 10 26 ). Together with the absence of interaction in relation to ER1 disease (p int 5 0.79) and overall breast cancer risk (p int 5 0.70), this indicated that the interaction might be specific to ER-disease.
In addition, indications of two further interactions were noted in relation to ER-disease risk. One of these was between 3p21-rs6796502 and age at menarche (OR int 5 1.26, 95% CI: 1.12-1.43, p int 51.8 3 10 24 ) which had BFDP 5 0.49, and of which the ABF (ABF 5 0.010) implied that the data were 100 times more likely under the alternative hypothesis than under the null. Carriers of the minor allele (A) of 3p21-rs6796502 who experienced their menarche no later than the age of 11 years had a reduced risk of ERbreast cancer (OR per-allele 5 0.70, 95% CI: 0.54-0.90, p 5 0.006), whereas there was no association with disease risk of the genetic variant for women who had their menarche between the age of 12 and 13 years (OR per-allele 5 0.88, While the observed interaction was in relation to ER-disease risk, no effect heterogeneity was detected when comparing ER-and ER1 cases (p het 5 0.53) nor was there any indication of any interaction in relation to overall breast cancer risk (p int 5 0.94). Hence, it is not possible to conclude that the observed interaction is specific for ER-disease. Finally, an indication of a gene-environment interaction was found between 8q23-rs13267382 and age at first FTP (OR int 5 0.89, 95% CI: 0.83-0.95, p int 5 5.2 3 10 24 ) in relation to ER-disease risk. This interaction had BFDP 5 0.61 assuming a true, prior interaction probability of 0.01, and ABF 5 0.016, indicating that the data are about 60 times more likely conditioning on the alternative than on the null. There was no interaction observed in relation to disease risk, when considering ER1 breast cancer (p int 5 0.98), or overall breast disease risk (p int 5 0.47), and no effect heterogeneity was found when comparing the risk of ER-and ER1 breast cancer (p het 5 0.99). Our findings indicated a modest reduction in ER-breast cancer risk for minor allele (A) carriers who were aged 30 or above at their first FTP (OR per-allele 5 0.79, 95% CI: 0.68-0.91, p 5 0.001), whereas for women who had their first child at younger ages the allele had no effect on risk.

Discussion
From the analyses presented in this work, four SNP-risk factor pairs were identified, for which p int < 7 3 10 24 , and all of the interactions were considered noteworthy (BFDP < 0.8) assuming a true prior interaction probability of 0.01. One of the results was detected in relation to overall breast cancer risk, while the three remaining results appeared to be specific for ER-disease.
The strongest gene-environment interaction in relation to overall breast cancer risk was noted between rs7558475 located in the CASP8 and FADD like apoptosis regulator (CFLAR) gene and current smoking (p int 5 1.8 3 10 24 ). The protein product of CFLAR regulates apoptosis, thus it is possible that CFLAR genetic variants affect response to DNA damage caused by tobacco associated carcinogens and therefore modify breast cancer risk conferred by smoking. However, although rs7558475 is located in a CFLAR enhancer region, reports from recent functional studies and expression quantitative trait locus (eQTL) analyses did not provide any convincing evidence regarding functionality. 6,31 Hence, further work is required to understand possible biological mechanism related to the observed interaction.
The strongest statistical evidence of interaction was found in relation to ER-breast cancer risk and was noted between an intron variant 5q14-rs7707921 in the autophagy related 10 (ATG10) gene, and alcohol consumption (p int 51.9 3 10 25 ). Autophagy, which is considered a survival mechanism of the cell, may act as a tumor suppressor but also influence cell survival by promoting tumor growth, and has been suggested as a target in cancer therapy. 32 It has been reported that autophagy could have a protective effect on esophageal epithelial cells responding to ethanol-induced oxidative stress. 33 Also, while ethanol promotes oxidative stress in cancer associated fibroblasts, it has been reported to induce autophagy resistance in epithelial cells. 34 Given the above information, it is conceivable that alcohol consumption could influence the effect on breast cancer risk of an autophagy-related polymorphism. However the biological mechanism needs to be further investigated. The position of the variant ATG10-rs7707921 does not coincide with any strong regulatory elements. The eQTL analyses carried out within the framework of BCAC showed a strong association between the T allele of rs7707921 and expression of the ribosomal protein S23 gene (RPS23) in breast tissue as well as a moderate association between the allele and expression of the ATPase, H1 transporting, lysosomal accessory protein 1-like (ATP6AP1L) gene. 5 The RPS23 gene encodes a ribosomal protein and the ATP6AP1L is also protein coding but the genes have not yet been implicated in ER-breast cancer risk and their expression levels have not been assessed in relation to alcohol consumption or oxidative stress. Further work is thus needed to understand how the protein products of these genes could interact with alcohol consumption to modify the risk association of rs7707921 with ER-breast cancer.
Furthermore, we found an indication of a possible interaction between 3p21-rs6796502 and age at menarche (p int 5 1.8 3 10 24 ) in relation to ER-breast cancer. Our results suggest that the reduced risk of ER-breast tumors for carriers of the A-allele are modified for women with late age at menarche 14 years. However, according to a recent functional study, the SNP is not located in the vicinity of any genes or enhancer regions in mammary cell lines, nor are there any significant results available from eQTL analyses. 5 In addition, no significant effect heterogeneity was found when comparing the interaction between ER-and ER1 cases to support that the result could to be specific to ER-disease. It is thus necessary to first confirm this interaction with further data before attempting any biological explanation.
The interaction observed between the intron variant 8q23-rs13267382 of the long intergenic non protein coding RNA 536 gene (LINC00536) and age at first FTP (p int 52.6 3 10 24 ) suggests that the variant is associated with a reduced risk of ER-breast cancer with older age at first FTP, whereby the association was statistically significant for women who were at least 30 years of age at their first FTP. Overall, this variant was not reported to be associated with ER-disease risk, 5 which is confirmed in the current report. Neither this SNP nor any variants in high linkage disequilibrium with it are positioned in the vicinity of any regulatory genomic feature. As for the interaction with 3p21-rs6796502, there was not clear evidence for this interaction to be specific to ERdisease. Therefore, further data are required to confirm this interaction.  Cancer Genetics and Epigenetics Barrdahl et al. This work is subject to a number of limitations. First, despite central harmonization of the data, substantial heterogeneity was observed in the risk estimates of the epidemiological risk factors across studies, which brought about the inclusion of a product term of study design and epidemiological variable in the interaction models, and the quantification of epidemiological main effects based on the population based studies. Second, the study population consisted predominantly of case-control studies (only three cohort studies), which are known to be prone to selection bias and recall bias, as well as associated misclassification of risk factors. However, gene-environment interaction estimates were similar in the overall study population compared to the subset of population based studies (data not shown). Misclassification of epidemiological risk factors is known to reduce the power to detect interactions, rather than increasing the probability of false-positive findings. 35 Hence, this study is more likely to be subject to limited power than to spurious gene-environment interactions. Also, our findings are based on study participants of Caucasian origin so that they may not be generalizable to other populations. For the ER specific risk analyses, in particular in the subgroup of ER-cases (N 5 4,662), the power was diminished due to the reduced sample size.
However, this study also has several strengths. To begin with, the interaction analyses are based on the largest dataset presently available. The four indicated interactions were based on 11,337 subjects (5,385 cases and 5,952 controls) in analyses with respect to alcohol consumption, and 19,427 subjects (9,073 cases and 10,354 controls) for current smoking, as well as 43,513 subjects (20,147 cases and 23,366 controls) in the analyses of age at menarche and 37,819 subjects (17,382 cases and 20,508 controls) in the analyses of age at first FTP.
Taken together, the results presented in this report are not in line with the existence of strong modification of the allelic effects on breast cancer risk by the epidemiological risk factors investigated. However, the results presented in this report contribute to the global body of knowledge on geneenvironment interactions by generating hypotheses, thereby providing guidance for future functional studies and large scale replication studies. staff