Exposures to estrogens from endogenous (lifetime ovulatory cycles, parity, adiposity) and exogenous (oral contraceptives, postmenopausal hormone therapy) sources are well established breast cancer risk factors. Estrogens act as growth factors in estrogen sensitive tissues, such as the breast, and this growth response to estrogens is mediated by estrogen receptors. Estrogen receptors are in the nuclear receptor superfamily of ligand-inducible transcription factors, and can interact directly with DNA, altering the expression of downstream genes.
Two estrogen receptor isoforms (ER-α and ER-β exist) are coded by 2 separate genes, ESR1 on chromosome 6 and ESR2 on chromosome 14. Both proteins are expressed in normal breast luminal epithelial cells, the morphological cell type of most breast tumors.1 Both isoforms can also be expressed in breast tumors. However, somatic loss of expression is associated with tumors whose growth is no longer controlled by steroid hormones. Such tumors are more aggressive and have poorer short-term prognosis.
Studies of associations between polymorphisms in ESR2 and breast cancer risk have been inconclusive. In 2003, Försti et al.2 found no association between ESR2 polymorphisms and breast cancer risk in a small case-control study of 219 breast cancer cases and 248 healthy male controls. In 2004, Gold et al.3 reported on estrogen receptor genotypes and haplotypes, and described that haplotypes of ESR2 may increase breast cancer risk among Ashkenazi Jewish women. In a larger case-control study (723 cases and 480 controls), Maguire et al.4 described an ESR2 haplotype that significantly increased breast cancer risk. In addition to the studies of associations between ESR2 and breast cancer risk, the role of ESR2 variants has also been explored in body weight extremes,5 ovulatory defects and menstrual disorders,6 anorexia nervosa7 and Alzheimer's Disease.8In vitro studies also suggest that ESR2 variation may influence the susceptibility to and development of breast cancer. For example, variant ESR2 mRNA transcripts have been isolated from human breast cancer cell lines9 and tumors.10, 11ESR2 coexpression with ESR1 has been isolated in both normal and malignant breast tissues.12, 13, 14
We hypothesized that inherited polymorphisms in genes related to sex steroid hormone synthesis, metabolism and cell signaling could alter the function of these genes and the proteins they encode, therefore altering breast cancer risk; in this report, we present results for the ESR2. We used a haplotype tagging approach, which aims to capture common variants in the ESR2 gene. Here, we present these haplotypes and describe their association with breast cancer risk in a pooled analysis of nested case-control studies from a large collaboration of prospective studies, the Breast and Prostate Cancer Cohort Consortium (BPC3)15 that includes 5,789 cases of breast cancer and 7,761 controls.
BBD, benign breast disease; BMI, body mass index; BPC3, Breast and Prostate Cancer Cohort Consortium; CI, confidence interval; CPS-II, Cancer Prevention Study II; EPIC, European prospective investigation into cancer and nutrition; ER, estrogen receptor; ESR2, estrogen receptor beta; FTP, full-term pregnancy; HRT, hormone replacement therapy; htSNP, haplotype tagging single nucleotide polymorphism; LD, linkage disequilibrium; MEC, multiethnic cohort; NHS, Nurses' Health Study; OR, odds ratio; PR, progesterone receptor; QC, quality control; SNP, single nucleotide polymorphism; WHS, Women's Health Study.
Material and Methods
The BPC3 has been described in detail elsewhere.15 Briefly, the consortium includes large well-established cohorts assembled in the United States and Europe that have both DNA samples and extensive questionnaire information [the American Cancer Society Cancer Prevention Study II (CPS-II),16 the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort,17 the Harvard Nurses' Health Study (NHS),18 Women's Health Study (WHS)19 and the Hawaii-Los Angeles Multiethnic Cohort (MEC)].20 With the exception of the MEC, most women in these cohorts are Caucasians of the United States and European descent. Cases were identified in each cohort by self-report with subsequent confirmation of the diagnosis from medical records or tumor registries, and/or linkage with population-based tumor registries (method of confirmation varied by cohort). Controls were matched to cases by ethnicity and age, and in some cohorts, additional criteria such as country of residence in EPIC.
Coding regions of ESR2 were sequenced in a panel of 95 (15 from each of the 5 ethnic groups; African American, Latina, Japanese, Native Hawaiian and Caucasian) advanced breast cancer cases from the MEC. All SNPs detected (8 total) in the sequencing scan existed previously in dbSNP or had been reported in the literature.5 Forty SNPs with minor allele frequency >5% over all or >1% in any one ethnic group were selected from this resequencing as well as those available in dbSNP from the nonsequenced areas to be used to select haplotype tagging SNPs. These SNPs were genotyped in a reference panel of 349 healthy women from the MEC populations (including 70 Caucasians) at the Broad Institute (Cambridge, MA) using the Sequenom and Illumina platforms, and 5 htSNPs were selected to ensure a minimum R2H (a measure of how well the SNPs selected describe the haplotypes observed in the screening population) among Caucasians of 0.7 or greater using the method of Stram et al.21 and Thellenberg-Karlsson et al.22 described a polymorphism (rs2987983) in the 5′ region of ESR2, which was associated with prostate cancer risk. This polymorphism failed to genotype in our initial screen; however, using HapMap data (data release 21 July 2006 on NCBI build 35 and dbSNP build 124) we found that this polymorphism is in complete linkage disequilibrium (r2 and D′ = 1.0) with rs3020450, one of the htSNPs we selected.
Genotyping of the 5 htSNPs [rs3020450, rs1256031, rs1256049, rs4986938 (ESR2_G1730A) and rs944459] in the breast cancer cases and controls was performed in 3 laboratories (University of Southern California, Los Angeles, CA; Harvard School of Public Health, Boston, MA and International Agency for Research on Cancer, Lyon, France) using a fluorescent 5′ endonuclease assay and the ABI-PRISM 7900 for sequence detection (Taqman). Initial quality control checks of the SNP assays were performed at the manufacturer (ABI, Foster City, CA); an additional 500 test reactions were run by the BPC3. Assay characteristics for the 5 htSNPs for ESR2 are available on a public website (http://www.uscnorris.com/mecgenetics/CohortGCKView.aspx). Sequence validation for each SNP assay was performed and 100% concordance was observed (http://snp500cancer.nci.nih.gov).23 To assess inter-laboratory variation, each genotyping center ran assays on a designated set of 94 samples from the Coriell Biorepository (Camden, NJ).23 The internal quality of genotype data at each genotyping center was assessed by typing 5–10% blinded samples in duplicate or greater (depending on study). One htSNP (rs944459) tagged a haplotype common only among African Americans, and as such was genotyped but not included in analyses. The remaining 4 htSNPs still tag the known variants of ESR2 with an R2H of 0.70.
We used conditional multivariate logistic regression to estimate odds ratios (ORs) for disease in subjects with a linear (log-odds additive) scoring for 0, 1 or 2 copies of the minor allele of each SNP. We also used conditional logistic regression with additive scoring and the most common haplotype as the referent to estimate haplotype-specific ORs using an expectation–substitution approach to assign expected haplotype counts based on the unphased genotype data and to account for uncertainty in assignment.24, 25 Haplotype frequencies and subject-specific expected haplotype counts were calculated separately for each cohort (and country within EPIC or ethnicity in the MEC). We combined rare haplotypes (those with estimated individual frequencies less than 5% in all cohorts) into a single category with a combined frequency of less than 1.6% of the controls.
To test the global null hypothesis of no association between variation in ESR2 haplotypes and htSNPs and risk of breast cancer (or subtypes defined by receptor status), we used a likelihood ratio test comparing a model with additive effects for each common haplotype (treating the most common haplotype as the referent) to the intercept-only model. In addition, we used permutation testing26 to further evaluate the association between haplotypes and breast cancer risk. About 10,000 permuted data sets were generated by shuffling case–control status within each matched case–control set. Matching schemes and variables varied by cohort, ranging from 1:1 (WHS, CPS-II) to frequency matching (MEC). Associations between each SNP and haplotype were evaluated in each of the 10,000 permutations using the log-additive model. The minimum p-value across all the variants tested (4 SNPs, 6 haplotypes; each modeled independently for 10 tests per permutation) in each permuted data set was compared with the lowest p-value observed in the original data set. The multiple-comparisons-corrected p-value is the number of permutations where the minimum p-value was less than the smallest observed p-value divided by 10,000.
We considered conditional models adjusting for known breast cancer risk factors. The covariates included to account for breast cancer risk factors were age at menarche (≤12 years, 13–14 years, 15+ years), menopausal status (pre, post and unknown), parity [ever/never full-term pregnancy (FTP)], body mass index (BMI in kg/m2 as a continuous variable) and use of postmenopausal hormones (ever/never). Other common risk factors including family history of breast cancer, personal history of benign breast disease and age at menopause were unavailable for large numbers of women, and therefore were not included in the models. We also evaluated these covariates (including those with large proportions of missing data) for possible interaction effects using likelihood ratio testing. Models with the main effect of genotype and the covariate of interest were compared with the models with the main effects of genotype and the covariate of interest, plus a multiplicative interaction term of the 2 variables. Finally, we tested whether the association between ESR2 and breast cancer differed by receptor (ER and PR) status. Power calculations were carried out using the program Quanto.27 The rmeta package in the R environment was used to create Figure 2 to examine heterogeneity across the cohorts.
Figure 1 shows the genomic structure of the region around ESR2, which consists of a single haplotype block. The 4 haplotype tagging SNPs in Caucasians account for 96% of the haplotype diversity at this locus. Using all 5 htSNPs tags common haplotypes among Caucasians with minimum R2H = 0.75, African Americans R2H = 0.58, Japanese R2H = 0.17, Native Hawaiians R2H = 0.23 and Latinas R2H = 0.12. When restricting to the 4 htSNPs that tag the haplotypes among Caucasians, the R2H values are 0.75, 0.22, 0.17, 0.21 and 0.12, respectively. The haplotypes tagged by these 4 SNPs ranged in allelic prevalence from 5 to 46% among the MEC Caucasian samples used for tagSNP selection, and were similar in the case–control analyses (5–45%).
A total of 5,789 cases and 7,761 controls were available for genotyping among cases and controls from the participating cohorts. Samples not yielding a genotype were removed from individual SNP analyses, and samples not yielding a genotype for at least 1 SNP were removed from haplotype analyses. Genotyping concordance was above 99% for between-center QC samples and was greater than 99% for center-specific blinded QC samples. Genotype success rate among cases and controls in all cohorts was above 95%. One polymorphism (rs1256049) deviated from Hardy–Weinberg equilibrium among the controls of the MEC Caucasians (p = 0.016) and EPIC (p = 0.003); however, genotype distributions between all cohorts were similar.
None of the single nucleotide polymorphisms studied showed an association with breast cancer risk (Table I). Tests of heterogeneity of risk estimates between participating cohorts ranged from 0.10 to 0.50 for each single nucleotide polymorphism. The global test for comparison of haplotype frequencies in cases and controls was not highly significant (d.f. = 6, p = 0.04). However, 1 haplotype showed an increase in breast cancer risk (p = 0.0007, OR 1.17, 95% CI 1.07–1.28; Table II). Heterogeneity tests of associations between haplotypes and breast cancer risk between cohorts ranged from 0.10 to 0.65. Figure 2 shows the risk associated with the CCAC haplotype in each cohort. We also used permutation testing to correct for multiple comparisons. Of the 10,000 permutations, only 20 yielded a minimum p-value less than that observed for the most significant haplotype. Therefore, the multiple-comparisons-corrected p-value for this haplotype is 0.002 (from 20/10,000).
Table I. Association Between ESR2 htSNPs and Breast Cancer Risk in the Breast and Prostate Cancer Cohort
Cases and controls of invasive breast cancer from all participating studies, totals are the sum of haplotype scores.
Unadjusted logistic regression conditional on matched case–control sets. Global p-value for association of breast cancer risk with haplotypes = 0.04.
Freq < 5%
Upon stratification by age at diagnosis (<63 or 63+, median age overall = 63 years), the risk associated with this haplotype was restricted to younger women (Table III). No statistically significant interactions (p-interaction < 0.05) between haplotypes and breast cancer risk factors [recent hormone replacement therapy (HRT), ever HRT, age at first FTP, ever FTP, family history of breast cancer, age at menarche, age at menopause, personal history of benign breast disease, menopausal status or BMI (in kg/m2 in 3 categories; <25, 25–29, ≥30)] were observed for this haplotype. No difference in risk was observed upon stratification by estrogen or progesterone receptor status (data not shown). Estrone and estradiol levels were available on postmenopausal cases and controls from EPIC and the NHS, and an interaction between the CCAC haplotype and estrone levels was observed (Table IV, p = 0.03), and similar, though not statistically significant results were observed with estradiol (data not shown).
Table III. Association Between ESR2 Haplotypes and Breast Cancer Risk in the Breast and Prostate Cancer Cohort Consortium, Stratified AT Age 63
Estrone levels below (low) or above (high) the median. Median was determined separately by cohort (EPIC or NHS) among controls only.–
Relative risk and 95% confidence interval from conditional logistic regression.
p-interaction = 0.03
The ESR2 is an obvious candidate gene to harbor allelic variants, which predispose to breast cancer risk along the sex steroid hormone synthesis, metabolism and signaling pathway. However, it is not the only candidate along this pathway, and many other genes are currently under study to examine associations between common variants and breast cancer risk. At present time, no clear consensus in the field has been reached with regards to studying the effect of variants in large numbers of genes simultaneously on disease risk. Therefore, we have chosen to present results from the ESR2 gene independent of other genes.
Given that the global-test for association between ESR2 haplotypes and breast cancer risk was of borderline significance (p = 0.04), with only one (CCAC) of the 6 common haplotypes showing a statistically significant increase in risk (p = 0.0007), we used permutation testing as an additional multiple comparisons correction procedure. After correction for multiple comparisons (at the gene level) using permutation testing, the CCAC haplotype remains nominally statistically significantly associated with breast cancer risk (corrected p-value = 0.002), though not at the stringent threshold (10−4) that has been proposed for candidate gene studies.
The low magnitude of risk limits the power to detect interactions with nongenetic risk factors. Nevertheless, we did find some intriguing results upon stratification by age at diagnosis (Table III) and estrone levels (Table IV). The stratified analyses by age suggest that the CCAC haplotype is a risk factor only in younger women. We have chosen to dichotomize at age 63, because this is the median age at diagnosis across all cohorts, and is similar to the median age at diagnosis in the SEER data (61 years).22 While breast cancer incidence rates increase dramatically after menopause, they continue to increase well into the seventh decade. In fact, risk factors for breast cancer, particularly body mass index, have been shown to vary in their effect on premenopausal or postmenopausal diagnosis of breast cancer. Therefore, the most likely interpretation of the interaction between the CCAC haplotype and age at diagnosis on breast cancer risk is related to overall lifetime risk, as opposed to risk relative to some specific life event, such as menopause. Among women with lower estrone levels, women carrying the CCAC haplotype had a further reduction in breast cancer risk. This could imply that a variant on this haplotype reduces the ability of cells to respond to estrogen signaling by altering the function of the ESR2 gene. These stratified analyses, particularly with respect to estrone levels where the number of samples available leads to very unstable risk estimates (as evidenced by the very wide confidence intervals) must be interpreted very cautiously, however, and further replication is necessary before making definitive conclusions.
Examining the other polymorphisms genotyped in the screen for htSNPs does not yield any a priori candidate causal SNPs (i.e., nonsynonymous or splice site SNPs) on this haplotype. However, a putatively causal polymorphism (either part of the screen or not) could be incompletely tagged by this haplotype either due to incomplete linkage, different allele frequency, or both. Given that no obviously functional polymorphisms have been described on this haplotype, we cannot rule out that the association we observe between the CCAC haplotype and breast cancer risk is due to chance.
The BPC3 was established to overcome the sample-size limitation of many studies that examine genetic variants for association with breast and prostate cancer. Given the sample size in this study (5,789 cases and 7,761 controls), we have >90% power with type I error rate of 10−4 to detect a 0.2 frequency allele with per-allele risk of 1.2. As such, the results we present here confidently exclude common variation of ESR2 from being associated with moderate or greater breast cancer risk. However, one less common variant (the CCAC haplotype, 8% of control chromosomes) is found to be associated with a modest increase in breast cancer risk. Even with the large sample size of the current study, roughly 12,000 cases and controls would be needed for 80% power to detect a similar association (per-allele OR 1.17) at type I error rate of 10−4. For this reason, we should be cautious when interpreting the association between the CCAC haplotype and breast cancer risk. Similarly, the population studied here is predominantly postmenopausal Caucasian women, and the htSNPs selected tag haplotypes most efficiently among Caucasians. Therefore, we cannot make conclusions about the association between variants of ESR2 and breast cancer risk in other populations, nor should these htSNPs be assumed to tag variants in non-Caucasian populations.
In conclusion, we have performed an exhaustive scan of SNPs in the ESR2 gene, selected htSNPs based on this scan, and evaluated the association between these htSNPs and breast cancer risk. One haplotype of ESR2 is significantly associated with a 17% increase in breast cancer risk per copy of the haplotype carried among Caucasian women.
We thank the participants in the component cohort studies and the expert contributions of Hardeep Ranu, Craig Labadie, Lisa Cardinale, Shamika Ketkar, Johannah Butler (Harvard University), Robert Welch, Cynthia Glaser, Laurie Burdett (National Cancer Institute), Loreall Pooler (University of Southern Califonia), Laure Dossus and James McKay (EPIC).
David G. Cox, Philip Bretsky, Peter Kraft and Paul Pharoah made up the writing committee for this work, and were responsible for data analyses, manuscript preparation and editing.
Stephen Chanock, Federico Canzian, Christopher Haiman, Daniel O. Stram and Meredith Yeager provided expertise in genotyping and results analyses as well as manuscript editing.
David Altshuler, Noel Burtt and Joel Hirschhorn carried out the sequencing, dense genotyping and htSNP selection.
Demetrius Albanes, Pilar Amiano, Goran Berglund, Heiner Boeing, Julie Buring, Françoise Clavel-Chapelon, Graham A. Colditz, Heather Spencer Feigelson, Susan E. Hankinson, Robert Hoover, David J. Hunter, Rudolf Kaaks, Laurence Kolonel, Loic LeMarchand, Eiliv Lund, Domenico Palli, Petra Peeters, Malcolm C. Pike, Elio Riboli, Michael Thun, Anne Tjonneland, Ruth C. Travis and Dimitrios Trichopoulos contributed substantially to sample collection and manuscript editing.
Breast and Prostate Cancer Cohort Consortium members: Writing Committee—David G. Cox, Program in Molecular and Genetic Epidemiology, Epidemiology Department, Harvard School of Public Health, Boston, MA; Philip Bretsky, Cedars-Sinai Medical Center, Los Angeles, CA; Peter Kraft, Program in Molecular and Genetic Epidemiology, Epidemiology Department, Harvard School of Public Health, Boston, MA; Paul Pharoah, Strangeways Research Laboratory, Cambridge, United Kingdom.
Additional Contributing Authors—Demetrius Albanes, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD; David Altshuler, Program in Medical and Population Genetics, Broad Institute at Harvard and MIT, Cambridge, MA; Pilar Amiano, Molecular and Nutritional Epidemiology Unit, Scientific Institute of Tuscany, Florence, Italy; Goran Berglund, Department of Medicine, Lund University, Lund, Sweden; Heiner Boeing, Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbruecke, Germany; Julie Buring, Division of Preventive Medicine, Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Boston, MA; Noel Burtt, Program in Medical and Population Genetics, Broad Institute at Harvard and MIT, Cambridge, MA; Eugenia E. Calle, Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA; Federico Canzian, Genomic Epidemiology Group, German Cancer Research Center, Heidelberg, Germany; Stephen Chanock, Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD; Françoise Clavel-Chapelon, INSERM, Institut Gustave Roussy, Villejuif, France; Graham A. Colditz, Department of Medicine, Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, MA; Heather Spencer Feigelson, Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA; Christopher A. Haiman, University of Southern California, Los Angeles, CA; Susan E. Hankinson, Department of Medicine, Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, MA; Joel Hirschhorn, Division of Preventive Medicine, Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Boston, MA; Brian E. Henderson, University of Southern California, Los Angeles, CA; Robert Hoover, Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD; David J. Hunter, Program in Molecular and Genetic Epidemiology, Epidemiology Department, Harvard School of Public Health, Boston, MA; Rudolf Kaaks, Division of Cancer Epidemiology, German National Cancer Center (DKFZ), Heidelberg, Germany; Laurence Kolonel, Loic LeMarchand, Epidemiology Program, Cancer Research Center, University of Hawaii, Honolulu, HI; Eiliv Lund, Institute of Community Medicine, University of Tromsø, Tromsø, Norway; Domenico Palli, Molecular and Nutritional Epidemiology Unit, Scientific Institute of Tuscany, Florence, Italy; Petra H.M. Peeters, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands; Malcolm C. Pike, University of Southern California, Los Angeles, CA; Elio Riboli, Imperial College, London, United Kingdom; Daniel O. Stram, University of Southern California, Los Angeles, CA; Michael Thun, Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA; Anne Tjonneland, Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark; Ruth C. Travis, Cancer Research UK Epidemiology Unit, University of Oxford, Richard Doll Building, Oxford, United Kingdom; Dimitrios Trichopoulos, Department of Hygiene and Epidemiology, School of Medicine, University of Athens, Athens, Greece; Meredith Yeager, Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD.