Two colorectal cancer (CRC) susceptibility loci have been found to be significantly associated with an increased risk of CRC in Dutch Lynch syndrome (LS) patients. Recently, in a combined study of Australian and Polish LS patients, only MLH1 mutation carriers were found to be at increased risk of disease. A combined analysis of the three data-sets was performed to better define this association. This cohort-study includes three sample populations combined totaling 1,352 individuals from 424 families with a molecular diagnosis of LS. Seven SNPs, from six different CRC susceptibility loci, were genotyped by both research groups and the data analyzed collectively. We identified associations at two of the six CRC susceptibility loci in MLH1 mutation carriers from the combined LS cohort: 11q23.1 (rs3802842, HR = 2.68, p ≤ 0.0001) increasing risk of CRC, and rs3802842 in a pair-wise combination with 8q23.3 (rs16892766) affecting age of diagnosis of CRC (log-rank test; p ≤ 0.0001). A significant difference in the age of diagnosis of CRC of 28 years was observed in individuals carrying three risk alleles compared to those with 0 risk alleles for the pair-wise SNP combination. A trend (due to significance threshold of p ≤ 0.0010) was observed in MLH1 mutation carriers towards an increased risk of CRC for the pair-wise combination (p = 0.002). This study confirms the role of modifier loci in LS. We consider that LS patients with MLH1 mutations would greatly benefit from additional genotyping of SNPs rs3802842 and rs16892766 for personalized risk assessment and a tailored surveillance program.
Lynch syndrome (LS) is an autosomal dominantly inherited cancer syndrome characterized by early-onset epithelial cancers. Patients with germline DNA mutations in MMR genes; MLH1 (MIM 120436), MSH2 (MIM 609309), MSH6 (MIM600678) and PMS2 (MIM600259) or in a gene causing methylation and inactivation of MSH2; EPCAM (MIM185535) are defined as having LS.1–5 Most recent disease penetrance estimation in LS suggest by age 70 years, 53% of men and 33% of women will develop CRC and 44% of women will develop endometrial cancer (EC).6,7 Individuals diagnosed with LS are also at greater risk of developing epithelial malignancies in a variety of other organs.8,9
Mutations in DNA mismatch repair (MMR) genes are not generally considered to display any genotype–phenotype correlation, even though there are some disease characteristics that correlate with respective mutant MMR gene i.e., more endometrial cancer observed in MSH6 mutation carriers and more extracolonic malignancies observed in MSH2 mutation carriers compared to MLH1 mutation carriers.10,11 Most patients are not categorized into phenotype groups based on either respective gene or site of mutations, but younger onset of CRC is observed in MLH1 mutation carriers.11 For more than a decade, the search for modifier genes in patients with LS has been undertaken but definitive results have been elusive,12–15 which is probably a result of small sample sizes in earlier reports.
Recent genome-wide association studies (GWASs) clearly demonstrate that common genetic variation contributes to the risk of developing CRC and, to date several low-penetrant CRC susceptibility loci have been identified.16–20 In 2009 two of these CRC loci; 8q23.3 (rs16892766) and 11q23.1 (rs3802842) were associated with an increased risk of developing CRC in Dutch LS patients,21 which was confirmed in a combined Australian/Polish LS cohort but only in MLH1 mutation carriers.22 Even though 11q23.1, plus 15q13 and 20p12 loci have been shown to be differentially associated with early onset CRC cancer dependent on a family history of CRC,23 the association with 8q23.3 and 11q23.1 could not be replicated in a French LS cohort.24 In fact, a decrease of CRC risk with 8q23.3 CC genotype was observed – but only two individual harbored the CC genotype on locus 8q23.3.24
In this report a combined analysis of the data sets from the studies by Wijnen et al.21 and Talseth-Palmer et al.22 has been undertaken confirming the role of modifier loci in LS. Combining the two studies, which comprises the largest cohort in the search for modifier genes in Lynch syndrome to date, reveals significant associations that can provide the basis for more specific risk assessment in LS.
CRC: colorectal cancer; EC: endometrial cancer; GWAS: genome-wide association study; HWE: Hardy–Weinberg equilibrium; LS: Lynch syndrome; MMR: mismatch repair; OR: odds ratio; ORF: open reading frame; SNP: single nucleotide polymorphism:
Material and Methods
The study complies with the ethical considerations from Hunter Area Research Ethics Committee (Australia), University of Newcastle Human Research Ethics Committee (Australia), the Ethics Committees of the Pomeranian Academy of Medicine (Poland) and Leiden University Medical Center (the Netherlands). Written, informed consent was obtained from all participants. The sample cohort consists of 1,352 LS patients (all mutation positive) from 424 families, representing the largest LS cohort examined for modifier genes to date. Information about individual sample cohorts has been previously described.21,22
Wijnen et al.21 genotyped six single nucleotide polymorphisms (SNPs) from six different CRC susceptibility loci, whereas Talseth-Palmer et al.22 genotyped nine SNPs from the same six loci (both from DNA extracted from whole-blood). Following publication by Wijnen et al.,21 additional SNPs have been genotyped in the Dutch cohort. All genotyping for both studies was performed in 2008 and were therefore limited with respect to number of loci studied. Two SNPs were chosen from the 18q21 region as common SNPs in SMAD7 (MIM 602932) had been well documented to influence CRC risk.18,19 SNPs included in the current analysis were rs16892766 (8q23.3), rs6983267 (8q24.1), rs10795668 (10p14), rs3802842 (11q23.1), rs4779584 (15q13.3), rs4464148 (18q21) and rs4939827 (18q21.1).
A Pearson's Chi-square test was used to evaluate deviation from the expected Hardy–Weinberg equilibrium (HWE). Risk of CRC association with each SNP was estimated by heterozygous and homozygous odds ratio (OR) using simple logistic regression and multiple logistic regressions to adjust for gene, gender and country of sample origin. Kaplan–Meier estimator analysis was used to test association between age of diagnosis of CRC and genotypes. Age of diagnosis of CRC, endometrial cancer or extracolonic cancers was used as estimate functions, while age at last follow up was used for cancer free individuals. Wilcoxon's (Breslow), Log-rank and Tarone–Ware tests were used to examine homogeneity of the Kaplan–Meier plots. All three tests were required to be significant for results to be considered reliable but only log-rank test is reported. Association between genotype and risk of CRC was further evaluated using Cox proportional hazard models. Clustering of samples within families was adjusted for by including family as a frailty term in the model. If there was only one member of a family, patients were grouped together as one “cluster.” Simple and adjusted (for gene, gender and country of sample origin) analysis was performed. Additive effect of SNPs was tested by adding all SNPs into the model as separate covariates. If a significant association was observed then another Cox proportional hazard model was used with a variable indicating the numbers of deleterious alleles calculated by counting two for a homozygote and one for a heterozygote (as previously reported25).
Supporting Information Table 1 (Supporting Information file 1) present clinical and molecular characteristics of the sample cohort, while Supporting Information Table 2 (Supporting Information file 1) displays characteristics for individuals used in Kaplan–Meier estimate analysis/Cox proportional hazard regression for each sample group.
Alpha level of all tests was set at p < 0.05. Bonferoni correction for multiple testing was applied as we performed six tests in the Cox proportional hazard regression model for people with CRC as their 1st primary tumor vs. tumor free individuals and tested 7 SNPs (6 × 7 = 42 tests). Corrected significance threshold is: p < 0.05/42 = 0.0012. Any results with a p-value < 0.0012 is considered significant. All statistical analysis was performed with STATA program, Version 10 (StataCorp, College Station, TX).
All seven SNPs were in Hardy–Weinberg equilibrium (HWE) in the combined sample cohort. Demographics of the genotype data and logistic regression results can be seen in Supporting Information Table 3 (Supporting Information file 1), no increased risk of CRC were observed.
To verify that the sample cohort is representative of what is expected in LS, Kaplan–Meier estimate analysis was performed with mutated gene as estimate of analysis. Median age of diagnosis of CRC (where 50% of the population is cancer free) for MLH1 and MSH2 mutation carriers are similar (53 and 52 years, respectively) but different from MSH6 mutation carriers (72 years); Log-rank p ≤ 0.0001, see Figure 1. MSH6 mutation carriers are at decreased risk of CRC compared to MLH1 mutation carriers (HR = 0.32, 95% CI = 0.20–0.49, p ≤ 0.0001) and MSH2 mutation carriers.
Further, Kaplan–Meier estimate analysis shows there is no evidence for an association for any of the seven SNPs when endometrial cancer was used as endpoint for analysis (see Supporting Information Table 4 in Supporting Information file 1) and only one SNP, rs3802842, displayed a trend when the total sample cohort was analyzed with CRC as endpoint of analysis using Cox proportional hazard regression (see Table 1). We also analyzed all the seven SNPs with any extra colonic cancer versus cancer free individuals, no significant findings were observed (data not shown).
Table 1. Regression analysis CRC
Subgroups of the sample cohort [MLH1, MSH2, females and males (MSH6 excluded due to low sample size)] were analyzed due to the trend observed for SNP rs3802842 in the adjusted analysis (see Table 1). Even though there is no significant difference in the subject group (Log-rank, p = 0.0960, see Supporting Information Fig. 1), a significant difference between age of diagnosis of CRC and genotypes can be seen in MLH1 mutation carriers for SNP rs3802842 (see Fig. 2) but not in MSH2 mutation carriers (see Fig. 3). MLH1 individuals carrying the CC (variant) genotype develop CRC on average 11 years earlier than individuals with the AA (wildtype) genotype (Log-rank p = 0.0003) and are also at higher risk of developing CRC (HR = 2.68, 95% CI = 1.56–4.63, p ≤ 0.0001, see Table 2). In addition, a trend of higher risk of CRC for SNP rs3802842 is observed in all females (HR = 1.92, 95% CI = 1.15–3.21, p = 0.013), also with an average of 11 years difference between the AA genotype and the CC genotype (Log-rank p = 0.0085). No significant difference was observed in MSH2 mutation carriers (Log-rank, p = 0.5177) or males (Log-rank, p = 0.8217).
Table 2. Regression analysis CRC, subgroups
All seven SNPs were added into the Cox proportional hazard regression model in the combined sample cohort and in subgroups; MLH1 and MSH2 mutation carriers, females and males to test the additive effect of investigated SNPs. No significant results were observed other than the beforehand observed rs3802842 association. To investigate additive effects of each SNP with rs3802842, each SNP was included in the model as a separate covariate in MLH1 mutation carriers. Again, only rs3802842 was statistically significant.
To investigate whether a cluster of risk alleles increases the risk of CRC, a pair-wise combination of SNP rs3802842 and the other six SNPs was analyzed in the Cox proportional hazard regression model in MLH1 mutation carriers. A pair-wise combination of SNPs rs3802842 and rs16892766 revealed a significant difference in median age of diagnosis of CRC with patients carrying three risk alleles (three variant alleles in two genotypes) being 28 years younger compared to individuals carrying no risk alleles (zero variant alleles in two genotypes); Log-rank p ≤ 0.0001, see Figure 4. Median age of carrying two and one risk alleles was 49 and 53 years, respectively. No individuals carry four risk alleles for this combination. MLH1 mutation positive patients carrying three risk alleles are also at somewhat increased risk of CRC compared to those with no risk alleles but did not reach the significance threshold used in this study (HR = 4.86, 95% CI = 1.80–13.11, p = 0.002). The outcome of carrying two risk alleles (HR = 1.77, 95% CI = 1.10–2.86, p = 0.019) and one risk allele (HR = 1.09, 95% CI = 0.78–1.53, p = 0.613) is not significant. Specific genotypes for the pair-wise combination of SNPs rs3802842 and rs16892766 for the seven MLH1 individuals carrying three risk alleles are displayed in Supporting Information Table 5 (Supporting Information file 1). Four MSH2 and eight MSH6 mutation carriers also carried three risk alleles for this SNP combination but results were not significant (seven of the MSH6 mutation carriers, aged 39–55, where not diagnosed with CRC). SNP combination (rs3802842 + rs16892766) was not used with extra colonic cancer as endpoint of analysis as only one out of twenty-one MLH1 carriers had extra colonic cancer as their first primary tumor.
The search for modifying genes/loci that influence disease expression in LS patients is beginning to reveal robust associations. In the current study we provide strong evidence that genetic variation defined by rs3802842 (11q23.1) and rs16892766 (8q23.3) affect the risk of developing CRC by combining datasets from two large independent LS studies.21,22 Current CRC screening recommendations for LS patients include colonoscopy every 1–2 years beginning at age 20–25 years.26,27 A recently reported case study suggests optimal colonoscopy interval lies closer to 1 year in individuals with germline MSH2 mutations,28 confirming that personalizing surveillance programs can decrease disease morbidity. In the current study, our results suggests that by genotyping MLH1 mutation carriers to identify individuals harboring three risk alleles for SNPs rs3802842 and rs16892766 the probability of developing CRC would decrease as a result of undergoing annual colonoscopy from the age of 20 (or earlier if anyone in the family has been diagnosed with CRC before the age of 25).
SNP rs3802842 is located in a gene-rich region containing four open reading frames (ORFs).19 The risk allele of SNP rs3802842 is not likely to be somatically selected for in the development of CRC,19,29 suggesting that the germline genotype at this locus is important for the patients risk of developing CRC. SNP rs3802842 has also been shown to be associated with early onset of CRC depending on having a family history of CRC/tumors of LS spectrum.23 The association observed in MLH1 mutation carriers in the current study for SNP rs3802842, suggests that the variant genotype (CC) of this SNP is needed for these patients to be at increased risk of CRC. Pittman et al.25 excluded a coding change in all four ORFs in the region as the basis of the association and suggest that underlying sequence change defined by SNP rs3802842 might exert regulatory effects on genes mapping outside 11q23.1.
Previous studies demonstrate that individuals carrying the risk allele of SNP rs16892766 present with more advanced tumors at diagnosis, and that the risk allele is associated with CRC in younger patients.20,30 This corresponds to the much lower age of diagnosis of CRC observed in the current study when this SNP is in a pair-wise combination with SNP rs3802842. SNP rs16892766 maps to the gene EIF3H (MIM 603912), which is involved in regulation of gene expression and deregulation of the gene can lead to altered cell growth and cancer.31 Another SNP, rs16888589, has been identified as a genetic variant for CRC at 8q23.3 through in vitro experiments, as no association was found between rs16892766 and mRNA expression in colorectal adenomas or carcinomas.32 More recent fine-mapping of 8q23.3 suggest that UTP23 (Gene ID 84294), rather than EIF3H, is the target of genetic variation associated with CRC and it is possible that both genes play a role in CRC development, given they have related roles in mRNA translation.33
Complex traits are thought to involve multiple genetic effects and there is a notion that additive components are expected to account for most of genetic variance observed in complex traits.34 Multilocus interaction will be less accurately assessed compared to any main genetic effect,34,35 as they explain a lower proportion of genetic variance. Previously, a combination of 6–10 CRC loci has been shown to be associated with CRC risk29,36 but this was not confirmed in a CRC study of genotype–phenotype correlations.30 When all seven SNPs were combined in the current study, no additive effect was observed other than for the pair-wise SNP combination described herein. Only SNP rs3802842 is significant as a single SNP and risk of developing CRC almost doubles with addition of SNP rs16892766 risk alleles and improves significance in the difference observed in age of diagnosis, suggesting that SNP rs16892766 subtly contributes to increased risk of CRC as there is no significant effect on its own. This observation is likely to be due to low allele frequency of the minor allele. Single SNP association testing may result in a reduced power to identify variants with small effect sizes,35 and combination of polymorphisms may carry information about phenotype that cannot be discovered from observation of individual SNPs alone.37 In complex traits, identified variants often account only for a minor proportion of the estimated heritability.38 Knowing how individual genetic variation combines to produce phenotypic change could advance applications such as personalized medicine. At present it is not known why 11q23.1 and 8q23.3 in a pair-wise combination increases risk for CRC in LS patients harboring MLH1 mutations, as both of the CRC loci might exert a regulatory effect on genes mapping outside their loci.
There is evidence indicating there are different functional domains in MLH1 and MSH2 that could ostensibly result in subtle differences between the two proteins that are influenced by independent genetic factors.39 The data presented in this report confirms that even though MLH1 and MSH2 mutation carriers starts off with the same risk of CRC (see Fig. 1), addition of other genetic factors (rs3802842 and rs16892766) alters risk of developing CRC significantly for MLH1 mutation carriers but not for MSH2 mutation carriers (see Figs. 2–4). The decreased risk of developing CRC in MSH6 mutation carriers compared to MLH1 and MSH2 mutation carriers observed in the current study has also previously been observed,10,40,41 thereby demonstrating the sample cohort is representative of what is expected in LS.
In the current study, similar to the results in the Dutch study a trend can be observed for SNP rs3802842 in all females independent of which MMR gene is mutated, which is most likely due to excess of female MLH1 mutation carriers (47% of all females—of which 54% have developed CRC) compared to female MSH2 and MSH6 mutation carriers (35 and 14%—of which 35 and 11% have developed CRC, respectively). Also, there is no evidence supporting an association between rs3802842 genotype and CRC risk being modified by gender in a large CRC study.25
It has been speculated that SNP rs3802842 may influence the development of other types of cancers.42 We therefore tested the association of all SNPs (alone or in combination) with endometrial cancer/extra colonic cancer, but no association was observed. This confirms the role of rs3802842 and rs16892766 in CRC development, supporting the findings of Niittymäki et al.42
Possible biases in the current study include confounding factors such as smoking, life style and other environmental factors influencing the reported results but should not be different between MLH1 and MSH2 mutation carriers as the sample number in each group is sufficient to counteract these problems. It could also be argued that there are only seven individuals with three risk alleles for the significant SNP combination and that the sample number is too small to provide reliable results but of these seven, five (71%) had developed CRC at a very young ages, while only 32–35% of individuals with 0–2 risk alleles developed CRC and at much older ages.
Our results are not consistent with the results of Houlle et al.,24 but if epistatic effects are real, gene substitution effects may vary widely between populations which differ in allele frequency, so what is observed as a significant effect in one population may not replicate in others.34 This does not explain the differences observed between our study and the French study,24 as the minor allele frequencies from the sample populations presented in the current report; 8.6% for rs16892766 and 25.6% for rs3802845 in the combined cohort (7.5 and 27.8% in Australian, 8.2 and 24.4% in Polish and 9.5 and 25% in the Dutch, respectively) is not different from the French study or previously reported allele frequencies in Europeans.19,20 As pointed out by the authors of the French study, their sample size giving individuals with a CC genotype for SNP rs16892766 a decreased risk of CRC is very small, and if correction for multiple testing was applied this observation would no longer be significant. Our combined sample cohort was not significant for any of the SNPs when analyzed with Cox proportional hazard regression, even when stratified for gene. Only when MLH1 mutation carriers were analyzed as a subgroup the significant results are observed. It is uncertain whether all analysis was performed on MLH1 mutation carriers as a subgroup in the French study and it is therefore uncertain whether ethnic group-specific effects exist. But we know that the results presented in this report are not easily transferred to individuals of non-European descent as the C allele of SNP rs16892766 is very rare in Asian individuals.43 A study of the two CRC susceptibility loci in an Asian LS population would be very beneficial.
The importance of the results identified in this report lie in the utilization of the information for the management of LS patients diagnosed with a germline mutation in MLH1. Tailoring surveillance options based on the genetic background of the patient should allow for better outcomes in terms of patient uptake and reduce some of the pressure on stretched health care services. Future directions of the study includes analyzing a much larger sample cohort with more markers in the respective chromosomal regions and/or exome sequencing of chromosome 11q23.1 in a higher number of patients so that the observed association can become more detailed.
In conclusion, the association between the two low-penetrance CRC susceptibility loci (8q23.3 and 11q23.1) and the increased risk of developing CRC in individuals diagnosed with LS provides confirmation of the role of modifier genes in LS. We consider LS patients with MLH1 mutations would greatly benefit from additional genotyping of SNPs rs3802842 and rs16892766. On the background of the additional genotyping results, a personalized risk assessment and surveillance program could be offered to patients harboring three risk alleles for these two SNPs as they are at increased risk of CRC and therefore likely to develop their CRCs at much younger ages than the average age of disease onset.
The authors thank the participants for contributing to this study. The Dutch Cancer Genetics Group is represented by: Dr Anja Wagner, Dr Rolf H. Sijmons, Dr Cora M. Aalfs, Dr Irma Kluijt, Dr Nicoline Hoogerbrugge, Dr Encarna Gomez Garcia, Dr Fred H. Menko, Dr Tom G. W. Letteboer and Dr Fredrik J. Hes.