Targeted next‐generation sequencing of 22 mismatch repair genes identifies Lynch syndrome families

Abstract Causative germline mutations in mismatch repair (MMR) genes can only be identified in ~50% of families with a clinical diagnosis of the inherited colorectal cancer (CRC) syndrome hereditary nonpolyposis colorectal cancer (HNPCC)/Lynch syndrome (LS). Identification of these patients are critical as they are at substantially increased risk of developing multiple primary tumors, mainly colorectal and endometrial cancer (EC), occurring at a young age. This demonstrates the need to develop new and/or more thorough mutation detection approaches. Next‐generation sequencing (NGS) was used to screen 22 genes involved in the DNA MMR pathway in constitutional DNA from 14 HNPCC and 12 sporadic EC patients, plus 2 positive controls. Several softwares were used for analysis and functional annotation. We identified 5 exonic indel variants, 42 exonic nonsynonymous single‐nucleotide variants (SNVs) and 1 intronic variant of significance. Three of these variants were class 5 (pathogenic) or class 4 (likely pathogenic), 5 were class 3 (uncertain clinical relevance) and 40 were classified as variants of unknown clinical significance. In conclusion, we have identified two LS families from the sporadic EC patients, one without a family history of cancer, supporting the notion for universal MMR screening of EC patients. In addition, we have detected three novel class 3 variants in EC cases. We have, in addition discovered a polygenic interaction which is the most likely cause of cancer development in a HNPCC patient that could explain previous inconsistent results reported on an intronic EXO1 variant.


Introduction
Surveillance programs for patients with an inherited predisposition to colorectal cancer have proven efficacy in the reduction of morbidity and mortality by up to 65% [1]. Colorectal cancer (CRC) is a heterogeneous disease and one of the most common cancers worldwide [2]. CRC can be categorized into two groups; one associated with chromosomal instability and the other with microsatellite instability (MSI) [3]. An inherited form of the latter, called hereditary nonpolyposis colorectal cancer (HNPCC)/Lynch syndrome (LS), is associated with the ORIGINAL RESEARCH Targeted next-generation sequencing of 22 mismatch repair genes identifies Lynch syndrome families inactivation of genes involved in DNA mismatch repair (MMR). MMR deficiency has been observed in 15-17% of all primary CRC [4,5], 30% of endometrial cancer (EC) [6], and approximately 10% of ovarian tumors [7].
The identification of germline mutations in families with LS accounts for only ~50% of all families that fulfil the Amsterdam criteria [8]. Patients with germline DNA MMR mutations in MLH1, MSH2, MSH6 and PMS2 or mutations in EPCAM (leading to impaired DNA repair through epigenetic silencing of MSH2) are defined as having LS [9][10][11], whereas the mutation negative patients are referred to as belonging to the entity known as HNPCC and only have a clinical diagnosis of the disease according to the Amsterdam criteria. On top of the high risk of CRC and EC, patients are also at greater risk of developing other epithelial malignancies [12][13][14]. The primary function of MMR genes is to eliminate base-base mismatches and insertion-deletion loops which arise as a consequence of DNA polymerase slippage during DNA replication [15]. MMR confers several genetic stabilization functions: It corrects DNA biosynthesis errors, ensures the fidelity of genetic recombination and participates in the earliest steps of checkpoint and apoptotic responses [16,17]. The absence of an effective MMR pathway is presumed to lead to the accumulation of mutations and an increased risk of disease.
EC is the most common gynecological malignancy in Western countries [18], yet the genetic basis of the disease is poorly understood. The disease has a strong association with obesete, the higher the BMI the higher the risk of EC [19]. MSI has been reported in 22-45% of sporadic EC [20][21][22]. The rate of MSI tumors reported in EC is much higher compared to other cancers, illustrating that abnormalities in the DNA MMR pathway appear to play a central role in EC development [23]. But MSI analysis, together with the clinical diagnostic criteria, is thought to have limited value in screening for LS-associated EC as only a small portion of patients with MSI tumors have mutations in MMR genes and about 15% of EC cases that did not fulfil the clinical diagnostic criteria of the disease had MMR gene mutations [24]. Two of the largest studies to date have found that 1.8-2.1% of EC cases have LS [25,26]. Taken together with the fact that only half of patients with a clinical diagnosis of HNPCC have mutations in MMR genes, this shows that our efforts should be focused on the development of new and/or more thorough mutation detection approaches.
In this study we used targeted next-generation sequencing (NGS) to examine 22 genes involved in the DNA MMR pathway in constitutional DNA from 14 HNPCC patients and 12 EC patients, plus 2 positive controls and a replicate.

Materials and Methods
The study complies with the requirements of the Hunter New England Human Research Ethics Committee and the University of Newcastle Human Research Ethics Committee, Newcastle, NSW, Australia. Written informed consent was obtained from all participants.

Participants
HNPCC probands referred to Hunter Area Pathology Service (HAPS, Pathology North) for genetic testing between the years 1997 and 2010 were used in this study. Fourteen unrelated HNPCC probands screened for mutations in MLH1, MSH2, MSH6 and/or PMS2 using a combination of DNA sequencing and multiplex ligationdependent probe amplification (MLPA) assays who were found to be mutation negative where included in the study. All patients had a diagnosis of CRC and conformed to the Amsterdam II criteria. Immunohistochemistry (IHC) and/or MSI results indicated a loss of expression of one or more of the four MMR genes (MLH1, MSH2, MSH6 and PMS2) where available for seven of these patients. Two mutation-positive samples and a replicate were included as internal controls.
DNA was also included from patients that were confirmed histologically as EC patients derived from the Hunter Centre for Gynaecological Cancer, John Hunter Hospital between the years of 2005 and 2006. A total of 12 EC patients were included in the study which comprised six patients with an additional diagnosis of CRC, three patients with colorectal adenomas, one patient with a renal cancer, one with breast cancer and one with a family history of CRC. Bioinformatics NGSANE [27] v0.4.0.2 was used to process the raw sequence files. Briefly, the pipeline maps the fastq files using burrowswheeler aligner (BWA) [28] to the human reference genome (b37, as obtained from GATK reference bundle), with subsequent score recalibration and realignment, using GATK [29] v2.8-1 with dbSNP v135 as guide. Variants were called over all alignment files simultaneously using GATK and dbSNP, HapMap v3.3 and the 1000 genomes project variant information was used as resource for the variant recalibration. Larger structural variants were called with Pindel V0.2.5 [30]. Quality control and reporting was done using NGSANE.

Results
The demographic characteristics of HNPCC and EC patients in the current study can be seen in Tables 1 and 2, Six of the EC cases were obese, three overweight, one was normal and two were of unknown BMI status, see Table 2. All the HNPCC cases fulfilled the Amsterdam II criteria, while family history of cancer was observed in 7/12 EC patients.
For all samples, at least 97.2% of reads mapped to the reference genome. All samples generated a mean reading depth (coverage) of at least 110×. In the 29 samples we detected; 897 indels (insertion/deletions) and 3900 SNVs using GATK, in addition we detected 3830 structural variants using Pindel. From the annotation we identified 5 exonic variants, 42 nonsynonymous SNVs and one intronic variant of significance. The variants of significance are listed in three tables according to variant classification; deleterious or probably deleterious variants -class 5/4 ( Table 3), variants of uncertain clinical relevance -class 3 ( Table 4) and probably benign variants or polymorphisms -variants of unknown clinical significance (Table 5).

Exonic variants and clinical interpretation
We identified five exonic insertion/deletion (indel) variants, which included the pathogenic class 5 MSH2 c.187_188insGG (LS #17) variant previously identified with Sanger sequencing in the LS sample used as a positive control (see Table 3 for details). In addition, a class 5 MSH6 exon 3 deletion (c.458_627del, EC #20) identified by Pindel, was detected in one EC case which we validated using MLPA (see Table 3 for details). This frameshift deletion was identified in a patient that was diagnosed  with EC at the age of 60 years, who also had a diagnosis of CRC but no family history of disease. Both variants, MSH2 c.187_188insGG and MSH6 c.458_627del, are frameshift mutations disrupting the normal reading frame and the cause of cancer development in these patients.
Three MSH3 nonframeshift deletions were also detected in exon 1 using GATK in up to 11 cases (both HNPCC and EC patients, see Table 5 for details). One of the deletions had been reported previously and is listed in the SNP database used by ANNOVAR. All three variants are classified as polymorphisms or probably benign variants as they are not disrupting the reading frame. Ten patients had all three variants, while one patient had two of the three variants.

Nonsynonymous single-nucleotide variants (SNVs) and clinical interpretation
We detected 42 nonsynonymous SNVs in thirteen different genes (listed in Tables 3, 4, and 5 according to variant classification). One class 4 SNV was predicted to affect splicing, MLH1 c.116G>A (EC #21) was found to be likely pathogenic (Table 3) and was validated using Sanger sequencing. This variant was identified in one EC patient who was diagnosed at the age of 57 years, who also had a diagnosis of bowel polyps. The patient has two sisters with both EC and CRC diagnosis.
Four novel SNVs; LIG1 c.980C>T (EC #24), POLD2 c.203G>T (HNPCC #18), RPA1 c.1160G>A (EC #28) and RPA2 c.731A>G (EC #25) were classified as variants of uncertain significance (class 3) and are listed in Table 4. All four SNVs were found in only one sample and had at least one annotation score indicating that the variant is deleterious or disease causing.
In Table 5 we have listed the variants classified as polymorphisms or probably benign variants; 37 of these are SNVs. Twelve of these were detected in a number of samples, including the LS case used as a positive control. Nine of the SNVs were seen in only one sample. The minor allele frequency (MAF, from 1000 genomes project) in these nine SNVs is very low or no frequency data is reported.

Intronic variants and clinical interpretation
The average number of intronic variants identified was ~2000-2500 per sample across the 22 genes. One of these, an intronic variant in EXO1, was identified by ANNOVAR to affect splicing and the consequence of this change is not predictable, but a skip of exon 15 is very likely (see Table 4). The EXO1 c.2212-1G>C (HNPCC #18) was identified in one HNPCC patient diagnosed with CRC at the age of 68 years. IHC from the patients tumor showed lack of MLH1 expression and a common polymorphisms in MLH1 was identified in this patient with Sanger sequencing during mutation screening (MLH1 c.655A>G, p.Ile219Val -classified as class 1 in LOVD). The patient has a family history of CRC, uterine cancer, and melanoma.

Internal control sample (replicate sample)
We chose one sample to act as an internal control sample (replicated -sample #11 and #31). As seen in Table 1 the patient carries two common polymorphisms in MSH2 and MSH6 in addition to an exon deletion in PMS2, all previously detected with Sanger sequencing. The two common polymorphisms were detected in both samples and 95.99% of the genotypes (variants and reference-conform calls) between the two samples are the identical. The PMS2 exon deletion was not detected with the computational tools used in this study. But after further investigation, the technology (NimbleGen human custom array) shows a dip in coverage in this genomic region for sample #11 and 31, which may be indicative of a heterozygous deletion (see Fig. 1).

Discussion
From the 26 HNPCC and EC patients screened with targeted NGS, we have identified two exonic variants that are consistent with a diagnosis of LS in two EC patients (one class 5; MSH6 c.458_627del and one class 4 variant; MLH1 c.116G>A). The MSH6 exon 3 deletion has previously been reported and is predicted to change the function of the gene [33]. The SNV in MLH1 is considered to affect splicing and has also previously been reported [34], and is listed multiple times in Leiden Open Variation Database (LOVD). The patients carrying the MSH6 and MLH1 variants had EC diagnosed years later than the average age of cancer development in LS patients (60 and 57 years of age, respectively) and neither of the two were carriers of any additional variants that was considered to be of significance. There was no family history of cancer reported in the patient harboring the MSH6 deletion, while the patients carrying the MLH1 variant has a sister diagnosed with EC and CRC. Both patients had high BMI (see Table 2) placing them in the obese category, which is atypical in LS.
In one HNPCC patient sample (#18, CRC at 68 years of age) that showed an absence of MLH1 expression by IHC, an intronic variant in EXO1 (intron 14) was detected which was predicted to affect splicing. The same variant was first reported by Wu et al. [35]. It was dismissed as a mutation as it was later detected in three Dutch controls (3/704) [36] and therefore, it seems unlikely to represent B. A. Talseth-Palmer et al. Targeted NGS Detects Lynch Syndrome Families a disease-causing mutation. In this context, it should be noted that there are two enzymatically active alternate splice forms of EXO1, one of which contains exon 14 and the other is truncated after exon 13 [37]. The two forms are a result from alternative RNA splicing and the predicted proteins differ at a small region of the COOH terminus of each protein [37]. Interestingly, the region of EXO1 that interacts with MSH2 is located in the COOH terminus [38]. Human EXO1 is a 5′ to 3′ exonuclease that directly interacts with MSH2, MSH3 and MLH1, and is thought to stabilize higher order complexes of MMR proteins [39][40][41].. The gene participates in DNA MMR and possibly the DNA recombination function of MLH1 [39]. This could explain the observed absence of MLH1 expression assessed by IHC in this patient. But methylation of promoter region of MLH1 cannot be ruled out as an explanation for the absence of MLH1 expression [42] even though this is most likely not the case for this family that has reported a strong history of CRC (methylation of MLH1 is usually seen in sporadic CRC cases). Mutations in EXO1 have also been associated with late onset CRC or atypical HNPCC and a weak mutator phenotype but when combined with additional weak mutator alleles can increase genetic instability [43], as observed in the patient in the current study.
EXO1 appears to act as a modifier genes rather than a highly penetrant germline mutation [35]. The results from this study supports this hypothesis as the patient harboring the EXO1 intronic variant also harbored additional variants, not observed in any of our other patients, that included one class 3 variants (POLD2, listed in Table 4) and two class 1 or 2 variants (MSH3 and POLD1, listed in Table 5). The patient developed CRC at the age of 68 and had a strong family history of CRC with later age onset, supporting the notion that changes in EXO1 are not associated with younger ages of disease onset. These findings suggest that the patient may be harboring multiple low penetrance variants that potentially increase cancer risk at later ages of onset. Only one other patient (HNPCC #6) harbored multiple variants; two class 1/2 variants (MLH1 c.2101C>A and POLD1 c.56G>A).
Interestingly, this patient also shows a lack of MLH1 expression as judged by IHC.
Three novel class 3 variants, LIG1 c.980C>T, RPA1 c.1160G>A and RPA2 c.731A>G, were identified in individual EC patients that were not observed in any other subjects. These genes play multiple roles in human MMR and the variants will require further investigation to determine if they play a role in disease development. Four other patients each carried an SNV classified as class 1 or 2, two in MLH3 and two in PMS1. Both MLH3 variants were deemed to be tolerated and were classified as common polymorphisms with reported minor allele frequencies (MAF) of 0.006 and 0.007. Similarly, one of the PMS1 variants was tolerated and considred benign with a MAF = 0.009, whereas the second PMS1 variant (c.605G>A) had previously been reported to have a functional significance via the alteration of exonic splicing enhancers as judged by in silico analysis [44].
The present study has some potential limitations. We acknowledge that NGS technology development is so rapid that library preparation and number of total effective reads in this study do not conform to current standards. However, as we detected the previously identified missense variants in our replicates and genotype accuracy is almost 96% (which is over what is expected from this technology [45] we argue that our approach is appropriate for the presented results. Pseudogenes of PMS2 present challenges but can be overcome with proper primer design [46], or elimination of exon 12 and 15 during analysis [47]. We did not specifically optimise the primer design for this study and the coverage of PMS2 was the lowest of all genes. Even though the NimbleGen tiled region has probes covering exon 10, the PMS2 exon 10 deletion (in sample #11/31) was not detected by the computational tools used in this study. While GATK does not call large deletions, we speculate that PinDel did no call this deletion due to the uneven read coverage in this region. The coverage fluctuation is due to PMS2 having many pseudogenes and one of them; PMS2CL, lacks exon 10. This inflates the coverage across the whole gene except for exon 10 where no additional reads are contributed from the pseudogene. Therefore, the additional coverage drop in sample #11/31 is not detected as a deletion as there is already a general coverage drop across all samples at this location. However, visually inspecting the coverage across all samples at this location indicates the expected deletion to be present, see Figure 1. The sample size is small and the samples have been screened for mutations in MLH1, MSH2, MSH6 and/or PMS2 previously and can explain why we have not detected any class 4/5 variants in any of the HNPCC cases included in the study. Another possible limitation is the fact that tumor tissue was not available for IHC/MSI for all the samples included in the study.
In the analysis presented here we have focused on exonic variants and nonsynonymous SNVs. We have not addressed the presence of absence of all intronic variants, even though there is an enormous amount of data to be investigated. SNVs that are not in protein-coding regions (synonymous SNVs) was not the focus of this study, but may still affect messenger RNA splicing, stability and structure, transcription factor binding as well as protein folding, which can have significant effect on the function of the protein [48].
In conclusion, by utilising new technology we have identified two LS families from the EC pateints, one without a family history of cancer and high BMI (obese), supporting the notion of universal screening of all EC patients. In addition, we have detected three novel class 3 variants to be followed up in EC cases. This study provides evidence that MMR screening provides new information about genetic risk for patients diagnosed with HNPCC and that genes not routinely tested for can play a role in cancer development in HNPCC patients through polygenic interactions that may indeed be causative. More patients need to be tested but there is the potential for rapid uptake of new testing strategies to improve risk assessment and prophylactic measures to redudce the burden of disease in this susceptible group of patients. Finally, the discovery of new genetic loci affecting the risk of developing cancer in this population will also have implications for cancer patients in the general population as the polygenic interaction reported herein may confer novel insight into new pathways for cancer development.