Lynch syndrome mutation spectrum in New South Wales, Australia, including 55 novel mutations

Abstract Background Lynch syndrome, the most frequent hereditary colorectal cancer syndrome, is caused by defects in mismatch repair genes. Genetic testing is important in order to identify mutation carriers who can benefit from intensive surveillance programs. One of the challenges with genetic testing is the interpretation of pathogenicity of detected DNA variants. The aim of this study was to investigate all putative pathogenic variants tested for at the Division of Molecular Medicine, Pathology North, in Newcastle, Australia, to establish whether previous variant classification is in accordance with that recently performed in the InSiGHT collaboration. Methods Prediction programs and available literature were used to classify new variants or variants without classification. Results We identified 333 mutation positive families, in which 211 different putative pathogenic mismatch repair mutations were found. Most variants with an InSiGHT classification (141 out of 146) were in accordance with our classification. Five variants were discordant, of which one can definitively be reclassified according to the InSiGHT scheme as class 5. Sixty‐four variants had not been classified by InSiGHT, of whom 55 have not been previously reported. Conclusion In conclusion, we found that our classifications were mostly in accordance with the InSiGHT scheme. In addition to already known MMR mutations, we have also presented 55 novel pathogenic or putative pathogenic mutations.


Introduction
Lynch syndrome (OMIM #120435), formerly called hereditary nonpolyposis colorectal cancer (HNPCC), is the most frequent hereditary predisposition to colorectal cancer (CRC). Lynch syndrome patients also have an increased risk of developing other epithelial malignancies, including endometrial, ovarian, stomach, hepatobiliary, urinary, small bowel, brain, and sebaceous tumors (recently reviewed) (Cohen and Leininger 2014).
Genetic testing is currently offered to families that fulfill the clinical diagnostic criteria as defined by the Amsterdam I and II, or Bethesda guidelines. The four DNA mismatch repair (MMR) genes associated with Lynch syndrome; MLH1 (OMIM *120436), MSH2 (OMIM *609309), MSH6 (OMIM *600678), and PMS2 (OMIM *600259) are the only ones that are routinely screened in predisposition testing. Recently, germline EPCAM (OMIM *185535) deletions have been associated with MSH2 epimutation, which results in gene expression silencing. Microsatellite instability is the hallmark of Lynch syndrome (Umar et al. 2004) and is detected by size fractionation of a series of mono-or di-nucleotide repeat sequences. Immunohistochemical (IHC) testing of tumor tissue looking for the loss of MMR gene expression is used as a guide for the selection of genes to test.
Lynch syndrome families have an earlier onset of cancer than the general population, and it is therefore imperative to identify high-risk patients from within these families in order to detect tumors at curable stages. The clinical criteria are useful for the identification of HNPCC, but only approximately half of the identified HNPCC families' harbor a pathogenic MMR mutation that can be reclassified as Lynch syndrome (Sjursen et al. 2010;Steinke et al. 2014). It is therefore important to perform genetic testing in order to identify high-risk mutation carriers.
One of the challenges with genetic testing is the interpretation of detected DNA variants, which is often straight forward when the variant is a stop codon, a frameshift deletion or insertion, a splice mutation in highly conserved splice donor or acceptor sites (exon intron boundaries), or a gross deletion. All these alterations will alter the protein function, leading to a defective MMR. However, when the alteration is a missense mutation (substitution of amino acid), an in-frame deletion or insertion, or an intron mutation outside the splice donor or acceptor site, the interpretation can be very complex, requiring extended analyses and even then may not reveal the true nature of the change. Those missense changes that cannot be readily classified are termed variants of uncertain clinical significance (VUS). A 5-tiered schemes for classification of variants has been made (Plon et al. 2008;Thompson et al. 2013), in which class 1 and 2 variants are not or most likely not pathogenic, respectively, VUS are class 3 variants, whereas class 4 and 5 are likely pathogenic and pathogenic, respectively. The classification of variants is important because class 4 and 5 is used to confirm a Lynch syndrome diagnosis and predictive testing can subsequently be offered to family members. In 2014, the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) developed, tested and applied the 5-tiered scheme for the classification of constitutional variants in MLH1, MSH2, MSH6, and PMS2 (Thompson et al. 2014).
The aim of this study was to investigate all putative pathogenic variants tested for at the Division of Genetics, Hunter Area Pathology Service (HAPS), Pathology North, in Newcastle, New South Wales, Australia, to establish whether previous variant classification is in accordance with that performed in the InSiGHT collaboration.
This report is a follow-up study from Scott et al. (2001) and from Talseth-Palmer et al. (2010) which presented data from 32 (MLH1 or MSH2) and 35 (MSH6 or PMS2) mutation positive families, respectively. Here, we present data from 333 mutation positive families.

Material and Methods
The study complies with the requirements of the Hunter New England Human Research Ethics Committee and the University of Newcastle Human Research Ethics Committee, Newcastle, NSW, Australia. Written informed consent was obtained from all participants.
HNPCC probands referred to HAPS (Pathology North) for genetic testing between the years 1997 and 2010 were used in this study. A collection of over 2000 patients diagnosed with HNPCC were included in the study, of which 834 have a molecular diagnosis of Lynch syndrome (germline changes in DNA MMR genes) belonging to 333 families.
Mutation analyses for MLH1, MSH2, and MSH6 were performed at the Division of Genetics, HAPS, Pathology North in Newcastle, New South Wales (NSW), Australia as described in Scott et al. 2001 andTalseth-Palmer et al. 2010. PMS2 gene analyses were performed at IMVS Pathology, Adelaide, Australia. Suspected splice variants were further tested by RNA analysis. For RNA analysis, transformed lymphocytes were grown in culture from which RNA was isolated and converted to cDNA, RNA transcripts were size fractionated and any that were not of the expected size were subjected to Sanger sequencing. RNA analyses were performed for the following variants: MLH1: c.588+1G>T and c.1731G>A, MSH2 c.1759G>T and MSH6: c.3173-22_3173-11del, c.3646+2dupT and c.3556+3_3556+13del.
As all 333 probands harbored a germline MMR gene defect, they were defined as having Lynch syndrome. Members of these families were offered predictive genetic testing.
The Leiden Open Variation Database (LOVD) database (http://chromium.liacs.nl/LOVD2/colon_cancer/variants) was utilized in order to find the variants database ID number and their InSiGHT classifications if reported. This classification system is described in (Thompson et al. 2013). If the variant was not reported in the LOVD database, we used the same system as InSiGHT for classification. Available prediction programs and research literature were also utilized for annotation of pathogenicity, including Alamut software (Interactive Biosoftware, Rouen, France). The following tools and measures were used to assess the functional impact at protein level of observed variants: Grantham's distance (Grantham 1974), PhyloP (Pollard et al. 2010), SIFT (Kumar et al. 2009), MutationTaster (Schwarz et al. 2014 (Marchler-Bauer et al. 2015), and UniProt (http://www.uniprot.org/uniprot/). The splice prediction tools used were SpliceSiteFinder-like (Zhang 1998), MaxEntScan (Yeo and Burge 2004), NNSPLICE (Reese et al. 1997), GeneSplicer (Pertea et al. 2001), and Human Splicing Finder (Desmet et al. 2009). Literature search by PUBMED, ClinVar (Landrum et al. 2014), and Google Scholar were performed to check if the variants had been reported previously.
All new MMR variants identified in our cohort, and which are included in this paper are submitted to the LOVD/InSiGHT database.
Two hundred and eleven different MMR variants (205 class 4/5 variants and 6 class 3 variants), which were considered to be the cause of Lynch syndrome, were identified in these families (Table 1). Of these variants, 141 were found to have a LOVD DB-ID number and an InSiGHT classification (class 4 and 5), which were in accordance with our own interpretation (Table S1). Eighty-four of these variants were found in single families, 29 were found in two families, 13 were found in three families, eight were found in four families, two were found in five families, three were found in six families, whereas one variant was found in eight and eleven families, respectively. The MMR mutation distribution of these 141 variants was 63 variants in MSH2, 53 in MLH1, 23 in MSH6, and 2 in PMS2.
Five of the nine splice mutations are in the highly conserved splice donor or acceptor sites (AE1-2). Three mutations considered to affect splicing (Table 2) has been confirmed by RNA analyses in our laboratory to cause aberrant splicing (pathogenic); two MSH6 variants (c.3173-22_3173-11del and c.3646+2dup) and one MSH2 variant (c.1759G>T). One predicted splice mutation is located in the last nucleotide of exon 14 (MLH1 c.1667G>A) and is predicted by three in silico tools to alter splicing.
Sixty of the variants without an InSiGHT classification were found in single families, whereas four variants were found in two families. Only ten of these variants have previously been published, thus fifty-four of the variants are novel.
Families with only an identified class 1, 2, or 3 MMR variant were not included in this study. But according to the LOVD database five of the variants, which we have interpreted as class 4 or 5, are class 3 based on "insufficient evidence" (Table 3). These are two variants in MLH1 (c.1A>G and c.988_990del), one in MSH2 (c.2635-3C>G), one in MSH6 (c.3556+3_+13del), and one in PMS2 (c.1A>G). The start codon PMS2 variant were found together with a novel (class 3) PMS2 variant in the same patient (an in-frame deletion, c.834_842del).
MLH1 c.1A>G (MLH1_0001457) has been reported once in the LOVD database by us, and PMS2 c.1A>G has been reported six times to the LOVD database. They are both classified as VUS by InSiGHT due to insufficient     evidence. The in-frame deletion in MLH1, c.988_990del, p.(Ile330del) (MLH1_01631) has been reported in four patients in the LOVD database. We argue for the pathogenicity of these three variants in the discussion. The predicted splice mutation in MSH6 c.3556+3_+13del (MSH6_00661) has been reported twice to the LOVD database and it is classified as a VUS due to the absence of RNA analyses. RNA analysis was performed in our laboratory showing that the mutation results in skipping of exon 6. Another predicted splice mutation in MSH2 c.2635-3C>G (MSH2_01371) has been reported twice to the LOVD database and was classified as a VUS due to the lack of RNA analysis. RNA analysis was not available for this study either. Three in silico splice prediction tools predicted this variant to alter the acceptor site 3 bps downstream by a change in À70.8% compared to wild type (WT). Our family harboring this mutation has a strong family history of cancer; five family members were found to be mutation carriers, three with CRC (diagnosed at 26, 35, and 49 years of age), one with adenomas at 26 years of age and one unaffected in his twenties. Thus, the MSH2 c.2635-3C>G mutation may be a high-risk mutation in this family, although cDNA analysis needs to be performed before a conclusion can be made.
The class 3 PMS2 variant (c.834_842del) found together with PMS2 c.1A>G in one of our patient has not been reported previously. This deletion causes the loss of 4 residues in exon 8 (His278, Gly279, Val280, and Gly281) and the insertion of Gln. His278, Gly279, and Gly281 are highly conserved residues in protein domain MutL_Trans_hPMS_2_like, thus, they are likely important for correct function (transducer domain, important in the transduction of structural signals from ATP-binding site to the DNA breakage/reunion regions of the enzymes). Structurally, residue p.278-280 form a turn between two beta-strands (http://www.uniprot.org/uniprot/P54278). Therefore, it is possible that the observed deletion introduces a globular change in the protein structure, leading to altered protein function.

Discussion
In this study, we report the MMR variants that have been identified in 333 consecutively collected families, from NSW, Australia over a period of almost 20 years, which have since been used for predictive testing. The MMR mutation spectrum in the present cohort is similar to that previously reported (Cohen and Leininger 2014). Mutations in MLH1 and MSH2 accounts for 80% of the total number of variants identified, whereas mutations in MSH6 and PMS2 accounts for almost 20%. The Amsterdam II criteria were fulfilled by 58% of the mutation positive families, which is in accordance with other studies (Syngal et al. 2000;Sjursen et al. 2010;Steinke et al. 2014).
The aim of this study was to investigate the variants used for predictive testing in order to establish whether they had been correctly classified and in accordance with the recently publish classification protocol suggested by the InSiGHT's collaboration on variant pathogenicity assessment. Sixty-nine percent (146 out of 211 variants) were found to have a LOVD DB-ID number and an InSiGHT classification. Of these, 141 (96.7%) were in accordance with our own interpretation as class 4 or 5. Five variants were defined as class 3 variants according to InSiGHT; whereas they were interpreted to be class 4 or 5 (Table 3) by us. Two of these variants were mutations in start codon of MLH1 and PMS2, both of which are assigned pathogenic by several studies. Recent functional studies of MLH1 c.A>G reveal that translation is mostly initiated at an in-frame position 103 nucleotides downstream, but also at two ATG sequences downstream (Parsons et al. 2015). These two ATG sequences showed minimal protein expression (c.89ATG) or some expression (c.122ATG), but because it results in a reading frame shift these starting codons will lead to a truncated protein (Parsons et al. 2015). The protein product encoded by the in-frame transcript initiating from position c.103 (lacking the first 34 amino acids) showed loss of in vitro mismatch repair activity comparable to known pathogenic mutations. Other nucleotide substitutions in MLH1 start codon, c.2T>A (MLH1_00031) (Mangold et al. 2005) and c.2T>G (Bonadona et al. 2011;Canard et al. 2012) are reported as pathogenic because of abnormal IHC and MSI-H tumors. The other start codon variant, PMS2 c.1A>G, has previously been found in three index patients Table 4. Overview of the distribution of all the different MMR mutation types reported in this study (including data from Tables 2, 3,  and Table S1). whose tumors showed loss of PMS2 staining, and where biallelic PMS2 variants were found (Senter et al. 2008), as in our patient. Their patients had CRC in their twenties (Senter et al. 2008), while the index patient in our family had synchronous CRC and endometrial cancer at 55 years of age. In another study, monoallelic PMS2 c.1A>G mutations were interpreted as pathogenic because of a lack of PMS2 protein staining in a patient with endometrial cancer diagnosed at 42 years of age (Borras et al. 2013). As described, there are several evidences for the pathogenicity of both MLH1 and PMS2 c.1A>G. A third variant with discordance was MLH1 c.988_990del p.[Ser295Argfs*21, Ile330del]. The variant has been shown by using a minigene assay to lead to partial exon 11 skipping (Tournier et al. 2008). It has been reported as a weak but reproducible factor contributing to exon 11 skipping, found in 5% of the transcripts. Blood samples suitable for RNA extraction were not available from patients carrying this MLH1 variant, thus, the partial splicing effect has not been confirmed. Another functional study has shown that MLH1 c.988_990del is associated with a reduction in protein expression and MMR activity and an alteration of subcellular localization of the MLH1 protein, (Raevaara et al. 2005) whereas interaction with PMS2 was comparable to WT. In addition to the functional assays implicating loss of MLH1 function, three have reported this in-frame variant to be found in Amsterdam positive patients with tumors showing loss of MLH1 protein and MSI-H to the LOVD/ InSiGHT database (Desiree du Sart; MLH1_001631) (Southey et al. 2005;Jenkins et al. 2006). Therefore, according to the InSiGHT 5-tiered classification system this variant should be classified as class 5.
The fourth discordant variant was MSH6 c.3556+3_+13del, which we have confirmed by cDNA analyses to cause an aberrant transcript lacking exon 6. Thus, this variant should now be reclassified from class 3 to pathogenic class 5. The fifth discordant variant was MSH2 c.2635-3C>G. RNA analyses has not been performed by us, and we could not find any cDNA data in the literature either. It may cause aberrant splicing as indicated by the prediction programs, however, this must be confirmed by cDNA analyses. Thus, we agree with LOVD/InSiGHT that this is a class 3 variant, which not should be used for predictive testing. Taken together one out of five discordant variants has to be reclassified by us.
In addition to reported MMR mutations, we have presented 55 novel mutations (Table 2 and one novel in Table 3). The 64 mutations in Table 2 are classified as pathogenic (class 5) or probably pathogenic (class 4) as they are frameshift, nonsense, exon deletions or duplications, splice mutations, and a duplication of 66 nucleotides. Usually, nonsense and frameshift mutations result in premature termination codons which target transcripts for nonsense-mediated mRNA decay. However, the effect of such mutations in the last exon cannot be determined conclusively. Three of the frameshift MLH1 variants reported here are in the last exon (c.2114del, p.Pro705-Leufs*78, c.2149del, p.Glu717Asnfs*66, and c.2196del, p.Lys732Asnfs*51). They alter the reading frame and change the last 51, 39 and 24 amino acids of exon 19, respectively. In addition, all three variants lead to extension of the protein by 25 amino acids. The C-terminal end of MLH1 is important for dimerization to PMS2, and missense mutations of codon 749 and 750 are shown to be pathogenic due to the abolition of PMS2 binding (Kosinski et al. 2010). MLH1 needs to bind PMS2 to form a catalytically functional and correctly localized heterodimer, which is important in MMR. Therefore, these three MLH1 frameshift mutations are interpreted to be class 4 variants and the families are offered predictive testing.
The nine novel splice mutations we report (Table 2) are in the highly conserved splice donor or acceptor sites (n = 5), or they were confirmed by RNA analyses to cause aberrant splicing (n = 3). One predicted splice mutation is located in the last nucleotide of exon 14 (MLH1 c.1667G>A). We have interpreted MLH1 c.1667G>A to be likely pathogenic because another substitution of the same nucleotide has been classified by InSiGHT to be pathogenic (MLH1 c.1667G>T; MLH1_01162). In addition, MLH1 c.1667G>C is published as likely pathogenic in ClinVar (accession RCV000164556.1). The predicted change at donor site one nucleotide downstream are almost equal for MLH1 c.1667G>A (À43.4%) and c.1667G>C (À43.7%) compared to WT (c.1667G), whereas the predicted change for c.1667G>T is slightly higher (À55.7%). MLH1 c.1667G>A was detected in one family fulfilling the Amsterdam criteria, and the proband and his brother were 49 and 36 years old at diagnosis of CRC, respectively.
One in-frame duplication in MSH2 (c.2051_2116dup; p.Val684_Val705dup) found in an Amsterdam positive family, has been interpreted as likely pathogenic because it lead to the insertion of 22 amino acids in the highly conserved ATP-binding cassette domain of MSH2. Most probably this duplication disturbs the structure (http:// www.uniprot.org/uniprot/P43246) of an important functional domain necessary for correct mismatch repair.
In our cohort, MSH2 variants are most frequent, both in number of variants (91 out of 211) and in number of families (147 out of 333). Thus, MSH2 variants were the causative variant in almost half of our families. Most families have their own unique causative MMR variant. However, some variants were found in several families, whereof the most frequent were MSH2 c.942+3A>T found in 11 families, and MLH1 c.350C>T found in 8 families. The most frequent mutations found in our cohort are well known and have several public entries in the LOVD database. We have not identified any typical Australian founder mutations.
The most frequent mutation types in this study were frameshift, splice site, nonsense, and exon deletion/duplication mutations, accounting for 94% of the mutations. Exon deletion/duplication amounted to 15% of the mutations, and more than 2/3 (71%) of these were found in the MSH2 gene. The spectrum of mutation types is similar to that found in other studies (Wijnen et al. 1998;Nilbert et al. 2009;Sjursen et al. 2010).
In conclusion, we found that most variants with an InSiGHT classification (141 out of 146) were in accordance with our classification. Five variants did not have the same classification, of which four can be reclassified by InSiGHT. In addition to already known MMR mutations, we have presented 55 novel pathogenic or putative pathogenic mutations.