Identification of Lynch syndrome risk variants in the Romanian population

Abstract Two familial forms of colorectal cancer (CRC), Lynch syndrome (LS) and familial adenomatous polyposis (FAP), are caused by rare mutations in DNA mismatch repair genes (MLH1,MSH2,MSH6,PMS2) and the genes APC and MUTYH, respectively. No information is available on the presence of high‐risk CRC mutations in the Romanian population. We performed whole‐genome sequencing of 61 Romanian CRC cases with a family history of cancer and/or early onset of disease, focusing the analysis on candidate variants in the LS and FAP genes. The frequencies of all candidate variants were assessed in a cohort of 688 CRC cases and 4567 controls. Immunohistochemical (IHC) staining for MLH1,MSH2,MSH6, and PMS2 was performed on tumour tissue. We identified 11 candidate variants in 11 cases; six variants in MLH1, one in MSH6, one in PMS2, and three in APC. Combining information on the predicted impact of the variants on the proteins, IHC results and previous reports, we found three novel pathogenic variants (MLH1:p.Lys84ThrfsTer4, MLH1:p.Ala586CysfsTer7, PMS2:p.Arg211ThrfsTer38), and two novel variants that are unlikely to be pathogenic. Also, we confirmed three previously published pathogenic LS variants and suggest to reclassify a previously reported variant of uncertain significance to pathogenic (MLH1:c.1559‐1G>C).

tumour tissue. We identified 11 candidate variants in 11 cases; six variants in MLH1, one in MSH6, one in PMS2, and three in APC. Combining information on the predicted impact of the variants on the proteins, IHC results and previous reports, we found three novel pathogenic variants (MLH1:p.Lys84ThrfsTer4, MLH1: p.Ala586CysfsTer7, PMS2:p.Arg211ThrfsTer38), and two novel variants that are unlikely to be pathogenic. Also, we confirmed three previously published pathogenic LS variants and suggest to reclassify a previously reported variant of uncertain significance to pathogenic (MLH1:c.1559-1G>C).  4 Carriers of LS mutations have an estimated 25%-75% life-time risk of CRC as well as increased risk of several other cancer types, including endometrial and ovarian cancer. 5 Due to the low frequency of LS and heterogeneity in phenotypic expression, it has proven difficult to accurately establish population prevalence and to assess the penetrance of LS mutations. 6 About 15% of CRC cases are somatically hypermutated as a consequence of MMR deficiency. Of these, 1%-3% are due to LS while most of the remaining MMR-deficient tumours have somatic inactivation of MLH1 via hypermethylation of the gene promoter. 7 In MMR-deficient tumours, both copies of the same MMR gene have been inactivated, resulting in no production of the respective protein product. MMR-deficient tumours exhibit several clinical characteristics that have implication for therapy, in particular with regard to the use of immune system modulators. 8  | 6069 method and sequenced on Illumina HiSeq X machines. Sequencing reads were aligned to build 38 of the human reference sequence (GRCh38) using the Burrows-Wheeler Aligner (BWA). 11 Alignments were merged into a single BAM file and marked for duplicates using Picard. 12 Only nonduplicate reads were used for the downstream analyses. Variants were called using version 3.8-0 of the Genome Analysis Toolkit (GATK), 13 using a multisample configuration.

| Variant annotation and filtering
Variants were annotated using release 8.0 of the Variant Effect Predictor (VEP-Ensembl). 14 To filter out variants over a certain frequency threshold, we used a reference set of 38 000 Icelandic individuals whole-genome sequenced at deCODE genetics, an extension of a previously described set of 15 220 WGS Icelanders. 15 None of the variants described here had any carriers in the Icelandic dataset. Additional frequency filtering was performed using alleles from publicly available datasets of the Exome Aggregation Consortium. 16

| Genetic analysis
Only rare (below 1% allelic frequency) coding and splice region variants were considered, including variants with predicted high (stop, frameshift, and splice essential) and moderate (missense, in-frame, and splice region) impact on protein function. We focused on singlenucleotide polymorphisms and small indels (< 20 base pairs). We  Table S1.

| Immunohistochemistry
Paraffin blocks with tumour samples from all 11 carriers of variants in the LS genes were collected and sections from them stained for MLH1, MSH6, PMS2 to assess if the protein was present. Immunohistochemistry was performed on 3 μm sections. Following deparaffinization in xylene, samples were rehydrated in ethanol and subjected to heat-induced epitope retrieval ( We assessed the pathogenicity of these variants in the Romanian population based on IHC results, predicted protein effect and annotation in ClinVar and InSIGHT databases ( Table 2, APRP, assessment of pathogenicity in the Romanian population).
We divide our results into two categories; novel variants and previously documented variants. We summarized all reports regarding the pathogenicity of the previously reported variants in Table S2, using the output from the ClinVar database. An overview of personal and familial history of cancer for the 11 carriers is listed in Table 4. do not support that his variant should be classified as pathogenic.

|
We classify this variant as a tier III based on the recommendations of the ACMG.

| DISCUSSIONS
This study is the first assessment of rare variants underlying LS in CRC patients in Romanians. We identify new variants specific to the Romanian population and show that some variants previously reported to be pathogenic in other populations also occur in Romania.
We identified three novel pathogenic variants, two novel variants that are unlikely to be pathogenic. Also, we confirmed three previously published pathogenic variants and suggest to reclassify a variant previously classified as VUS as pathogenic. Due to study limitations, we were not able to classify the three APC variants identified in the Romanian population. We note that out of the two rare missense variants in APC identified in the same individual, we classify one as a likely benign variant based on ACMG's guidelines for classification of sequence variants. 20 The other variant, p.Ala927Gly, has been reported previously as a VUS, but we note that it is located within a critical domain, intolerant to mutations. Our present study is the first one, to our knowledge, to examine rare sequence variants associated with CRC in the Romanian population.
In total, we identified six pathogenic variants, one nonpathogenic variant and four variants of uncertain significance in the Romanian population. In order to determine the prevalence of these variants in Romania, we assessed the frequencies of the 11 variants in the full ROMCAN cohort. As described in Table 3, none of the mutations were found in more than 1 CRC patients except for MLH1:c.1559-1G>C. Our results do not suggest any strong association between the 11 variants identified here and BRC, LuCa or PrCa.
Identification of LS variants in the Romanian population is important in order to reduce the incidence and mortality of this multicancer disorder. Our present study is the largest effort, to our knowledge, to examine the genetic profile of this pathology in Eastern Europe. Due to study limitations, we were not able to extrapolate any other clinical observations, and we emphasize the need for future follow-up studies in the Romanian population. This study is the first step towards improving our understanding of the genetic particularities of this pathology in Romania and provides new insights for the scientific community studying the genetic epidemiology of LS.

CONFLI CT OF INTEREST
The authors from deCODE genetics are employees of deCODE genetics/AMGEN.

ACKNOWLEDG EMENTS
This study was funded in part by the European Union FP7 Program