Toward a better definition of EPCAM deletions in Lynch Syndrome: Report of new variants in Italy and the associated molecular phenotype

Abstract Background Inherited epimutations of Mismatch Repair (MMR) genes are responsible for Lynch Syndrome (LS) in a small, but well defined, subset of patients. Methylation of the MSH2 promoter consequent to the deletion of the upstream EPCAM gene is found in about 1%–3% of the LS patients and represents a classical secondary, constitutional and tissue‐specific epimutation. Several different EPCAM deletions have been reported worldwide, for the most part representing private variants caused by an Alu‐mediated recombination. Methods 712 patients with suspected LS were tested for MMR mutation in our Institute. EPCAM deletions were detected by multiplex ligation‐dependent probe amplification (MLPA) and then defined by Long‐Range polymerase chain reaction (PCR)/Sanger sequencing. A comprehensive molecular characterization of colorectal cancer (CRC) tissues was carried out by immunohistochemistry of MMR proteins, Microsatellite Instability (MSI) assay, methylation specific MLPA and transcript analyses. In addition, somatic deletions and/or variants were investigated by MLPA and next generation sequencing (NGS). Results An EPCAM deletion was found in five unrelated probands in Italy: variants c.556‐490_*8438del and c.858+1193_*5826del are novel; c.859‐1430_*2033del and c.859‐670_*530del were previously reported. All probands were affected by CRC at young age; tumors showed MSI and abnormal MSH2/MSH6 proteins expression. MSH2 promoter methylation, as well as aberrant in‐frame or out‐of‐frame EPCAM/MSH2 fusion transcripts, were detected in CRCs and normal mucosae. Conclusion An EPCAM deletion was the causative variant in about 2% of our institutional series of 224 LS patients, consistent with previously estimated frequencies. Early age and multiple CRCs was the main clinical feature of this subset of patients.


| INTRODUCTION
Pathogenic variants in MLH1, MSH2, MSH6 and PMS2 Mismatch Repair (MMR) genes (OMIM: 120436, 609309, 600678 and 600259) are causative of Lynch Syndrome (LS), an autosomal dominant condition that confers elevated risk of developing colorectal cancer (CRC), endometrial cancer (EC) and several other types of cancer. Typical phenotypic features of LS tumors are Microsatellite Instability (MSI) and loss of MMR protein expression (Lynch et al., 2009).
Biallelic inactivation of one of MMR genes, as a result of combined germline and second hit somatic mutations, facilitates carcinogenesis. Nonsense, missense, frameshift and splicing variants, as well as deletions of one or more exons, are responsible for loss of function of the inherited, predisposing MMR allele. In addition, some so-called epimutations are reported. Epimutation is referred as a heritable change that does not alter the DNA sequence but affects gene expression via DNA methylation or histone modification. In LS, constitutional epimutations are mainly reported for the MLH1 gene, either as primary event arisen de novo in gametogenesis or as secondary event, caused by an in cis heritable genetic variant (Cini et al., 2015;Hitchins, 2013).
As about the MSH2 gene, a constitutional tissue-specific promoter methylation can be inherited as secondary epimutation, due to a deletion involving the last exons of the EPCAM gene (OMIM: 185535) that maps upstream MSH2. As a consequence, EPCAM is brought close to MSH2 and, in EpCAM expressing tissues, a read-through EPCAM/MSH2 fusion transcript is generated while the native MSH2 promoter is hypermethylated and silenced (Kovacs, Papp, Szentirmay, Otto, & Olah, 2009;Ligtenberg et al., 2009). This complex event represents the first hit in a well-defined subset of LS patients (1%-3% of the total)  that are typified by the expression of aberrant EPCAM/MSH2 fusion transcripts in normal and tumor colon tissues (Kovacs et al., 2009;Ligtenberg et al., 2009).
Several different EPCAM deletions have been described worldwide, in some cases as recurrent/founder mutations (Dymerska et al., 2017;Eguchi et al., 2016;Mur et al., 2014;Spaepen et al., 2013). It has been established that the mechanism underlying these rearrangements is an Alu-mediated deletion, involving different highly homologous Alu sequences that are interspersed in intronic and intergenic regions within the EPCAM/MSH2 locus (Tutlewska, Lubinski, & Kurzawski, 2013).
Here we report on the identification and characterization of four different EPCAM deletions in five unrelated Italian LS families, and their epigenetic effect on the MSH2 locus.

| MATERIALS AND METHODS
From 1995 to 2017, DNA from a total of 712 unrelated cancer patients was collected in a diagnostic setting by the Functional Oncogenetics and Oncogenomics Laboratory at the CRO National Cancer Institute (Aviano, Italy). Patients were enrolled for genetic testing because of personal and family history consistent with LS and/or evidence of high MSI (MSI-H) and/or negative for MMR protein according to immunohistochemistry (IHC). Diagnosis of LS was confirmed by MMR and EPCAM genetic testing in 224/712 patients.
The genetic testing protocol and use of DNA samples for research purposes was approved by the Local Independent Ethics Committee (CRO-15-1997). Written informed consent was obtained from all participants of the present study.
Mismatch Repair protein IHC was performed on Formalin-Fixed and Paraffin-Embedded (FFPE) sections, stained on the Ventana BenchMark Ultra platform (Ventana Medical Systems, Inc., Oro Valley AZ). MLH-1 (M1), MSH2 (G219-1129), MSH6 (44) and PMS2 (EPR3947) (Roche Diagnostics, Indianapolis, IN) monoclonal antibodies were used to qualitatively identify human MMR proteins. Lesions were considered negative for protein expression when a complete absence of nuclear staining was evident in tumor cells with concomitant nuclear staining of adjacent normal epithelial and stromal cells.
Long-Range polymerase chain reactions (PCRs) for breakpoints detection were performed with Expand™ Long Template PCR System (Merck KGaA, Darmstadt, Germany). Shorter PCRs including breakpoints were obtained with GoTaq (Promega, Madison, WI) (primers and conditions available upon request). Polymerase chain reaction products were purified from agarose gel with the Wizard® SV Gel and PCR Clean-Up System (Promega), then sequenced using a modified BigDye® cycle sequencing protocol (30x 94°C 30″, 55°C 30″, 60°C 2′), to prevent secondary structures due to Alu-mediated hybridization.
Tumor and normal mucosa methylation statuses were assessed using Methylation Specific-MLPA (MS-MLPA) with SALSA® MLPA® ME011 MMR genes probemix (MRC-Holland). Blood DNA was used as a control.
Mutation analysis of tumor DNA was performed on the two available frozen CRC samples with a TruSeq Custom | 3 of 7 CINI et al.
Amplicon (Illumina, San Diego, CA) next generation sequencing (NGS) panel, including coding sequences of MSH2, MSH6 and MLH1 genes. Libraries were run on an Illumina MiSeq platform, with a mean coverage >4,000 reads. MLPA analysis for loss of heterozygosity (LOH) detection was carried out on frozen and FFPE tumor samples with the SALSA® MLPA® probemix P072-C1 (MSH6).
Aberrant transcripts were investigated in RNAs extracted from frozen or FFPE samples. In brief, RNA from tumor and/or normal mucosa of carriers was purified with the TRIzol™ Reagent (ThermoFisher Scientific, Waltham, MA) or the RNeasy FFPE kit (Qiagen, Hilden, Germany) according to manufacturer's protocols, and cDNA was retrotranscribed using the SuperScript™ III Reverse Transcriptase (ThermoFisher Scientific) and random primers. The presence of fusion transcripts was evaluated by amplification and direct sequencing of PCRs covering different regions between exon 4 or 7 of the EPCAM gene and exon 1 or 2 or 3 of MSH2 (primers and details upon request). The reference sequences NM_002354.2 and NG_012352.2 (EPCAM) and NM_000251.2 and NG_007110.2 (MSH2) were used for reporting aberrant transcripts and genomic deletions.

| RESULTS
Of the 224 patients tested positive for LS in our laboratory from 1995 to 2017, five unrelated patients, all Italians, carried a partial deletion of the EPCAM gene, involving exons 6-9 (families AV114 and UD18) or exons 8-9 (families PD31, AV182 and PD78), as assessed by MLPA analyses. No other pathogenic variants were found in MSH2 and MSH6 genes of these patients.
All probands were diagnosed with MSI-H CRC at young age (34-50 years), and had a family history of CRC. Immunohistochemistry, performed on a representative tumor of the proband, revealed abnormal MSH2 and MSH6 expression (Table 1)  of nuclear MSH2/MSH6 staining, one tumor retained focal MSH2 expression and some showed heterogeneous MSH2 and/or MSH6 cytoplasmic staining (Figure 1a). Twenty relatives from four families were tested for EPCAM deletion: eight healthy subjects (age 19-57 years) and four patients affected by early-onset CRC (age 25-45 years) tested positive at MLPA for the deletion detected in the proband. Clinical and molecular data of all carriers are listed in Table 1.
No pathogenic variant acting as a second hit was found by NGS screening in either MSH2 or other MMR genes in the DNA from CFS395T and CFS396T tumors. MLPA performed on available tumor samples highlighted a variegate pattern ranging from retention of the second EPCAM allele to LOH of various extent and amplitude (up to MSH2 and MSH6 loci) (Table 1 and Figure 1d).
Several EPCAM-MSH2 fusion transcripts were detected by PCR in normal mucosa and in CRC samples CFS395T, CFS396T and CFS1475T (Table S1 and Figure 1e), but not in wild-type controls (not shown).
Sequencing of PCR products evidenced at least five different aberrant transcripts originated from Del_16.5: three in-frame and two creating a frameshift with a premature stop codon in exon 2 (Figure 1f-g shows a representative case). Also Del_4.9 and Del_11.5 yielded different in-frame and out-of-frame EPCAM/MSH2 fusion transcripts (Table 1).

| DISCUSSION
An increasing body of evidence supports the relevance of epigenetic modifications in the pathogenesis of hereditary syndromes. The elucidation of the primary event causing the epigenetic change is crucial to ascertain the actual secondary nature of the epimutation and to optimize cascade testing in family members. In the context of LS testing, the importance of epimutations analysis is universally recognized and EPCAM deletion is a well-known primary event. In the last 10 years many different variants of this gene have been described; however, construction of a complete and curated public database for EPCAM deletions is still in progress (https://databases.lovd.nl/shared/genes/EPCAM), and the actual frequency of EPCAM mutations in LS is poorly defined. In particular, no precise estimate of EPCAM variant incidence in the Italian LS population is available so far. Our institutional database includes 224 molecularly confirmed LS unrelated families, five of which with EPCAM deletion (>2%), a frequency consistent with literature data (Tutlewska et al., 2013). All EPCAM deletions reported worldwide are bona fide Alu recombination-mediated variants, as they are the four deletions presented here. With the exception of few founder mutations identified in the Netherlands and Poland (Dymerska et al., 2017;Niessen et al., 2009), in general deletions in the EPCAM gene are rare, sometimes private, variants with no recombination hotspot sites. Two of the four deletions (Del_16.5 and Del_11.5) that we describe here are novel, not recorded in any databases or literature. The Del_2.6 variant was identical to that reported in the Netherlands . The Del_4.9 variant was first described as founder mutation in Dutch population and in the large American Family R Ligtenberg et al., 2009;Lynch et al., 2011;Niessen et al., 2009). Interestingly, the Del_4.9 variant described here and that previously reported  (GenBank: FJ347525.1) had different breakpoints, although involving the same Alu sequences, and our family had no known Dutch ancestry.
For three out of four deletions, we had evidence in both normal and tumor colon tissues, of EPCAM/MSH2 fusion transcripts. These were concomitant to MSH2 promoter methylation, which acted as a first hit for gene inactivation.
In all EPCAM deleted tumors analyzed, MSH2/MSH6 IHC nuclear expression was always abnormal. In three cases the loss of nuclear MSH2 staining was compatible with a large somatic deletion of the locus. A peculiarity was the MSH2 cytoplasmic staining detected in two cases. This aberrant localization is similar to that described in a previous report of a larger deletion involving both EPCAM and the first two exons of MSH2 (Sekine et al., 2017), and is consistent with the production of the alternative in-frame fusion transcripts, predicted to be translated into chimeric proteins. It is tempting to speculate that the fusion with EPCAM alters the intrinsic capability of MSH2 to form a functional complex with MSH6, which is responsible for MSH2/MSH6 nuclear shuttling (Gassman et al., 2011).
Somatic EPCAM/MMR second hits could be identified only in three out of six EPCAM deletion-associated tumors. The sub-optimal quality of FFPE DNA from samples CFS825T and CFS1475T prevented NGS testing, so somatic MSH2 point-mutations cannot be excluded.
Instead, MSH2 deficiency remains unexplained in sample CFS395T, in which both MLPA and NGS analyses were performed, but neither LOH nor DNA sequence variants were detected.
EPCAM deletion carriers typically show high penetrance of early onset CRC and a CRC cumulative risk up to 75%; compared to MSH2 mutation carriers rarely develop multiple tumors of different histotypes. Moreover, the cumulative risk of EC in EPCAM deletion carriers is lower (12% vs. 51%), although this risk is slightly increased for deletions more proximal to the MSH2 locus (Kempers et al., 2011). According to previous genotype-phenotype correlations, even in our small dataset EC is absent and the most frequent tumor is CRC, with 16 CRCs in eight affected carriers. Notably, one duodenal tumor was also recorded, confirming a previous suggestion of increased risk (Kempers et al., 2011), but the data are insufficient for correlating this clinical phenotype to particular EPCAM genotypes. Given the presence of multiple metachronous CRCs, the gastrointestinal prevention protocols for EPCAM deletion carriers should be focused on colon-rectum, but surveillance of small intestine is also warranted. It is worth noting that EC occurrence, even if rare, cannot be excluded, and the gynecologic surveillance is still advised for all women with an EPCAM deletion.
This report underscores the importance of EPCAM genetic screening in LS patients, especially in the presence of an atypical pattern of MSH2 protein expression in tumors. In perspective, a thorough molecular characterization of the breakpoints might help in genetic counseling by laying down the basis for improved tumor risk estimates.