Exome sequencing in 51 early onset non‐familial CRC cases

Abstract Background Colorectal cancer (CRC) cases with an age of onset <40 years suggests a germline genetic cause. In total, 51 simplex cases were included to test the hypothesis of CRC as a mendelian trait caused by either heterozygous autosomal dominant or bi‐allelic autosomal recessive pathogenic variants. Methods The cohort was whole exome sequenced (WES) at 100× coverage. Both a dominant‐ and recessive model were used for searching predisposing genetic factors. In addition, we assayed recessive variants of potential moderate risk that were enriched in our young‐onset CRC cohort. Variants were filtered using a candidate cancer gene list or by selecting variants more likely to be pathogenic based on variant type (e.g., loss‐of‐function) or allele frequency. Results We identified one pathogenic variant in PTEN in a patient subsequently confirmed to have a hereditary hamartoma tumor syndrome (Cowden syndrome) and one patient with a pathogenic heterozygous variant in PMS2 that was originally not identified by WES due to low quality reads resulting from pseudogenes. In addition, we identified three heterozygous candidate missense variants in known cancer susceptibility genes (BMPR1A,BRIP1, and SRC), three truncating variants in possibly novel cancer genes (CLSPN,SEC24B, SSH2) and four candidate missense variants in ACACA, NR2C2, INPP4A, and DIDO1. We also identify five possible autosomal recessive candidate genes: ATP10B,PKHD1,UGGT2,MYH13,TFF3. Conclusion Two clear pathogenic variants were identified in patients that had not been identified clinically. Thus, the chance of detecting a hereditary cancer syndrome in patients with CRC at young age but without family history is 2/51 (4%) and therefore the clinical benefit of genetic testing in this patient group is low. Of note, using stringent filtering, we have identified a total of ten candidate heterozygous variants and five possibly biallelic autosomal recessive candidate genes that warrant further study.


| Whole exome sequencing of early onset CRC and familial BRC samples
DNA was quantified using a Qubit Fluorometer (Life Technologies, USA). Sequencing libraries were prepared according to the TruSeq DNA Sample Preparation Kit EUC 15005180 or EUC 15026489 (Illumina, USA). Briefly, 1-1.5 μg of genomic DNA was fragmented using the Covaris 400 bp protocol (Covaris, Inc., USA). After fragmentation, all samples were subjected to end-repair, A-tailing, and adaptor ligation of Illumina Multiplexing PE adaptors. An additional gel-based size selection step was performed and the adapter-ligated fragments were subsequently enriched by PCR followed by purification using Agencourt AMPure Beads (Beckman Coulter, Sweden). Exome capture was performed by pre-pooling equimolar amounts and performing enrichment in 5-or 6-plex reactions according to the TruSeq Exome Enrichment Kit Protocol (EUC 15013230). Library size was checked on a Bioanalyzer High Sensitivity DNA chip (Agilent Technologies, Sweden) while concentration was calculated by quantitative PCR. The pooled DNA libraries were clustered on a cBot instrument (Illumina) using the TruSeq PE Cluster Kit v3. Paired-end sequencing was performed for 100 cycles using a HiSeq 2000 instrument (Illumina) with TruSeq SBS Chemistry v3, according to the manufacturer's protocol. Base calling was performed with RTA (1.12.4.2 or 1.13.48) and the resulting BCL files were filtered, de-multiplexed, and converted to FASTQ format using CASAVA 1.7 or 1.8 (Illumina). The sequencing was performed at an average coverage of 100x.

| Rare autosomal dominant and autosomal recessive analysis in cancer susceptibility gene list
To search for causative mutations in autosomal dominant-or autosomal recessive cancer susceptibility genes, we first used an in silico cancer gene list modified from (Vogelstein et al., 2013) in the analysis of 51 earlyonset CRC samples. The gene list contains 244 known cancer-related genes (known somatic cancer driver genes as well as hereditary cancer genes). The candidate variants were primarily selected by filtering splicing and non-silent variants with MMAF <1% in autosomal-recessive cancer genes and <0.1% in autosomal-dominant cancer genes. The variants with MMAF higher than the prevalence of cancer syndrome suggested by the gene were manually excluded. The resulting candidate variants were classified as of uncertain significance, likely pathogenic, or pathogenic using American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines criteria (Richards et al., 2015) (Figure 1). The candidate variants were then confirmed by Sanger sequencing. Manual analysis of poor quality PMS2 variants using the sequencing file (BAM) was performed.

| Analysis of missense variants
In this analysis, we selected all missense variants in the exomes of the 51 CRC patients with MMAF < 0.1% in all public databases. Then, we grouped the variants using CADD score (Kircher et al., 2014) ("more than 20", "more than 25", and "more than 30") and ExAC missense Z-score (Lek et al., 2016) ("less than or equal to 3" and "more than 3") ( Figure 3).

F I G U R E 1 Autosomal dominant
and autosomal recessive analysis in cancer susceptibility gene list Exome data of 51 early-onset CRC cases -Variants presenting in an in silico -Variants with MMAF less than -0.1% if the gene is an autosomal-dominant cancer gene -1% if the gene is an autosomal-recessive cancer gene -Variants with MMAF less than the prevalence of cancer syndrome suggested by the gene -American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines criteria (Richards et al., 2015) 2.7.4 | Autosomal recessive genes analysis (rare monogenic and less common risk genes) To pinpoint autosomal recessive genes predisposing to early-onset colorectal cancer, we hypothesized that the patients inherited biallelic risk variants, one from each parent in a homozygous or compound heterozygous state. A gene was assumed to be associated with an increased risk of CRC if the number of cases with the possible biallelic variants was more prevalent than in a control population. Due to an absence of individual genotyping data of a normal population, we used a cohort of familial breast cancer processed in the same way as the CRC samples as a comparison group, assuming that the possible biallelic variants predisposing to early-onset colorectal cancer are not associated with familial breast cancer. With such a hypothesis, we searched for possible biallelic variants in the exome in our 51 early-onset CRC cases with 56 familial BRC as a comparison group. During the first step of filtering, genes with splicing and non-silent variants, with a MMAF < 20% and predicted to be pathogenic by more than 4 out of 9 in silico effect predictors (SIFT (Kumar et al., 2009), PolyPhen2 HDIV (Adzhubei et al., 2010), PolyPhen2 HVAR (Adzhubei et al., 2010), Phylop (Cooper et al., 2005), LRT (Chun & Fay, 2009), Mutation Taster (Schwarz et al., 2010), Mutation Assessor (Reva et al., 2011), FATHMM (Shihab et al., 2015), GERP++ (Davydov et al., 2010)), were selected. The next filtering step was to select genes with possible biallelic variants (possible compound heterozygous or homozygous) in at least two CRC cases and where no possible compound or homozygous case was found among the familial breast cancer cohort. If any two variants always had a similar MAF among population allele frequency databases or the variants showed up together in multiple samples, the variants were considered likely to be on the same allele and were excluded (Figure 4). The variants were prioritized based on MMAF and the observed frequency in CRC compared to the statistical likelihood of occurring together (computed by multiplying the MMAF of the two alleles).

| Database submission of novel variants
Relevant variant information has been uploaded to the Leiden Open Variation Database (LOVD), http://www.lovd.nl/3.0

| RESULTS
We first searched for variants in known cancer susceptibility genes (Figure 1), resulting in eight heterozygous variants in eight genes (Table 1, Figure 1). Among them, the variant in PTEN, p.Ser59Ter, has been reported to be pathogenic in ClinVar (Landrum et al., 2018) and had been reported as a somatic mutation 15 times in the TCGA dataset in various tumor types in cBioportal (Cerami et al., 2012;Gao et al., 2013). The patient was subsequently examined at the Department of Clinical Genetics at an age of 50 years and found to have macrocephaly (OFC: 65 cm), papillomatous lesions on hands and oral mucosa, intellectual disability (IQ = 84), and hamartomatous intestinal polyps. He was sent to surveillance of the thyroid according to clinical guidelines and multinodular goiter was detected. In all, these findings confirmed the clinical diagnosis of Cowden syndrome/ Hamartoma tumor syndrome (MIM #601728). His parents were both 23 years of age when he was born and neither had had any cancer. Genetic counseling was offered to his family  members. Unfortunately, no family members have contacted us and we have therefore not been able to perform genetic testing of the parents/siblings in order to confirm a de novo or familial variant. There were six candidate missense variants in the following cancer genes: MSH2, APC, PTPN12, BMPR1A, POLE, and SRC, and one inframe duplication in BRIP1. All were classified as variants of unknown significance according to ACMG criteria ( Table 1). All of these genes are expressed in colon tissue (www.genecards.org), but none of these specific variants have been reported as somatic variants in 3,473 colorectal cancer samples or in any other tumor types in cBioPortal. After our study, immunohistochemistry of mismatch repair genes was performed as part of another research project on one patient with CRC at 37 years. His tumor showed loss of PMS2 protein. Targeted sequencing of PMS2 (based on a clinical nested PCR approach) detected a NM_000535.5:c.2113G>A, p.Glu705Lys variant that is known to be pathogenic and has been reported as causative in many families, even though the INSIGHT expert panel have interpreted it as a variant of unknown significance (ClinVar). Thereafter, we manually checked the sequencing files of the other patients for poor quality PMS2 variants and no other possible pathogenic variant in PMS2 was found. No homozygous/compound heterozygous variants were found among the 244 cancer genes. In addition to the cancer gene list, we also performed an analysis on frameshift-, nonsense-, and splice variants in the exome ( Figure 2). After Sanger verification, apart from the variant in PTEN, there were 10 possible candidate variants from 10 genes (CLSPN, CELSR2, ADAM17, BIRC6, SEC24B, RBM27, PPARGC1B, NCOA7, SSH2, and MYO9B) (1 frameshift deletion, 7 nonsense-, and 2 splice variants) ( Table 2). All these genes are expressed in the normal colon tissue (genecards). None of the variants were found in cBioPortal.
In missense variants analysis (Figure 3), there were in total 3,800 missense variants with MMAF < 0.1% in all public databases. In order to select candidate variants, we filtered the variants with CADD score (Kircher et al., 2014) more than 20 (n = 2301) (Supporting Information, Table  S1). Among them, there were 248 variants in genes in which the normal population has fewer missense variants than expected, ExAC missense Z-score > 3 (Lek et al., 2016), (Supporting Information, Table S2), suggesting that missense variants might be detrimental to the normal gene function. If we instead categorize the 2,301 variants based on higher CADD score, 1,140 variants had a CADD score >25 and 367 variants had a CADD score of more than 30 (Supporting Information, Table S2). 669/2301 variants had never been reported in a normal population and, of those with CADD >30, 82 had never been reported and eight of these were in a gene where missense variants were expected to be deleterious based on the conservative constraint, ExAC missense Z-score (Lek et al., 2016) resulting in the following candidate gene list: ACACA, KIAA1109, GPHN, NR2C2, GPR25, ZNF462, INPP4A, and DIDO1 (Table 3).
In the autosomal recessive gene analysis ( Figure 4), there were 16 genes where at least two CRC cases had homozygous or possible compound heterozygous variants and none in the breast cancer cohort that we used as a comparison group (Table 4). Six genes (ATP10B, PKHD1, PTPRQ, UGGT2, MYH13, TFF3) had variants with MMAF <5%, which were observed together in CRC cases twenty times more than expected by the computed likelihood. Therefore, these six genes are more likely to be candidate autosomal recessive risk factors.

| DISCUSSION
Since onset of colorectal cancer before the age of 40 years is infrequent, genetic predisposition may be suspected. Our cohort of 51 simplex cases with early-onset CRC tested two

F I G U R E 4 Autosomal recessive genes analysis
Exome data of 51 early-onset CRC cases and 56 familial BRC samples -Splicing-and non-silent variants -Variants with MMAF less than 20% -Variants predicted to be pathogenic in more than 4 out of 9 in silico predictors Filtering genes with possible bi-allelic variants -At least two CRC cases have homozygous variants or possible compound heterozygous variants in the gene -Familial breast cancer cohort have no homozygous variants or possible compound heterozygous variants in the gene -The possible compound heterozygous variants were considered to be on the same allele and were removed if -Any two variants always had a similar MAF among population allele frequency databases -The variants showed up together in multiple samples hypotheses: first a pathogenic variant in an autosomal dominant cancer gene; and second an autosomal recessive cancer syndrome.
In order to address the first hypothesis, we searched for heterozygous variants in genes known to be important in hereditary cancer or known somatic driver genes according to the classification by the Vogelstein group (Vogelstein et al., 2013) (Table 1). We found one pathogenic variant in PTEN, p.Ser59Ter, known to cause Cowden/hamartoma tumor syndrome and this syndrome could later be confirmed clinically in the patient and genetic counseling could be offered to his relatives. While the other variants (Table 1) suggested by the cancer gene list approach have an unknown clinical significance (Landrum et al., 2018), the genes are known to be cancer-related. As we did not have access to tumor material or DNA from relatives, unfortunately further analyses to test the gene expression in tumors and/or segregation in the family could not be performed. One patient demonstrated loss of PMS2 protein in his tumor and harbored a pathogenic variant in exon 12 in PMS2 that was detected by a clinical lab. This variant was initially missed by our WES due to poor quality in the mapping step as a result of the pseudogene PMS2CL that has a high homology to exon 9 and 11-15 (Takeda et al., 2014). Loss of function in APC gene leads to the development of adenomas, a precursor lesion to CRC, in familial adenomatous polyposis (MIM #175100). This condition occurs in <1/10,000, thus the p.Asn944Asp variant with a MMAF of 0.05% is also probably too common to be disease causing. MSH2 is known to cause Lynch syndrome (MIM #120435). The incidence of Lynch syndrome has been estimated to be 1/660-1/2,000 (de la Chapelle, 2005) and approximately 40% of cases are caused by MSH2 mutation (i.e., incidence of MSH2-Lynch syndrome: 1/1,650-1/5,000) (Moller et al., 2017). In the ClinVar database, there are 850 pathogenic variants in MSH2 and thus the p.Val722Phe MSH2 variant with an incidence of 1/5,000 (0.02%) is probably too common to be disease-causing. PTPN12 is critical for the regulation of cell proliferation, differentiation, and neoplastic transformation (Takekawa et al., 1992) but is not associated with a known clinical cancer syndrome. POLE encodes the catalytic subunit of DNA polymerase epsilon and mutations cause susceptibility to CRC and other tumor types (MIM # 615083). Both of these genes are somatically mutated in CRC, but these variants were also present in 0.01% of a normal population and are likely too common to be cancer predisposition variants. The BMPR1A, BRIP1 and SRC variants have never been reported in healthy individuals which makes them highly interesting. BMPR1A variants can cause juvenile intestinal polyposis (MIM #174900). However, the only clinical information we had available from the patient was microsatellite-stable CRC at 35 years with no information  Rosenthal et al., 2018). However, the pathogenicity of our in-frame BRIP1 duplication is not known. SRC, encoding a tyrosine kinase receptor, is frequently implicated in cancer (Turro et al., 2016). It was suggested as a putative CRC gene based on proteomic analysis (Zhang et al., 2014) and may have a role in colon cancer progression (Irby et al., 1999).
In an attempt to search for novel cancer genes, we identified 10 truncating variants (in addition to the pathogenic PTEN variant) (Table 2) in genes which are not known to be susceptibility genes for cancer: CLSPN, CELSR2, ADAM17, BIRC6, SEC24B, RBM27, PPARGC1B, NCOA7, SSH2, and MYO9B. All these genes are expressed in colon tissue (genecards.org). Seven of these have never been reported in normal databases, supporting their role as rare risk variants. Three have a function highly related to cancer. CLSPN is involved in the DNA damage checkpoint (Chini & Chen, 2003) and is required for the activation of CHK1 during a DNA replication checkpoint response (Kumagai & Dunphy, 2000). It is also involved in DNA replication and repair of DNA damage repair and may function as a tumor suppressor gene (Azenha, Lopes, & Martins, 2017). Nine somatic truncating mutations in CLSPN have been reported in CRC in cBio-Portal including one at position 1139, that is, more distal to our variant. Therefore, this gene is the most likely candidate activating EGFR (Zunke & Rose-John, 2017). ADAM17 is upregulated in colorectal cancer cells (Blanchot-Jossic et al., 2005) and blocking ADAM17 in mouse colorectal cancer xenografts inhibited tumor growth (Rios-Doria et al., 2015). In humans, loss of both alleles of ADAM17 can cause a rare neonatal autosomal recessive inflammatory bowel and skin disease (MIM #603639). As both CELSR2 and ADAM17 seem to have growth-stimulating functions, the loss-of-function mutations are less likely candidates for cancer predisposition. The variants in BIRC6, RBM27, and NCOA7 were found in 0.02%-0.05% of the reference population which reduces the likelihood of them being pathogenic. Of these, BIRC6 exhibited a role in resistance to apoptosis (Chen et al., 1999;Hao et al., 2004). The gene has been suggested to be a prognosis predictor in colorectal cancer (Hu et al., 2015) and BIRC6 mutations have been found in colorectal adenomas  and are common in carcinomas (Wolff et al., 2018). NCOA7/ERAP140 is a nuclear receptor coactivator that binds to the estrogen receptor among others. It has been implicated as a risk factor for breast cancer (Higginbotham et al., 2011) but has as yet no known association to CRC. It has been shown to be over-expressed in oral squamous cell carcinoma where it promotes cell proliferation (Xie et al., 2016). Two genes have some association to cancer but not to CRC: PPARGC1B is a co-activator for ESR1 and variants in the gene have been associated with increased risk for estrogen-positive breast cancer (Li et al., 2011). One study demonstrated that MYO9B was involved in the migration of prostate cancer cell lines and might be important for metastasis (Makowska, Hughes, White, Wells, & Peckham, 2015). To date, no studies have implicated RBM27, SEC24B, or SSH2 in CRC or cancer development. As the variants in SEC24B and SSH2 have never been reported in the normal population databases, they are also possible candidate genes. In total, 3,800 rare missense variants were found in the exomes of the 51 young CRC cases. In order to try to reduce the number of candidates to consider, we filtered using CADD score > 20 which left 2301 missense variants (Supporting Information, Table S2). In order to pinpoint the candidate variants, we used even more stringent criteria (higher CADD, variants that were never previously reported and in genes that normally do not have many missense variants) giving a short-list of eight variants. CADD represents multiple lines of in silico evidence of pathogenicity and can predict pathogenicity reasonably well, however, in silico tools (such as CADD and the ExAC missense Z-score) will also miss clinically relevant pathogenic variants (van der Velde et al., 2015) and therefore this way of filtering is not optimal. In this way, we have probably excluded several variants that may have a pathogenic effect and in the future when many hundred thousand individuals have been sequenced in population databases, we may be able to filter solely on MMAF. Among the eight top candidate variants, one has been reported to be a possible tumor suppressor gene: INPP4A. INPP4A has been shown to be downregulated in pancreatic cancer and to inhibit cell proliferation and promote apoptosis in bladder and pancreatic cancer cells (Wang, Feng, Jiang, & Zuo, 2017;Wang, Wu, Huang, & Chen, 2018). Three have been described in cancer, but their function remains unclear. ACACA (acetyl-CoA carboxylase or ACC1) catalyses the rate-limiting reaction in the biogenesis of long-chain fatty acids and is thus vital for cancer cell survival during hypoxia (Gao et al., 2016). Inhibition of ACACA can lead to either decreased cell proliferation (Jones et al., 2017;Singh, Yadav, Kumar, & Saini, 2015) or decreased apoptosis (Keenan et al., 2015) and increased risk of metastasis/tumor recurrence (in mice) (Rios Garcia et al., 2017). NR2C2 can inhibit cancer initiation, but promote cancer progression (Lin et al., 2017). In colon cancer cell lines, NR2C2 is required for cell survival and its inhibition induces cell death (McNew et al., 2016;Singh et al., 2012). DIDO1 regulates embryonic stem cell renewal  and is upregulated in melanoma tissues and cell lines as well as colorectal tumors (Braig & Bosserhoff, 2013;Sillars-Hardebol et al., 2012), although in the latter, there was no correlation to gene copy number and DIDO1 was suggested to be a passenger on the common 20q duplication seen in CRC. The remaining four have no known cancer-related function (KIAA1109, GPHN, GPR25, ZNF462). Thus, whole exome sequencing has demonstrated its capability in identifying a large amount of possible candidate variants in this study. By using very stringent criteria, we could identify four possible candidate genes with a putative role in cancer, however we have probably excluded several candidate variants in the filtering. Thus, the challenge is how to rank the variants in the candidate list to obtain an optimal balance between sensitivity and specificity.
In the search for a recessive syndrome, we found no cases of rare biallelic variants in any known cancer genes in our patients. We also searched the entire exome for more common autosomal recessive alleles and listed 16 possible candidate genes ( Table 4) with six of them (ATP10B, PKHD1, PTPRQ, UGGT2, MYH13, TFF3) more likely than the others based on their MMAF <5% and observed frequency 20 times higher than the expected likelihood of occurring together. ATP10B is the catalytic part of a complex which catalyzes the hydrolysis of ATP coupled to the transport of aminophospholipids from the outer to the inner leaflet of the late endosomes (UniProt). It has been reported to be mutated somatically in 1.2% of all CRC in cBioportal (accessed 20181218), but its possible function in cancer is not yet known. PKHD1 is known to cause autosomal recessive Polycystic Kidney Disease (MIM #606702). It encodes a protein which is important for correct mitotic spindle formation and function and inhibition leads to mitotic defects (Zhang et al., 2010). PKHD1 is mutated in 5% of all CRC in cBioportal and in an analysis of 13,023 genes in 11 colorectal cases, PKHD1 was ranked as the seventh most common somatically mutated gene (Sjoblom et al., 2006). However, there was a debate whether the finding reached statistical significance (Forrest & Cavet, 2007;Ward et al., 2011). PTPRQ, Protein tyrosine phosphatase receptor-like type Q, was reported to be involved in phosphorylation/dephosphorylation signaling pathways and metastasis (Laczmanska et al., 2014(Laczmanska et al., , 2016Sato et al., 2017). 1.2% of all CRC have somatic mutations in this gene (cBioportal, accessed 181218), however, as a suspected oncogene, this gene is a less likely candidate for an autosomal recessive predisposition gene. UGGT2 is a uridine diphosphate-glucose:glycoprotein glucosyltransferase (Takeda et al., 2014). It has no known function in cancer although 2.4% of all CRC carry somatic mutations in UGGT2 (cBioportal). MYH13 also has no known role in cancer, but somatic mutations in CRC are found in 2.3% (cBioportal). TFF3, Trefoil factor 3, is a secreted protein which stimulates cell migration and prevention of apoptosis, enabling repair of the intestinal mucosa. It was suggested to be a risk factor for early recurrence of CRC and to be involved in promoting lymph node metastasis (Huang, Li, Wang, & Zhang, 2013;Morito et al., 2015). although TFF3 is very rarely mutated in CRC tissue (0.1%, cBioportal). In all, the relevance of our detected possibly biallelic variants is unknown. In summary, we have detected six putative candidate genes, of which five (ATP10B, PKHD1, UGGT2, MYH13, TFF3) may be candidate risk factor recessive genes for CRC.
Using the breast cancer cohort as a comparison to search for autosomal recessive genes may not be a good approach since there are genes that are predispose to both breast-and colorectal cancer. However, to look for possible compound heterozygous variants, we need a cohort where we have all the genotyping information. We have considered using genotyping data from public databases but there are a few issues. First, the number of samples with available genotyping information is far fewer than those that provide only allele frequency, due to data privacy. The second issue is about platform errors. Not all the variants we see in the analysis are from the samples. The errors can happen in any steps after DNA isolation: errors caused by contamination, by library preparation, by sequencing machines, or errors caused by computational tools. An advantage of our inhouse BRC cohort is that they are Swedish, the same population as the study group. Also, the DNA from the BRC cohort has been collected and processed in the same way as the study group. These advantages have eliminated the artifacts caused by the differences in populations and the difference in platforms.
The number of our proposed candidate truncating variants (10 variants in 51 cases) are slightly fewer than expected compared to those with similar study (26 of the 102 cases) (Adam et al., 2016), considering that our frequency cut-off is less stringent (0.1% vs. 0.01%). This discrepancy can likely be explained by number of databases and number of sub-populations used during filtering of rare variants. Adam et al. filtered their variants using 12 population databases including the seven ExAC cohorts. Our proposed candidate variants were required to have a MMAF <0.1% in 21 population databases. Among them, SweGen (Ameur et al., 2017) and our in-house database of 249 Swedish samples played a major role in removing normal variants belonging to the Swedish population. The ALL population in ExAC and gnomAD (Lek et al., 2016), together with their other seven sub populations, even though they overlapped, contributed significantly with variants belonging to other populations. Databases of 200 Danish samples  and 1000Genomes (ALL and NFE populations) (1000Genomes Project Consortium et al., 2012 were also included to increase the sensitivity. This shows the importance of using population-specific databases of normal variants in the filtering stage. Our results (2/51 = 4% with a hereditary cancer syndrome: Cowden and PMS2-Lynch syndrome) are comparable to others. For example Pearlman et al. showed that 2% of all CRC cases with onset before 50 years had a hereditary cancer syndrome other than Lynch syndrome (APC, MUTYH, SMAD4). They detected Lynch syndrome in 8%, especially in those with a family history (Pearlman et al., 2017). In our cohort, we selected patients with no family history, which largely excludes Lynch syndrome, although CRC caused by PMS2-mutation with its low penetrance can occur in the absence of family history. As PMS2 has multiple pseudogenes, pathogenic variants are often missed by WES and need to be manually analysed.
Simplex CRC cases are often caused by other mechanisms such as somatic mutations (Haraldsdottir et al., 2014;Mensenkamp et al., 2014) or epigenetic aberrations (Moreira et al., 2015). In the near future, patients with cancer will likely undergo paired tumor and germline testing in order to detect somatic aberrations that can guide therapy as part of genomic medicine initiatives across the Western world. As a bonus, both hereditary variants and somatic driver mutations can be identified in the same analysis. Until such pipelines are established, our results and those of others suggest that, in the absence of family history, or other suggestive features suggestive of hereditary syndromes such as polyposis or multiple tumors, genetic testing for gemline mutations in young CRC patients is of limited clinical benefit.

| CONCLUSION
Whole exome sequencing in early onset non-familial CRC patients only identified a causative germline mutation in two of 51 (4%). These patients had Cowden syndrome, that had not been diagnosed clinically or PMS2-Lynch syndrome that was difficult to detect using WES. Our results suggest germline analysis using WES or broad gene panels in simplex cases with CRC at young age is of limited value in the clinic. In addition, we propose three candidate variants in known cancer susceptibility genes (BMPR1A, BRIP1, and SRC), up to three truncating variants in possibly novel cancer genes (CLSPN possibly SEC24B, SSH2), four missense variants in genes involved in cancer initiation or progression (ACACA, NR2C2, INPP4A, and DIDO1), and five candidate risk factor recessive genes (ATP10B, PKHD1, UGGT2, MYH13, TFF3). Further studies are needed to find support for the pathogenicity of these variants in novel Mendelian or complex disease in early onset non-familial CRC.