Somatic mutations and germline sequence variants in patients with familial colorectal cancer



It is estimated that up to 35% of colorectal cancers (CRC) can be explained by hereditary factors. However, genes predisposing to highly penetrant CRC syndromes account for only a small fraction of all cases. Thus, most CRCs still remain molecularly unexplained. A recent systematic sequencing study on well-annotated human protein coding genes identified 280 somatically mutated candidate cancer genes (CAN genes) in breast and colorectal cancer. It is estimated that 8% of all reported cancer genes show both somatic and germline mutations. Therefore, the identified CAN genes serve as a distinct set of candidates for being involved in hereditary susceptibility. The aim of this study was to evaluate the role of colorectal CAN genes in familial CRC. Samples from 45 familial CRCs without known cancer predisposing mutations were screened for somatic and germline variants in 15 top-ranked CAN genes. Six of the genes were found to be somatically mutated in our tumor series. We identified 22 nonsynonymous somatic mutations of which the majority was of missense type. In germline, three novel nonsynonymous variants were identified in the following genes: CSMD3, EPHB6 and C10orf137, and none of the variants were present in 890 population-matched healthy controls. It is possible that the identified germline variants modulate predisposition to CRC. Functional validation and larger sample sets, however, will be required to clarify the role of the identified germline variants in CRC susceptibility.

Colorectal cancer (CRC) is the third most common cancer with around 50,000 estimated deaths in the United Stated in 2009.1 Current studies estimate that up to 35% of CRCs involve inherited susceptibility.2 However, germline mutations in genes of high penetrance, such as APC in FAP and MLH1 in HNPCC, are believed to explain only a subset of hereditary CRCs.3 Thus, the genetic etiology of many familial cases remains unexplained. It has been hypothesized that much of the remaining inherited genetic risk is a consequence of common low- and rare moderate-penetrance variants. Candidate gene sequencing approaches are well suitable for detecting moderate-penetrance genes that methods such as linkage analysis and genome-wide association (GWA)-based strategies cannot spot. Further efforts to identify additional susceptibility genes are well justified to reduce morbidity and mortality caused by CRC.

The first genome-wide effort to identify genes mutated in cancer was carried out by Sjöblom et al.4 in a study where more than 13,000 genes in 11 breast and 11 colorectal cancers were analyzed. In the first screen, a total of 1,307 somatic mutations in 1,149 genes were identified. With a statistical method that took into account variations in mutation frequency, context and nucleotide type affected, a set of candidate cancer genes (CAN genes) was attained. The identified CAN genes, 189 in total, were described as true driver genes with mutation frequency higher than expected by chance alone. A more complete picture of the cancer genome was presented by Wood et al.5 in a follow-up study of the previous effort, including the analysis of around 18,000 protein-coding genes. Rank ordered lists of 280 CAN genes, equally spread between breast and colorectal cancer, were obtained.

Today, as many as 418 cancer genes have been identified and of these 8% show both somatic and germline mutations ( On the Human Cancer Gene Census website, 15 somatically mutated CRC genes are listed, and germline mutations have been reported in 6 of them ( This indicates that the somatically mutated CAN genes serve as a distinct set of candidate genes for being involved in hereditary susceptibility. The aim of this study was to evaluate the role of colorectal CAN genes in familial CRC. The mutational profiles of the 15 top-ranked CAN genes were analyzed for somatic and germline variants in a series of 45 familial CRC patients.4, 5


CRC: colorectal cancer; CAN genes: candidate cancer genes; FAP: familial adenomatous polyposis; HNPCC: hereditary nonpolyposis colorectal cancer; GWA: genome-wide association

Material and Methods

Study subjects

The CRC samples and corresponding normal tissues used in this study were selected from a population-based set of 1,042 samples. These were prospectively collected at nine Finnish central hospitals between years 1994 and 1998. This study was approved by the Helsinki University Hospital Ethics Committee. Complete clinical data, including age at disease onset, family history, pathology reports, tumor grade and stage, are described in more detail in previous works.7, 8 After extensive analyses, 40 of these 1,042 samples have been identified as carriers of known CRC susceptibility genes, whereas 113 familial cases have remained mutation negative. In this study, the focus is on mutation negative familial cases, with typically one additional CRC case in first-degree relatives. From these, 45 tumor samples previously evaluated by a pathologist were selected. They all displayed >70% carcinoma tissue, were of grades I–III and of microsatellite stable status. Respective normal samples were available from all cases. The average age at diagnosis was 71.9 years (SD, ±11 years; median age, 73 years), and 13% of the patients had more than one diagnosed CRC case in a first-degree relative. DNA from paraffin-embedded tissues was derived for selected first-degree relatives. Blood DNA samples from population-matched healthy individuals, obtained from the Finnish Red Cross Blood Transfusion Service, were used as controls.

Candidate gene selection

A recent large-scale sequencing effort identified 140 possible colorectal CAN genes.5 In our study, 15 top-ranked CAN genes were selected and analyzed for mutational profile (Table 1). We excluded the following CAN genes: APC, TP53, SMAD4 and PTEN, which were known CRC predisposing genes. KRAS has been extensively studied in familial CRC and was also left out from our analysis. All coding exons and adjacent splice sites were covered related to the Reference Sequences with the longest coding region (NCBI database, Build 36.1, released March 2006).

Table 1. CAN genes screened for mutation profile
inline image

Sequencing methods

Primers were designed using the ExonPrimer program ( Primers for amplifying DNA extracted from paraffin-embedded tissue were designed with the Primer3 program (http://frodo. Primer sequences and PCR conditions are available on request. The fragments were amplified using AmpliTaqGold® enzyme (Applied Biosystems, Foster City, CA). PCR products were purified enzymatically with ExoSAP-IT PCR purification kit (USB Corporation, Cleveland, OH). Direct sequencing was done by using Big Dye Terminator kit 3.1 (Applied Biosystems, Foster City, CA) and ABI3730 Automatic DNA Sequencer (Applied Biosystems, Foster City, CA). All methods were performed according to manufacturers' instructions.

Sequencing strategy

The initial sample panel consisted of 45 familial colorectal tumor samples. All variants observed and not found in SNP databases were confirmed by reamplifying and sequencing the tumor DNA. The respective normal samples were screened to distinguish between somatic and germline origin. A panel of control samples from anonymous Finnish blood donors was examined to exclude polymorphisms. A total of 480 control samples were selected according to patients' place of birth. If data from the first 45 patients was compatible with possible identification of a novel hereditary predisposing gene, the analysis was further extended to 61 familial CRC normal tissue DNA samples. The extended panel consisted of both microsatellite unstable (28) and microsatellite stable (33) cancer patients. Finally, all sequence graphs were examined both manually and by computer analysis (SoftGenetics, State College, PA). All reference sequences were obtained from the NCBI database (NCBI database, Build 36.1, released March 2006).


Genotyping of 967 CRC cases and 410 healthy controls for CSMD3 c.4045 T>G, EPHB6 c.961 G>C and c10orf137 c.872 T>C was carried out by using iPLEX Gold chemistry (Sequenom, San Diego, CA). The genotyping of SNP markers was performed by the Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki. Detailed protocols and primer sequences are available on request. The average genotyping success rate was 99.7%.

RNA extraction

Total cellular RNA was extracted using the RNeasy kit (Qiagen, Valenica, CA) and used for cDNA synthesis by reverse transcription PCR according to standard protocol (Promega Corporation, Madison, WI).

In silico analysis

Two sequence homology-based tools were used to predict the potential impact of the identified nonsynonymous germline variants on protein function: Sort Intolerant from Tolerant (SIFT; and Polymorphism Phenotype (PolyPhen; If the SIFT prediction tolerance index score was ≤0.05, the variation was considered deleterious. Predictions made by PolyPhen were assigned as “probably damaging,” “possibly damaging” or “benign.” The potential splicing effect of the germline variants were predicted by two splice site prediction programs: NetGene2 ( and the Berkeley Drosophila Genome Project (BDGP, tools/splice.html).


Samples from 45 familial CRCs without known cancer predisposing mutations were screened for somatic and germline variants in 15 CAN genes. In total, 356 exons were sequenced with a mean coverage of 90%. Tumors were first screened, and all variants observed that were not found in SNP databases (, were sequenced in respective normal tissue of the individual. Altogether, 22 nonsynonymous somatic mutations and 16 germline variants were identified (Table 2). Each variant occurred only once in the sample panel.

Table 2. Somatic mutations and germline variants identified in CAN genes
inline image

The somatic mutations included 20 missense, 1 nonsense (FBXW7, R367X) and 1 splice-site mutation (TCF7L2, IVS10+1 G>A). The effect of the splice site mutation was analyzed by cDNA sequencing, which revealed abnormal mRNA splicing leading predominantly to skipping of Exon 10. Notably, these changes were not observed in matched normal tissue, hence demonstrating their somatic origin. Seven of the somatic mutations were novel variants not found in the COSMIC database (, Table 2). The most frequently mutated genes were phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha isoform (PIK3CA) and F-box/WD repeat-containing protein 7 (FBXW7, also known as CDC4), with eight mutations in each. Location of the mutations in PIK3CA and FBXW7 was nonrandom. Most of the mutations occurred in known mutational hotspots in the domain regions of the proteins (Table 2).9, 10

Of the 16 identified germline variants, 11 were missense and 5 were silent changes. Silent germline variants were not studied further. The identified missense germline variants were screened in population-matched controls. The following three missense germline variants could not be found in a set of 480 population-matched healthy individuals (average success rate 94%): (i) Erythroid differentiation-related factor 1 (C10orf137) I291T, (ii) CUB and sushi domain-containing protein 3 precursor (CSMD3) F1349V and (iii) Ephrin type-B receptor 6 precursor (EPHB6) A321P (Table 2). In addition, the three germline variants were succefully genotyped in 410 population-matched healthy controls with negative results. To evaluate the contribution of the germline variants in our population-based material,7, 8 967 CRC cases were genotyped. No additional variations were identified.

None of the tumor samples showed loss of the wild-type allele and each of these variants was detected only once in our sample set. In two of these families, one CRC case in first-degree relatives had been diagnosed in addition to the index (Fig. 1). One family had two diagnosed CRC cases in addition to the index. Also, other cancers had been identified in these families. The germline variant I291T in C10orf137 was also observed in the affect child, diagnosed with CRC at age 45 years. EPHB6 A321P was not present in the affected brother. Finally, the genes showing the highest somatic mutation frequencies, PIK3CA and FBXW7, were screened for their entire coding region in additional 61 normal tissue CRC samples. However, no germline variants were identified.

Figure 1.

Pedigrees of CRC cases with the identified germline variants. (a) I291T in c10orf137. (b) F1349V in CSMD3. (c) A321P in EPHB6. The affected proband carrying the variant is indicated by an arrowhead. The germline variant I291T in c10orf137 was also present in the affected son, diagnosed at age 45 years. equation image, rectal cancer; equation image, colon cancer; equation image, gastrointestinal cancer (site indeterminate); equation image, gastric cancer. The age at diagnosis is indicated under each indivudual affected.

Two in silico prediction programs, SIFT and Polyphen (,, were used to estimate the potential impact of the germline variants on protein function. Only EPHB6 A321P was predicted to have a deleterious effect on protein function when considering results from both programs. In addition, none of the germline variants were predicted to have a splicing effect, as tested in silico by NetGene2 and BDGP programs (,


A recent systematic sequencing study on breast and colorectal cancer, including the great majority of human genes, identified 280 somatically mutated CAN genes. These included well-known cancer genes but most had not previously been linked to cancer.5 Because 8% of the genes mutated in cancer show both somatic and germline mutations, we hypothesized that the identified somatically mutated CAN genes may be involved in hereditary predisposition.4, 5 We, therefore examined the mutational profile of 15 CAN genes in a panel of 45 familial CRC cases without known CRC predisposing mutations.

Altogether, we identified 22 nonsynonymous somatic mutations of which the majority were of missense type. To our knowledge, seven of the somatic mutations have not been previously linked to cancer (Table 2). The top-ranked CAN gene PIK3CA was one of the most highly mutated genes with eight somatic missense mutations. PIK3CA encodes the p110 catalytic subunit of the class IA phosphatidylinositol 3-kinases (PI3Ks). Oncogenic PI3Ks are known to play a key role in CRC tumorigenesis. The somatic mutation frequencies of PIK3CA have been reported as 14–32% and 9% for sporadic and familial CRC, respectively.9, 11, 12 In this study, all somatic mutations affected residues in the helical and kinase domains of the protein, with most mutations falling into previously reported mutational hotspots E542, E545 and H1047. These hotspot mutations have been shown to elevate lipid kinase activity, highlighting the oncogenic nature of these alterations.13, 14

The mutational profiling of FBXW7 resulted in identification of eight somatic mutations. FBXW7 is a component of an ubiquitin ligase complex that targets molecules, including cyclin E and MYC, for degradation.15, 16 Numerous cancer-associated somatic FBXW7 mutations have been reported, and the gene is described to have a tumor-suppressive role in the development of cancer.17 Recent studies have shown that mutations of a single allele at FBXW7 can have dominant-negative effects in addition to the more common loss of function. The mutation data presented here is highly similar to that previously described, with mostly heterozygous missense mutations affecting conserved arginine residues in the protein interaction domain.18 Because these mutations affect residues at the protein interaction domain, a potential consequence is disruption of substrate binding. We also identified a heterozygous somatic nonsense mutation (R367X) located downstream of the dimerization domain. This somatic mutation has been previously identified, once in CRC and once in endometrial cancer, further strengthening its potential involvement in tumorigenesis.19, 20 Hypothetically, this nonsense mutation that results in a truncating protein might not be able to bind substrates but might interfere with wild-type FBXW7 protein through dimerization.

A somatic splice-site mutation was identified in TCF7L2 that caused skipping of Exon 10. Also, Wood et al.5 identified two somatic mutations in the IVS10 donor of the TCF7L2 gene. TCF7L2 is an important Wnt signalling pathway component and regulates the proliferative cellular compartment in the intestine.21TCF7L2 has shown to be mutated in CRC with frequent somatic frameshift mutations increasing the activity of the protein in microsatellite-unstable CRCs.22 Several TCF7L2 isoforms have been described with cell type-specific distribution and in differential gene-regulatory responses.23 The splice mutation reported here could potentially favour tumor growth; however, functional assays are required to clarify the role of the aberrantly spliced form in CRC development.

Altogether, three novel germline variants were identified, C10orf137 I291T, CSMD3 F1349V and EPHB6 A321P, which were not present in SNP databases or in a set of 890 population-matched healthy controls. EPHB6 A321P was shown not to segregate with CRC in the family, speaking against, but not excluding, a possible role of this variant in CRC predisposition. C10orf137 I291T segregated with the disease phenotype, providing evidence for pathogenicity. Specimens to evaluate segregation of the CSMD3 variant were not available. All three identified missense variants were located at predicted domains (Table 2). When analyzed in silico, EPHB6 A321P was predicted to have a damaging effect on protein function. EPHB6 belongs to the Eph receptor tyrosine kinase family and lacks kinase activity. Several studies have suggested that EPHB6 may act as a tumor suppressor gene.24 Currently, the function of C10orf137, CSMD3 and EPHB6 has not been completely defined, and additional studies will be required to further clarify their role in cancer in general.

In conclusion, our data on somatic mutations in CAN genes is in good agreement with previous works in general.4, 5 However, the majority of the analyzed genes did not show any mutations. These further highlight the idea previously presented that tumors are heterogeneous with very few mutations in common. Nevertheless, different genes contributing to cancer are often functionally equivalent, acting through the same molecular pathway. It appears that pathways rather than individual genes control the tumorigenic process.5, 25 To our knowledge, no germline variants have previously been indentified in CSMD3, EPHB6 and C10orf137 in patients with CRC. All of these variants were present at a low frequency (1/45) in our sample set. It is, however, possible that they modulate predisposition to CRC, although the low frequency of these variants suggests a limited role in hereditary susceptibility. Functional validation and larger sample sets will be required to clarify the role of the identified germline variants in CRC susceptibility.


We thank Ms. Sini Marttinen, Ms. Sirpa Soisalo, Ms. Eevi Kaasinen, Ms. Mairi Kuris, Ms. Inga-Lill Svedberg, Ms. Iina Vuoristo, Ms. Maarit Ohranen and Ms. Maarit Lappalainen for technical assistance. This work was supported by grants from Academy of Finland (Finnish Center of Excellence Program 2006–2011), the Finnish Cancer Society and the Sigrid Juselius Foundation and by grants to A.E.G. (The Chancellor of the University of Helsinki, Finska Läkarsällskapet).