Cancer genomics and pathology: All Together Now


Tatsuhiro Shibata, MD, PhD, Division of Cancer Genomics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan. Email:


Cancer develops from a single cell with stepwise accumulation of genomic alterations. Recent innovative sequencing technologies have made it possible to sequence the full cancer genome. Cancer genome sequencing has been productive and helpful in the discovery of novel cancer genes. It also has revealed previously unknown but intriguing features of the cancer genome such as chromothripsis and kataegis. However, careful comparison of these studies has suggested that analyses of most tumors still seem to be incomplete, and histopathological diagnosis/classification will be essential for refining these data. Based on the improvement of technology and the completion of the cancer gene catalog, genetic diagnosis, such as examination of all potentially druggable mutations, of individual cancers will be performed routinely together with histological diagnosis. Pathologists will play a central role in both interpreting these patho-molecular diagnoses for oncologists, and the process of decision-making necessary for individualized medicine.


Multi-step carcinogenesis by accumulation of genomic alterations (Fig. 1)

Figure 1.

Accumulation of genomic alterations during carcinogenesis. Hepatocellular carcinoma (HCC) is shown as a representative case. (Top) Histological representation of multi-step hepato-carcinogenesis. (Bottom) Schematic representation of accumulation of genomic alterations. A regenerative nodule in liver cirrhosis, a premalignant lesion, is caused by clonal expansion, and genomic changes begin to be fixed at this stage.

Cancer develops from a single cell with stepwise accumulation of genetic and epigenetic alterations.1 Pathologically this step can be recognized in a wide range of epithelial cancers (such as those of the colon, lung and liver etc.): this process starts as a precursor lesion or a premalignant lesion, and then proceeds to early (or in situ) cancer, finally resulting in invasive/metastatic tumors that are potentially lethal. Therefore genetic and epigenetic alterations represent the true nature of cancer, and cancer is a disease of the genome. In addition, it should also be borne in mind that cancer is a dynamic condition in which the genome is continuously changing. This process has recently been referred to as tumor (genome) evolution, because it essentially shares some similarity with the evolution of life forms on earth.

Types of genome alteration in cancer

Cancer cells contain various types of genetic alterations. These include single nucleotide substitutions, small insertions/deletions, increases (gene amplification) and decreases (gene deletion) of copy number, and large structural alterations. The latter include inter-chromosomal translocations and intra-chromosomal rearrangements such as inversion and tandem duplication. These alterations affect genes, especially those encoding proteins, and cause changes in the component amino acids, small deletions or truncations of protein, increases/decreases in the expression of many genes, or the production of fusion genes.

Driver/passenger mutations

DNA instability is one of the major driving forces of various genetic alterations in cancer cells. Due to this, a cancer cell may contain thousands of acquired mutations in its genome. Indeed, recent whole-genome sequencing analyses have revealed that the prevalence of somatic mutations in human cancer genomes spans a wide range of zero to approximately 30 mutations per Mb. It is predicted that most of these mutations are neutral and non-functional, while a handful are deeply implicated in carcinogenesis. The former are known as passenger mutations and the latter as (cancer) driver mutations.2 It remains uncertain how many driver mutations exist in each cancer. Probably the number depends on the tumor type and the evolutional process through which it has arisen. An experimental model has proposed that more than four driver mutations are required to transform a normal epithelial cell,3 although in reality the number is likely to be higher than this.

Oncogene addiction

By definition, driver mutations play important roles in cancer cells, promoting cell proliferation, anti-apoptosis, invasion/metastasis, and so on. It could be envisaged that correction of all driver mutations would be required in order to ‘normalize’ a cancer cell. Interestingly, however, previous studies have revealed that the survival of some cancer cells strongly depends on specific driver mutations. It seems that cancer cells depend on (or are ‘addicted’ to) the activation of specific signal pathways, a phenomenon known as ‘oncogene addiction’.4 For example, in a Myc-driven HCC mouse model, when Myc expression was artificially shut down, the tumors shrank rapidly, proving that this HCC model depends on activation of the Myc driver gene.5 The success of molecular-targeted therapy against addicted oncogenes (BCR-ABL, EGFR, HER2, BRAF and EML4-ALK etc.) in a wide range of cancers (leukemia, lung and breast cancers and melanoma) has also confirmed that this feature is common to many types of primary cancer.6–9

This review will focus mainly on acquired genomic alterations (somatic alterations) in cancer. In contrast to somatic mutations, pre-existing germline variations are reportedly associated with susceptibility to various types of cancer and responses to a wide range of drugs.


Figure 2.

Correlation of technological development and cancer gene discovery. Technological developments of genome-wide scanning in cancer genomics research (left) and representative new cancer genes discovered by these methods (right) are shown. The analytical resolution increases from top to bottom.

Discoveries of new cancer genes have been followed by development of new analytical technologies. For example, the discovery of genome-wide allelic markers (such as microsatellite markers) and the development of allele-typing technologies have facilitated scanning of genome-wide loss of heterozygosity (LOH) and led to the identification of tumor-suppressor genes such as RB110 and APC.11 Completion of the Human Genome Project12 made it possible to start large-scale PCR-based exon amplification and resequencing. At the initial stage, kinome (all genes encoding kinases) re-sequencing studies discovered mutations of BRAF and EGFR in melanoma and non-small cell lung cancer, respectively.13,14 After these successes, the hunt for novel cancer genes has evolved to encompass a genome-wide sequencing approach.

Next-generation sequencing (NGS) technology

Recent innovative sequencing technologies have dramatically improved the effectiveness of cancer gene hunting. In 2008, massive parallel sequencing made it feasible to sequence the human genome within a realistic timescale.15 The NGS technologies are based on several innovative principles, including single-molecule amplification, sequencing by synthesis, and paired-end (PE) sequencing (sequencing both ends of DNA fragments), which are quite different from classical Sanger (or capillary) sequencing. These techniques facilitate massive (between a million to a billion at maximum) short-sequence reads at one time (taking about two weeks for a single run). Significantly, PE sequencing has made it possible to annotate structural rearrangements16 as well as identifying fusion transcripts by PE-transcriptome sequencing,17 both of which are very informative for understanding the cancer genome.

The NGS technology continues to be improved further. One current high-spec second-generation sequencer (Illumina Hiseq2000, Illumina, San Diego, CA, USA) can read sequences equivalent to six human genomes within two weeks, but the speed is expected to increase radically in the future. Because this technology currently produces more than one billion short-read (150–100 bp) sequence data at each run, well-refined informatics procedures are needed for manipulation and analysis in order to gain insight from this massive volume of chopped sequence data. This new technology is able to reveal nucleotide changes including substitutions, insertion and deletions, and structural alterations in cancer genomes at single-nucleotide resolution.

Whole-genome sequencing (WGS) and whole-exome sequencing (WES)

The NGS technology has made it possible to sequence the full cancer genome (whole genome sequencing: WGS). However, the cost of WGS is still rather high (approx. 5000 US dollar/genome) for analysis of hundreds of cases. Since the protein coding regions (exon) constitute approximately 2% of the human genome, highly efficient enrichment of all protein-encoding exons (whole exome) will reduce the sequence cost dramatically (theoretically by about 50-fold). To achieve this, several techniques for genome enrichment, such as exon capture or exon enrichment, have been devised.18,19 Next generation sequencing, together with these methodologies (referred to as whole exome sequencing: WES), has dramatically improved the speed of cancer gene hunting. Although WES can economically detect mutations in the protein-encoding regions, this approach cannot in principle detect chromosomal rearrangements, virus integration within the host genome, or mutations in the non-coding regions.

Whole-transcriptome sequencing (WTS)

Next generation sequencing can analyze the whole RNA profile (precisely speaking, reverse-transcribed cDNA), known as the ‘transcriptome,’ in an unbiased way. This approach can measure gene expression more quantitatively than microarray analysis. Moreover, it can detect novel fusion genes and unannotated non-coding RNAs (ncRNAs), which play important roles in tumorigenesis, tumor progression or metastasis. Combination of WGS and WTS can detect RNA editing events by finding discordance between the genome and the transcriptome.

Single-cell sequencing

Because genetic heterogeneity is common in cancers, current analyses using mixed (or heterogeneous) tumor samples may lose important information. Ultimately, single cancer cell analysis would be preferable for detecting individual molecular events, but the application of this approach has just begun to be reported. Navin et al. reported the first single-nucleus sequencing analysis to accurately quantify copy number changes in individual breast cancer cells.20 This analysis revealed a dynamic feature (punctuated clonal evolution) during tumor progression. Recently, Hou et al. reported whole-exome single-cell sequencing.21 This pilot study analyzed 90 cells of a JAK2-negative myeloproliferative neoplasm and identified thrombocythemia-related candidate mutations.

Integrated analysis

Somatic mutation, copy number change, methylation and gene expression data provide much insight into the genome of each cancer. However, integration of these ‘omics’ data can provide multi-dimensional views of cancer. For example, integration of mutation and copy number change helps to identify new oncogene (mutation and gene amplification) and tumor suppressor genes (mutation and deletion). In another way, gene expression and methylation analyses (overexpression and CpG hypomethylation or downregulation and CpG hyper methylation) together with genetic data are also rich resources for cancer gene discovery. As such a systematic approach for molecular characterization of cancers, a large-scale integrated analysis of mutation (WES), copy number (SNP array), epigenome (DNA methylation), and gene expression (poly A RNA and small RNA expression) has been reported.22

International consortium for the cancer genome database

In addition to collecting comprehensive genomic and molecular data, it is important to share these data with scientists all over the world to accelerate the progress of cancer research. The Catalogue of Somatic Mutations in Cancer (COSMIC) established by the Sanger Center is a founding database for the cancer genome.23 To accomplish world-wide cooperation for the standardization and coordination of cancer genome projects and database construction, the International Cancer Genome Consortium (ICGC) was launched in 2008.24 Such efforts are also valuable for understanding the effects and contributions of ethnic or epidemiological differences in each cancer by comparing genome data for the same tumor in different countries. This consortium aims to coordinate the generation of comprehensive catalogues of genomic abnormalities in 50 different cancer types and/or subtypes and to ensure the construction of a high-quality database that is available to the research community. Currently, 14 countries including Japan are participating in this consortium, and more than 40 cancer genome projects are now underway.


Up to May 2012, at least 75 papers related to cancer genome sequencing had been published. Most of those studies included whole-genome sequencing (26/75, 34.7%) or whole-exome sequencing (40/75, 53.3%), and eight papers were related to cancer transcriptome sequencing (8/75, 10.7%) (Fig. 3). Hematological tumors accounted for 22.7% of the reports, while solid epithelial tumors accounted for 54.7% and sarcoma and other non-epithelial tumors for 22.7%. The number of cases analyzed in these studies varied from one to 316 (average 24.6 cases).

Figure 3.

Current status of cancer genome sequencing research. (a) Distribution of cases analyzed in each cancer genome sequencing study. Most current studies analyzed less than 20 cases as a discovery set. (b) Distribution of whole exome (WES), whole genome (WGS), whole transcriptome (WTS) sequencing analysis of cancer genome in reported studies. inline image, WES; inline image, WGS; inline image, WTS; inline image, others. (c) Distribution of epithelial, hematological and other (including sarcoma) tumors in cancer genome sequencing studies. inline image, epithelial; inline image, hematological; inline image, others/sarcoma.

Hematological tumors

In addition to chromosomal translocations, cancer genome sequencing has identified additional somatic mutations in hematological tumors. Of note, frequent mutations of the RNA splicing machinery have been found to be characteristic in a fraction of these tumors (myelodysplastic syndrome (MDS) and chronic lymphocytic leukemia (CLL)).

Acute myeloid leukemia (AML)

The first reported study involving whole cancer genome sequencing analyzed one AML case without any known cytogenetic abnormalities using Illumina sequencing.25 This study initially demonstrated the feasibility of whole cancer genome sequencing, and was a landmark work in this field. The same research group identified recurrent IDH1 and DNMT3A mutations in AML using this approach.26,27 Whole-genome sequencing of seven pairs of secondary AML and matched bone marrow identified genetic evolution profiles during the process of progression from MDS.28

Myelodysplastic syndrome (MDS)

This category contains heterogeneous subgroups of premalignant hematological disorders. Three analyses, two involving WES and one involving WGS, have been published. Yoshida et al. performed WES of 29 MDS cases and identified the RNA splicing machinery as a novel cancer-related pathway.29 This was the first report of mutations of the RNA splicing machinery components, including SF3B1, in cancer. A WGS study reported recurrent mutations of U2AF1, another RNA splicing factor, in MDS.30 Papaemmanuil et al. also identified frequent SF3B1 mutations in MDS, particularly in those with ring sideroblasts.31

Chronic lymphocytic leukemia (CLL)

Two groups have published three genome sequencing papers on CLL. The first study performed WGS on four CLL cases and identified four new cancer genes.32 Later, Quesada et al.33 and Wang et al.34 reported WES of larger sample cohorts (105 and 91 cases, respectively), confirming SF3B1 as a new recurrently mutated gene in CLL.

Non-Hodgkin lymphoma and other hematological malignancies

Whole-exome sequencing analyses of diffuse large B cell lymphoma (six cases) and non-Hodgkin lymphoma (13 cases of diffuse large B cell lymphoma and one case of follicular lymphoma) have been reported.35,36 These two studies identified recurrent mutations in histone-modifying molecules (MLL2, CREBBP and EP300) in this tumor type. The WGS of 12 cases of early T-cell precursor acute lymphoblastic leukemia revealed frequent activating mutations in genes regulating cytokine receptors and RAS signaling, and inactivating mutations of genes associated with hematopoietic development and histone modification.37 The WGS of 38 cases of multiple myeloma revealed clustered mutations in the NFkB pathway as well as BRAF mutation in 4% of cases.38 The WES of a single case identified BRAF as a common mutation in hairy-cell leukemia.39 Also, single WES analysis led to the identification of frequent STAT3 mutation in large granular lymphocytic leukemia.40

Solid tumors

In contrast to hematological tumors, primary solid tumors, especially epithelial ones, contain more non-tumor cells (inflammatory cells, fibroblasts, endothelial cells, etc.). Therefore more extensive sequencing may be required to detect somatic mutations with greater confidence. Alternatively, short-term in vitro culture or xenotransplantation into mice for enrichment of tumor cells has also been tried. Unbiased mutation screening by genome sequencing has led to the discovery of new cancer genes, especially those for metabolic enzymes (e.g. IDH1) and chromatin regulators (e.g. SWI/SNF complex and histone H3.3), which may play broad and important roles in a wide range of solid tumors.

Hepatocellular carcinoma (HCC) with multiple etiological backgrounds

Multiple etiological factors including hepatitis virus B and C infection, alcohol intake, obesity and aflatoxin-B-contaminated food intake are associated with HCC. Totoki et al. reported the first WGS analysis of single HCV-positive HCC and demonstrated a characteristic mutation signature and a set of candidate driver mutations in HCC.41 Recently, Fujimoto et al. reported WGS of 27 HCCs with multiple etiological backgrounds (HCV, HBV and non-virus).42 This revealed an intimate association between the mutation profile and etiological factors as well as recurrent mutations in the SWI/SNF complex and other chromatin regulators (including ARID1A, ARID1B and ARID2). In agreement with this, WES of 10 HCV-positive HCCs identified recurrent ARID2 mutation.43 Another WES study of 24 alcohol-associated HCCs revealed ARID1A and NFE2L2 mutations.44 The WGS of 88 HCCs (including 81 HBV-positive HCCs) mapped HBV virus integration in a genome-wide manner and revealed multiple HBV genome integrations at the TERT, MLL4 and CCNE1 loci.45

Breast cancer (BC)

Gene expression profiling has identified the presence of multiple molecular subtypes (normal breast, luminal, HER2 and basal subtypes) in breast cancer.46 PCR-based target resequencing of 18 191 genes in 11 BCs revealed the first landscape of mutated genes in this cancer.47 The first WGS of breast cancer was performed on a metastatic lobular carcinoma.48 This study compared allele frequencies of mutations between metastatic and primary lesions, and found that most (25/32) mutations detected in a metastatic lesion were present only in metastasis, or at a low frequency in the primary tumor. This also compared WGS and WTS data and identified somatic RNA editing events. Low-coverage WGS of 24 BCs (9 cell lines and 15 primary tumors) identified a complex rearrangement pattern.49 Intra-chromosomal rearrangements were more frequent than inter-chromosomal ones, and tandem duplication was particularly common in some cases. Ding et al. performed WGS of the primary tumor, the metastasis and the xenograft of a basal-type BC case.50 The metastatic lesion contained two additional de novo mutations, and mutations in the xenograft mostly overlapped with those in the other two samples.

Because of the histological/molecular heterogeneity of BC, large-scale analysis of a BC cohort is required. Recently, Stephens et al. reported WES and copy number analysis of 100 primary BC cases.51 This identified new driver genes including AKT2, MAP3K1 and TBX3. That study also revealed 73 different combinations of mutated cancer genes, implying a high genetic diversity among BCs. The WGS of 21 BCs, including BRCA1 or BRCA2 mutated cases, uncovered distinctive mutation profiles of substitutions (dominant C > T or C > G) together with their surrounding sequence contexts and microhomology-mediated small deletions among cases.52

Malignant melanoma (MM) including metastatic cases

Malignant melanoma is a highly metastatic and lethal cancer. It consists of several histological subtypes. One of the pioneering studies involving WGS analyzed a melanoma cell line (COLO829) and a corresponding normal lymphoid cell line.53 That analysis revealed a characteristic mutation signature (dominant C > T/G > A substitution) induced by UV exposure. The WES of seven other melanoma cell lines with matched normal samples revealed recurrent mutations of MAP2K1 and MAP2K2.54 The WES analysis of 14 primary MM cases identified TRRAP and GRIN2A,55 and the WES of eight metastatic MM cases identified MAP3K5 and MAP3K9,56 as recurrently mutated genes. Recently, the WGS of 25 metastatic MMs identified frequent PREX2 mutations.57 Harbour et al. reported WES of two cases of metastatic uveal MM and found frequent BAP1 mutations.58 Interestingly, germline BAP1 mutation was found to predispose individuals to melanocytic tumors.59

Lung cancer

Target resequencing of 623 genes in 188 cases of lung adenocarcinoma revealed the first landscape of somatic mutations in lung cancer.60 This identified multiple mutated kinase genes including ERBB4, EPHA3, KDR and NTRK. The WGS of a small cell lung cancer (SCLC) cell line (NCI-H209) revealed a dominant C > A/G > T substitution signature that is considered to be tightly associated with tobacco smoking.61 This also identified a recurrent rearrangement of PVT1-CHD7 in SCLC cell lines. Lee et al. performed WGS of a single non-small cell lung cancer (poorly differentiated adenocarcinoma) in a smoker.62 They identified the same C > A/G > T dominant substitution signature as that in SCLC. A combination of WGS and WTS for a single sample of adenocarcinoma from a young non-smoker identified the KIF5B-RET fusion gene.63

Gastrointestinal and pancreatic cancers

Similarly to BC, PCR-based target resequencing of 18 191 genes has been performed in colorectal47 and pancreatic cancers,64 and this has provided important details of the genetic frameworks of these tumors. Exome sequencing of 16 954 genes in 15 pancreatic cancer cell lines and their matched normal samples revealed diverse mutations, and MLH1 haplo-insufficiency or complete inactivation was associated with genomic instability.65 The WGS of nine colorectal adenocarcinomas identified recurrence of the VTI1A-TCF7L2 fusion gene.66 The WES of 10 pancreatic neuroendocrine tumors (PNET) revealed recurrent mutations in the DAXX/ATRX, MEN1 and mTOR pathway genes.67 Interestingly, DAXX1/ATRX-mutated PNET invariably showed telomerase-independent telomere maintenance (known as ALT).68

The WES of 22 gastric cancers identified recurrent inactivating mutations of ARID1A, which are more frequent in cases with microsatellite instability or Epstein-Barr virus infection.69 Another WES of 15 gastric adenocarcinoma cases identified recurrent mutations in the chromatin regulators, including ARID1A and FAT4.70

Glioblastoma multiforme (GBM) and other brain tumors

Glioblastoma multiforme is the most common and lethal type of brain tumor. PCR-based target re-sequencing of 20 661 genes identified recurrent mutations in the IDH1 gene.71 The first integrated-type study of the cancer genome analyzed 206 GBM cases and identified multiple new cancer genes and associations between genetic and epigenetic alterations.72 WES of 48 pediatric GBMs identified driver mutations quite different from those of adult GBMs.73 This study revealed the first hotspot mutations in the histone gene (H3.3) and recurrent mutations of ATRX/DAXX. Oligodendroglioma is the second most common brain tumor, and this exhibited characteristic chromosome 1p and 19q losses. The WES of seven oligodendrogliomas identified recurrent mutations of CIC (located on 19q) and FUBP1 (on 1p).74 The WGS of four SHH-type medulloblastomas identified a characteristic rearrangement signature (chromothripsis, see below), which is associated with TP53 mutation status.75

Renal cell cancer (RCC) of the clear cell type

Target re-sequencing of 3544 candidate genes in 101 clear cell RCCs (ccRCCs) identified inactivating mutations in histone-modifying genes (SETD2, KDM5C and KDM6A).76 The same group performed WES of seven ccRCCs and identified frequent mutations of PBRM1.77 Guo et al. performed WES of 10 ccRCCs and identified frequent mutations of the ubiquitin-mediated proteolysis pathway.78

Prostate cancer (PC)

The WGS of seven PCs and corresponding normal tissues discovered a complex chain of balanced rearrangements in this tumor type.79 This study also suggested a link between genomic breakpoints of somatic rearrangements and chromatin or transcriptional regulation. The WES of 23 PCs including 16 lethal metastatic tumors identified hypermutated genomes in a fraction of these cases.80 The WES of 112 PCs further identified recurrent mutations of the SPOP, FOXA1 and MED12 genes.81 The WES of 61 PCs including heavily pretreated metastatic castration-resistant PCs obtained by rapid autopsy revealed recurrent mutations in androgen receptor-associating molecules (MLL2, FOXA1, UTX and ASXL1).82

Other solid tumors including sarcoma

The WES of eight ovarian clear cell carcinomas revealed frequent mutations in ARID1A and PPP2R1A.83 Integrated genomic analysis of 489 high-grade serous ovarian carcinomas identified dominant TP53 mutation (in 96% of cases) and a long-tailed list of other mutated genes including NF1 (4.1%), BRCA1(3.5%) and BRCA2 (3.2%).22 The WES of eight fluke-associated cholangiocarcinomas found recurrent mutations in MLL3 and GNAS.84 The WES of 32 head and neck squamous cell carcinomas revealed NOTCH1 as a tumor suppressor gene in this tumor.85 The WES of 18 uterine leiomyomas identified frequent mutations of MED12 in 70% of cases.86 The WGS of 87 neuroblastomas also found chromothripsis in high-grade cases and frequent mutations in the neuritogenesis genes.87 The WGS of four primary retinoblastomas identified RB inactivation with a few additional mutations and structural alterations.88

Cancer cell line encyclopedia

Cancer cell lines are good resources for analysis of tumor biology and also for evaluation of possible therapies. Barretina et al. reported WES and other genomic analyses (copy number and gene expression) of 947 cancer cell lines.89 On the basis of the pharmacological profiles of 24 anti-cancer drugs in 479 cell lines, they indentified new potential genetic predictors of susceptibility to some of these agents.

Overall, cancer genome sequencing has been productive and powerful in the discovery of novel cancer genes, and the number of publications in this field is rapidly increasing. However, careful comparison of these studies has revealed that analyses of most tumors still seem to be incomplete. For example, analysis of the same common tumor types has revealed a partially consistent but mainly different list of frequently mutated genes. For example, three groups analyzed different subsets of HCC (Asian-virus associated and Caucasian-alcohol associated) and reported different sets of significantly mutated genes.42–44 Four groups analyzed primary or metastatic melanoma samples and reported different driver genes.54–57 These may be due to inter-tumor heterogeneity within small numbers of analyzed cases or the distinct etiology/ethnic backgrounds of samples used in different studies. Therefore larger-scale studies and international collaboration with data comparison (see above) will be required to overcome these inconsistencies. More importantly, it should be recognized that histopathological diagnosis/classification will be essential for refining and comparing these analyses and data.


Figure 4.

Tumor evolution and intra-tumoral heterogeneity. (a) Schematic representation of tumor revolution. Acquisition of driver genes produces new clones and promotes tumor evolution. Acquisition of some driver genes does not confer survival advantage and clones harboring such alterations disappear through revolution (shown by black clones). Other driver genes may produce clones fitted for distant metastases (shown by gray clones). (b) Tumor heterogeneity at the point of histological analysis. Multiple clones including rapidly emerging clones (shown as clone a and b) coexist within a single tumor. Dormant clone (shown as clone d) may remain as a reservoir for another revolutional process after drastic environmental changes (e.g. treatment by anti-tumor chemotherapy).

Pathological examination of a single primary tumor may reveal obvious morphological diversity (or so-called intratumoral heterogeneity) of individual cancer cells. Because carcinogenesis is a process that involves cumulative changes in the driver genes, it is obvious that multiple clones co-exist in a single primary tumor (Fig. 3). In addition to such a Darwinian-like clonal evolution process, the recently proposed ‘cancer stem cell’ or ‘tumor initiating cell’ hypothesis argues that there is a hierarchy or complexity of tumor cells in the context of tumor formation and progression. Such phenotypic diversity is also evident during the process of metastasis or during therapeutic intervention.

Gerlinger et al. investigated intratumoral genetic heterogeneity by multi-region WES of RCC cases.90 They obtained samples from multiple regions of a primary tumor as well as metastatic lesions in four independent cases of RCC and performed WES on two cases. Sequencing analysis revealed shared and private mutations, the latter meaning mutations detected in only one region. Based on the combination of mutation data, they constructed a phylogenic tree of tumor evolution for each case. Their analysis revealed that tumor evolution in these cases was not a simple linear process of mutation accumulation, but a more complex and branched pathway. Recently, Xu et al. applied single-cell sequencing for analysis of tumor heterogeneity within RCC.91 They purified 20 single cells from the tumor and five single cells from the adjacent normal tissue and performed WES sequencing. Although this single-cell sequencing did not detect any clear diversity of acquired mutations among individual cancer cells, it revealed different substitution signatures among common, driver and private mutations.

Yachida et al.92 and Campbell et al.93 reported the dynamics of genome evolution during metastasis of pancreatic cancer. Exome sequencing data or genome rearrangements revealed by low-coverage WGS demonstrated similar dynamic changes in the cancer genome during metastasis. The former study employed seven autopsy samples of pancreatic cancer and analyzed multiple intra-pancreatic and distant metastatic lesions. This made it possible to trace each of the subpopulations that had been responsible for metastasis to distinct organs (peritoneum, liver and lung). The latter study determined the phylogenic trees of metastatic lesions based on the presence of somatic rearrangements.

The WGS of eight paired samples of primary and relapsed AML were used to investigate clonal evolution during therapeutic intervention.94 Two major evolution patterns were demonstrated: (i) acquisition of additional mutations by the founding clones and (ii) acquisition of additional mutations and expansion of a subclone of the founding clones (dormant clone in Fig. 4) during the process of relapse.

In addition to genetic heterogeneity within a single tumor or between a primary and its metastasis, tumor multi-centricity is also an informative phenotype for understanding the complexity of carcinogenesis. Under high-risk conditions such as exposure to potent carcinogens, chronic infection/inflammation or tissue regeneration such as liver cirrhosis, multiple independent tumors occasionally will arise in the same organ simultaneously (synchronous tumors) or within different time windows (metachronous tumors). Fujimoto et al. performed WGS of two pairs of multi-centric HCC tumors.42 These tumors did not share common mutations between the pairs, but the somatic substitution signatures of each multi-centric pair shared significant similarity.


In contrast to WGS/WES analyses, fewer studies of cancer WTS have been published to date. A landmark paper from the University of Michigan group reported the potential of WTS for detecting fusion genes in cancer.17 The same group identified novel fusion genes of RAF kinases (SLC45A3-BRAF and ESPR1-RAF1) in prostate and gastric cancers and melanoma,95 and reported recurrent fusion genes in the MAST kinase and NOTCH gene families in breast cancer.96 Recently Kohno et al. performed WTS of 30 Japanese lung adenocarcinomas and identified the KIF5B-RET fusion gene.97

Whole-transcriptome sequencing can also detect mutations in expressed genes. The WTS of four adult-type granulose-cell tumors discovered a frequent (86/89 cases) hotspot mutation of the FOXL2 gene.98 The same group also performed WTS of 14 non-epithelial ovarian tumors and discovered recurrent somatic DICER1 mutations in these tumors.99

Unbiased identification of expressed transcripts using WTS has led to the further discovery of novel ncRNAs. Presner et al. discovered 121 unannotated ncRNAs in samples of prostate cancer and PCAT-1 ncRNA as a novel prostate-specific regulator of cell proliferation.100


Because of the rapid development of technology and its application to cancer research, there is currently a flood of molecular and sequence data related to the cancer genome. Indeed, sequencing has gradually revealed the nature of the cancer genome and important new players (cancer-driving genes). However, do these data also provide new conceptual frameworks associated with the field of pathology?

New connections between mutations and tumor subtypes

Unbiased genome-wide mutation screening has efficiently identified previously unknown but frequent mutations in characteristic histological subtypes or rare tumors. For example, a tight association between specific mutations and histological subtypes (e.g. BRAF mutation in hairy cell leukemia39 and ARID1A mutation in ovarian clear cell carcinoma83) has been demonstrated. These discoveries are informative when attempting to understand the molecular carcinogenesis of these tumors.

Association between mutation signatures and etiological backgrounds

DNA mutation is caused by both exposure to exogenous/endogenous mutagens (e.g. smoking, UV exposure, radiation, DNA-damaging compounds including anti-cancer drugs, and endogenous mitochondria-derived reactive oxygen species) and intrinsic DNA instability (e.g. defects in the DNA repair system and fragility of the chromosome). Therefore the mutation profile of each cancer genome reflects the characteristic contribution of these factors during carcinogenesis. It is postulated that positive or negative clonal selection will occur for mutations in protein-encoding regions, while many passenger mutations in non-coding areas are neutral and would reflect each contribution more precisely. Therefore, WGS data are more preferable and reliable for analyzing the correlation between mutation signatures and environmental factors. Generally, C > T substitution on CpG, which is induced mainly by deamination of cytosine, is prominent in most cancers. However, there are good examples of characteristic mutation signatures in cancers resulting from specific environmental exposures, such as G > T substitution caused by cigarette smoking in SCLC61 and C > T substitution caused by UV exposure in melanoma.53

In addition to C > T, T > C substitution is also dominant in HCC, but the environmental cause of this substitution remains unknown. Fujimoto et al. performed principal component analysis of 27 sets of WGS data from HCC and found associations between the HCC mutation profile and the presence of hepatitis B virus genome integration, or a history of alcohol intake.42 Nik-Zainal et al. used a non-negative matrix factorization model to analyze mutation profiles in BC.52 Based on WGS data for 21 BC cases, they extracted five distinct mutational signatures. By calculating the contribution of each signature, cancers with BRCA1 or BRCA2 mutations were found to exhibit a characteristic combination of signatures.

Further analysis involving large-scale WGS studies of common and rare cancers will provide valuable information on the association between environmental factors and cancer mutation profiles, which will also be valuable for studies on cancer prevention.

New concepts from cancer genome sequencing

Cancer genome sequencing has revealed previously unknown but intriguing features of the cancer genome. The underlying molecular mechanisms and pathological significance of these features remain to be clarified, however, they should tightly relate to the fields of pathology and epidemiology.

  • 1Chromothripsis (massive remodeling of a single chromosome: from the Greek, chromos for ‘chromosome’; thripsis, ‘shattering into pieces’) (Fig. 5)During analysis of structural rearrangements in cancers, Stephens et al. discovered one CLL case in which 42 somatic rearrangements occurred in the long arm of chromosome 4.101 Based on a survey of 746 cancer cell lines, they identified similar substantial but localized rearrangements in other cancers, and predicted that this massive remodeling of a single chromosome (called chromothripsis) is a more general phenomenon, occurring in about 2–3% of all cancer types (being observed more frequently in bone cancers101). Based on mathematical analysis, they proposed that the process of chromothripsis would be better explained by a ‘single catastrophic rearrangement model’ rather than a ‘progressive rearrangement model’.
  • 2Transcription-coupled mutation repair of specific types of somatic mutationThe WGS of a SCLC cell line has identified dominant G > T transversion in somatic mutations, the number of G > T substitutions being lower on the transcribed strand than on the untranscribed strand.61 The authors further correlated the prevalence of mutation with the level of gene expression, and were able to recognize the effects of transcription-coupled repair on mutations of the transcribed strand. These findings suggest that G > T substitution on the transcribed strand is more likely to be repaired during carcinogenesis. Similar transcription-coupled mutation repair has been reported in other cancers. Recently, Fujimoto et al. reported that this phenotype completely disappeared in a MLH1-mutated (mismatch repair-deficient) case of HCC, thus implicating the role of MLH1 in this process.42
  • 3Kataegis (localized hyper-mutation: from the Greek for ‘thunderstorm’) (Fig. 6)To explore the regional clustering of somatic mutations in the breast cancer genome, Nik-Zainal et al. applied a ‘rainfall plot’ of inter-mutation distance for each mutation.52 They found two types of mutation clusters in the cancer genome: a large (more than 10 Mb) cluster of mutations (referred to as a macrocluster) and a smaller (about a few hundred base pairs) one (referred to as a microcluster). Such mutational features, termed kataegis, were observed in 13/21 BC cases.
Figure 5.

Chromothripsis is a new mode of somatic rearrangements in cancer. (a) Circos plot of a chromothripsis-positive cancer. Circos plot shows chromosomes (out rim), somatic copy number changes (middle rim) and rearrangements (inner rim). Arrow indicates a cluster of rearrangements in the chromsome 22 in this tumor. (b) Schematic representation of two models which cause chromothripsis. Left: An example of step-wise accumulation of single rearrangement event in a progressive rearrangement model. Right: Massive fragmentation and reorganization of the chromosome in a single catastrophic rearrangement model.

Figure 6.

Kataegis is a new distribution phenotype of somatic mutations. (a) Schematic representation of somatic mutation (shown by gray arrow) distribution in cancer genome (shown by black bold line). Top: Usually somatic mutations sparsely distribute within the cancer genome since passenger mutations, which constitute most of them, occur at relatively random positions. Middle and bottom: In cases of Kataegis, the size of mutation cluster ranges more than 10 Mb (macrocluster) or less than 500 kb (microcluster). (b) Representative rainfall plots showing sparse distribution and a macrocluster (indicated by a circle). Reprinted with permission.52


Currently, clinical diagnosis based on molecular data such as protein expression or gene alteration for treatment decision-making is carried out routinely for cancers including breast cancer (e.g. estrogen/progesterone receptor expression and HER2 amplification) and lung cancer (e.g. EGFR mutation and EML4-ALK rearrangement). In addition, information about specific mutation status (e.g. KRAS and BRAF) is valuable for oncologists considering whether molecular-targeted therapy (e.g. Cetuximab for treating colorectal cancer and Vemurafenib for treating melanoma) is warranted. Such personalized cancer medicine is based on knowledge of the cancer mutation repertoire and also of agents that target these altered genes or pathways. Therefore, current efforts to complete the cancer genome catalogue and uncover associations between specific genomic alterations and the efficacies of molecular targeting drugs will provide essential resources for practical molecular classification of tumors. In the near future, molecular diagnosis (for example, potentially druggable mutations in tumors) will be routinely reported by clinical laboratories, together with histological diagnosis. In this situation, pathologists will play a central role in both interpreting these patho-molecular diagnoses for oncologists, and the process of decision-making necessary for individualized medicine.


Here I reviewed recent activities of cancer genome sequencing researches, which include identification of new cancer genes, new concepts and molecular interpretation of old concepts in a wide range of tumors. New sequencing technologies have now been able to tackle with the complexity and dynamics of cancer, which is an old theme but difficult to resolve until now. All these discoveries and rising issues are also quite important questions and challenges to the field of pathology for clearer understanding of each tumor since pathologists can approach the complex and unrevealed nature of tumors through the combination of histological analysis and genomic profiling.


I thank all colleagues in our laboratory for discussion and comments on this review. This work was supported partially by the Program for Promotion of Fundamental Studies in Health Sciences of the National Institute of Biomedical Innovation (NIBIO) and National Cancer Center Research and Development Fund (23-A-8).