Department of Genetics, University of North Carolina, Chapel Hill, NC
Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
Address reprint requests to: Derek Chiang, Ph.D., Lineberger Comprehensive Cancer Center, University of North Carolina, 450 West Drive, CB #7295, Chapel Hill, NC 27599-7295. E-mail: firstname.lastname@example.org; fax: 919-966-3015.
Potential conflict of interest: D.Y.C. is currently an employee of Novartis.
Genetic alterations in specific driver genes lead to disruption of cellular pathways and are critical events in the instigation and progression of hepatocellular carcinoma (HCC). As a prerequisite for individualized cancer treatment, we sought to characterize the landscape of recurrent somatic mutations in HCC. We performed whole-exome sequencing on 87 HCCs and matched normal adjacent tissues to an average coverage of 59×. The overall mutation rate was roughly two mutations per Mb, with a median of 45 nonsynonymous mutations that altered the amino acid sequence (range, 2-381). We found recurrent mutations in several genes with high transcript levels: TP53 (18%); CTNNB1 (10%); KEAP1 (8%); C16orf62 (8%); MLL4 (7%); and RAC2 (5%). Significantly affected gene families include the nucleotide-binding domain and leucine-rich repeat-containing family, calcium channel subunits, and histone methyltransferases. In particular, the MLL family of methyltransferases for histone H3 lysine 4 were mutated in 20% of tumors. Conclusion: The NFE2L2-KEAP1 and MLL pathways are recurrently mutated in multiple cohorts of HCC. (Hepatology 2013;58:1693–1702)
Hepatocarcinogenesis is instigated by copy number alterations, mutations, and chromosomal rearrangements that activate oncogenes or inactivate tumor suppressors. Previous genetic characterization of hepatocellular carcinoma (HCC) has indicated significant heterogeneity among tumors, which has hampered the development of targeted therapy. Genomic and transcriptomic profiling studies have attempted to classify tumor molecular subgroups and have implicated several signaling pathways that are mutated in HCC.[1-3] The Wnt/β-catenin signaling pathway members, CTNNB1, AXIN1, and AXIN2 are collectively mutated in up to half of tumors. The most frequently mutated tumor suppressor is TP53, which has mutations in over 20% of tumors. Over half of HCCs harbor gains of chromosome 1q and 8q, which include candidate oncogenes MCL1, SHC1, MYC, and COPS5/JAB1, among hundreds of other genes.[4-8]
To date, studies of the mutational spectrum of HCC have focused on a limited number of candidate genes. Advances in genome-sequencing technologies have enabled simultaneous analysis of thousands of expressed genes, accelerating the search for additional novel and recurrently mutated genes.[9-14] Recent studies have identified the adenosine triphosphate (ATP)-dependent nucleosome remodeling enzymes, ARID1A and ARID2, to be mutated in approximately 15% and 5% of tumors, respectively.[11-14] A regulator of the redox-signaling pathway, NFE2L2, is mutated in 6% of tumors. Other genes mutated at greater than 3% frequency include RPS6KA3, IL6ST, NRAS, KRAS, PIK3CA, PTEN, SAMD9L, DMXL1, and NLRP1.[11, 13, 15]
The genetic heterogeneity of HCC has complicated our understanding of the molecular basis of this disease. To further define the important recurrent and clinically actionable mutations in HCC, we embarked on a large-scale study of 87 tumors that was powered to detect mutated genes at a population prevalence of at least 10%. We hypothesized that multiple component genes of certain signaling pathways could be recurrently mutated in HCC.
Patients and Methods
Subjects were identified and informed consent was obtained from consecutive patients undergoing surgical resection for confirmed HCC at the University Health Network (Toronto, Ontario, Canada) and the University of North Carolina (Chapel Hill, NC). Five additional tumor specimens were procured from the Cooperative Human Tissue Network. All surgical specimens were processed according to established institutional HCC-tumor banking protocols. In brief, fresh surgical specimens underwent immediate gross pathologic examination by experienced liver pathologists. Samples were obtained from viable tumor and nontumor liver >2 cm away from the primary lesion. Tumors with evidence of earlier therapy, such as radiofrequency ablation or transarterial chemoembolization, were excluded. Tissue samples were bisected with half of the tissue stored in liquid nitrogen and the mirror sample retained for histologic confirmation. All tumor samples were histologically assessed for diagnosis and cellularity, and nontumor liver samples were confirmed to be free of tumor cells.
DNA Extraction and Sequencing Libraries
Tissue samples were thawed and weighed before homogenization. Genomic DNA (gDNA) and RNA were extracted with the Qiagen AllPrep kit (Qiagen, Valencia, CA), and quality was assessed on 2% agarose gel. gDNA for exome capture and sequencing was prepared using either the SureSelect Target Enrichment System for Illumina Paired-End Sequencing Library (Agilent Technologies, Santa Clara, CA), according to manufacturer's instructions (protocol versions 1.2 [May 2011] and 1.3.1 [February 2012]), or equivalent enzymes (New England Biolabs, Ipswich, MA) and Agilent's protocol (version 2.0.1; May 2010). Briefly, 2-3 μg of gDNA were sheared on Covaris S220 (Covaris, Woburn, MA) to a 150-200-base-pair (bp) target size with the following instrument settings: 10% duty cycle; intensity 5; 200 cycles per burst; and six cycles of 60 seconds each in frequency-sweeping mode. Sheared fragments were ligated to Illumina's adapters and enriched by five to six cycles of amplification. Five hundred nanograms of amplified libraries were incubated with Agilent's whole-exome capture oligos for 24 hours at 65°C. Hybridized fragments were captured with streptavidin-coated beads, eluted, and amplified by 10-12 cycles of polymerase chain reaction (PCR) utilizing either 1 of 12 of Agilent's indexed primers or not indexed SureSelect GA PCR primers. Prepared libraries were pooled in batches of four and sequenced on an Illumina HiSeq 2000 (San Diego, CA) instrument generating 100-bp paired-end reads. Alternatively, individual libraries were sequenced on an Illumina GA II instrument generating 76-bp paired-end reads.
We enriched for protein-coding regions with the Agilent AllExon version 1 sequence capture baits for 10 tumors and matched normals, and with the Nimblegen Human exome version 1 sequence capture baits for five tumors and matched normals. These captured libraries were sequenced on the Illumina Genome Analyzer II to an average of 4.8× and 5.8×, respectively. An additional 72 tumors and matched normals were captured on the Agilent AllExon version 4 sequence capture baits and sequenced in pools of four samples with the Illumina HiSeq 2000 to an average of 59× coverage (Supporting Fig. 1).
Bioinformatics Sequence Analysis
Sequences were mapped to the human genome (NCBI37/hg19), excluding alternative haplotype chromosomes, using the Bowtie 2 alignment algorithm. Alignments were refined using The Genome Analysis ToolKit to mark PCR duplicate reads and perform base-quality recalibration.[17, 18] Alignments from each tumor and matched normal were then analyzed by the MuTect algorithm.[19, 20] In brief, MuTect includes a preprocessing step for sequence read qualities, a Bayesian classifier to assess the posterior probability of somatic mutations, and postprocessing of candidate mutations. Somatic mutations were assigned to transcript and amino acid coordinates using the ANNOVAR software suite.
Availability of Genetic Sequencing Data
Binary sequence alignment map files (BAM files) have been deposited in the National Center for Biotechnology Information dbGAP database.
Significantly Mutated Genes and Pathways
MutSig software (version 1.5) identified the list of significantly mutated genes among 87 HCCs (https://confluence.broadinstitute.org/display/CGATools/MutSig). Genes that harbored a greater number of mutations than expected by chance were detected with a binomial test. For each gene, the observed number of mutations across the 87 tumors was compared to the expected number based on the background mutation rates and the covered bases in all samples. The binomial probabilities were adjusted to false discovery rate (FDR) q-values with the Benjamini-Hochberg procedure and are reported in Table 1.
Table 1. Clinical Characteristics of HCC Samples
Patients N = 87 (%)
Patients may have more than one predisposing condition.
Gene families were downloaded in May 2012 from the HUGO Gene Nomenclature Committee database (http://www.genenames.org/genefamilies/a-z). For each gene family, we tested for an enrichment of mutations in the genes within the family relative to the genes outside of the family. For each individual, we calculated the per-base mutation rate among the exons of the genes within the gene family and among the exons of the genes outside of the gene family. We then tested whether the average mutation rate within a gene family was higher than the average mutation rate for genes outside the family using a one-sided paired t test.
RNA Isolation From Tissue Samples and Quantitative Real-Time PCR
Total RNA was prepared using the Qiagen AllPrep kit (Qiagen), and quality was assessed on 2% agarose gel. MVP Human Liver Total RNA pool (Agilent Technologies) was introduced as a standard control. Complementary DNA (cDNA) was synthesized using 200-ng random primers (Thermo Scientific, Rockford, IL) and 200 U of M-MLV Reverse Transcriptase (Life Technologies, Grand Island, NY) from 2 μg of each total RNA sample, according to the manufacturer's instructions. All samples within an experiment were reverse transcribed under the same condition, and the resulting cDNA was diluted (1:5) in nuclease-free water and stored in aliquots at −20°C until use. Real-time PCR with SYBR green detection was performed in a Light Cycler 480 System (Roche Applied Science, Indianapolis, IN) in a total reaction volume of 20 μL in a 384-well plate. Each reaction contained 5 μL of diluted cDNA, 500 nM of each primer (as listed in Supporting Table 1), and 1× LightCyclerR 480 SYBR Green I master mix. The real-time PCR running protocol consisted of (1) 5-minute preincubation at 95°C, (2) amplification (10 seconds at 95°C, 10 seconds at 60°C, and 15 seconds at 72°C), (3) melting curve (10 seconds at 95°C, 65°C-97°C at at 2.5°C/s−1, and a continuous fluorescent measurement), and (4) 10 seconds of cooling at 40°C. Relative quantitative analysis was carried out according to the 2−ΔΔCt method.
Descriptive characteristics of genetic and clinical variables were reported as frequencies and percentages for categorical variables; continuous variable were reported as medians and range. Comparisons of frequencies between genetic and clinical variables were performed using chi-square and Fisher's exact tests, where appropriate. Survival analyses were performed using Kaplan-Meier's method. Univariate survival analysis was performed using log-rank tests, and multivariate analyses were conducted using Cox's proportional hazards model.
Clinical Characteristics of HCC Specimens
Complete clinical data were available for 87 of 89 (98%) tumor samples and is shown in Table 1. Cases included a mix of predisposing disease etiologies, including 43% and 21% of patients with hepatitis B and C, respectively. Nineteen cases had multifocal disease at time of surgery; however, only one tumor was submitted for analysis in each case. Positive staining for cytokeratin 19 (CK19) in >5% of cells was noted in 12 cases, and two tumors were fibrolamellar HCC. Median follow-up of cases was 33.8 months (range, 3-130). There were 3 (3.8%) perioperative deaths within 90 days. A total of 28 of 87 (32%) patients died, with a mean overall survival (OS) of 80.6 months with a 5-year overall survival estimated at 76% by Kaplan-Meier's analysis. During follow-up, there were a total of 44 recurrences for a median disease-free survival (DFS) of 39.1 months.
Significantly Mutated Genes in HCC Cohort
In total, we found 5,820 nonsynonymous mutations and 433 nonsense mutations in these 87 tumors (average, 66.1; range, 4-362) or 2.5 mutations per Mb sequenced (Fig. 1A). The somatic mutation rate is comparable to those reported in previous studies.[11-14] The mutational bias for CpG to A/T transversions in HCC was consistent with previous studies (Fig. 1B).
We followed standard statistical analyses to discriminate driver mutations from random mutations.[19, 24] We assumed that most of the mutations in cancer genomes represent background noise, whereas driver genes would be mutated more frequently than expected by chance. We used a binomial probability to estimate the expected number of mutations for each sample. This probability distribution corrects for gene length because of the assumption that longer genes will be expected to accumulate more mutations by chance. We calculated the background mutation rate using all of the nonsynonymous and synonymous mutations determined in all 87 samples. For each mutated gene, we then calculated the binomial probability of observing at least N mutations, given the background mutation rate. The P value was adjusted for multiple hypotheses using Benjamini-Hochberg's procedure for controlling FDR. In this analysis, we identified 13 genes that were significantly mutated from the discovery cohort, according to an FDR cutoff of 5% (Table 2).
The most frequently mutated genes in this cohort were the well-known oncogene, CTNNB1 (10%), and the tumor suppressor, TP53 (18%; Table 2). CTNNB1 mutations and activation of the Wnt pathway have been associated with large (>3 cm) tumors, poorly differentiated histology, tumor invasion and metastases, as well as HCV-associated HCC. TP53 mutations have been associated with all predisposing etiologies with specific Ser249 mutations associated with aflatoxin B exposure. KEAP1, encoding kelch-like ECH-associated protein 1, retains NFE2L2/NRF2 in the cytosol and regulates the Keap1-Nrf2 cell defense pathway. Previous studies have shown that the Keap1-Nrf2-signaling pathway mediates protective cellular responses to oxidative and xenobiotic damage.[26, 27] The roles of IGSF3, ATAD3B, and PCMTD1 have not been previously characterized in HCC.
Significant Enrichment of Mutations in Histone Methyltransferases
To further characterize the pattern of mutated genes and explore their significance of functional pathways in HCC, we analyzed mutations within known gene families (Table 3). Among four histone H3 lysine 4 methyltransferases of the MLL family, we validated 13 missense mutations by PCR and Sanger-based resequencing. We identified two tumors with MLL mutations, four with MLL2 mutations, one with MLL3 mutations, and six with MLL4 mutations (Fig. 2A-D). Among the MLL gene family, the MLL2 and MLL4 genes seem to be the most likely driver genes in HCC. MLL4 encodes mixed lineage leukemia-4, one of the MLL family of histone H3 lysine-4 (H3K4)-specific methyl transferases. Notably, MLL4 is a recurrent hotspot for hepatitis B virus (HBV) integration in nearly 12% of HCC genomes. MLL3 and MLL4 participate in transcriptional coactivator complexes and are necessary for expression of p53 target genes in response to DNA damage. Knockdown of MLL4 reduces cell-cycle progression and induces apoptosis.
Table 3. Significantly Mutated Gene Families in 87 HCCs
No. of Genes
Mutation Rate in Family
t Test (Rate)
Nucleotide-binding domain and leucine-rich repeat-containing
Mixed lineage leukemia histone methyltransferases
Calcium channel subunits
Transcript Levels of Recurrently Mutated Genes
We further sought to confirm expression-level signatures of 13 recurrently mutated genes in tumor and liver samples used for sequencing analysis. Total RNA was extracted from 49 tumor samples, eight nontumor liver samples from HCC patients, and normal liver reference RNA. Among the tumors selected for expression analysis, 39 had mutations in recurrently altered genes and 10 lacked mutations in the genes of interest. In the nontumor liver specimens of HCC cases, overexpression of several genes was observed, including TP53 (six of eight samples), CTNNB1 (four of eight), ATAD3B (five of eight), PCMTD1 (five of eight), BRD9 (four of eight), TTLL2 (four of eight), TMEM170A (four of eight), TMEM51 (five of eight), and GJA1 (three of eight), whereas IGSF3 was underexpressed in five of eight samples (Fig. 3). Among samples with a confirmed TP53 mutation, the gene was overexpressed in five of nine and underexpresed in three of nine samples. CTNNB1 was overexpressed in seven of nine tumors with mutations in this gene. KEAP1 expression levels were similar in nontumor liver samples, compared to reference controls, but decreased expression was noted in four of six tumors with KEAP1 mutations. Increased expression of genes in samples harboring mutations was observed for CPA2 (three of six samples), ATAD3B (one of one), PCMTD (one of one), BRD9 (five of six), TTLL2 (two of four), TMEM170A (two of two), TMEM51 (three of three), and GJA1 (two of two).
Clinical Characteristics of HCC According to Mutation Status
HCC arising from hepatitis C infection demonstrated a significantly higher rate of CTNNB1 mutations (62.5% versus 37.5%; P = 0.038), confirming earlier reports associating mutations in β-catenin with hepatitis C virus (HCV). There was also a trend toward higher rates of microvascular invasion in HCC with MLL gene mutations (67% versus 45%; P = 0.11). Otherwise, there were no significant associations between individual gene or gene family mutations and clinical variables assessed.
Mutations in TP53 were associated with significantly higher rate of recurrence (89% versus 40%; P = 0.006) and shorter DFS (median DFS: 7.9 versus 42.9 months; P = 0.001; Fig. 4). There was a trend toward decreased OS status among TP53-mutated tumors (median OS: 26.0 versus 83.2 months; P = 0.1; Supporting Fig. 2). Tumors harboring mutations in the MLL family were associated with a trend toward earlier recurrence, with a median DFS of 28.9 months for MLL mutation carriers, compared to 45.8 months for cases without MLL mutations (P = 0.22), and may be associated with a more aggressive disease phenotype (Supporting Fig. 3). A trend toward lower rates of recurrence (12.5% versus 49.3%; P = 0.060) and prolonged DFS (P = 0.23) was observed in cases with CTNNB1 mutations, but did not reach statistical significance because of limited power. The presence of CPA2 and KEAP1 mutations were associated with decreased DFS; however, these analyses lacked sufficient statistical power.
Univariate analysis of DFS for all clinical and genetic variables identified tumor size (P = 0.042), multifocality (P = 0.077), and p53 mutation status (P = 0.001) as significant or borderline significant predictors of DFS (Table 4). Conditional multivariable survival analysis demonstrated that p53 mutation status was the only independent predictor of DFS (hazard ratio [HR] = 4.245; 95% confidence interval [CI]: 1.86- 9.70; p = 0.02). Tumor multifocality was the only independent predictor of OS.
HCC is a genetically heterogeneous disease; this molecular diversity has led many groups to attempt to characterize HCC to improve our understanding of the genes and pathways involved in the etiology of this disease. The goal of genomic and transcriptomic profiling efforts in HCC is to develop a molecular classification of HCC that identifies characteristic driver genes that either predict prognosis or eventually could be developed as targets for tailored therapies. This study of 87 matched tumor-normal pairs more than doubles the number of HCC characterized by whole-exome sequencing, to a total of 158 tumors. As a result of limited sample sizes (ranging from 10 to 27 tumors), it should not be surprising that these studies have not yielded many overlapping genes. Indeed, larger sample cohorts with clinical follow-up data will be required to discern the prognostic significance of recurrently mutated genes.
An interesting emerging consensus from these HCC-sequencing studies is the prevalence of mutations in chromatin-regulatory enzymes. In particular, several studies have reported mutations in the SWI/SNF-related, ATP-dependent nucleosome remodelers, ARID1A and ARID2.[11-14] We only detected two mutations in ARID1A (2%) and one in ARID2 (1%), despite over 20× coverage of these genomic regions. However, our study concurs with recent reports of mutations in the MLL family of histone H3 lysine 4 methyltransferases, which can also be disrupted by genomic integration of HBV.[14, 28] The clinical characteristics of tumors harboring MLL gene mutations suggest that inactivation of the MLL gene family may be associated with an aggressive tumor phenotype. However, we have not evaluated the functional effect of these mutations on histone methylation. As more data on the MLL gene family are collected, further studies could assess how the most frequent mutations may impair enzymatic function or recruitment of these enzymes. Further work is needed to elaborate how disrupted chromatin regulators cooperate with alterations in known signaling pathways—such as the Wnt/β-catenin pathway or Myc targets—in tumor progression, cellular differentiation, or gene expression.
Woo et al. had previously demonstrated worse OS associated with p53 mutations in a cohort of predominantly Chinese HCC patients with HBV etiology. This study complements those findings by demonstrating the prognostic value of HCC in a North American series of patients of mixed etiology (HBV/HCV). Combined, these data demonstrate that p53 is associated with recurrence and DFS, oncologic outcomes that reflect an aspect of tumor biology, as well as OS, which includes death from both HCC and the underlying liver disease. The observation of p53 as an independent prognostic factor with an ability to predict outcomes in addition to tumor size and number may have important clinical implications in predicting outcomes for patients preceding treatment, such as resection or transplantation.
Sorafenib represents the first molecularly targeted therapy for HCC, and the vast majority of HCC clinical trials are currently evaluating the efficacy of tyrosine kinase inhibitors.[32, 33] However, the combined analysis of whole-exome sequencing from 158 tumors reveals that no single protein kinase that is mutated at more than 5% frequency in HCC.[9-14] This scarcity of kinase mutations suggests that HCC might be rarely susceptible to the dramatic responses to kinase inhibitors that are observed in other cancer types.[34, 35] In contrast, the frequent mutation of MLL histone methyltransferases—as well as ARID ATP-dependent nucleosome remodeling enzymes identified in previous studies—suggests that epigenetic regulatory enzymes may represent important target genes in HCC.
Because most studies to date have been conducted on surgically resected tumors, we have little knowledge about the genetic alterations that occur in either very early lesions treated with ablative modalities as well as later stages of HCC progression that are not amenable to surgical treatment. Our understanding of tumor evolution could be improved by more-sensitive technologies that could sequence gDNA from core biopsy specimens. Another confounding issue with genomic profiling is the high rate of intratumor heterogeneity. Indeed, a pioneering study demonstrated considerable clonal heterogeneity within a single tumor lesion, with allelic frequencies as low as 13%.
In this series, we present the whole-exome sequencing analysis of a large diverse series of HCC tumors and matching normal liver tissue. Our results support the genetic heterogeneity of HCC in that most genes were mutated in few (<20%) of the samples analyzed; however, analysis of gene families have indicated potentially important pathways, including MLL and NFE2L2-KEAP1, that are altered in subsets of tumors. Overexpression of several genes of interest were observed in tumors with identified mutations, but also in adjacent nontumor liver samples, which suggests a role of these genes in the premalignant “field effect” that is observed in the unaffected liver of HCC patients. We observed phenotypic differences in HCC according to gene mutation status, including p53 mutation status as an independent predictor of recurrence-free survival. Several other genes of interest demonstrated trends in time and risk of recurrence; these observations were limited by sample size and require further investigation in larger studies. The lack of correlation between traditional prognostic features, such as tumor size, number, and vascular invasion, indicates that mutational profiling may enhance our ability to develop more-predictive models of tumor behavior. Further investigation is required to enhance our understanding of the full breadth of gene mutations in HCC and identify clinically relevant genes and pathways that can enhance our understanding of hepatocarcinogenesis and develop individualized therapy based on HCC genetic signatures.
The authors thank Bert O'Neil for a critical revision of this manuscript.