Genomic landscape of copy number aberrations enables the identification of oncogenic drivers in hepatocellular carcinoma


  • Potential conflict of interest: KW, SS, SD, TX, ZZ, YW, DP, WJY, PAR, MM and JX are or were employed by Pfizer Inc. KW, SS, SD, YW, and DP own stock in Pfizer.

  • Data availability: Both gene expression and SNP genotyping array data have been deposited to Gene Expression Omnibus ( under accession numbers GSE36376 and GSE38326, respectively.


Cancer is a genetic disease with frequent somatic DNA alterations. Studying recurrent copy number aberrations (CNAs) in human cancers would enable the elucidation of disease mechanisms and the prioritization of candidate oncogenic drivers with causal roles in oncogenesis. We have comprehensively and systematically characterized CNAs and the accompanying gene expression changes in tumors and matched nontumor liver tissues from 286 hepatocellular carcinoma (HCC) patients. Our analysis identified 29 recurrently amplified and 22 recurrently deleted regions with a high level of copy number changes. These regions harbor established oncogenes and tumor suppressors, including CCND1 (cyclin D1), MET (hepatocyte growth factor receptor), CDKN2A (cyclin-dependent kinase inhibitor 2A) and CDKN2B (cyclin-dependent kinase inhibitor 2B), as well as many other genes not previously reported to be involved in liver carcinogenesis. Pathway analysis of cis-acting genes in the amplification and deletion peaks implicates alterations of core cancer pathways, including cell-cycle, p53 signaling, phosphoinositide 3-kinase signaling, mitogen-activated protein kinase signaling, Wnt signaling, and transforming growth factor beta signaling, in a large proportion of HCC patients. We further credentialed two candidate driver genes (BCL9 and MTDH) from the recurrent focal amplification peaks and showed that they play a significant role in HCC growth and survival. Conclusion: We have demonstrated that characterizing the CNA landscape in HCC will facilitate the understanding of disease mechanisms and the identification of oncogenic drivers that may serve as potential therapeutic targets for the treatment of this devastating disease. (Hepatology 2013;58:706–717)


American Joint Committee on Cancer


Benjamini-Hochberg's method


cyclin D1


cyclin-dependent kinase inhibitor


Cancer Gene Census


colony formation assays


chromodomain helicase DNA binding protein 1-like


chromosome instability index


copy number aberration


disease-free survival


disease-specific survival


false discovery rate


fibroblast growth factor 19


glyceraldehyde-3-phosphate dehydrogenase


hepatitis B virus


hepatocellular carcinoma


hepatitis C virus


immunohistochemical IRB, institutional review board


mitogen-activated protein kinase


hepatocyte growth factor receptor


messenger RNA


the Molecular Signatures Database




phosphoinositide 3-kinase


small interfering RNA


single-nucleotide polymorphism


transforming growth factor beta


vascular endothelial growth factor.

Hepatocellular carcinoma (HCC) is the fifth-most common cancer and the third-most common cause of cancer-related death worldwide. It has high prevalence in Southeast Asia because of endemic hepatitis B virus (HBV) infection and is refractory to nearly all currently available anticancer therapies.[1] Extensive studies of HCC have implicated aberrant activation of many signaling pathways involved in cellular proliferation,[2] survival,[3] differentiation,[4] and angiogenesis.[5] Although these studies have increased the understanding of HCC tumorigenesis, few studies provide reliable information on how frequently these targets and pathways are altered in HCC patients. A number of genome-wide gene expression profiling studies have been performed using clinical samples from various geographic regions across the world: These studies have highlighted specific genes and molecular pathways in the pathogenesis of HCC and have proposed molecular classifications of HCC.[5-7] To further elucidate the mechanism of hepatocarcinogenesis, it is useful to reconstruct molecular events at both the gene expression and DNA copy number levels. With the rapid development of high-density single-nucleotide polymorphism (SNP) array and array-based comparative genomic hybridization, it has become feasible to characterize CNAs involved in tumor development and progression across the entire genome. Several groups have applied these technologies to identify copy number aberrations (CNAs) in HCC and nominated putative driver genes.[5, 8-10] However, many of the previous studies were limited by the modest size of the studied cohorts, whereas others lacked a coherent dataset, including both copy number and gene expression measurements from the same set of patients, which hindered a fully integrated analysis. It is also useful to comprehensively characterize HCC cell line models so that putative driver genes that are driven by CNAs can be studied in preclinical models carrying the matching genetic alterations. Toward this end, a comprehensive collection of characterized HCC cell line models is still lacking.

In this study, we comprehensively and systematically analyzed the genome-wide CNAs and accompanying gene expression changes in 286 primary HCCs and 30 HCC cell lines. This allowed us to characterize the genomic landscape of HCC and to identify regions in the HCC genome that have undergone recurrent high-level focal amplifications or deletions. Pathway analysis of cis-acting genes in these CNA regions suggests that key cancer pathways are altered in a significant proportion of HCC patients. We further proposed a fully integrated approach to identify candidate oncogenic drivers from recurrent focal amplicons and credentialed two candidate drivers (BCL9 and MTDH) by demonstrating that they play a significant role in tumor growth and survival in relevant HCC cell line models.

Patients and Methods

Patient Samples and HCC Cell Lines

A total of 286 pairs of fresh frozen tumor and adjacent nontumor liver tissues containing no necrosis or hemorrhage were collected from primary HCC patients who were treated with surgical resection at Samsung Medical Center (Seoul, Korea) from July 2000 to May 2006 (Table 1 and Supporting Table 1; Supporting Materials). Informed consent was obtained from each patient included in the study. This study was approved by the institutional review board (IRB) of Samsung Medical Center (IRB approval no.: SMC 2010-04-083). A list of HCC cell lines used in this study and their sources can be found in Supporting Table 2.

Table 1. Major Demographic and Clinicopathological Characteristics of the HCC Cohort and Their Associations With the CIN Score Inferred Based on CNA Data
Characteristics No. of PatientsPercentageAverage CIN ScoreP Value
  1. a

    Linear trend test.

  2. b

    One-way analysis of variance test.

  3. Abbreviations: AFP, alpha-fetoprotein; BCLC, Barcelona Clinic Liver cancer classification.

Edmondson gradeI3311.50.097<0.0001a
Tumor size, cm<518564.70.137<0.0001b
Serum AFP, ng/mL<2010839.30.1350.0036b
Intrahepatic metastasisAbsent22378.00.1410.0001b
Vascular invasionAbsent13346.50.127<0.0001b
Child Pugh classA27395.50.1510.0012a
AJCC T stageI12042.00.128<0.0001a
BCLC stage0-A16056.00.1380.2090a
Table 2. Top Amplification and Deletion Peaks in HCC
CNA TypeCoordinatesCytobandQ ValuesCNaFreqb (%)NgcCancer Genes
  1. Shown are peaks with GISTIC2 residual Q values ≤0.05 and peak frequency ≥4%. Cancer genes column lists representative cancer genes in each CNA peak selected based on the CGC ( BCL9 and MTDH were identified as putative HCC drivers in this study. Full sets of genes under each CNA peak can be found in Supporting Table 3.

  2. a

    Estimated copy number.

  3. b

    High peak frequency using cutoffs of 3 and 1.3 for amplifications and deletions, respectively.

  4. c

    Total number of genes in the peak.

Ampchr11:68927460-6925368811q13.26.17E-144.264.93CCND1, FGF19
Delchr1:611233-644451131p36.110.04661.217.0720CDKN2C, ARID1A
Delchr9:21855960-224379069p21.39.53E-371.0512.62CDKN2A, CDKN2B

SNP Genotyping Array and Gene Expression Array Data

Gene expression profiling was performed at Expression Analysis (Durham, NC) on Illumina Human HT-12 v4 BeadChips, according to the array manufacturer's protocol. Data were processed using an in-house pipeline to derive gene-summarized expression values (Supporting Materials). Genotyping was performed on the Human Omni1-Quad BeadChip by Illumina FastTrack Services (Illumina, San Diego, CA), where samples were processed according to the manufacturers' instructions. Raw data were processed using an in-house pipeline to obtain copy number segments and gene-summarized copy number estimates (Supporting Materials).

Copy Number Data Analysis

In primary HCC samples, copy number gain and loss cutoffs were selected to be 2.3 and 1.7, respectively, based on an assessment of replicate samples from the same SNP arrays. Copy numbers ≥3 and ≤1.3 were considered high-level amplifications and deletions, respectively, which represent conservative thresholds as primary tumor samples are typically a mixture of tumor cells and surrounding or infiltrating stromal cells. In HCC cell lines, we used a minimum copy number cutoff of 2.7 to select models with amplifications and treated models with copy numbers >1.7 and <2.3 as copy number neutral. GISTIC2 analysis[11] was performed on segmented copy number data using default parameters.

Chromosome Instability Index Score

We devised a chromosome instability index (CIN) score to measure degree of CNAs across the entire genome of a tumor, taking into account both the total regions of chromosome that are altered in a tumor as well as the amplitude of these alterations. Specifically, for a tumor genome segmented into L segments, where li and ai are the length and mean value (as Log2 intensity ratios between tumors and matched normal samples) of segment i, the CIN score is defined as shown in Equation (1):

display math(1)

Statistical Analysis

Associations with disease-specific survival (DSS) and disease-free survival (DFS) were assessed using Cox's proportional hazards regression model (see Supporting Materials for definition of DSS and DFS). P values were corrected for multiple testing (of all genes on the microarray) using Benjamini-Hochberg's (BH) method[12] to calculate the false discovery rate (FDR). To assess the ability of copy number traits to predict patient outcomes, we compared the number of copy number traits that are associated with clinical outcomes to the number from a permutated dataset where the sample labels were randomly shuffled for each trait independently. cis-correlations between a gene's copy number and its own messenger RNA (mRNA) expression across tumors were calculated using Pearson's correlation. P values associated with the resulting correlation coefficients were corrected for multiple hypotheses testing using the BH method. The null correlation distribution was obtained by shuffling the sample label between each copy number and expression vector independently for all genes.

Pathway Analysis

Genes with expression changes driven by somatic CNAs were selected from GISTIC2 amplification or deletion peaks with significant cis-correlation (FDR ≤0.05). We used the canonical pathway database from the Molecular Signatures Database (MSigDB),[13] excluding gene sets with fewer than 10 or more than 500 members. Overrepresentation of selected genes among these pathways was assessed using Fisher's exact test. The FDR was calculated based on 100 permutations where random gene sets of the same size were tested. The final top 17 pathways were selected based on (1) enrichment FDR ≤0.05 and (2) at least 30% of HCCs in the studied cohort having at least one gene in the pathway altered by somatic CNAs.

Quantitative Reverse-Transcription Polymerase Chain Reaction

Total RNA was extracted from cell lines using the RNeasy Plus Mini Kit (Qiagen, Valencia, CA). The Taqman gene expression assay was performed using the TaqMan RNA-to-CT 1-step Kit protocol (catalog no.: 4392938; Applied Biosystems, Foster City, CA), according to the manufacturer's instructions. Data were derived from three independent experiments. Data analysis was performed using Stratagene's software (Stratagene, La Jolla, CA), where threshold cycle values were unlogged and normalized by the glyceraldehyde-3-phosphate dehydrogenase (GAPDH) reference. Knockdown percentage was calculated as percent reduction in average signal from siBCL9 or siMTDH cells, relative to siControl cells (set to 100%), in each assay.

Cell Proliferation Assay

The cell proliferation assay was performed using the CyQuant Direct Cell Proliferation assay (Life Technologies Corporation, Carlsbad, CA), according to the manufacturer's protocol. Data were derived from five independent experiments. Percent inhibition was calculated as percent reduction in average signal from siBCL9 or siMTDH cells, relative to siControl cells (set to 100%), in each assay. P values between siControl and siBCL9 or siMTDH samples were calculated using a two-sample t test.

Soft Agar Assay

Cells expressing small interfering RNAs (siRNAs) targeting BCL9, MTDH, or control were suspended in a top layer of RPMI growth media and 0.35% Ultrapure LMP agar (Life Technologies) and plated on a bottom layer of growth media and 0.6% LMP agar in a 96-well plate. Soft agar colonies were stained with 0.5 μM of calcein-AM solution (Life Technologies) and counted 5-14 days after plating with an Acumen eX3 multiplate reader (TTP LabTech Ltd., Melbourn, UK). Data were derived from five independent experiments. Percent inhibition was defined as percent reduction in average number of colonies formed in siBCL9 or siMTDH cells, relative to siControl cells (set to 100%), in each assay. P values between siControl and siBCL9 or siMTDH samples were calculated using a two-sample t test.


Genomic Landscape of HCC

To characterize the genomic landscape of HCC, we compiled a collection of snap-frozen tumor and adjacent nontumor liver tissues from 286 patients who were treated with surgical resection (Table 1). Both RNA and DNA were isolated from all samples and profiled on the Illumina Human HT-12 v4 BeadChips and Human Omni1-Quad SNP genotyping arrays (Illumina), respectively.

Based on the SNP genotyping array data, we derived the somatic copy number profiles of the 286 HCCs using their matched nontumor liver tissue as references. On average, there are 200 somatic copy number gain events and 247 somatic copy number loss events per HCC, accounting for 12.0% and 11.3% of the genome, respectively. A genome-wide view of the segmented copy numbers revealed that most chromosome arms have undergone large-scale copy number gains or losses, with frequent gains observed on 1q, 6p, 7p, 7q, 8q, 13q, and 17q and frequent losses on 1p, 4q, 8p, 9p, 9q, 13p, 16p, and 16q (Fig. 1A). We also devised a CIN score, which is a single metric that summarizes the extent of CNAs in individual tumors (see Patients and Methods). We found that the CIN scores were positively associated with various features of tumor progression, such as American Joint Committee on Cancer (AJCC) stage, Edmondson grade, and tumor size, in agreement with our understanding of somatic CNAs as a cumulative process as a tumor advances (Table 1). On the other hand, the CIN scores were negatively associated with patients' age, the Child-Pugh score, and cirrhosis, which reflect overall liver function and pathological state of the non-HCC liver (Table 1). In addition to clinical HCC samples, we also profiled 30 HCC cell lines on the same gene expression and SNP genotyping array platforms. Overall, the spectrum of CNAs in HCC cell lines recapitulates primary HCCs (Fig. 1A).

Figure 1.

CNA landscape of primary HCCs and HCC cell lines. (A) Heatmap showing the CNA pattern of HCC genome. Columns represent markers along chromosomes and are sorted according to their genomic coordinates; rows represent tumors separated into primary HCCs (upper) and HCC cell lines (lower). Copy number changes of the 22 autosomes are shown in shades of red for copy number gains and shades of blue for copy number losses. Histograms on top of the heatmap show frequency of copy number gains (above the horizontal axis and in red) and losses (below the horizontal axis and in blue) in the primary HCC population. (B) Focal amplification and deletion peaks identified by GISTIC2 in primary HCCs. For both plots, markers on SNP arrays are plotted on the vertical axis and sorted by their genomic coordinates; horizontal axes show the statistical significance of each peak as Q values determined by GISTIC2. Vertical green lines represent the default GISTIC2 Q value cutoff of 0.25. Tick marks indicate the positions of high confidence peaks reported in Table 2, with number of genes in each peak shown in brackets and representative known cancer genes shown in parentheses.

To assess the extent to which somatic CNAs in HCC drive downstream transcriptional programs, we calculated the correlation between a gene's somatic copy number and its mRNA expression in cis across our patient cohort. Overall, there were 3,152 genes for which at least 10% (i.e., correlation coefficient ≥0.316) of their expression variation can be explained by their own copy number changes, whereas by chance only one gene was expected at the same level of correlation (FDR = 3.17 × 10−4) (Supporting Fig. 1A). This is consistent with previous studies,[14] and suggests that somatic CNA significantly contributes to the expression landscape in HCC. In addition, somatic copy numbers of 661 and 206 genes were also significantly associated with DSS and DFS in our cohort, respectively (P < 1 × 10−4), whereas by chance one could expect only two and one genes, respectively, at the same P value cutoff (Supporting Fig. 1B). Hence, somatic CNAs in HCC are clinically relevant and may provide novel prognostic markers. We also observed a nonrandom distribution of CNA-to-CNA correlations where unlinked loci were frequently correlated to each other (Supporting Fig. 2). As expected, adjacent loci were highly correlated, whereas at a higher level some chromosome arms became either unlinked (e.g., 6p versus 6q and 17p versus 17q) or anticorrelated (e.g., 1p versus 1q and 8p versus 8q). In addition, numerous correlations between unlinked loci were observed, suggesting coselection of these genomic regions (e.g., 1p versus 16p, 1q versus 4q, and 5q versus 19q) as previously reported.[14]

Figure 2.

Validation of BCL9 as an oncogenic driver in HCC. (A and B) Relationship between BCL9 somatic copy number and mRNA expression (A), and between BCL9 mRNA and protein expression (B), in primary HCCs. P value for the mRNA-protein association was derived from a linear trend test. (C and D) qRT-PCR data showing knockdown efficiency of siRNA targeting BCL9 in all tested cell line models for CyQuant proliferation assays (C) and CFAs (D). BCL9 expression levels were normalized by the GAPDH reference and set to be 1 in siControl experiments. Data were derived from three independent experiments and plotted as mean ± standard deviation. (E and F) Quantification of proliferation by CyQuant (Life Technologies Corporation, Carlsbad, CA) proliferation assays (E) and CFAs (F) in BCL9 and control siRNA-transfected cells. Percentage numbers shown above the bars for each cell line denote the average percent of inhibition in BCL9 siRNA-transfected cells, relative to siControl transfected cells. Asterisk indicates that the P value between siBCL9 and siControl is ≤0.05. The two HCC models with BCL9 amplification and the two unamplified control models were labeled by square brackets on the x axis.

Identification of Candidate Oncogenic Drivers in HCC

Although the overall CNA pattern is broadly consistent with the literature on HCC,[5, 9, 10, 14] the size and quality of our dataset should provide greater power to accurately localize and identify both large-scale and focal chromosomal alterations. To identify regions of copy number changes that may be responsible for driving tumorigenesis, we applied the GISTIC2 algorithm,[11] which incorporates both amplitude and frequency of CNAs to determine their statistical significance. Amplification or deletion peaks identified by GISTIC2 represent recurrent overlapping CNAs among multiple tumors, thus providing a finer resolution for mapping putative oncogenes and tumor-suppressor genes. Our GISTIC2 analysis identified 146 focal events, including 99 amplification peaks and 47 deletion peaks (Fig. 1B; Supporting Table 3). The median size of amplification peaks is 0.24 Mb (ranging from 1.5 kb to 11.6 Mb), containing an average of ∼5 genes per peak (excluding peaks that contain no genes, or “gene-less” hereafter). The median size of deletion peaks is 2.8 Mb (ranging from 46 kb to 122 Mb), containing an average of ∼100 genes per peak. We found that amplification peaks were significantly smaller than deletion peaks (P = 2.6 × 10−7; Supporting Fig. 3), and that genes under the amplification peaks tended to have stronger cis-correlation than those under deletion peaks, whereas both showed stronger cis-correlation compared to genes not located within any peak (Supporting Fig. 3). These observations support the disease relevance of the CNA peaks and are consistent with the assumption that oncogene activation is more locus specific than tumor-suppressor inactivation in cancer. We also thoroughly examined the association of GISTIC2 peaks to clinical and outcome variables (summarized in Supporting Table 4).

Table 3. Top Pathways Overrepresented Among cis-Acting Genes in CNA Peaks
Canonical PathwaysFold EnrichmentaP ValueFDR%HCCsb
  1. a

    Calculated against all human genes in the MSigDB.

  2. b

    Percent of HCC patients with at least one gene altered in the pathway.

  3. Abbreviations: ChREBP2, carbohydrate response element-binding protein 2; eIF4, eukaryotic translation initiation factor 4.

Pathways in cancer1.900.000150.0006750.3
Cell cycle3.290044.8
Ubiquitin-mediated proteolysis3.290039.5
Wnt signaling pathway2.010.00380.009438.1
TGF-b signaling pathway2.350.00480.01237.1
Insulin signaling pathway3.070036.4
p53 pathway3.170.000190.0007136.4
Oxidative phosphorylation2.870.0000050.00004735.7
Basal transcription factors4.670.0000330.0002535.7
ChREBP2 pathway3.440.000970.003134.6
eIF4 pathway6.310.0000050.00004632.9
Neuotrophin signaling pathway2.400.000480.001832.2
MAPK signaling pathway1.700.00510.01231.5
PI3K pathway3.570.00280.0067230.4
Figure 3.

Validation of MTDH as an oncogenic driver in HCC. (A and B) Relationship between MTDH somatic copy number and mRNA expression (A), and between MTDH mRNA and protein expression (B), in primary HCCs. P value for the mRNA-protein association was derived from a linear trend test. (C and D) qRT-PCR data showing knockdown efficiency of siRNA targeting MTDH in all tested cell line models for CyQuant (Life Technologies Corporation, Carlsbad, CA) proliferation assays (C) and CFAs (D). MTDH expression levels were normalized by the GAPDH reference and set to be 1 in siControl experiments. Data were derived from three independent experiments and plotted as mean ± standard deviation. (E and F) Quantification of proliferation by CyQuant proliferation assays (E) and CFAs (F) in MTDH and control siRNA transfected cells. Percentage numbers shown above the bars for each cell line denote the average percent of inhibition in MTDH siRNA-transfected cells, relative to siControl transfected cells. Asterisk indicates that the P value between siMTDH and siControl is ≤0.05. The two HCC models with MTDH amplification and the two unamplified control models were labeled by square brackets on the x axis.

We next focused on higher confidence peaks with residue Q value (by GISTIC2) ≤0.05, and high-level alteration frequency of at least 4% in our cohort, resulting in 29 amplification peaks and 22 deletion peaks (excluding gene-less peaks) (Table 2). The most highly amplified peak is located at chromosome 11q13.2 and contains three genes, including cyclin D1 (CCND1) and fibroblast growth factor 19 (FGF19), both of which have recently been reported to be amplified in HCC and validated as bona fide HCC drivers.[9] Hepatocyte growth factor receptor (MET) is one of 10 genes in the amplification peak located at 7q31.2, encodes the receptor for hepatocyte growth factor, and has been implicated as an oncogene in several cancer types, including HCC.[2] Many clinical compounds are available that specifically inhibit MET, thus providing an actionable path forward for testing MET as a potential target in HCC. Another gene of interest is chromodomain helicase DNA binding protein 1-like (CHD1L), which has been shown to interact with poly(ADP-ribose) and is involved in chromatin relaxation subsequent to DNA damage. Recent studies[15] have established its oncogenic role in HCC both in vitro and in vivo. Overall, we found a number of genes in the Cancer Gene Census (CGC)[16] under the top amplification peaks (those not reviewed here include BCL9, ARNT, ABL2, REL, XPO1, COX6C, ATF1, and BCL11B). Consistent with previous findings in HCC, the most frequently deleted peak is located at chromosome 9p21.3 and encompasses cyclin-dependent kinase inhibitor 2A and 2B (CDKN2A and CDKN2B, respectively), two well-documented tumor-suppressor genes that play a regulatory role in the CDK4/6 and p53 pathways in cell-cycle G1 progression. Other well-known tumor suppressor genes located within the top deletion peaks include PTEN, RB1, BRCA2, and SMAD4.

In addition to these well-known cancer genes, which recapitulated important drivers in HCC, our analysis also revealed other chromosomal regions that have undergone recurrent CNAs in HCC, affecting a greater number of genes not previously known to be involved in HCC. For example, seven additional amplification peaks were identified, each containing a single gene in the peak. These include TMLHE, A26A1, ABCC4, MTDH, PRDM14, BAT2D1, and RFWD2, which may be worth testing as potential drivers in HCC. Further studies are necessary to determine the function of these genes to understand their roles in hepatocarcinogenesis and identify potential therapeutic targets for HCC.

CNAs Affect Key Cancer Pathways in HCC

Another approach to gain insight into these candidate driver genes is to organize them into molecular pathways and cellular processes and search for patterns of pathway alterations. In addition to placing the candidate CNA drivers into a mechanistic context, this approach can also identify other genes on the altered pathway for which therapeutic options may be available. However, one challenge of CNA-based driver gene discovery is that passenger genes are often coamplified or codeleted in the same regions as the true driver genes, even after applying algorithms such as GISTIC2, which attempts to pinpoint the exact location of drivers by examining the minimal overlapping regions across a large tumor population. To alleviate this potential contamination from passenger genes, we focused on genes under GISTIC2 peaks with significant cis-correlation to their own mRNA (i.e., the so-called cis-acting genes). Our analysis showed that cell cycle was the most enriched pathway affected by somatic CNA involving cis-acting genes, such as CCND1, CDC16/23/25C, and CDKN2A/2B, together affecting 44.8% of HCCs in our study cohort (Table 3 and Supporting Table 5). The KEGG “Pathways in Cancer” was altered more frequently in our cohort than any other pathway, affecting more than half (50.3%) of the tumors, underlying the broad-spectrum effect of somatic CNAs in targeting multiple key pathways in cancer simultaneously. More specifically, we also identified individual cancer-related molecular pathways that were significantly overrepresented among cis-acting genes driven by somatic CNAs, including Wnt signaling, transforming growth factor beta (TGF-β) signaling, the TP53 pathway, mitogen-activated protein kinase (MAPK) signaling, and the phosphoinositide 3-kinase (PI3K) pathway, many of which have established roles in HCC and therapeutic implications that may influence drug discovery and development. A detailed view of frequent somatic CNAs in critical signaling pathways identified in our HCC cohort is summarized in Supporting Fig. 4. Taken together, these results provided new insights into HCC carcinogenesis and prompted us to search for novel driver genes and potential therapeutic targets in these somatic CNA regions.

Validation of Candidate Oncogenic Drivers

To generate testable hypotheses that could be followed up experimentally in appropriate model systems, we focused on cis-acting candidate driver genes (i.e., with positive cis-correlation and an FDR ≤0.05) that are in a highly amplified peak with ≥4% frequency and ≤10 genes in the peak. We further filtered the list to those genes with ≥2-fold overexpression in the amplified tumors, compared to adjacent nontumor liver tissues, and with at least two HCC cell lines carrying the same gene amplification. Of the 14 candidate drivers from seven amplicons (Supporting Table 6), some were well-established oncogenic drivers in HCC, including CCND1, FGF19, and CHD1L.[9, 15] We were able to perform functional testing on two additional genes (BCL9 and MTDH), based on reagent availability and previous knowledge of their involvement in cancer. To test the hypothesis that HCCs with focal amplification of the candidate driver are more dependent on the driver for growth and survival, compared to HCCs without the gene amplification, we selected four HCC cell lines for each candidate driver to perform target knockdown using RNA interference: two with amplification of the target and two that were copy number neutral.

BCL9 encodes B-cell CLL/lymphoma 9 and is involved in the Wnt/β-catenin signaling pathway by mediating the recruitment of pygopus to the nuclear β-catenin/TCF complex.[17] Although a t(1;14)(q21;q32) translocation involving BCL9 and IGL has been found in B-cell acute lymphoblastic leukemia,[18] neither BCL9 translocation nor gene amplification have been reported in HCC. In our HCC cohort, BCL9 was located in the amplification peak at 1q21.1, which is highly amplified in 8.7% of HCCs (Table 2 and Supporting Table 3). There was a significant correlation between its somatic copy number and gene expression in primary HCCs (Fig. 2A), and protein expression measured by immunohistochemical (IHC) staining (Supporting Materials; Supporting Fig. 5A) correlated well with mRNA expression (Fig. 2B; Supporting Table 7). Transient transfection of siRNA SMARTpool for BCL9 significantly reduced gene expression of BCL9 in all four cell lines tested (Fig. 2C,D) and significantly decreased cell growth and survival in both proliferation assays (Fig. 2E) and colony formation assays (CFAs) (Fig. 2F) in MHCC97H and MHCC97L, the two cell lines with BCL9 gene amplification (Supporting Fig. 6). By contrast, siRNA-mediated inhibition of BCL9 gene expression had minimal effect on the SK-HEP-1 cell line, which is copy number neutral for BCL9, although the other BCL9 copy-number–neutral cell line (HUH6) showed significant growth inhibition upon BCL9 knockdown, suggesting that mechanisms other than BCL9 amplification may confer dependence on BCL9 expression.

Our analysis also identified a peak at 8q22.1 containing a single gene (MTDH), which encodes metadherin. MTDH has been implicated as an oncogene in a number of cancer types, including HCC.[19] However, previous work in HCC has not yet established the dependency of MTDH-driven tumorigenesis on MTDH focal amplification, especially in relevant preclinical models that harbor the MTDH amplification. In our study, MTDH was highly amplified in 12.9% of HCCs (Table 2 and Supporting Table 3). There was a significant cis-correlation between somatic copy number and mRNA expression of MTDH in primary HCCs (Fig. 3A), and protein expression by IHC (Supporting Fig. 5B) correlated well with mRNA expression (Fig. 3B; and Supporting Table 8). We further identified two HCC models (MHCC97H and SNU-398) with amplification of the MTDH locus (Supporting Fig. 6). Transient transfection of siRNA SMARTpool for MTDH significantly reduced the gene expression levels of MTDH in all four cell lines tested (Fig. 3C,D). In the two MTDH amplified HCC models, siRNA-mediated inhibition of MTDH gene expression significantly decreased cell growth and survival in both proliferation assays (Fig. 3E) and CFAs (Fig. 3F), whereas knockdown had a less-prominent effect on the two MTDH copy-number–neutral lines (L-02 and SMMC-7721).


Our study represents one of the most comprehensive characterizations of the genomic landscape in a large primary HCC cohort and models. In addition to revealing the overall CNA patterns in primary HCC patients, we identified and characterized focal amplification and deletion peaks and prioritized potential oncogenic drivers and molecular pathways that may be implicated in hepatocarcinogenesis. While overall, our study is broadly consistent with the literature on HCC, our sample size and resolution have increased power to accurately identify and localize both large-scale and focal chromosomal alterations. Many of the CNA peaks from our analysis contain well-established genes known to be implicated in HCC or other cancer types. For example, genes contained in the most highly amplified peak (CCND1 and FGF19) and in the most frequently deleted peak (CDKN2A and CDKN2B) have been reported on and validated as oncogenic drivers and tumor suppressors in HCC, respectively, supporting the validity of our data and analysis pipeline.

Our approach extends previous literature reports that interrogated both somatic CNAs and gene expression changes in HCC in two ways. First, our dataset included both somatic copy number and gene expression data from the same set of primary HCC patients, allowing us to fully integrate the two data types when prioritizing driver genes by requiring significant cis-correlation and overexpression of the candidate drivers in the specific subset of tumors carrying the CNAs. Second, our approach selected appropriate preclinical models for testing the candidate driver genes, including both cell lines with the gene amplification to assess activity, as well as cell lines without the amplification to establish differential response to target knockdown between the models, in order to gain confidence that the oncogenic effect is truly CNA driven.

We also noticed some differences between our study and previous reports. For example, using the Affymetrix SNP6 arrays on 58 HCC tumor and normal pairs, Jia et al.[8] identified a putative oncogene, HEY1, which was not identified in our analysis. Although HEY1 was amplified in ∼13.7% of HCCs in our cohort, it was not assigned by GISTIC2 to any amplification peak; however, our analysis did identify the tumor suppressor, TRIM35, and another putative oncogene, SNRPE, that were originally reported on in the Jia et al. study. Chiang et al.[5] studied a cohort of 100 HCCs that were primarily hepatitis C virus (HCV) positive, and identified a focal amplicon containing vascular endothelial growth factor A (VEGFA). However, VEGFA was amplified in only ∼3.3% of our HCC cohort and was not highlighted by our analysis. Overall, such discrepancies may arise from one or more of the following factors: differences in sample origin, etiology, quality and degree of stromal contamination, variations in copy number measurement technologies, different data processing and analysis algorithms, and different thresholds used for selecting genes.

Our study implicated two putative oncogenic driver genes (BCL9 and MTDH) as important for growth and survival in HCC. We found that amplification of BCL9 was significantly associated with poor DFS in our HCC cohort (P = 0.03; Supporting Fig. 7), which may indicate a distinct clinical behavior of HCC patients carrying BCL9 amplification. In addition, somatic copy numbers of both BCL9 and MTDH were positively associated with advanced AJCC tumor stage (Supporting Fig. 7), suggesting that the aberration of either gene may be involved in the maintenance of aggressive phenotype of an established tumor. We also performed preliminary functional characterizations of both putative drivers by siRNA-mediated target knockdown in HCC cell lines that carry the respective target amplification and compared with models without the amplification. We noted that results on BCL9 were mixed in the HUH6 cell line, which is copy number neutral with respect to BCL9, but had decreased viability upon BCL9 knockdown in one of the assays. Because BCL9 is involved in the Wnt/β-catenin–signaling pathway,[17] there may exist other mechanisms for activating this pathway in HUH6 cells: It has been shown that the Wnt pathway may be activated in the HUH6 cell line as a result of β-catenin mutations.[20] Blocking the Wnt/β-catenin pathway by knocking down BCL9 gene expression could then lead to tumor growth inhibition in HUH6 cells, which may be addicted to this pathway for its tumorigenic properties. More research is needed to fully validate these two genes as oncogenic drivers in HCC and to explore their utility in targeted cancer therapy. Our work nevertheless demonstrates a proof of concept that systematic clinical genomics approaches, such as the one presented here, could be valuable in uncovering novel, clinically relevant cancer driver genes, and that testing of such genes needs to be performed in relevant preclinical models, both with and without the corresponding genetic aberration.

Future directions of our work include high-throughput dropout screens to systematically test all genes within the focal amplicons, an unbiased approach similar to the forward genetic screening by Sawey et al.[9] One of the biggest challenges in CNA-driven target identification is to distinguish true driver gene(s) from passengers in a focal amplicon. It has been shown that multiple drivers may even coexist in a highly focal amplicon, such as CCND1 and FGF19.[9] It would be valuable to perform unbiased screening to validate all candidate somatic CNA drivers in appropriate models and then dissect key attributes that distinguish drivers from passengers to facilitate future in silico algorithm development. Toward this end, the genomic characterization of a comprehensive collection of 30 HCC cell line models in our study will also serve as a valuable resource for future research in this direction.


The authors thank Drs. John Lamb and Soonmyung Paik for scientific discussion in this study, Peter C. Roberts for facilitating data management and transfer, and Sylvie Sakata for study support.