Transcriptome classification of HCC is related to gene alterations and to new therapeutic targets


  • Potential conflict of interest: Nothing to report.


Hepatocellular carcinomas (HCCs) are a heterogeneous group of tumors that differ in risk factors and genetic alterations. We further investigated transcriptome-genotype-phenotype correlations in HCC. Global transcriptome analyses were performed on 57 HCCs and 3 hepatocellular adenomas and validated by quantitative RT-PCR using 63 additional HCCs. We determined loss of heterozygosity, gene mutations, promoter methylation of CDH1 and CDKN2A, and HBV DNA copy number for each tumor. Unsupervised transcriptome analysis identified 6 robust subgroups of HCC (G1-G6) associated with clinical and genetic characteristics. G1 tumors were associated with low copy number of HBV and overexpression of genes expressed in fetal liver and controlled by parental imprinting. G2 included HCCs infected with a high copy number of HBV and mutations in PIK3CA and TP53. In these first groups, we detected specific activation of the AKT pathway. G3 tumors were typified by mutation of TP53 and overexpression of genes controlling the cell cycle. G4 was a heterogeneous subgroup of tumors including TCF1-mutated hepatocellular adenomas and carcinomas. G5 and G6 were strongly related to β-catenin mutations that lead to Wnt pathway activation; in particular, G6 tumors were characterized by satellite nodules, higher activation of the Wnt pathway, and E-cadherin underexpression. Conclusion: These results have furthered our understanding of the genetic diversity of human HCC and have provided specific identifiers for classifying tumors. In addition, our classification has potential therapeutic implications because 50% of the tumors were related to WNT or AKT pathway activation, which potentially could be targeted by specific inhibiting therapies. (HEPATOLOGY 2007;45:42–52.rpar;

Hepatocellular carcinoma (HCC) is one of the most frequently occurring solid tumors worldwide and is the third-leading cause of death from cancer.1 Cirrhosis of any origin and dysplastic regenerative nodules have long been considered the likely precursors of HCC because of their frequent association with HCC occurrence.2, 3 As in other solid tumors, a large number of genetic alterations accumulate during the carcinogenetic process. Some of these genetic alterations are specific to HCC etiological risk factors, particularly HBV infection, which can induce chromosome instability or insertional mutagenesis.4 Among the genetic alterations unrelated to HCC risk factors, microsatellite allelotypes and comparative genomic hybridization (CGH) studies have demonstrated recurrent chromosome aberrations.5 Altogether, the principal carcinogenetic pathways known to be deregulated in HCC are inactivation of TP53,6 Wnt/wingless activation mainly through CTNNB1 mutations activating β-catenin- and AXIN1-inactivating mutations,7–9 retinoblastoma inactivation through RB1 and CDKN2A promoter methylation and rare gene mutations,10, 11 insulin growth factor activation through IGF2 overexpression, and IGF2R-inactivating mutations.12, 13

We previously showed that genetic alterations are indeed closely associated with clinical characteristics of HCC that define 2 mechanisms of hepatocarcinogenesis.14 The first type of HCC was associated with not only a high level of chromosome instability and frequent TP53 and AXIN1 mutations but also was closely linked to HBV infection and a poor prognosis. Conversely, the second subgroup of HCC tumors was chromosome-stable, having a high incidence of activating β-catenin alterations, and was not associated with viral infection. To further investigate genotype–phenotype correlations in HCC and identify pathways and/or biological processes deregulated in such heterogeneous tumors, we performed a comprehensive analysis at the clinical, genetic, and transcriptomic level in 57 HCCs and 3 adenomas, and we validated the results using quantitative RT-PCR in 63 new independent HCCs.

Patients and Methods

Tumor Samples and Clinical Data.

A series of 120 HCCs and 3 hepatocellular adenomas were collected from 123 patients surgically treated in 3 French surgical departments from 1992 to 1999. A first group of tumors (57 HCCs and 3 hepatocellular adenomas) was used for genomewide transcriptome microarray analysis, and a second group of 63 independent HCCs was used for validation by quantitative RT-PCR analysis. All these tumors were clinically characterized as previously described14 (Supplementary Table 1). Briefly, the 123 patients had a 4:1 male/female sex ratio, 79% were born in France, and their mean age was 60 years. Thirty percent, 25%, 33%, and 5% of the patients had the HCC risk factors of hepatitis B virus, hepatitis C virus, alcohol abuse, and hemochromatosis, respectively. In 68% of the patients, their HCCs developed from cirrhotic liver or severe chronic hepatitis disease. Histological grade of tumor differentiation was assigned according to the Edmondson and Steiner grading system: 7% were grade I, 49% were grade II, 39% were grade III, and 4% were grade IV. Tumor size ranged from 3 to 190 mm (median = 60 mm). The preoperative α-fetoprotein serum level was available for 103 patients, which showed that 37 patients had a level of more than 100 IU/ml. Macroscopic and/or microscopic vascular invasion was recorded for 38% of the patients. Satellite tumors defined by nodules found less than 1 cm from the main tumor were recorded for 42%. For the Affymetrix analysis, 5 pools of 3 nontumorous liver tissues matching the analyzed tumors were used. They were alcoholic cirrhosis (pool 1), alcoholic noncirrhotic liver (pool 2), HBV noncirrhotic liver (pool 3), HCV cirrhosis (pool 4), and HBV cirrhosis (pool 5). In the QRT-PCR experiments, we analyzed the RNA of these 15 nontumorous tissue samples and of 6 additional normal nontumorous liver samples. We used 19 human fetal liver samples at different stages of development (ranging from 11 to 29 weeks of pregnancy) to test for genes potentially expressed during fetal life using QRT-PCR. The study was approved by the local Ethics Committee (CCPPRB Paris Saint-Louis), and informed consent was obtained in accordance with French legislation.

Gene Mutations, Chromosome Imbalance, Quantification of HBV Genome and DNA Methylation.

For each tumor, we screened for gene mutations in TP53, CTNNB1 coding for β-catenin, AXIN1, PIK3CA, EGFR, KRAS, NRAS, and HRAS using direct sequencing. Primers and protocols are available on request, and all mutations are described in Supplementary Table 2. Genomewide allelic loss was assessed as previously described.14 For all samples related to HBV infection either by serological results or viral DNA amplification, HBS and HBX copies of DNA were quantified in tumor and nontumor DNA using the Syber Green (Applied Biosystems) method (for further details, see Supplementary Materials and Methods). DNA methylation at the CDH1 and CDKN2A promoter was assessed using bisulfite DNA and methylation-specific amplification as previously described.15, 16

Gene Expression Analysis.

Microarray analyses of 57 HCCs, 3 hepatocellular adenomas, and 5 pools of nontumorous samples were performed using 5 μg of total RNA and 20 μg of cRNA (GeneChip Fluidics Station 400) of HG-U133A Affymetrix GeneChip™ arrays per hybridization and analyzed following the manufacturer's protocols (all methodological details of the gene expression analysis are given in Supplementary Materials and Methods).

Quantitative RT-PCR Data Processing.

Quantitative RT-PCR of 109 HCCs (46 from the “microarray” series of 57 HCCs and 63 new samples) and 21 nontumor liver samples was performed in duplicate, using TaqMan® Low Density Array and the ABI PRISM® 7900HT System (Applied Biosystems). Briefly, expression of a gene was normalized to internal control ribosomal 18S relative to the mean expression of the corresponding gene in nontumorous samples normalized to internal control ribosomal 18S.

Western Blot and Immunohistochemistry.

Frozen tissue was homogenized in RIPA Lysis buffer (Santa Cruz), and protein concentration was determined by a BCA protein assay kit (Pierce). Immunoblot analysis was performed using the antibodies E-cadherin (SC-7870, 1:500; Santa Cruz), AKT (#9272, 1:2000; Cell Signalling), phospho-AKT ser473 (#9271, 1:200; Cell Signalling), phospho-GSK3 ser9 (#9336, 1:500; Cell Signalling), IGF1Rβ (#3027, 1:500; Cell Signalling), peroxidase-conjugated secondary antibody (1:2000; Santa Cruz), and enhanced chemoluminescence (ECL; Pierce). Immunostaining was performed using monoclonal anti-β-catenin as previously described.17


Unsupervised Transcriptome Analysis Defines Clusters of Tumors Closely Associated with Clinical Data and Genetic Alterations

In the whole series of 123 tumors (120 HCCs and 3 hepatocellular adenomas) we identified mutations in the CTNNB1 (encoding β-catenin), TP53, AXIN1, TCF1, PIK3CA, and KRAS genes in 34, 31, 13, 5, 2, and 1 tumors, respectively (Supplementary Table 2). No mutations were found in NRAS, HRAS, and EGFR. Hypermethylation of the CDKN2A and CDH1 promoter was identified in 35% and 16% of the tumors, respectively.

Fifty-seven HCCs, 3 hepatocellular adenomas, and 5 samples of pooled nontumorous tissues were analyzed using Affymetrix HG-U133A GeneChip™ arrays. On the basis of the results of a hierarchical clustering analysis using the 6,712 probe sets whose expression varied the most across samples, we classified tumors into 2 major groups with each further divided into 3 subgroups (G1-G6; Fig. 1). This classification was found to be extremely robust—the mean reproducibility score was found to be at least 90% for all major groups and subgroups—and was conserved across different gene lists and clustering methods.

Figure 1.

Unsupervised hierarchical clustering. The dendrogram in the upper panel was obtained on the basis of the expression profile of the 6,712 probe sets whose expression varied the most across samples from 57 HCC tumors, 3 hepatocellular adenomas, and 5 pools of nontumor tissues using Ward's linkage and 1-Pearson correlation coefficient. The different HCC subgroups (G1-G6) are indicated in color. The heat map, in the lower panel, shows the expression of 493 probe sets corresponding to a subselection of HCC subgroup–specific genes (supplementary Table 4). For each probe set, data are median-centered (black), with the lowest and highest intensity values in green and red, respectively.

We observed a high level of association between this HCC classification and genetic alterations and clinical factors (Table 1). The tumors in groups G1-G3 were associated with high chromosomal instability, yielding significantly higher fractional allelic loss (FAL), compared to the tumors in groups G4-G6 (P < 10−2). Among the frequent somatic gene mutations observed, the G5-G6 subgroups were highly associated with CTNNB1 mutations (P < 10−10) and the G2-G3 subgroups with TP53 mutations (P = 0.03). The rare TCF1 and PIK3CA mutations were associated with the G4 subgroup (P = 0.01) and the G2 subgroup (P = 0.01), respectively. Hypermethylation of E-cadherin (CDH1) and P16 (CDKN2A) gene promoter were most frequently observed in the G5-G6 and G3 subgroups, respectively. In addition, HCC related to HBV infection was found in groups G1-G3; more specifically, tumors with a low number of HBV DNA copies were related to the G1 subgroup (P < 10−4). We validated these associations on the basis of the classification of a second set of 63 independent tumors using quantitative RT-PCR.

Table 1. Associations Between Transcriptomic Groups and Clinical, Pathological, and Genetic Variables
 Associated groupAffymetrix hybridizations (57 HCCs)QRT-PCR validation set (63 HCCs)Complete set (120 HCCs)
  1. Shown are P values obtained from Fisher exact tests based on the given genetic or clinical variable.

Clinical characteristics    
 AFP > 100 IU/mLG1.01.006< 10−4
 HBV lowG1< 10− 4.0410−5
 HBV highG2.05.07<10−2
 Age < 60 yearsG1 and G2.04.09< 10−2
 African originG1< 10−2.3< 10−2
 HemochromatosisG21< 10−3.04
Pathological characteristics    
 Satellite nodulesG6< 10−2.3<10−2
 Genetic alterations    
 FAL > 0.128G1, G2, and G3< 10−2< 10−3< 10−5
 CTNNB1 mutationG5 and G6< 10−10< 10−5< 10−15
 TP53 mutationG2 and G3.03.001< 10−3
 AXIN1 mutationG1 and G2.1< 10−2< 10−2
 PIK3CA mutationsG2.01.01
 CDH1 methylationG5 and G6.01< 10−2< 10−2
 LOH 17pG2 and G3.005.02< 10−3
 LOH 16pG1, G2, and G3.005.05.0005
 LOH 16qG1.05.04< 10−2
 LOH 4qG1, G2, and G3.002.0023.10−6
 LOH 5qG3.02.02< 10−3
 LOH 13qG2.02< 10−2< 10−3
 LOH 21qG3< 10−2.03< 10−3
 LOH 22qG3< 10−2.02< 10−3

Identification of a Predictor of the 6-Group Classification

To more easily classify tumors, we sought to define a gene predictor that would use quantitative RT-PCR data (for details, see Supplementary Materials and Methods and Supplementary Table 3). We identified a combination of 16 genes (Fig. 2) that correctly classified 45 of 46 HCCs previously characterized in the Affymetrix experiment. We then used this signature to partition the validation series of 63 independent HCC samples into 6 subgroups. As observed in the first set of tumors analyzed in the Affymetrix experiment, significant associations were found between genetic alterations and the different predicted subgroups (Table 1). These associations were confirmed using the complete series of 120 HCC tumors classified with either Affymetrix and QRT-PCR (n = 46), Affymetrix only (n = 11), or QRT-PCR only (n = 63); see Table 1 and Figure 3. We further searched to identify deregulated pathways in each groups of tumors.

Figure 2.

Predictor of HCC classification using quantitative RT-PCR. (A) Formula used for class-membership prediction. To classify a new sample requires the ΔCt values (control: 18S) of the 16 genes used in the formula. Having calculated the distance between the given sample and the centroid representation of each class, we associated the new sample to the closest class. (B) Parameters for each gene and for each class used in the formula.

Figure 3.

Clinical and genetic annotations for the classification of all 120 HCCs. Samples (in columns) are ordered by group (G1-G6). Sample names are in gray (56 Affymetrix-classified samples), red (1 HCC misclassified by QRT-PCR), or black (independent series of 63 HCCs). Features that are positive and negative are indicated in black and white boxes, respectively (FAL, fractional allelic loss, in black tumors deleting more than 5 chromosome arms; Mut, mutation; sat. nodules, satellite nodules less than 1 cm from the principal tumor; AFP, alpha-fetoprotein).

Identification of Key Signaling Pathways Implicated in HCC Subgroups

HCC groups G1-G3 (associated with a high rate of chromosomal instability) were enriched with overexpression of cell-cycle/proliferation/DNA metabolism genes (P < 0.01; see Supplementary Tables 4 and 5).

Parentally Imprinting Genes Were Overexpressed in HCC Subgroup G1.

Among the high number of genes specifically overexpressed in the G1 subgroup, which was associated with HBV infection with a low number of viral DNA copies, AXIN1 mutations, younger age, high serum level of AFP, and frequent origin from Africa (Fig. 3, Table 1, and Supplementary Table 4), we found genes encoding proteins expressed during development, myosin heavy chain IIb (MYH4) and the transcription factor SOX9, and the parentally imprinted genes insulin-like growth factor 2 (IGF2), paternally expressed genes 3 and 10 (PEG3 and PEG10), alpha-fetoprotein (AFP), and sarcoglycan epsilon (SGCE). The differential expression of all these genes was validated using QRT-PCR on 109 HCC tumors (Fig. 4A). The parentally imprinted genes were highly overexpressed in normal fetal liver. H19 mRNA was also overexpressed, not only in the G1 samples but also in fetal samples, correlating with the level of IGF2 mRNA expression in these 2 groups (R2= 0.4 and 0.6, respectively). Interestingly, in the unsupervised classification, the magnitude of expression of AFP mRNA was identified as a discriminating marker of the G1 tumors and was observed to be correlated with a high AFP serum level (Table 1 and Fig. 3).

Figure 4.

Validation of selected genes and proteins specific for HCC subgroups G1 and G2. (A) Validation of AFP, IGF2, SOX9, PEG3, MYH4, and H19 gene expression using QRT-PCR. Box plots (from 25th percentile to the 75th percentile with a line at the median) show the range of relative expression (tumor versus the mean of 21 nontumors (T/NT)) in the predicted subgroups G1 (n = 10) and G2 to G6 (n = 99), 21 nontumor samples (NT), and 19 fetal liver samples (FL). The whiskers extend above and below the box to show the first and ninth deciles. P values from the Kruskal-Wallis test are indicated. (B) Validation of EEF1A2 and PRSS7 genes overexpressed in PIK3CA-mutated tumors compared to 107 nonmutated HCCs. P values from a t test are shown. (C) Protein expression analysis of Akt, phosphoAkt (Ser473), PhosphoGSK3β (Ser9), and IGF1R was performed using Western blotting in 44 HCCs distributed in all transcriptomic subgroups. The graphs show the percentage of tumors that overexpress these proteins in the G1, G2, and G3-G6 subgroups compared to nontumor tissues. The panel on the right illustrates protein expression profiles of the different HCC subgroups.

HCC Tumors in Subgroup G2 Included the Rare PIK3CA Mutated Cases.

Subgroup G2 tumors were related to HBV infection with a high number of viral DNA copies, frequent local and vascular invasion, and TP53 mutations (Table 1 and Fig. 3). An association with hemochromatosis-related tumors was only observed in the validation set, possibly because the number of tested patients overall was small (n = 6, Table 1). Interestingly, in the whole series we identified mutations in the PIK3CA gene in only 2 tumors that belonged to the G2 group. These 2 samples were closely associated in the unsupervised clustering analysis (HCC189 and HCC438; Fig. 1). We identified 38 genes specifically overexpressed in the PIK3CA-mutated samples when compared with the other tumors in groups G1-G3. Using QRT-PCR, we validated overexpression of 2 of these genes, the genes coding for protein elongation factor EEF1A2 and enterokinase PRSS7, specifically overexpressed in PIK3CA-mutated tumors (P = 0.001, Fig. 4B).

IGF2 overexpression in G1 and PIK3CA mutations in G2 were predicted to activate the AKT pathway. To test this hypothesis, we searched for possible activation of the AKT pathway using Western blotting in a series of 44 HCCs representing each of the transcriptomic groups. We found specific overexpression of Insulin-like growth factor 1 receptor (IGF1R) in 7 of 9 G1 tumors in contrast to IGF1R overexpression found in only 9% of the tumors in the other groups (P < 10−4; Fig. 4C). We also identified frequent overexpression of AKT in groups G1 (75%) and G2 (25%), compared to only 13% found in the other groups. Moreover, phosphorylated AKT proteins were overexpressed in all tumors of G1 and in 58% of the G2 tumors in contrast to only 8% in the other groups, demonstrating frequent activation of AKT in the tumors in groups G1 and G2 (P < 10−3; Fig. 4C). Accordingly, GSK3β, a target of activated AKT, was found to be phosphorylated in 78% of G1 tumors, in 42% of G2 tumors, but in only 9% of the tumors in the other groups (P < 10−3; Fig. 4C).

HCCs in Subgroup G3 Overexpressed Genes Encoding Proteins of the Nuclear Pore and That Control the Cell Cycle.

HCC subgroup G3 mainly included tumors with TP53 mutation but without HBV infection. Significant associations were found with chromosome losses at 17p, 5q, 21q, and 22q (Fig. 3 and Table 1), and we frequently observed a high incidence of CDKN2A promoter hypermethylation (data not shown). Examples of key genes specifically differentially expressed in the G3 group are provided in Supplementary Figure 1. We observed overexpression of a large number of genes implicated in cell-cycle control (P < 0.0003), including those encoding proteins that control cell-cycle checkpoints (CDC6, MAD2L1, BUB1, TTK, SMC1L1), cyclins (CCNA2, CCNE2), and DNA replication checkpoints (MCM2, MCM3, MCM6, ASK). Interestingly, genes encoding proteins implicated in nucleus import/export were also overexpressed, including 6 proteins of the nuclear pore (KPNB1, RANBP5, XPO1, IPO7, NUP155, NUP107).

G4 Was a Heterogeneous Group of HCC Tumors.

The G4 subgroup of samples comprised the 5 sample pools of nontumorous liver tissues, which clustered tightly together within a large, heterogeneous group containing 20 tumors (Fig. 1). We also observed that 4 tumors, all with a TCF1 mutation, were closely associated in a small cluster of 5 tumors including the 3 hepatocellular adenomas and 2 well-differentiated HCCs. These results suggest a possible continuum of early transcriptome alterations between nontumorous, benign, and HCC tissues. We did not find other recurrent or unifying genetic and/or clinical characteristics in the remaining small clusters of G4 HCCs (Fig. 1).

Activation of the Wnt Pathway in G5 and G6 with Underexpression of Cell Adhesion Proteins in G6 Is Related to Satellite Tumor Nodules.

G5 and G6 HCCs were highly related to β-catenin activation (70% and 100% of the tumors contained CTNNB1 mutations, respectively). No chromosome deletions specific to G5 and/or G6 were identified. In a search for possible β-catenin-targeted genes, we found 83 genes to be significantly overexpressed in G5 and/or G6 HCCs (Supplementary Table 4). All but 2 of these genes were also found to be significantly differentially expressed in a supervised test comparing β-catenin-mutated and -nonmutated tumors. In addition to overexpression of GPR49, GLUL, and PAP/HIP, 3 well-known β-catenin target genes in the liver,18–20 overexpression of 6 putative β-catenin target genes potentially implicated in tumorigenesis was confirmed using QRT-PCR (Fig. 5A). We observed that the expression of all these putative β-catenin-targeted genes was more significant in G6 than in G5, even after exclusion of the samples without CTNNB1 mutation. We also found greater overexpression of β-catenin in G6 tumors than in G5 tumors combined with a loss of signal at the plasma membrane and a strong localization in cytoplasm and nucleus (Fig. 5B). Consistent with this observation, we found overexpression of LEF1, a transcription factor that interacts with β-catenin to activate Wnt-responsive target genes, in G6 tumors. In addition, we found underexpression of CDH1 (encoding E-cadherin) in the G6 subgroup (Fig. 5A), which may have accounted for the local invasion of these HCCs as shown by the quasi-constant presence of satellite nodules around the principal tumor (Fig. 3 and Table 1). We showed that the level of CDH1 mRNA down-regulation was highly related to the down-regulated expression of the E-cadherin protein in G6, consistent with the high level of promoter methylation of CDH1 in these tumors (Fig. 5C). Interestingly, in G5-related tumors, we observed enrichment of underexpressed genes involved in the response to biotic stimuli and immune response such as ARHGDIB, HLA-DPA1/B1, IFI16, IFI44, PTGER4, STAT1, and CLECSF2 (P < 0.003). Finally, combining transcriptome and immunohistochemical data, we found no evidence of significant activation of the β-catenin pathway in HCC subgroups G1-G4.

Figure 5.

Characterization of HCC tumors leading to G5 and G6 subgroups. (A) Validation of genes using QRT-PCR. Relative expression (tumor versus mean of nontumors [T/NT]) for GLUL, SPARCL1, TBX3, LAMA3, MERTK, EPHB2, PAP, LEF1, and CDH1 was analyzed in tumors related to G1-G4 (n = 75), G5 (n = 23), G6 (n = 11), and nontumor samples (n = 21). P values from the Kruskal-Wallis test are indicated. (B) β-Catenin immunostaining in representative cases of HCC mutated for β-catenin and leading to G5 and G6. In case HCC303 (G5), note the small number of stained nuclei and the intense staining of the plasma membrane (white arrows). In case HCC305 (G6), the cytoplasm and nuclei of hepatocytes are intensely stained (black arrows) without a signal at the plasma membrane. (C) Protein expression of E-cadherin using Western blotting (upper panels) and mRNA expression using QRT-PCR (lower panel) in HCCs of groups G5 and G6.


To elucidate the molecular diversity of HCC tumors without any a priori assumptions, we used an unsupervised transcriptome-wide approach to classify a large number of tumors that have been extensively clinically and genetically annotated. Using this approach, we obtained a robust classification of HCC that yielded 6 main subgroups reflecting the large spectrum of these tumors.1, 2, 21 This classification was achieved by QRT-PCR using 16 diagnostic genes and was applied to an independent set of 63 tumors, for a total of 120 HCCs studied. Our classification is in full agreement with previously published transcriptomic analyses of HCC,22–24 which identified 2 tumor groups linked to chromosome stability/instability (corresponding to meta-groups 1-3 and 4-6). However, our study went further by exquisitely refining this classification. Indeed, the strength of our study resides in our inclusion of a vast amount of medical data: (1) our series of tumors of patients surgically treated in France who had the main risk factors for HCC (HBV infection, HCV infection, alcohol abuse, and hemochromatosis), and (2) an exhaustive number of clinical, histopathological, and genetic annotations. From the results of our study, we have concluded that the primary clinical determinant of class membership is HBV infection, and the other main determinants are genetic and epigenetic alterations, including chromosome instability (FAL > 0.128), CTNNB1 and TP53 mutations, and parental imprinting (Fig. 6).

Figure 6.

Schematization of the different HCC subgroups defined by transcriptome analysis with their related clinical and genetic pathways. G1 to G6 are the subgroups of HCCs defined by transcriptome analysis. Vertical lines indicate significantly associated features (see Table 1, Fig. 3, and supplementary Table 5). Red and green primarily indicate over- and underexpressed genes, respectively, in that particular functional category. LOH, loss of heterozygosity; Hemochrom, hemochromatosis; AFP, alpha-fetoprotein; HBV, hepatitis B virus;*rare feature.

The natural history of HCC has taught us that HBV-related tumors (defining the G1 and G2 subgroups) are strikingly molecularly distinct from other HCCs. Indeed, tumors related to HCV infection and alcohol abuse were interspersed across subgroups G3-G6. Our transcriptomic classification has enabled the identification of new entities of tumors. Subgroup G1 included HBV-related tumors from younger patients (relative to the other HBV HCCs), frequently from Africa, with an equal sex ratio, a low number of viral DNA copies, frequent AXIN1 mutations, absence of TP53 mutation, and overexpression of genes normally imprinted (Fig. 6). These results suggest that HBV infection early in life25, 26 leads to a specific type of HCC that has immature features with abnormal parental gene imprinting selection, possibly through the persistence of fetal hepatocytes or alternatively through partial dedifferentiation of adult hepatocytes. These G1 tumors are related to the high-risk populations found in epidemiological studies.4, 27 Subgroup G6 had a 100% incidence of CTNNB1 mutation, a high level of Wnt pathway activation (higher than in G5), and inactivation of E-cadherin consistent with the high invasive potential of these tumors, as E-cadherin inactivation is known to participate in the cell invasion process.28 Interestingly, AXIN1-mutated tumors segregated to different groups that were not characterized by harboring a CTNNB1 mutation, suggesting that β-catenin and Axin1 may operate in distinct ways to promote carcinogenesis. In human HCCs, the inactivating mutations of the AXIN1 gene seem to differ strongly from the gain of function of β-catenin, as confirmed by in vitro experiments.17

Apart from these large subgroups of tumors defined by frequent genetic alterations, our analysis also defined homogeneous subgroups of tumors related to rare genetic alterations like TCF1 or PIK3CA mutations.29, 30 However, mutations specific to other small homogeneous subgroups of tumors have yet to be identified.

Thorgeirsson and collaborators defined a hepatoblast-like subtype of HCC that exhibited the most severe prognostic characteristics.31 Interestingly, our G1 tumor group shared the same characteristics as this hepatoblast-like type of HCC. However, in our series of HCC, the most severe prognosis was related to G3 tumors that did not overexpress fetal genes (Supplementary Fig. 2). Previous reports also identified the 2 main transcriptomic group classifications as a relevant survival predictor.22 We found better survival for subgroups G4-G6 than subgroups G1-G3; however, this difference did not reach statistical significance. These discrepancies may be related to the different distribution of risk factors in each tumor series. Furthermore, unsupervised transcriptomic classification does not appear to be a robust survival determinant, emphasizing the need to carefully consider the wide spectrum of epidemiological risk factors and genetic diversity of HCC when attempting to define “universal” HCC survival predictors.

In conclusion, our global transcriptomic analysis has established a robust classification reflecting the natural diversity of human HCC and the structural gene alterations and epigenetic de-regulations accumulated during tumor progression. The high diversity of HCC tumors has clinical implications, and our classification will help to further identify patients who will benefit from targeted therapies.


We warmly thank Jacqueline Godet, François Sigaux, and Philippe Dessen, managers of the Carte d'Identité des Tumeurs (CIT) program, founded by the Ligue Nationale contre le Cancer; Daniela Geromin, Christelle Thibault, Patricia Legoix, Damien Gerald, and Olivier Bluteau for their experimental help; Fabien Petel for his help in the submission of the data to EBI; and Philippe Bois for critically reading the manuscript. We thank the technicians from the CEPH, Fondation Jean Dausset, for their help in sequencing and all the clinicians who referred patients.