1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

We analyzed global gene expression patterns of 91 human hepatocellular carcinomas (HCCs) to define the molecular characteristics of the tumors and to test the prognostic value of the expression profiles. Unsupervised classification methods revealed two distinctive subclasses of HCC that are highly associated with patient survival. This association was validated via 5 independent supervised learning methods. We also identified the genes most strongly associated with survival by using the Cox proportional hazards survival analysis. This approach identified a limited number of genes that accurately predicted the length of survival and provides new molecular insight into the pathogenesis of HCC. Tumors from the low survival subclass have strong cell proliferation and antiapoptosis gene expression signatures. In addition, the low survival subclass displayed higher expression of genes involved in ubiquitination and histone modification, suggesting an etiological involvement of these processes in accelerating the progression of HCC. In conclusion, the biological differences identified in the HCC subclasses should provide an attractive source for the development of therapeutic targets (e.g., HIF1a) for selective treatment of HCC patients. Supplementary material for this article can be found on the HEPATOLOGY Web site ( (HEPATOLOGY 2004;40:667–676.)

Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world, accounting for an estimated 500,000 deaths annually.1 Although HCC is prevalent in Southeast Asia and sub-Sahara Africa, the incidence of HCC has doubled in the United States over the past 25 years, and incidence and mortality rates are likely to double over the next 10–20 years.2 Although much is known about both the cellular changes that lead to HCC and the etiological agents responsible for the majority of HCC cases (hepatitis B virus, hepatitis C virus, alcohol), the molecular pathogenesis of HCC is not well understood.3 Considerable efforts have been devoted to establishing a prognostic model for HCC by using clinical information and pathological classification to provide information at diagnosis on both survival and treatment options.4–10 Although much progress has been made (reviewed by Llovet et al.11), many issues still remain unresolved. For example, a staging system that reliably separates patients with early HCC as well as intermediate to advanced HCC into homogeneous groups with respect to prognosis does not exist. This is particularly important because the natural course of early HCC is unknown and the natural progression of intermediate and advanced HCC are known to be quite heterogeneous.12 It therefore appears axiomatic that improving the classification of HCC patients into groups with homogeneous prognosis would at least improve the application of currently available treatment modalities and at best provide new treatment strategies.

Recently, microarray technologies have been successfully used to predict clinical outcome and survival as well as classify different types of cancer.13–15 These microarray technologies have also been applied in many studies to define global gene expression patterns in primary human HCC as well as HCC-derived cell lines16 in an attempt to gain insight into the mechanisms of hepatocarcinogenesis. These studies have identified subgroups of HCC that differ according to etiological factors,17 mutations of tumor suppressor genes,18 rate of recurrence,19 and intrahepatic metastasis,20 as well as novel molecular markers for HCC diagnosis.21 However, most of these studies identified genes that are associated with limited aspects of tumor pathogenesis, and thus failed to create molecular prognostic indices that could be applied to the HCC patient population in general.

In the present study, we investigated the possibility that variations in gene expression in HCC obtained at diagnosis would permit the identification of distinct subclasses of HCC patients with different prognoses. The results revealed two subclasses of HCC patients characterized by significant differences in the length of survival. We also identified expression profiles of a limited number of genes that accurately predicted the length of survival. Our data indicate that it is possible to use gene expression patterns to accurately predict the clinical outcome of HCC at the time of diagnosis.

Patients and Methods

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Complementary DNA Microarrays.

The Human Array-Ready Oligo Set (Version 2.0) containing 70-mer probes of 21,329 genes was obtained from Qiagen, Inc. (Valencia, CA). Oligo microarrays were produced at the Advanced Technology Center at the National Cancer Institute.

Human Tissue Samples and Preparation of RNA.

Surgically removed normal livers (n = 18) from patients with liver metastasis from colon cancers or from traffic accident patients were retrieved from the tissue bank of the Thomas E. Starzl Transplant Institute at the University of Pittsburgh Medical Center. One disease-free donor liver unsuitable for transplantation was also used. Total RNAs from the 19 normal livers were pooled and used as a reference for all microarray experiments. Ninety-one HCC tissues and 60 matched nontumor surrounding liver tissues were obtained from 90 patients undergoing partial hepatectomy as treatment for HCC. Tumor specimens originated from China and Belgium. Tissue banking was approved by the Institutional Review Board of all institutions. Total RNAs were isolated using the CsCl density gradient centrifugation method.22

Microarray Experiments and Data Analysis.

Twenty micrograms of total RNA from tissues were used to derive fluorescently (Cy5 or Cy3) labeled complementary DNA. A reference complementary DNA was generated using total RNA from 19 normal livers. At least two hybridizations were performed for each tissue sample using a dye-swap strategy to eliminate labeling bias of the fluorescent intensity measurement. A detailed procedure for microarray experimentation and data analysis is available in a supplementary note on the HEPATOLOGY Web site (

Supplementary Data.

Supplementary notes, figures, and tables can be accessed on the HEPATOLOGY Web site (


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

We characterized gene expression profiles in 91 human primary HCC and 60 matched nontumor surrounding tissues (STs) using DNA microarrays. A hierarchical clustering analysis based on Pearson correlation coefficients was applied to all tissues on the basis of similarity in the expression pattern over all genes (Fig. 1A). As expected, it yielded two major clusters, one representing HCC tumors, and the other representing nontumor STs, with a few exceptions. Thus, the molecular configuration of HCC can be readily distinguished from nontumor STs, as has already been observed.18

thumbnail image

Figure 1. Hierarchical clustering analysis. (A) Unsupervised hierarchical clustering of 91 HCC tumors and 60 matched surrounding nontumor liver tissues separated the tissues into two main groups: HCC tumors and STs (see Supplementary Fig. 1 for details). (B) Hierarchical clustering of 91 HCC tumors only. Genes with an expression ratio that had at least a twofold difference relative to reference in at least 9 tissues were selected for hierarchical analysis (4,187 gene features). The data are presented in matrix format in which rows represent the individual gene and columns represent each tissue. Each cell in the matrix represents the expression level of a gene feature in an individual tissue. The red and green color in cells reflect high and low expression levels, respectively, as indicated in the scale bar (log2 transformed scale). (C) Kaplan-Meier plot of overall survival of HCC patients grouped on the basis of gene expression profiling. One patient (HCC16) was excluded from the data set due to death from septic shock after surgery. (D) Kaplan-Meier plot of overall survival of HCC patients grouped on the basis of serum AFP levels (>300 ng/mL). (E) Kaplan-Meier plot of overall survival of HCC patients grouped on the basis of both gene expression profiling and AFP levels in serum. HCC, hepatocellular carcinoma; ST, surrounding tissue; AFP, alpha fetoprotein.

Download figure to PowerPoint

Two Distinct Subclasses of HCC Revealed via Hierarchical Clustering of Gene Expression Patterns are Highly Associated With Survival of Patients.

Next, we attempted to identify subclasses of HCC solely on the basis of gene expression patterns. Genes with an expression ratio that has at least a twofold difference relative to the reference in at least 9 tumors were selected for hierarchical analysis (4,187 gene features). Analysis of the clustered data with the HCC revealed 2 distinctive subtypes of gene expression patterns among 91 cases of HCC (Fig. 1B), suggesting a degree of heterogeneity among HCC gene expression profiles. Members of the 2 clusters also resided in compact and easily separable three-dimensional space when viewed by a three-dimensional multidimensional scaling plot based on their overall similarity of expression patterns (Supplementary Fig. 2), indicating that the 2 subclasses identified with hierarchical clustering are not due to artifacts from data processing. Having identified the 2 distinctive subclasses of HCC, we examined the association of clusters with clinical data. The two clusters showed weak associations with serum alpha fetoprotein (AFP) levels and Edmonson tumor grades. Cluster A contained a higher percentage of AFP+ (>300 ng/mL) patients (62.5%) and Edmonson grade III tumors (77%), while 42% and 50% of cluster B was AFP+ and grade III, respectively (Table 1). Significant association with the clusters was only detectable in patient survival. The overall survival time in cluster A (30.3 ± 8.02 months) was shorter than cluster B (83.7 ± 10.3 months). As expected, a Kaplan-Meier survival curve and a log-rank test indicated poorer survival in cluster A patients (P < 1.0 × 10−4) when compared with cluster B (Fig. 1C). Thus, the molecular differences between these 2 subclasses of HCC were associated with a remarkable difference in the clinical outcome of these patients.

Table 1. Clinical and Pathological Features of HCC Patients
VariableCluster ACluster BTotal
  • Abbreviations: HBV, hepatitis B virus; HCV, hepatitis C virus; NA, not applicable.

  • *

    Mean and SE of survival time were estimated as described in supplementary notes. See details in Supplementary Table 3.

No. of patients405090
AFP (>300 ng/mL)   
 Hereditary hemochromatosis123
 Wilson's disease1 1
Edmonson grade   
Survival (months)*   

It has been widely accepted that serum AFP levels are significantly related to the survival of HCC patients; higher levels of serum AFP indicate poorer survival.5, 6, 10 Among many clinical indicators of the HCC patients, serum AFP levels (>300 ng/mL or less) only showed association with survival with marginal significance (P = .13) in our patient cohort (Fig. 1D). We determined whether or not our molecular classification of HCC could enhance the prognostic value of this clinical indicator previously used for prediction of survival. While the 2 subclasses of AFP+ patients showed a marginal difference in overall survival, AFP− patients in cluster A had a severely diminished overall survival (Supplementary Fig. 3). Moreover, when patients were subdivided into 4 groups based on serum AFP levels and gene expression clusters, AFP− patients in cluster A showed the worst overall survival among all patients (Fig. 1E).

Prediction of Survival With Gene Expression Profiles.

We applied 5 different statistical methods to determine whether or not gene expression patterns could be used to predict survival: linear discriminator analysis, support vector machines, nearest centroid, nearest neighbor, and compound covariate predictor. Before the analysis, 2 tumor samples, HCC89-2 and HCC16, were excluded from the data set because patient HCC16 died of septic shock after surgery and samples from 2 separate tumors were obtained from patient HCC89. In the absence of a totally independent data set, we attempted to assess the validation of our results and reproducibility of the test by randomly dividing the HCC into 2 equal groups: the training set (n = 45), which was used to develop the HCC classifiers, and the validation set (n = 44), which was used to evaluate the test. Briefly, we started to identify the most differentially expressed genes between 2 clusters in the training set. These genes were combined to form a series of classifiers that estimate the probability that a particular HCC belongs to cluster A or B. The number of genes in the classifiers was optimized to minimize misclassification errors during the “leave one out” cross-validation of the tumors in the training set. When applied to the validation set, all 5 models successfully separated poorer survival patients (cluster A) from longer survival patients (cluster B). All Kaplan-Meier survival curves and log-rank tests in the validation set showed significant differences between subclass A and B that were independently predicted using the 5 classifier models (Fig. 2A–G). Moreover, when we examined the predicted subclass memberships of the tumors, only a few discrepancies were observed (Fig. 2H). These results demonstrated not only strong association of gene expression patterns with the survival of the patients but also a robust reproducibility of these gene expression–based predictors.

thumbnail image

Figure 2. Survival analysis of outcome of prediction in validation set. (A) Kaplan-Meier plot of overall survival of HCC patients in validation set classified by hierarchical clustering analysis. (B–G) Kaplan-Meier plots of overall survival of HCC patients in validation set classified by linear discriminator analysis, support vector machines, nearest centroid, 3 nearest neighbor, 1 nearest neighbor, and compound covariate prediction models, respectively. (H) Hierarchical clustering of 44 HCC tissues in a validation set. Columns represent each tissue and rows represent outcomes of various prediction models as indicated. Each cell represents memberships of tissues when particular prediction model was applied in a validation set. The red and blue color in cells represent clusters A and B, respectively. HCA, hierarchical clustering analysis; LDA, linear discriminator analysis; SVM, support vector machines; NC, nearest centroid; NN, nearest neighbor; CCP, compound covariate prediction.

Download figure to PowerPoint

Survival Genes.

Because the most striking feature of the unsupervised analysis of the expression profiles was the strong association with survival, we decided to apply supervised analysis of the genes whose expression is most strongly associated with length of survival. The univariate Cox proportional hazards model was used to assess the association of the gene expression with the survival. Expression of 442 features representing 406 unique genes (Supplementary Table 1) was highly correlated with length of survival with strong statistical significance (P < .001). The outcome of hierarchical cluster analysis of the HCC with the 406 survival genes was highly similar to the previous analysis with all the genes (Fig. 3A). With few exceptions, cluster memberships of each tumor remained the same in the 2 hierarchical cluster dendrograms, highlighting again the robustness of the predicted HCC subclasses and their strong association with length of survival. We noted that survival genes were almost equally divided into two groups, those whose expression is higher in subclass A tumors (HA genes) and those whose expression is higher in subclass B tumors (HB genes). When we categorized the survival genes according to the Gene Ontology, the biggest difference between HA genes and HB genes was observed in genes associated with cell proliferation (Supplementary Table 2). Of the HA survival genes, 45% belonged to the cell growth and maintenance category, while only 19% of HB survival genes were in the same Gene Ontology category, strongly suggesting that the HCCs in subclass A grow faster than those in subclass B.

thumbnail image

Figure 3. Gene expression patterns of 406 survival genes. (A) Hierarchical clustering of 89 HCC tumors with survival genes separated the tissues into 2 main groups. The data are presented in as described in Fig. 1. The red and blue color cells at the bottom of dendrogram represent memberships of cluster A and B, respectively, from Fig. 1B. (B) Relative expression of 213 HB genes that were more expressed in cluster B HCC tissues. HCC tissues were ordered according to average expression level of 213 genes as indicated at the bottom of colored heat map. (C) Relative expression of 193 HA genes that were more expressed in cluster A HCC tissues. HCC tissues were ordered according to average expression level of 193 genes as indicated at the bottom of colored heat map. (D) Relative expression of HIF1a, ENGL2, and downstream target genes of HIF1a. HCC tissues were ordered according to average expression level of 193 genes as indicated.

Download figure to PowerPoint

We next generated an averaged gene expression index from HB genes to examine their predictive power. Patients were then ranked according to the average gene expression level of tumors from the highest to the lowest (Fig. 3B) and divided into two equal 50th percentiles. Kaplan-Meier plots and log-rank tests of overall patient survival in the 2 divided groups revealed striking differences with strong statistical significance (P < 1.0 × 10−4) (data not shown). Likewise, the average gene expression index from HA genes produced similar results (Fig. 3C) with comparable statistical significance (P < 1.0 × 10−5). Most of the patients in cluster A and B were perhaps not surprisingly well separated from each other in both 50th percentile segmentations (Fig. 3B and 3C). Taken together with the previous 2 independent clustering analyses and the cross-validation test of training and validation data sets, these results further support the notion that a distinct gene expression pattern predicts survival characteristics of the 2 subclasses of the HCC patients.

Next, we employed a knowledge-based annotation of the survival genes based on a public database search, because the Gene Ontology Consortium term annotation of genes was not sufficient to provide insight into the underlying biological differences between the 2 subclasses of HCC. The survival genes fell within several biological groups (Table 2). The cell proliferation group was the best predictor of an unfavorable outcome of the disease, which is consistent with previous analyses in human lymphomas.23 Expression of typical cell proliferation markers such as PNCA and cell cycle regulators such as CDK4, CCNB1, CCNA2, and CKS2 was greater in subclass A than subclass B. Not surprisingly, many genes that are expressed more in subclass A are antiapoptotic. Recent studies have identified PTMA/ProT as an inhibitor of apoptosome formation, the essential step for the final activation of the caspase-dependent cascade in the apoptotic pathway,24 and have identified SET as an inhibitor of the Granzyme A–induced caspase-independent pathway.25SET is also a subunit of the inhibitor of acetyltransferases complex that regulates histone modification and gene expression.26 Significantly, PTMA has recently been shown also to be part of the inhibitor of acetyltransferases complex,27 suggesting their multiple roles in hepatocarcinogenesis. Genes involved in prothrombin activation were expressed less in subclass A, indicating impairment of liver function in this subclass. Many of the genes with lower expression in subclass A were liver-specific (data not shown), which is consistent with the previous observation that poorly differentiated HCC tumors have less favorable clinical outcomes.28 Higher expression of genes involved in ubiquitination and sumoylation indicated that accelerated cell proliferation in the poorer survival group might be due to selective degradation of critical proteins, including cell cycle inhibitors. Concomitant overexpression of the histone H4 family with HRMT1L2 (H4-specific methyltransferase) in the poorer survival group of HCC may indicate unidentified roles of the histone H4 family and their modification in tumor development. Expression of HIF1a, the master regulator of hypoxia induced gene expression,29 was enhanced in subclass A, while expression of ENGL2, a negative regulator of HIF1a by prolyl hydroxylation,30 was reduced (Fig. 3D). Both these changes dramatically enhance HIF1a activity in tumor cells, which in turn provide a favorable environment for tumor growth.

Table 2. Summary of Selected Survival Genes
GeneHazard RatioP (Wald test)P (Likelihood Ratio test)P (t test, A vs. B)UnigeneDescription
Prothrombin activation
 F100.718.00093.001363.63E-07Hs.47913Coagulation factor X
 F120.713.0000648.39E-055.32E-15Hs.1321Coagulation factor XII (Hageman factor)
 SERPINC10.774.00026.0003033.49E-20Hs.75599Serine (or cysteine) proteinase inhibitor, clade C1
 SERPING10.586.00022.0002732.26E-12Hs.151242Serine (or cysteine) proteinase inhibitor, clade G1
Ubiquitination and sumoylation
 UBE2D12.4.00034.0003033.15E-12Hs.129683Ubiquitin-conjugating enzyme E2D 1
 USP12.22.00061.0009513.07E-06Hs.35086Ubiquitin specific protease 1
 HSPC1501.76.00074.001058.31E-08Hs.5199Similar to ubiquitin-conjugating enzyme
 UBA22.437.40E-071.17E-069.28E-12Hs.4311SUMO-1 activating enzyme subunit 2
 RBX12.32.00017.000143.39E-07Hs.279919Ring-box 1
 RWDD12.345.30E-065.52E-05.002542Hs.22679RWD domain containing 1
 HIST1H4A1.97.0000493.98E-054.51E-11Hs.248178Histone 1, H4a
 HIST1H4C1.69.00051.0003061.91E-10Hs.46423Histone 1, H4c
 HIST2H42.04.00013.0001416.44E-11Hs.55466Histone 2, H4
 HRMT1L22.31.00065.000873.38E-14Hs.20521HMT1 hnRNP methyltransferase-like 2
 CRFG3.77.0000352.74E-056.51E-12Hs.215766G protein-binding protein CRFG
 HDAC22.64.0000152.42E-054.79E-12Hs.3352Histone deacetylase 2
 SLBP2.93.00039.000586.001256Hs.75257Stem-loop (histone) binding protein
 PTMA3.75.0000383.96E-057.89E-08Hs.250655Prothymosin, alpha
 SET2.2.0007.00112.65E-07Hs.145279SET translocation
 YWHAB3.48.00046.0004085.25E-06Hs.18223814-3-3 beta polypeptide
 YWHAH2.44.00063.0003471.18E-08Hs.34953014-3-3 eta polypeptide
 YWHAQ2.31.00085.0006562.81E-11Hs.7440514-3-3 theta polypeptide
 NALP22.91.00035.0001641.77E-11Hs.6844Neuronal apoptosis inhibitor protein 2
 PDCD53.04.0000111.41E-056.25E-06Hs.166468Programmed cell death 5
 P80.525.00073.000704.000309Hs.424279p8 Protein (candidate of metastasis 1)
 IER31.74.00038.0002958.43E-11Hs.76095Immediate early response 3
Cell cycle regulation and cell proliferation
 PCNA1.92.00022.0003013.57E-06Hs.78996Proliferating cell nuclear antigen
 CDK42.09.00085.0008363.78E-12Hs.95577Cyclin-dependent kinase 4
 TOPBP14.24.00016.85E-053.57E-08Hs.91417Topoisomerase (DNA) II binding protein
 CGR110.422.00038.000291.57E-06Hs.159525Cell growth regulatory with EF-hand domain
 BCAT11.58.000081.0002544.35E-08Hs.317432Branched chain aminotransferase 1, cytosolic
 CCNB12.19.0000587.66E-056.66E-07Hs.23960Cyclin B1
 CKS21.77.00011.0001159.41E-10Hs.83758CDC28 protein kinase regulatory subunit 2
 DLG72.69.0000176.12E-054.3E-07Hs.77695Discs, large homolog 7 (Drosophila)
 NAP1L11.98.000074.0001116.42E-10Hs.302649Nucleosome assembly protein 1-like 1
 CCNA21.97.00015.0003358.11E-10Hs.85137Cyclin A2
 MAPRE12.68.0000149.43E-063.29E-13Hs.234279Microtubule-associated protein, RP/EB family, member 1
 TTK1.66.00095.001281.08E-10Hs.169840TTK protein kinase
 BUB32.69.00083.001048.13E-09Hs.40323Budding uninhibited by benzimidazoles 3 homolog
 CENPF1.71.00047.0005934.53E-07Hs.77204Centromere protein F, 350/400ka
 KNTC12.35.00012.0002044.47E-05Hs.333355Kinetochore associated 1
 MCM22.03.00099.001198.1E-08Hs.57101Minichromosome maintenance deficient 2
 MCM61.88.00094.0008522.3E-10Hs.155462Minichromosome maintenance deficient 6
 MCM72.41.00047.0004342.6E-09Hs.77152Minichromosome maintenance deficient 7
Regulation of HIF1a
 HIF1A1.69.00022.0002981.19E-07Hs.197540Hypoxia-inducible factor 1, alpha
 EGLN20.461.00018.0002561.1E-08Hs.324277egl nine homolog 2

Predicted biological features of each subgroup of HCC based on gene expression patterns were further validated using independent methods as described in the supplementary notes.

Three Distinctive Gene Expression Patterns in HCC and ST.

To gain additional insight into the biological differences between the 2 subclasses of HCC, we generated 2 different gene lists by applying significance analysis of microarrays.31 Gene list X represents the top 500 genes that were differentially expressed between ST and all HCC tissues. Gene list Y represents the top 500 genes that were differentially expressed between HCCs in A and B clusters (Fig. 4A and 4B). When gene expression patterns of all tissues were compared together, 3 different patterns were observed: X not Y (330 genes), X and Y (170 genes), and Y but not X (330 genes). Genes in the X not Y category had uniform differences between all HCCs and STs regardless of subclass A or B, representing common alterations of gene expression in HCC. Enhanced expression of 26S proteasome subunits such as PSMC4, PSME3, PSMD4, PSMD2, and PSMB4 indicated an enhanced activation of wide-ranging protein degradation inall HCCs. However, ubiquitination, a selective protein degradation pathway, was only active in subclass A HCC (see Table 2). Genes in the X and Y category display a subclass-specific gene expression pattern. Although gene expression was altered in all HCCs, a more pronounced alteration was observed in subclass A. Expression of the G1/S phase cell cycle regulator CDK4 was highest in subclass A, moderately enhanced in subclass B, and lowest in ST, which agreed well with the more proliferative features of subclass A. Enhanced expression of H2FAX—a histone H2 variant involved in the chromosome double-strand breaks response32—in subclass A might reflect more chromosomal damage and/or instability in subclass A than in B. Additional reduction in the expression of liver-specific genes including the p450 family in the poorer survival group (subclass A) shows that reduced liver function is indeed a bad prognostic indicator for HCC patients.

thumbnail image

Figure 4. Distinctive gene expressions of HCC and ST. (A) Venn diagram of genes selected via significance analysis of microarrays. X represents genes differentially expressed between surrounding tissues and all HCC tissues. Y represents genes differentially expressed between cluster A HCC and cluster B HCC tissues. One hundred seventy genes were shared in 2 different gene lists. (B) Purple and red bars at the left side of the heat map represent X and Y genes, respectively. Pink bars at the right side of heat map represent survival genes. Colored bars at the top of heat map represent tissues as indicated. HCC, hepatocellular carcinoma; ST, surrounding tissue.

Download figure to PowerPoint


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Prognostic modeling of patients with HCC at diagnosis that considers tumor stage, functional impairments of the liver, and the general condition of the patient can provide valuable information and indicate therapy.4, 10 However, increased surveillance and advances in image technology have afforded earlier diagnosis of HCC. This development presents a challenge with respect to prognostic modeling of HCC, because the natural history of early HCC is unknown.12 In addition, intermediate and advanced HCC are quite heterogeneous,33 even though the natural history and prognostic factors are well defined.12 Therefore, it is necessary to establish robust methods capable of evaluating the prognosis of patients diagnosed at the early, intermediate, and late stages of HCC. As a first step in the development of a molecular prognostic evaluation, we have used gene expression profiling technology and unsupervised and supervised learning methods to successfully predict survival of HCC patients.

We applied three independent but complementary approaches for data analysis to uncover subclasses of HCC and the underlying biological differences between the subclasses. First, unsupervised classification methods based solely on gene expression patterns were applied. Hierarchical clustering of the data as well as multidimensional scaling revealed two subclasses of HCC strongly associated with the length of patients' survival. The differences in gene expression are quite robust as illustrated by the fact that the poorer survival group (subclass A) was successfully separated from the better survival group (subclass B) in the training data set as well as in the validation data set when 5 different statistical methods for prediction were applied (see Fig. 2). The presence of two extreme subgroups in AFP− patients was unexpected and probably accounts for the insufficient predictive power when AFP was used as a sole prognostic indicator.4, 10

Second, a univariate regression model was used to identify individual genes whose expression is highly correlated with length of survival. Application of survival genes for subclass prediction was highly accurate, as illustrated by the fact that averaged gene expression indices were sufficient to segregate the 2 subclasses even without the use of sophisticated prediction models. Also, information obtained from knowledge-based annotation of the 406 survival genes provided insight into the underlying biological differences between the 2 subclasses of HCC. Although quantitative measurement of cell proliferation and apoptotic rates in both subclasses strongly support the long-established notion that imbalance between cell proliferation and cell death is the primary hallmark of tumors34 and provided the best quantitative separation of the 2 subclasses, many other issues were also highlighted.

The ubiquitin system is often deregulated in cancers.35 In HCC, the degree of ubiquitination is highly correlated with cell proliferation and survival of patients and has also been proposed as a possible predictive marker for recurrence of human HCC.36 In addition, PSMD10/Gankyrin, a subunit of the 26S proteasome that accelerates the degradation of retinoblastoma, is overexpressed in HCC.37 Also, enhanced activation of ubiquitin-dependent protein degradation may account for deregulation of cell cycle control and faster cell proliferation in the poor survival group (subclass A). Therefore, deregulated components in ubiquitin-mediated protein degradation may provide attractive therapeutic targets for novel HCC treatment modalities.

The third approach involved the analysis of overall similarity and dissimilarity of gene expression between all the HCCs, subclasses A and B, and STs (see Fig. 4). Although the 2 subclasses of HCC may be viewed as distinctive biological entities, they still share significant overall similarity of gene expression when compared with ST. This may indicate that subclass A HCC accumulated additional oncogenic alterations of gene expression on top of a common HCC gene expression signature, thereby providing a more favorable environment for tumor cell growth. However, we cannot rule out the possibility that different mechanisms may contribute to the development of subclass A and B following exposure to different etiological factors, and the gene expression signature may reflect that etiological “footprint.” This scenario is unlikely, however, because the great majority of our HCC cases were associated with hepatitis B virus. Alternatively, the cell of origin of a tumor can be important in determining the clinical outcome, as shown for diffuse large B cell lymphoma.14 It is therefore possible that the 2 subclasses of HCC might represent different cellular origins (i.e., hepatic stem cells vs. hepatocytes) of the tumors.

Comparative analysis of our data and earlier studies demonstrated good concordance of the data despite differences in patient populations and technology platforms (see supplementary notes). It strongly supports the generality of our findings that the subclasses of HCC might represent distinct disease entities. Also, the observation that genes associated with early recurrence and intrahepatic metastasis of HCC19, 20 did not discriminate between the subclass A and B suggests that the information (at least from a gene expression standpoint) embedded in these important processes is not sufficient to predict survival. It is therefore likely that the additional information provided by the survival genes (only 2 of the genes associated with intrahepatic metastasis were among these) is needed for effectively predicting survival. This is of considerable importance, because in a recent study on survival of HCC patients it was demonstrated that HCC was the prime cause of death in patients with compensated cirrhosis.38 However, considerable molecular heterogeneity still exists within each HCC subclass, as evidenced by quantitative differences in survival gene expression (see Fig. 3B and 3C) and the small fraction of patients that are frequently misclassified in the prediction models. It is therefore probable that more subclasses of HCC might emerge when gene expression data from more HCC patients become available.

The severity of HCC and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Systematic analysis of gene expression patterns provides an insight into the biology and pathogenesis of HCC. Our results indicate that HCC prognosis can be readily predicted from the gene expression profiles of the primary tumors. Because the microarray-based measurement of gene expression reflects the abundance of expressed messenger RNA and proteins in the HCC as confirmed by quantitative reverse-transcriptase polymerase chain reaction and immunohistochemical staining (Supplementary Figs. 6 and 7), a limited set of quantitative reverse-transcriptase polymerase chain reaction and/or immunohistochemical staining assays may be sufficient to predict the prognosis of patients at the time of diagnosis; however, a prospective study is needed to confirm this proposal. Nevertheless, the unique molecular characteristics of each subclass of HCC uncovered by a genome-wide survey of gene expression provide insight into the tumor biology of HCC and offer the opportunity for new therapeutic strategies. SET and PTMA are of particular interest for potential therapeutic targets because of their multitasking features. Even if a curative therapy for HCC patients cannot be offered at this stage, it may be possible to identify therapeutic targets that can slow the course of disease progression. For example, small molecules that inhibit PTMA and HIF1a activities are already available24, 39 and may provide opportunities to alter the course of HCC progression in both subclass A and B.


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

This article includes Supplementary Notes, Tables, and Figures available at

suppmat_667.pdf1882KSupplementary notes, figures and tables.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.