Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
We analyzed global gene expression patterns of 91 human hepatocellular carcinomas (HCCs) to define the molecular characteristics of the tumors and to test the prognostic value of the expression profiles. Unsupervised classification methods revealed two distinctive subclasses of HCC that are highly associated with patient survival. This association was validated via 5 independent supervised learning methods. We also identified the genes most strongly associated with survival by using the Cox proportional hazards survival analysis. This approach identified a limited number of genes that accurately predicted the length of survival and provides new molecular insight into the pathogenesis of HCC. Tumors from the low survival subclass have strong cell proliferation and antiapoptosis gene expression signatures. In addition, the low survival subclass displayed higher expression of genes involved in ubiquitination and histone modification, suggesting an etiological involvement of these processes in accelerating the progression of HCC. In conclusion, the biological differences identified in the HCC subclasses should provide an attractive source for the development of therapeutic targets (e.g., HIF1a) for selective treatment of HCC patients. Supplementary material for this article can be found on the HEPATOLOGY Web site (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html) (HEPATOLOGY 2004;40:667–676.)
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world, accounting for an estimated 500,000 deaths annually.1 Although HCC is prevalent in Southeast Asia and sub-Sahara Africa, the incidence of HCC has doubled in the United States over the past 25 years, and incidence and mortality rates are likely to double over the next 10–20 years.2 Although much is known about both the cellular changes that lead to HCC and the etiological agents responsible for the majority of HCC cases (hepatitis B virus, hepatitis C virus, alcohol), the molecular pathogenesis of HCC is not well understood.3 Considerable efforts have been devoted to establishing a prognostic model for HCC by using clinical information and pathological classification to provide information at diagnosis on both survival and treatment options.4–10 Although much progress has been made (reviewed by Llovet et al.11), many issues still remain unresolved. For example, a staging system that reliably separates patients with early HCC as well as intermediate to advanced HCC into homogeneous groups with respect to prognosis does not exist. This is particularly important because the natural course of early HCC is unknown and the natural progression of intermediate and advanced HCC are known to be quite heterogeneous.12 It therefore appears axiomatic that improving the classification of HCC patients into groups with homogeneous prognosis would at least improve the application of currently available treatment modalities and at best provide new treatment strategies.
Recently, microarray technologies have been successfully used to predict clinical outcome and survival as well as classify different types of cancer.13–15 These microarray technologies have also been applied in many studies to define global gene expression patterns in primary human HCC as well as HCC-derived cell lines16 in an attempt to gain insight into the mechanisms of hepatocarcinogenesis. These studies have identified subgroups of HCC that differ according to etiological factors,17 mutations of tumor suppressor genes,18 rate of recurrence,19 and intrahepatic metastasis,20 as well as novel molecular markers for HCC diagnosis.21 However, most of these studies identified genes that are associated with limited aspects of tumor pathogenesis, and thus failed to create molecular prognostic indices that could be applied to the HCC patient population in general.
In the present study, we investigated the possibility that variations in gene expression in HCC obtained at diagnosis would permit the identification of distinct subclasses of HCC patients with different prognoses. The results revealed two subclasses of HCC patients characterized by significant differences in the length of survival. We also identified expression profiles of a limited number of genes that accurately predicted the length of survival. Our data indicate that it is possible to use gene expression patterns to accurately predict the clinical outcome of HCC at the time of diagnosis.
HCC, hepatocellular carcinoma; ST, surrounding tissue; AFP, alpha fetoprotein; HA genes, genes with high expression in subclass A tumors; HB genes, genes with high expression in subclass B tumors.
Patients and Methods
Complementary DNA Microarrays.
The Human Array-Ready Oligo Set (Version 2.0) containing 70-mer probes of 21,329 genes was obtained from Qiagen, Inc. (Valencia, CA). Oligo microarrays were produced at the Advanced Technology Center at the National Cancer Institute.
Human Tissue Samples and Preparation of RNA.
Surgically removed normal livers (n = 18) from patients with liver metastasis from colon cancers or from traffic accident patients were retrieved from the tissue bank of the Thomas E. Starzl Transplant Institute at the University of Pittsburgh Medical Center. One disease-free donor liver unsuitable for transplantation was also used. Total RNAs from the 19 normal livers were pooled and used as a reference for all microarray experiments. Ninety-one HCC tissues and 60 matched nontumor surrounding liver tissues were obtained from 90 patients undergoing partial hepatectomy as treatment for HCC. Tumor specimens originated from China and Belgium. Tissue banking was approved by the Institutional Review Board of all institutions. Total RNAs were isolated using the CsCl density gradient centrifugation method.22
Microarray Experiments and Data Analysis.
Twenty micrograms of total RNA from tissues were used to derive fluorescently (Cy5 or Cy3) labeled complementary DNA. A reference complementary DNA was generated using total RNA from 19 normal livers. At least two hybridizations were performed for each tissue sample using a dye-swap strategy to eliminate labeling bias of the fluorescent intensity measurement. A detailed procedure for microarray experimentation and data analysis is available in a supplementary note on the HEPATOLOGY Web site (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html).
We characterized gene expression profiles in 91 human primary HCC and 60 matched nontumor surrounding tissues (STs) using DNA microarrays. A hierarchical clustering analysis based on Pearson correlation coefficients was applied to all tissues on the basis of similarity in the expression pattern over all genes (Fig. 1A). As expected, it yielded two major clusters, one representing HCC tumors, and the other representing nontumor STs, with a few exceptions. Thus, the molecular configuration of HCC can be readily distinguished from nontumor STs, as has already been observed.18
Two Distinct Subclasses of HCC Revealed via Hierarchical Clustering of Gene Expression Patterns are Highly Associated With Survival of Patients.
Next, we attempted to identify subclasses of HCC solely on the basis of gene expression patterns. Genes with an expression ratio that has at least a twofold difference relative to the reference in at least 9 tumors were selected for hierarchical analysis (4,187 gene features). Analysis of the clustered data with the HCC revealed 2 distinctive subtypes of gene expression patterns among 91 cases of HCC (Fig. 1B), suggesting a degree of heterogeneity among HCC gene expression profiles. Members of the 2 clusters also resided in compact and easily separable three-dimensional space when viewed by a three-dimensional multidimensional scaling plot based on their overall similarity of expression patterns (Supplementary Fig. 2), indicating that the 2 subclasses identified with hierarchical clustering are not due to artifacts from data processing. Having identified the 2 distinctive subclasses of HCC, we examined the association of clusters with clinical data. The two clusters showed weak associations with serum alpha fetoprotein (AFP) levels and Edmonson tumor grades. Cluster A contained a higher percentage of AFP+ (>300 ng/mL) patients (62.5%) and Edmonson grade III tumors (77%), while 42% and 50% of cluster B was AFP+ and grade III, respectively (Table 1). Significant association with the clusters was only detectable in patient survival. The overall survival time in cluster A (30.3 ± 8.02 months) was shorter than cluster B (83.7 ± 10.3 months). As expected, a Kaplan-Meier survival curve and a log-rank test indicated poorer survival in cluster A patients (P < 1.0 × 10−4) when compared with cluster B (Fig. 1C). Thus, the molecular differences between these 2 subclasses of HCC were associated with a remarkable difference in the clinical outcome of these patients.
Table 1. Clinical and Pathological Features of HCC Patients
Abbreviations: HBV, hepatitis B virus; HCV, hepatitis C virus; NA, not applicable.
Mean and SE of survival time were estimated as described in supplementary notes. See details in Supplementary Table 3.
It has been widely accepted that serum AFP levels are significantly related to the survival of HCC patients; higher levels of serum AFP indicate poorer survival.5, 6, 10 Among many clinical indicators of the HCC patients, serum AFP levels (>300 ng/mL or less) only showed association with survival with marginal significance (P = .13) in our patient cohort (Fig. 1D). We determined whether or not our molecular classification of HCC could enhance the prognostic value of this clinical indicator previously used for prediction of survival. While the 2 subclasses of AFP+ patients showed a marginal difference in overall survival, AFP− patients in cluster A had a severely diminished overall survival (Supplementary Fig. 3). Moreover, when patients were subdivided into 4 groups based on serum AFP levels and gene expression clusters, AFP− patients in cluster A showed the worst overall survival among all patients (Fig. 1E).
Prediction of Survival With Gene Expression Profiles.
We applied 5 different statistical methods to determine whether or not gene expression patterns could be used to predict survival: linear discriminator analysis, support vector machines, nearest centroid, nearest neighbor, and compound covariate predictor. Before the analysis, 2 tumor samples, HCC89-2 and HCC16, were excluded from the data set because patient HCC16 died of septic shock after surgery and samples from 2 separate tumors were obtained from patient HCC89. In the absence of a totally independent data set, we attempted to assess the validation of our results and reproducibility of the test by randomly dividing the HCC into 2 equal groups: the training set (n = 45), which was used to develop the HCC classifiers, and the validation set (n = 44), which was used to evaluate the test. Briefly, we started to identify the most differentially expressed genes between 2 clusters in the training set. These genes were combined to form a series of classifiers that estimate the probability that a particular HCC belongs to cluster A or B. The number of genes in the classifiers was optimized to minimize misclassification errors during the “leave one out” cross-validation of the tumors in the training set. When applied to the validation set, all 5 models successfully separated poorer survival patients (cluster A) from longer survival patients (cluster B). All Kaplan-Meier survival curves and log-rank tests in the validation set showed significant differences between subclass A and B that were independently predicted using the 5 classifier models (Fig. 2A–G). Moreover, when we examined the predicted subclass memberships of the tumors, only a few discrepancies were observed (Fig. 2H). These results demonstrated not only strong association of gene expression patterns with the survival of the patients but also a robust reproducibility of these gene expression–based predictors.
Because the most striking feature of the unsupervised analysis of the expression profiles was the strong association with survival, we decided to apply supervised analysis of the genes whose expression is most strongly associated with length of survival. The univariate Cox proportional hazards model was used to assess the association of the gene expression with the survival. Expression of 442 features representing 406 unique genes (Supplementary Table 1) was highly correlated with length of survival with strong statistical significance (P < .001). The outcome of hierarchical cluster analysis of the HCC with the 406 survival genes was highly similar to the previous analysis with all the genes (Fig. 3A). With few exceptions, cluster memberships of each tumor remained the same in the 2 hierarchical cluster dendrograms, highlighting again the robustness of the predicted HCC subclasses and their strong association with length of survival. We noted that survival genes were almost equally divided into two groups, those whose expression is higher in subclass A tumors (HA genes) and those whose expression is higher in subclass B tumors (HB genes). When we categorized the survival genes according to the Gene Ontology, the biggest difference between HA genes and HB genes was observed in genes associated with cell proliferation (Supplementary Table 2). Of the HA survival genes, 45% belonged to the cell growth and maintenance category, while only 19% of HB survival genes were in the same Gene Ontology category, strongly suggesting that the HCCs in subclass A grow faster than those in subclass B.
We next generated an averaged gene expression index from HB genes to examine their predictive power. Patients were then ranked according to the average gene expression level of tumors from the highest to the lowest (Fig. 3B) and divided into two equal 50th percentiles. Kaplan-Meier plots and log-rank tests of overall patient survival in the 2 divided groups revealed striking differences with strong statistical significance (P < 1.0 × 10−4) (data not shown). Likewise, the average gene expression index from HA genes produced similar results (Fig. 3C) with comparable statistical significance (P < 1.0 × 10−5). Most of the patients in cluster A and B were perhaps not surprisingly well separated from each other in both 50th percentile segmentations (Fig. 3B and 3C). Taken together with the previous 2 independent clustering analyses and the cross-validation test of training and validation data sets, these results further support the notion that a distinct gene expression pattern predicts survival characteristics of the 2 subclasses of the HCC patients.
Next, we employed a knowledge-based annotation of the survival genes based on a public database search, because the Gene Ontology Consortium term annotation of genes was not sufficient to provide insight into the underlying biological differences between the 2 subclasses of HCC. The survival genes fell within several biological groups (Table 2). The cell proliferation group was the best predictor of an unfavorable outcome of the disease, which is consistent with previous analyses in human lymphomas.23 Expression of typical cell proliferation markers such as PNCA and cell cycle regulators such as CDK4, CCNB1, CCNA2, and CKS2 was greater in subclass A than subclass B. Not surprisingly, many genes that are expressed more in subclass A are antiapoptotic. Recent studies have identified PTMA/ProT as an inhibitor of apoptosome formation, the essential step for the final activation of the caspase-dependent cascade in the apoptotic pathway,24 and have identified SET as an inhibitor of the Granzyme A–induced caspase-independent pathway.25SET is also a subunit of the inhibitor of acetyltransferases complex that regulates histone modification and gene expression.26 Significantly, PTMA has recently been shown also to be part of the inhibitor of acetyltransferases complex,27 suggesting their multiple roles in hepatocarcinogenesis. Genes involved in prothrombin activation were expressed less in subclass A, indicating impairment of liver function in this subclass. Many of the genes with lower expression in subclass A were liver-specific (data not shown), which is consistent with the previous observation that poorly differentiated HCC tumors have less favorable clinical outcomes.28 Higher expression of genes involved in ubiquitination and sumoylation indicated that accelerated cell proliferation in the poorer survival group might be due to selective degradation of critical proteins, including cell cycle inhibitors. Concomitant overexpression of the histone H4 family with HRMT1L2 (H4-specific methyltransferase) in the poorer survival group of HCC may indicate unidentified roles of the histone H4 family and their modification in tumor development. Expression of HIF1a, the master regulator of hypoxia induced gene expression,29 was enhanced in subclass A, while expression of ENGL2, a negative regulator of HIF1a by prolyl hydroxylation,30 was reduced (Fig. 3D). Both these changes dramatically enhance HIF1a activity in tumor cells, which in turn provide a favorable environment for tumor growth.
Microtubule-associated protein, RP/EB family, member 1
TTK protein kinase
Budding uninhibited by benzimidazoles 3 homolog
Centromere protein F, 350/400ka
Kinetochore associated 1
Minichromosome maintenance deficient 2
Minichromosome maintenance deficient 6
Minichromosome maintenance deficient 7
Regulation of HIF1a
Hypoxia-inducible factor 1, alpha
egl nine homolog 2
Predicted biological features of each subgroup of HCC based on gene expression patterns were further validated using independent methods as described in the supplementary notes.
Three Distinctive Gene Expression Patterns in HCC and ST.
To gain additional insight into the biological differences between the 2 subclasses of HCC, we generated 2 different gene lists by applying significance analysis of microarrays.31 Gene list X represents the top 500 genes that were differentially expressed between ST and all HCC tissues. Gene list Y represents the top 500 genes that were differentially expressed between HCCs in A and B clusters (Fig. 4A and 4B). When gene expression patterns of all tissues were compared together, 3 different patterns were observed: X not Y (330 genes), X and Y (170 genes), and Y but not X (330 genes). Genes in the X not Y category had uniform differences between all HCCs and STs regardless of subclass A or B, representing common alterations of gene expression in HCC. Enhanced expression of 26S proteasome subunits such as PSMC4, PSME3, PSMD4, PSMD2, and PSMB4 indicated an enhanced activation of wide-ranging protein degradation inall HCCs. However, ubiquitination, a selective protein degradation pathway, was only active in subclass A HCC (see Table 2). Genes in the X and Y category display a subclass-specific gene expression pattern. Although gene expression was altered in all HCCs, a more pronounced alteration was observed in subclass A. Expression of the G1/S phase cell cycle regulator CDK4 was highest in subclass A, moderately enhanced in subclass B, and lowest in ST, which agreed well with the more proliferative features of subclass A. Enhanced expression of H2FAX—a histone H2 variant involved in the chromosome double-strand breaks response32—in subclass A might reflect more chromosomal damage and/or instability in subclass A than in B. Additional reduction in the expression of liver-specific genes including the p450 family in the poorer survival group (subclass A) shows that reduced liver function is indeed a bad prognostic indicator for HCC patients.
Prognostic modeling of patients with HCC at diagnosis that considers tumor stage, functional impairments of the liver, and the general condition of the patient can provide valuable information and indicate therapy.4, 10 However, increased surveillance and advances in image technology have afforded earlier diagnosis of HCC. This development presents a challenge with respect to prognostic modeling of HCC, because the natural history of early HCC is unknown.12 In addition, intermediate and advanced HCC are quite heterogeneous,33 even though the natural history and prognostic factors are well defined.12 Therefore, it is necessary to establish robust methods capable of evaluating the prognosis of patients diagnosed at the early, intermediate, and late stages of HCC. As a first step in the development of a molecular prognostic evaluation, we have used gene expression profiling technology and unsupervised and supervised learning methods to successfully predict survival of HCC patients.
We applied three independent but complementary approaches for data analysis to uncover subclasses of HCC and the underlying biological differences between the subclasses. First, unsupervised classification methods based solely on gene expression patterns were applied. Hierarchical clustering of the data as well as multidimensional scaling revealed two subclasses of HCC strongly associated with the length of patients' survival. The differences in gene expression are quite robust as illustrated by the fact that the poorer survival group (subclass A) was successfully separated from the better survival group (subclass B) in the training data set as well as in the validation data set when 5 different statistical methods for prediction were applied (see Fig. 2). The presence of two extreme subgroups in AFP− patients was unexpected and probably accounts for the insufficient predictive power when AFP was used as a sole prognostic indicator.4, 10
Second, a univariate regression model was used to identify individual genes whose expression is highly correlated with length of survival. Application of survival genes for subclass prediction was highly accurate, as illustrated by the fact that averaged gene expression indices were sufficient to segregate the 2 subclasses even without the use of sophisticated prediction models. Also, information obtained from knowledge-based annotation of the 406 survival genes provided insight into the underlying biological differences between the 2 subclasses of HCC. Although quantitative measurement of cell proliferation and apoptotic rates in both subclasses strongly support the long-established notion that imbalance between cell proliferation and cell death is the primary hallmark of tumors34 and provided the best quantitative separation of the 2 subclasses, many other issues were also highlighted.
The ubiquitin system is often deregulated in cancers.35 In HCC, the degree of ubiquitination is highly correlated with cell proliferation and survival of patients and has also been proposed as a possible predictive marker for recurrence of human HCC.36 In addition, PSMD10/Gankyrin, a subunit of the 26S proteasome that accelerates the degradation of retinoblastoma, is overexpressed in HCC.37 Also, enhanced activation of ubiquitin-dependent protein degradation may account for deregulation of cell cycle control and faster cell proliferation in the poor survival group (subclass A). Therefore, deregulated components in ubiquitin-mediated protein degradation may provide attractive therapeutic targets for novel HCC treatment modalities.
The third approach involved the analysis of overall similarity and dissimilarity of gene expression between all the HCCs, subclasses A and B, and STs (see Fig. 4). Although the 2 subclasses of HCC may be viewed as distinctive biological entities, they still share significant overall similarity of gene expression when compared with ST. This may indicate that subclass A HCC accumulated additional oncogenic alterations of gene expression on top of a common HCC gene expression signature, thereby providing a more favorable environment for tumor cell growth. However, we cannot rule out the possibility that different mechanisms may contribute to the development of subclass A and B following exposure to different etiological factors, and the gene expression signature may reflect that etiological “footprint.” This scenario is unlikely, however, because the great majority of our HCC cases were associated with hepatitis B virus. Alternatively, the cell of origin of a tumor can be important in determining the clinical outcome, as shown for diffuse large B cell lymphoma.14 It is therefore possible that the 2 subclasses of HCC might represent different cellular origins (i.e., hepatic stem cells vs. hepatocytes) of the tumors.
Comparative analysis of our data and earlier studies demonstrated good concordance of the data despite differences in patient populations and technology platforms (see supplementary notes). It strongly supports the generality of our findings that the subclasses of HCC might represent distinct disease entities. Also, the observation that genes associated with early recurrence and intrahepatic metastasis of HCC19, 20 did not discriminate between the subclass A and B suggests that the information (at least from a gene expression standpoint) embedded in these important processes is not sufficient to predict survival. It is therefore likely that the additional information provided by the survival genes (only 2 of the genes associated with intrahepatic metastasis were among these) is needed for effectively predicting survival. This is of considerable importance, because in a recent study on survival of HCC patients it was demonstrated that HCC was the prime cause of death in patients with compensated cirrhosis.38 However, considerable molecular heterogeneity still exists within each HCC subclass, as evidenced by quantitative differences in survival gene expression (see Fig. 3B and 3C) and the small fraction of patients that are frequently misclassified in the prediction models. It is therefore probable that more subclasses of HCC might emerge when gene expression data from more HCC patients become available.
The severity of HCC and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Systematic analysis of gene expression patterns provides an insight into the biology and pathogenesis of HCC. Our results indicate that HCC prognosis can be readily predicted from the gene expression profiles of the primary tumors. Because the microarray-based measurement of gene expression reflects the abundance of expressed messenger RNA and proteins in the HCC as confirmed by quantitative reverse-transcriptase polymerase chain reaction and immunohistochemical staining (Supplementary Figs. 6 and 7), a limited set of quantitative reverse-transcriptase polymerase chain reaction and/or immunohistochemical staining assays may be sufficient to predict the prognosis of patients at the time of diagnosis; however, a prospective study is needed to confirm this proposal. Nevertheless, the unique molecular characteristics of each subclass of HCC uncovered by a genome-wide survey of gene expression provide insight into the tumor biology of HCC and offer the opportunity for new therapeutic strategies. SET and PTMA are of particular interest for potential therapeutic targets because of their multitasking features. Even if a curative therapy for HCC patients cannot be offered at this stage, it may be possible to identify therapeutic targets that can slow the course of disease progression. For example, small molecules that inhibit PTMA and HIF1a activities are already available24, 39 and may provide opportunities to alter the course of HCC progression in both subclass A and B.