SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Progression of hepatocellular carcinoma (HCC) is a stepwise process that proceeds from pre-neoplastic lesions—including low-grade dysplastic nodules (LGDNs) and high-grade dysplastic nodules (HGDNs)—to advanced HCC. The molecular changes associated with this progression are unclear, however, and the morphological cues thought to distinguish pre-neoplastic lesions from well-differentiated HCC are not universally accepted. To understand the multistep process of hepato-carcinogenesis at the molecular level, we used oligo-nucleotide microarrays to investigate the transcription profiles of 50 hepatocellular nodular lesions ranging from LGDNs to primary HCC (Edmondson grades 1-3). We demonstrated that gene expression profiles can discriminate not only between dysplastic nodules and overt carcinoma but also between different histological grades of HCC via unsupervised hierarchical clustering with 10,376 genes. We identified 3,084 grade-associated genes, correlated with tumor progression, using one-way ANOVA and a one-versus-all unpooled t test. Functional assignment of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC. Using both diagonal linear discriminant analysis and support vector machines, we identified 240 genes that could accurately classify tumors according to histological grade, especially when attempting to discriminate LGDNs, HGDNs, and grade 1 HCC. In conclusion, a clear molecular demarcation between dysplastic nodules and overt HCC exists. The progression from grade 1 through grade 3 HCC is associated with changes in gene expression consistent with plausible functional consequences. Supplementary material for this article can be found on the HEPATOLOGY website (http://www.interscience.wiley.com/jpages/0270-9139/suppmat/index.html). (HEPATOLOGY 2005;42:809–818.)

Hepatocelluar carcinoma (HCC) is one of the most common malignancies worldwide. The chronic hepatitis resulting from infection with hepatitis B virus or hepatitis C virus and exposure to carcinogens such as aflatoxin B1 are known as major risk factors for HCC.1 Molecular investigations have recently found that genetic alterations of tumor suppressor genes or oncogenes such as p53, β-catenin, and AXIN1 might be involved in the progression to HCC,2–4 but the frequency of these somatic mutations appears to be low in HCCs. Furthermore, it is unclear how these genetic changes reflect the clinical characteristics of the individual tumors. Therefore, the predominant molecular events underlying HCC in most patients remain unknown.

Because HCC typically develops in close association with pre-existing cirrhosis, it is widely believed that a liver with cirrhosis may contain pre-neoplastic nodules that are in an intermediate stage between nonneoplastic regenerating nodules and overtly malignant HCC.5, 6 These nodular lesions have been designated as “dysplastic nodules” by the International Working Party and are further divided into low-grade dysplastic nodules (LGDNs) and high-grade dysplastic nodule (HGDNs) depending on the degree of cytological or architectural atypia on histological examination.7 That these nodules frequently contain one or more microscopic foci of HCC suggests that dysplastic nodules, especially HGDNs, might be precancerous lesions of HCC.8, 9 Some investigators have adopted the concept of early HCC (eHCC)—sometimes referred to as “carcinoma in situ” or “microinvasive carcinoma” of the liver—which is characterized by a small tumor mass lacking invasive growth properties such as vascular invasion or intrahepatic metastasis.10–12 However, there remains considerable controversy as to whether eHCC should be regarded as frank cancer or as a form of HGDN.13 As such, the distinction between precancerous and cancerous lesions remains debatable, and the developmental process from pre-neoplastic lesion to overt HCC is still unclear.

HCC can be classified into four different histological grades, known as Edmondson grades 1 through 4, which generally correspond to well-differentiated, moderately differentiated, poorly differentiated, and undifferentiated types of HCC, respectively.14, 15 Most cancer nodules less than 1 cm in diameter consist of well-differentiated cancerous tissues and are completely replaced by less well-differentiated cancerous tissues when the tumor size reaches a diameter of approximately 3 cm.16 As such, tumor de-differentiation and increasing tumor size are thought to reflect a continuum of morphological change in a multistep hepato-carcinogenesis process, but the molecular underpinnings of this are largely unknown.

Recently, DNA microarray technology has enabled the genome-wide analysis of gene transcript levels, and as such has yielded great insight into the molecular nature of cancer. Although several reports have described tumor-associated molecular expression profiles of liver cancers,17–20 little insight into the molecular nature of early or multistep hepato-carcinogenesis has been gained. To better understand this multistep process at the molecular level, we analyzed global transcript levels in the context of three different histological grades of HCC as well as precancerous LGDNs and HGDNs using a high-density spotted oligo-nucleotide microarray.

In the present study, we show that an extensive and remarkably reproducible expression signature comprised of several thousand genes underlies the procress of HCC progression. For the majority of genes, messenger RNA levels were either positively or negatively correlated with tumor progression. Functional analysis of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC, including cell proliferation, protein synthesis, and hepatocyte-specific functions. We also observed altered expression of known tumor suppressor genes and oncogenes that potentially contribute to this process. Additionally, we identified a subset of progression-associated genes that could accurately classify tumors according to grade, and readily distinguish dysplastic nodules from low-grade cancer.

Patients and Methods

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Patients and Tissue Preparation.

Primary HCCs, including Edmondson grade 1 (G1), grade 2 (G2), grade 3 (G3), and premalignant lesions of HCC (LGDNs and HGDNs), were obtained from 42 patients who underwent surgical treatment for HCC at Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea. Immediately after hepatectomy, freshly removed livers were serially sliced from the top edge to the bottom edge at 7- to 8-mm intervals and examined by a pathologist for the presence of nodular lesions. Any bulging nodules 10 mm or more in diameter or lesions macroscopically different in color from the surrounding liver, regardless of size, were snap-frozen in liquid nitrogen and stored at −80°C until use. Subsequent sections from the same nodule were fixed in 10% neutral formalin for confirmation of morphological diagnosis. The hematoxylin-eosin–stained sections were examined independently by two pathologists and classified as HCC with different histological grading according to the Edmondson and Steiner method or dysplastic nodules of low or high grade according to the guidelines of the International Working Party. In this way, we obtained a total of 30 HCCs (10 G1, 10 G2, and 10 G3), 10 LGDNs, and 10 HGDNs from 42 patients. To reduce experimental bias, we selected all specimens that had a background associated with cirrhosis and were hepatitis B virus (HBV) seropositive (Supplementary Table 1 ). Approval was obtained from the institutional review boards of the Catholic University of Korea College of Medicine and the Sungkyunkwan University School of Medicine. Informed consent was provided according to the Declaration of Helsinki.

DNA Microarrays.

The Compugen/Sigma Human Oligolibrary (60-mers) representing 18,664 LEADS clusters (Compugen/Sigma-Genosys, Woodland, TX) was spotted onto poly-L-lysine-coated glass microscope slides using an OmniGrid robotic arrayer (GeneMachines, San Calos, CA). All microarrays were manufactured at the Microarray and Expression Genomics Laboratory of the Genome Institute of Singapore essentially according to Eisen and Brown.21

RNA Preparation and Microarray Hybridization.

Total RNA was extracted from frozen tissues using TRIzol reagent following the manufacturer's protocol (Life Technology, Rockville, MD). Human universal reference RNA (Stratagene, La Jolla, CA) was used as the reference RNA. Total RNA (20 μg) was used for DNA target synthesis as described.22 The reference RNA was labeled with Cyanine-3, and the test sample was labeled with Cyanine-5.

Data Processing and Analysis.

All GenePix files were uploaded into the Genome Institute of Singapore microarray database and log expression ratios were normalized using the global median method. Microarray features with signal (foreground) intensities less than 50% above median local background intensity in both channels, and features automatically and manually flagged as “not found” were treated as missing values. Genes with expression values in 70% or more of the tumors within each of the five grades were retained for further analyses. 10,376 of 18,708 probes passed this filter. This probe set was used as the basis for all subsequent analyses. Hierarchical clustering of log ratios was performed using the softwares Cluster and Treeview23; Pearson correlation, mean centering and average linkage were applied in all clustering applications. One-way ANOVA (F test) and one-versus-all (OVA) t test were performed in the R statistical package (http://www.r-project.org/). Support vector machines (SVM) and diagonal linear discriminant analysis (DLDA) were used to assess the classification accuracy of gene classifiers with grade prediction potential.24–26

Class Prediction: Stratified Three-fold Cross-validation.

Classification accuracies were assessed using a stratified three-fold cross-validation scheme. Here the arrays were randomly partitioned into three folds. Any random grouping that resulted in less than two members of any grade in any fold was discarded and resampled. Two folds were used to train the classifier, which was then tested on the remaining fold. At every training/test set selection, p genes were selected. The p genes used in the classifier comprised the p/2 genes most significantly upregulated and the p/2 genes most significantly downregulated for each grade according to the OVA Welch t test results. The three-fold cross-validation process was repeated 100 times, and the mean accuracies were reported.

Results

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Large-Scale Gene Expression Alterations Coincide With Different Histological Grade of HCC Progression.

To determine whether global alterations in gene expression could discern histological grade differences ranging from pre-neoplastic lesion to advanced HCC, we examined the expression profiles in a series of 50 hepatocellular nodular lesions from 42 patients treated (see Materials and Methods). These specimens were subsequently hybridized onto spotted oligo-nucleotide microarrays, each containing 18,861 probes representing approximately 18,000 unique genes. Of these, 9 cases were not included in the final dataset as a result of suboptimal average signal intensities (owing to poor RNA quality) or unusually high background fluorescence. Therefore, 41 samples from patients comprised the final dataset, which included 7 LGDNs, 7 HGDNs, and 9 G1, 10 G2, and 8 G3 HCCs. The relevant patient/tissue clinico-pathological variables are provided as supporting information (Supplementary Table 1 ).

First, we performed unsupervised hierarchical cluster analysis on the expression profiles of the 41 hepatocellular nodular lesions with 10,376 genes that passed the basic filtering criteria described in Materials and Methods. This resulted in two predominant tissue clusters: one cluster (CI) that contained all the dysplastic nodules (LGDNs and HGDNs), and a second cluster (CII) that contained all of the G2 and G3 HCCs and a majority of the G1 tumors (6/9) (Fig. 1). Within the CII cluster, all of the G3 HCCs were found together in a single G3-exclusive subcluster, which was flanked by the G2 HCCs that together with the G3s comprised a larger G2-G3 subcluster. The ostensible separation of dysplastic hepatocytes (CI) from HCC (CII), along with the occurrence of grade-specific subclusters in CII, demonstrates that reproducible large-scale changes in gene expression distinguish pre-neoplastic lesions and overt HCC as well as different histological grades of HCC. Of the 9 G1 HCCs, 3 cases (G1-05, -06, and -09) were found to have expression profiles more similar to the pre-neoplastic nodules, while the rest clustered with the overt HCCs. This observation suggests that G1 HCCs are molecularly heterogeneous, sitting on the border between the transition from premalignant lesion to overt malignant carcinoma, and that this heterogeneity is distinguishable at the molecular level. Additionally, we compared a small number of nontumorigenic surrounding tissues (i.e., “normal” tissue) to the HCCs and dysplastic nodules. The expression profiles of the nontumorigenic surrounding tissues consistently clustered apart from the dysplastic nodules, suggesting that dysplasia itself is marked by transcriptional alterations distinct from “normal” liver tissue (Supplementary Fig. 1 ).

thumbnail image

Figure 1. Unsupervised hierarchical clustering of pre-neoplastic lesions and primary HCCs can accurately partition tissues according to malignancy status and high tumor grade. (A) Two-dimensional clustergram of the 10,376 genes selected with minimal filtering criteria (see Materials and Methods). Each row represents a tumor profile; each column represents a probe's measurements. The color saturation reflects the difference in expression between the tissue specimen and the common reference RNA. (B) Tissue dendrogram derived from clustering using the 10,376 gene set. Note that the two dominant clusters, cluster I (CI) and cluster II (CII)—with the exception of G1 HCCs—accurately partition the pre-neoplastic and malignant tissues, and that the CII cluster is further subdivided into branches that are largely grade-specific. LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Download figure to PowerPoint

Identification and Pattern Analysis of HCC Grade-Associated Genes.

To study in detail the genes most often correlated with tumor progression (referred to henceforth as “grade-associated” genes), we identified all genes associated with grade at P < .001 by either one-way ANOVA (F test) or OVA unpooled t test.26 The F test assigns the greatest significance to genes with expression profiles that show continuous variation among classes (e.g., grades), while the OVA t test assigns greater significance to genes with expression profiles that clearly distinguish one class from the rest. Consequently, the F test is biased toward selecting genes with profiles that progressively change from one class to the next, while the OVA t test is biased toward selection of those that show class-specific expression spikes. Thus, gene selection based on this combination of statistical measures allows for greater discovery of differentially expressed genes as it takes advantage of the inherent differences between the F test and the t test. We obtained 2,423 and 3,118 probes significant at P < .001 by F test and at least one of the five OVA t tests, respectively. After removing redundant discoveries, we were left with 3,084 probes with nonredundant gene identities with an expected maximum occurrence of false discoveries of 63 genes [10,376 × (0.001 + 0.001 × 5)]. Hierarchical clustering of these genes in Fig. 2 shows that the predominant grade-associated expression profiles are those with either positive or negative correlations with grade, rather than genes with spiking expression at precise stages of progression.

thumbnail image

Figure 2. Clusters of grade-associated genes with biological implications. Hierarchical clustergram of 3,084 gene expression patterns with significant associations with grade (P < .001; F test and/or OVA t test) is shown (left). Three clusters of highly correlated genes are shown: cell cycle genes (cluster 1, top), genes involved in protein synthesis (cluster 2, middle), and genes involved in liver-specific functions (cluster 3, bottom). Unigene names are given (Unigene build #161). Note that the majority of genes show gradual but continuous change from low dysplastic expression to high malignant expression, or high dysplastic expression to low malignant expression. LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Download figure to PowerPoint

Biological Properties of Grade-Associated Genes.

Hierarchical cluster analysis of the 3,084 grade-associated genes revealed several clusters of particularly highly correlated genes with ostensible biological implications, suggesting the coordination of certain biological activities with HCC progression. Genes of the top cluster (cluster 1) shown in Fig. 2 are characterized by a gradual increase in transcript levels, with the highest levels found consistently in G3 tumors. Using Gene Ontology terms, we observed enrichment in this cluster for genes associated with cell cycle functions, including numerous genes involved in DNA replication, chromatin remodeling, and cell proliferation. Comparative analysis of this cluster with the human cell cycle gene list defined by Whitfield et al.27 revealed further involvement of genes having periodic expression during the cell cycle (Fig. 2, cluster 1). Cluster 2 (middle cluster) is characterized by expression patterns showing a gradual increase from pre-neoplastic lesion to G2 HCC, followed by a relatively sharp increase in transcript abundance in most G3 tumors. These genes, which had the highest overall “within-cluster” correlation, are comprised predominantly of genes directly involved in protein synthesis, including ribosomal proteins, translation initiation, and elongation factors and constituents of the spliceosome. Finally, the genes comprising cluster 3 (Fig. 2, bottom cluster) are characterized by a gradual but large-magnitude decline in transcript levels from LGDNs to high-grade HCC, and are made up mostly of genes that have central roles in primary liver function or are expressed exclusively in hepatocytes. These include genes involved in fatty acid and lipid metabolism, detoxification pathways, and synthesis of complement and coagulation factors, suggesting a gradual loss of normal hepatocyte function coincident with progressive cellular de-differentiation.

Further examination of the grade-associated genes revealed a number of genes that, through altered expression, may contribute directly to the increasing malignant behavior of advancing HCC. Figure 3 shows representative function-associated clusters. For example, the top cluster shows 24 such genes known or suspected to play roles in oncogenic transformation or tumor suppression. In addition, several growth factors, genes involved in apoptosis, and cell adhesion molecules that might have potential roles in HCC development and progression through altered expression were also extracted via categorical analysis using Gene Ontology as shown in Fig. 3.

thumbnail image

Figure 3. Expression patterns of grade-associated genes with possible roles in HCC pathogenesis. Grade-correlated genes were classified according to Gene Ontology terms or through a search of the literature. Subsets of these genes are shown for the following categories: (1) oncogenes and tumor suppressors, (2) growth factors, (3) apoptosis, and (4) cell adhesion. Unigene cluster symbols and names are shown (Unigene build #161). LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Download figure to PowerPoint

Grade-Associated Genes Predict Stage of HCC Progression.

We next sought to determine whether we could identify a subset of genes that could accurately classify the specimens according to grade. We addressed two classification problems: (1) discriminating among LGDNs, HGDNs, or G1, and (2) discriminating among G1, G2, or G3, because these are the most relevant problems in HCC diagnosis. Treating these as one five-grade problem could limit the use of some genes that otherwise might perform well in the two smaller clinically relevant problems.

We compared three different classification methods—DLDA, SVM,24 and k-nearest neighbor28—for each of the two classification problems. Of note, approximately 4% of the values were missing and were therefore imputed according to the k-nearest neighbor imputation method.28 Classification accuracies were assessed using stratified three-fold cross-validation with 100 repetitions (see Materials and Methods).

The number of genes for each grade, p, was varied to find the optimal number of gene classifiers. Figure 4A shows the plot of classification accuracy as p was varied for DLDA and SVM classifiers. We found DLDA and two varieties of SVM (ie, linear and RBF kernels to be the most robust classifiers for both problems. (Note: because the k-nearest neighbor classifiers had inferior performance, only the SVM and DLDA accuracies are shown.) These results suggested that using 30 to 50 genes per grade was optimal. We therefore decided to use 40 genes (the 20 most significantly upregulated and the 20 most significantly downregulated in each grade), resulting in 120 total genes for each problem (Fig. 4B-C)—that is, 120 “early-stage” genes for discriminating among early-stage samples, and 120 “late-stage” genes for discriminating among late-stage samples (Supplementary Tables 2 and 3 ) for a total of 240 grade classifier genes. As shown in Fig. 4, we would expect an approximately 95% classification accuracy in discriminating between early stage samples and an approximately 91% classification accuracy in discriminating between late-stage samples.

thumbnail image

Figure 4. Determination of optimal classification strategies and high-accuracy gene classifiers. (A) Classification accuracies as a function of gene (classifier) number are shown for each of three different classification methods (DLDA, SVM linear, and SVM radial), for each of the two grade problems: G1-G2-G3 and LGDN-HGDN-G1. (B,C) Genes were classified according to OVA t test–derived P values for the LGDN-HGDN-G1 problem (B) and the G1-G2-G3 problem (C), and the top 20 most highly expressed and 20 most underexpressed genes in each grade class (i.e., 40 genes per grade class) were selected to constitute a 120-gene classifier for each problem. G1-3, Edmondson grades 1-3; LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; DLA, discriminant linear analysis; SVM, support vector machine.

Download figure to PowerPoint

The prediction confidence of a specimen can be assessed by the frequency with which it is correctly classified in 100 random partitions in three-fold cross-validation. A summary of the frequency of class assignments using the early- and late-stage genes is tabulated in Table 1. For the early-stage samples, 100% of the specimens were correctly classified the majority of the time by all three methods. For the late-stage samples, all but two arrays were classified correctly the majority of the time by all three methods. Specimens G2_02 and G3_08 were consistently misclassified into the adjacent lower grade.

Table 1. LGDN-HGDN-G1 Confusion Matrix and G1-G2-G3 Confusion Matrix of the DLDA and SVM Classifiers
SampleDiagonal LDASVM LinearSVM RadicalSampleDiagonal LDASVM LinearSVM Radical
LGDNHGDNG1LGDNHGDNG1LGDNHGDNG1G1G2G3G1G2G3G1G2G3
  • Abbreviation: LDA, linear discriminant analysis.

  • *

    Indicates misclassified sample.

LGDN_02955097309640G1_01623718515086140
LGDN_03653508020076240G1_0394608218087130
LGDN_05100001000010000G1_04991099109820
LGDN_07100001000010000G1_05100001000010000
LGDN_08991099109910G1_06100001000010000
LGDN_09991098209730G1_078812095509820
LGDN_1010000100009910G1_088911097309550
HGDN_0119363574036235G1_091000099109910
HGDN_020100009640982G1_10100001000010000
HGDN_05010000100001000G2_01095509730946
HGDN_060100009820973G2_02*574306040066340
HGDN_07010000100001000G2_030100049333916
HGDN_08010000100001000G2_04199059415950
HGDN_100100019903970G2_050100009910991
G1_01001000010000100G2_0609280901008614
G1_03001000010000100G2_0729711184514833
G1_04001000010000100G2_08089110683206238
G1_0521088001001099G2_0909731881118415
G1_060298001000199G2_1019901288018820
G1_07029809910793G3_02001000010000100
G1_08039703970397G3_04017830227802674
G1_090425816931792G3_05001000010000100
G1_10001000010000100G3_06019907930793
          G3_070010002980694
          G3_08*057430653506436
          G3_0902980109001189
          G3_10029805951792

We next extended our validation of the 240 classifier genes to an independent set of specimens consisting of 5 new samples and the 9 samples that were previously excluded from the initial analysis. As shown in Table 2, we were able to correctly classify all 5 of the new fresh samples. Furthermore, despite RNA quality concerns, the majority of the remaining 9 samples (7 of 9) were also classified correctly. These data, though limited by a relatively small test set, suggest that these 240 progression-associated genes could be clinically useful classifiers for assisting diagnosis of all stages of hepato-carcinogenesis.

Table 2. Good Overall Classification by DLDA on Two Independent Datasets
 SampleClassPrediction
  • NOTE. The new dataset is comprised of five previously unanalyzed tumor samples; the excluded dataset includes the nine arrays that were previously excluded from analysis due to suboptimal RNA or hybridization features.

  • *

    Indicates misclassified sample.

New datasetHCC_168G1G1
 HCC_141G2G2
 HCC_143G1G1
 HCC_203HGDNHGDN
 HCC_219G2G2
Excluded datasetLGDN_01LGDNLGDN
 LGDN_04LGDNLGDN
 LGDN_06LGDNLGDN
 HGDN_03HGDNHGDN
 HGDN_04HGDNHGDN
 HGDN_09HGDNHGDN
 G1_02*G1G1
 G3_01*G3G2
 G3_03*G3G1

Discussion

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

Patients with HCC have a poor prognosis because most HCCs are detected at a stage too late for curative treatment. Therefore, early detection of small HCC or precancerous lesions appears to be the best way to achieve better therapeutic results. However, morphological and molecular features of precancerous lesions are far from being fully elucidated. The terminology of nodular hepatocellular lesions adopted by the International Working Party of the 1995 World Congress of Gastroenterology suggests that there is a continuum in hepato-carcinogenesis that includes low-grade dysplastic nodules, HGDNs, and dysplastic nodules with microscopic foci of HCC, which may enlarge and replace the nodule giving rise to a small HCC, and finally advanced HCC.7, 9 Despite the fact that this group provided several morphological criteria to discriminate between well-differentiated HCC and HGDN and/or LGDN, they acknowledged that a strict line could not be drawn between premalignant and malignant lesions by simple microscopic observation.

The recent advance of DNA microarray technology, a high-throughput method of monitoring gene expression, has made it possible to analyze the expression of thousands of genes at once. Consequently, expression profiling by microarrays has been profitably applied to gene discovery and class determination in human cancers.29 To understand molecular changes associated with the developmental stages of HCC, we assessed gene expression profiles of the different histopathological stages of HCC, including LGDNs, HGDNs, and G1-G3 HCCs, using a high-density spotted oligo-nucleotide microarray analysis. We observed not only the clear separation of dysplastic nodule (CI) from overt cancer (CII) but also grade-specific subclusters of HCC in CII via unsupervised hierarchical clustering analysis (Fig. 1). These results indicate that there is a clear difference in molecular signature between each histological grade in the progression of HCC. However, there is some molecular heterogeneity in G1 HCC. Most G1 HCCs (6/9) had expression profiles showing more relation to the frank carcinoma and clustered adjacent to G2 and G3 HCCs as shown in Fig. 1; however, 3 cases of G1 HCCs (G1-5, G1-6, and G1-9) were grouped into precancerous nodules. Among these 3 cases, G1-9 was confirmed as eHCC upon histopathological review. In fact, histologically defined G1 HCC lesions can be further divided into small HCC with indistinctive margin (eHCC) and small nodular HCC with distinctive margin, with more than half of these encapsulated by a thin fibrous capsule.30 Unlike eHCC, which lacks an invasive growth pattern, the later lesion revealed tumor invasion into the portal vein and intrahepatic metastasis in 27% (G1-5) and 10% (G1-6).31 This heterogeneity strongly suggests that G1 HCC might border between pre-neoplastic lesion and outright carcinoma representing a transition state from dysplasia to carcinoma. Furthermore, after carefully analyzing whether or not the replicative state of HBV infection could influence the resultant expression profiling by using pathological information pertaining to the replicative state of HBV infection (as measured by HBV DNA levels in serum), we were unable to find any significant correlation between our gene expression results and the replicative state of HBV in the samples, suggesting that the replicative state of HBV has little or no measurable effect on gene transcription in our HCC samples.

We also identified 3,084 grade-associated genes whose transcript levels were either positively or negatively correlated with tumor progression through a combination of one-way ANOVA and OVA unpooled t test. Functional analysis of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC, including cell proliferation, protein synthesis, and hepatocyte-specific function (Fig. 2). Using Gene Ontology terms, we performed categorical analysis according to gene function and extracted a number of well-known genes as tumor suppressor, oncogenes, growth factors, effectors of apoptosis, and cell adhesion molecules involved in cell–cell and cell–matrix interactions whose expression patterns were associated with grade. For example, RARRES3 (retinoic acid receptor responder 3) is a class II tumor suppressor gene (i.e., downregulated in tumorigenesis rather than mutated) with growth suppressive and apoptosis-inducing activity.32 It has previously been found to be downregulated in a manner correlated with progression of B-CLL33 and cellular de-differentiation in colorectal adenocarcinoma,34 consistent with our observation that this gene is downregulated in G2 and G3 HCCs. The majority of oncogenes and tumor suppressors identified here demonstrate expression patterns that systematically change from dysplasia to carcinoma, and in some cases, with alterations in expression already detectable in the pre-neoplastic state. It is therefore interesting to speculate that these genes, acting together or separately, could be directly involved in common pathways of HCC pathogenesis in a grade-dependent fashion.

Due to the recent advances in diagnostic imaging techniques and increased clinical and pathological interest, small hepatocellular nodular lesions even less than 1 cm in size are frequently detected in patients with cirrhosis who have been monitored as high-risk patients. These nodules could be LGDNs, HGDNs, or well-differentiated small HCCs and sonographic- or CT-guided needle biopsies from these nodules are performed routinely for differential diagnosis. However, it is often difficult—even for a hepatopathologist—to differentiate among these lesions, especially in needle-biopsied specimens with limited material. For this reason, the discovery of an objective molecular marker or classifier genes that will help to standardize histological differential diagnosis of these nodules and lead to appropriate treatment is eagerly anticipated. In the present study, we identified a subset of genes that could accurately classify specimens according to histological grade. We considered these as two separate problems rather than one five-grade problem: (1) discriminating among LGDNs, HGDNs, or G1 HCCs (early-stage lesions) and (2) discriminating among G1, G2, or G3 HCCs (late-stage lesions). We selected the top 20 most highly expressed and 20 most under-expressed genes in each grade class (i.e., 40 gene per grade class) resulting in 120 total genes for each problem. Indeed, of 23 pre-neoplastic lesions and G1 HCCs, none was misclassified by a chosen set of 120 early-stage–associated genes. And 2 samples (2/27) were misclassified in the case of overt HCC (Table 1). We extended our confidence validation analysis for the 240 outlier genes to an independent set of specimens consisting of 5 new samples and the 9 samples that were previously excluded for analysis because of RNA quality concerns. We were able to correctly classify all samples except for two G3 cases, which were misclassified as G2 and G1, respectively. Although more testing on a larger, independent set of tumors graded by a different pathologist will be necessary to establish the accuracy and clinical value of the classifier, this implied that a series of 240 outlier genes could potentially be good classifiers especially for distinguishing among LGDNs, HGDNs, and G1 HCCs via both DLDA and SVM.

In conclusion, it is true that despite numerous investigations of hepato-carcinogenesis, only limited or incomplete data are available regarding gene expression profiles during the development and progression of HCC in humans.18, 19, 35–37 The systemic approaches such as the simultaneous evaluation of genome-wide transcripts and regulatory pathways in precancerous lesions and HCCs are necessary to gain much-needed molecular insight into hepato-carcinogenesis. We uncovered the molecular signatures of pre-neoplastic lesions and early- and advanced-stage HCC. Our 240 classifier genes for distinguishing the early and advanced stages of HCC exhibited high fidelity in classification from pre-neoplastic lesions to HCCs. Through further informative analysis of these outlier genes and intensive clinical validations, we hope to identify clinically useful biomarkers that will facilitate early detection of liver cancer, and perhaps further elucidate the underlying molecular pathology of HCC.

References

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information
  • 1
    Thorgeirsson SS, Grisham JW. Molecular pathogenesis of human hepatocellular carcinoma. Nat Genet 2002; 31: 339346.
  • 2
    Pang A, Ng IO, Fan ST, Kwong YL. Clinicopathologic significance of genetic alterations in hepatocellular carcinoma. Cancer Genet Cytogenet 2003; 146: 815.
  • 3
    de La CA, Romagnolo B, Billuart P, Renard CA, Buendia MA, Soubrane O, et al. Somatic mutations of the betacatenin gene are frequent in mouse and human hepatocellular carcinomas. Proc Natl Acad Sci U S A 1998; 95: 88478851.
  • 4
    Satoh S, Daigo Y, Furukawa Y, Kato T, Miwa N, Nishiwaki T, et al. AXIN1 mutations in hepatocellular carcinomas, and growth suppression in cancer cells by virus-mediated transfer of AXIN1. Nat Genet 2000; 24: 245250.
  • 5
    Takayama T, Makuuchi M, Hirohashi S, Sakamoto M, Okazaki N, Takayasu K, et al. Malignant transformation of adenomatous hyperplasia to hepatocellular carcinoma. Lancet 1990; 336: 11501153.
  • 6
    Theise ND, Park YN, Kojiro M. Dysplastic nodules and hepatocarcinogenesis. Clin Liver Dis 2002; 6: 497512.
  • 7
    Terminology of nodular hepatocellular lesions. International Working Party. HEPATOLOGY 1995; 22: 983993.
  • 8
    Tornillo L, Carafa V, Sauter G, Moch H, Minola E, Gambacorta M, et al. Chromosomal alterations in hepatocellular nodules by comparative genomic hybridization: high-grade dysplastic nodules represent early stages of hepatocellular carcinoma. Lab Invest 2002; 82: 547553.
  • 9
    Sakamoto M, Hirohashi S, Shimosato Y. Early stages of multistep hepatocarcinogenesis: adenomatous hyperplasia and early hepatocellular carcinoma. Hum Pathol 1991; 22: 172178.
  • 10
    Kanai T, Hirohashi S, Upton MP, Noguchi M, Kishi K, Makuuchi M, et al. Pathology of small hepatocellular carcinoma. A proposal for a new gross classification. Cancer 1987; 60: 810819.
  • 11
    Takayama T, Makuuchi M, Hirohashi S, Sakamoto M, Yamamoto J, Shimada K, et al. Early hepatocellular carcinoma as an entity with a high rate of surgical cure. HEPATOLOGY 1998; 28: 12411246.
  • 12
    Kojiro M. Premalignant lesions of hepatocellular carcinoma: pathologic viewpoint. J Hepatobiliary Pancreat Surg 2000; 7: 535541.
  • 13
    Theise ND, Park YN, Kojiro M. Dysplastic nodules and hepatocarcinogenesis. Clin Liver Dis 2002; 6: 497512.
  • 14
    Edmondson HA, Steiner PE. Primary carcinoma of the liver: a study of 100 cases among 48,900 necropsies. Cancer 1954; 7: 462503.
  • 15
    Ferrell LD, Crawford JM, Dhillon AP, Scheuer PJ, Nakanuma Y. Proposal for standardized criteria for the diagnosis of benign, borderline, and malignant hepatocellular lesions arising in chronic advanced liver disease. Am J Surg Pathol 1993; 17: 11131123.
  • 16
    Kenmochi K, Sugihara S, Kojiro M. Relationship of histologic grade of hepatocellular carcinoma (HCC) to tumor size, and demonstration of tumor cells of multiple different grades in single small HCC. Liver 1987; 7: 1826.
  • 17
    Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, et al. Gene expression patterns in human liver cancers. Mol Biol Cell 2002; 13: 19291939.
  • 18
    Xu XR, Huang J, Xu ZG, Qian BZ, Zhu ZD, Yan Q, et al. Insight into hepatocellular carcinogenesis at transcriptome level by comparing gene expression profiles of hepatocellular carcinoma with those of corresponding noncancerous liver. Proc Natl Acad Sci U S A 2001; 98: 1508915094.
  • 19
    Okabe H, Satoh S, Kato T, Kitahara O, Yanagawa R, Yamaoka Y, et al. Genome-wide analysis of gene expression in human hepatocellular carcinomas using cDNA microarray: identification of genes involved in viral carcinogenesis and tumor progression. Cancer Res 2001; 61: 21292137.
  • 20
    Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, et al. Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. HEPATOLOGY 2004; 40: 667676.
  • 21
    Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol 1999; 303: 179205.
  • 22
    DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278: 680686.
  • 23
    Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95: 1486314868.
  • 24
    Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 2000; 97: 262267.
  • 25
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16: 906914.
  • 26
    Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 2001; 98: 1514915154.
  • 27
    Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002; 13: 19772000.
  • 28
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001; 17: 520525.
  • 29
    Kim I, Kang HC, Park J. Microarray application in cancer research. Cancer Research and Treatment 2004; 36: 207213.
  • 30
    Kojiro M. Pathology of early hepatocellular carcinoma: progression from early to advanced. Hepatogastroenterology 1998; 45(Suppl 3): 12031205.
  • 31
    Kojiro M, Yano H, Nakashima O. Pathology of early hepatocellular carcinoma: progression from early to advanced. Semin Surg Oncol 1996; 12: 197203.
  • 32
    Huang SL, Shyu RY, Yeh MY, Jiang SY. The retinoid-inducible gene I: effect on apoptosis and mitogen-activated kinase signal pathways. Anticancer Res 2002; 22: 799804.
  • 33
    Casanova B, de la Fuente MT, Garcia-Gila M, Sanz L, Silva A, Garcia-Marco JA, Garcia-Pardo A. The class II tumor-suppressor gene RARRES3 is expressed in B cell lymphocytic leukemias and down-regulated with disease progression. Leukemia 2001; 15: 15211526.
  • 34
    Shyu RY, Jiang SY, Chou JM, Shih YL, Lee MS, Yu JC, et al. RARRES3 expression positively correlated to tumour differentiation in tissues of colorectal adenocarcinoma. Br J Cancer 2003; 89: 146151.
  • 35
    Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, et al. Gene expression patterns in human liver cancers. Mol Biol Cell 2002; 13: 19291939.
  • 36
    Chuma M, Sakamoto M, Yamazaki K, Ohta T, Ohki M, Asaka M, et al. Expression profiling in multistage hepatocarcinogenesis: identification of HSP70 as a molecular marker of early hepatocellular carcinoma. HEPATOLOGY 2003; 37: 198207.
  • 37
    Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, et al. Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. HEPATOLOGY 2004; 40: 667676.

Supporting Information

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. References
  7. Supporting Information

This article includes Supplementary material available at http://www.interscience.wiley.com/jpages/0270-9139/suppmat

FilenameFormatSizeDescription
jws-hep.20878.fig1.tif298K. Comparison of expression profiling of nontumorigenic tissues, dysplastic nodules, and HCCs. Hundred outlier genes for each grade (LGDN, HGDN, G1-3 HCCs) were selected by combination of OVA T-test and ANOVA (see Materials and Methods), and the corresponding outliers were also selected from nontumorigenic tissues with cirrhosis (designated as "NL"), then combined and analyze the expression pattern of all outlier genes. The heat-map of hierarchical clustering analysis (a) and its dendrogram (b) show that NL arrays located very close together with dysplastic nodule cluster, and formed subscluter within dysplastic nodule cluster, suggesting molecular nature of dysplastic nodule is more like NL, but may have distinct signature with dysplastic nodule.
jws-hep.20878.tbl1-.doc236K Supplementary Table 1.Clinical data and diagnostic details on experimental samples. Supplementary Table 2.The list of 120 early stage genes for discriminating among early-stage samples * Any probes selected twice in a classification problem are marked with an asterisk. Supplementary Table 3.The list of 120 late stage genes for discriminating among late-stage samples * Any probes selected twice in a classification problem are marked with an asterisk.

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.