Molecular changes from dysplastic nodule to hepatocellular carcinoma through gene expression profiling

Authors

  • Suk Woo Nam,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Jik Young Park,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Adaikalavan Ramasamy,

    1. Genome Institute of Singapore, Singapore
    Search for more papers by this author
  • Shirish Shevade,

    1. Genome Institute of Singapore, Singapore
    Search for more papers by this author
  • Amirul Islam,

    1. Genome Institute of Singapore, Singapore
    Search for more papers by this author
  • Philip M. Long,

    1. Genome Institute of Singapore, Singapore
    Search for more papers by this author
  • Cheol Keun Park,

    1. Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
    Search for more papers by this author
  • Soo Eun Park,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Su Young Kim,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Sug Hyung Lee,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Won Sang Park,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Nam Jin Yoo,

    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    Search for more papers by this author
  • Edison T. Liu,

    1. Genome Institute of Singapore, Singapore
    Search for more papers by this author
  • Lance D. Miller Ph.D.,

    Corresponding author
    1. Genome Institute of Singapore, Singapore
    • Department of Pathology, College of Medicine, The Catholic University of Korea, #505 Banpodong, Seocho-gu, Seoul, South Korea, 137–701
    Search for more papers by this author
    • fax: (65) 6478–9060

  • Jung Young Lee M.D., Ph.D.

    Corresponding author
    1. Department of Pathology, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    2. Microdissection Genomics Research Center, College of Medicine, The Catholic University of Korea, Seoul, South Korea
    • Microarray and Expression Genomics Genome Institute of Singapore, Genome Building #02–01, 60 Biopolis Street, Singapore 138672
    Search for more papers by this author
    • fax: (82) 2–537–6586


  • Potential conflict of interest: Nothing to report.

Abstract

Progression of hepatocellular carcinoma (HCC) is a stepwise process that proceeds from pre-neoplastic lesions—including low-grade dysplastic nodules (LGDNs) and high-grade dysplastic nodules (HGDNs)—to advanced HCC. The molecular changes associated with this progression are unclear, however, and the morphological cues thought to distinguish pre-neoplastic lesions from well-differentiated HCC are not universally accepted. To understand the multistep process of hepato-carcinogenesis at the molecular level, we used oligo-nucleotide microarrays to investigate the transcription profiles of 50 hepatocellular nodular lesions ranging from LGDNs to primary HCC (Edmondson grades 1-3). We demonstrated that gene expression profiles can discriminate not only between dysplastic nodules and overt carcinoma but also between different histological grades of HCC via unsupervised hierarchical clustering with 10,376 genes. We identified 3,084 grade-associated genes, correlated with tumor progression, using one-way ANOVA and a one-versus-all unpooled t test. Functional assignment of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC. Using both diagonal linear discriminant analysis and support vector machines, we identified 240 genes that could accurately classify tumors according to histological grade, especially when attempting to discriminate LGDNs, HGDNs, and grade 1 HCC. In conclusion, a clear molecular demarcation between dysplastic nodules and overt HCC exists. The progression from grade 1 through grade 3 HCC is associated with changes in gene expression consistent with plausible functional consequences. Supplementary material for this article can be found on the HEPATOLOGY website (http://www.interscience.wiley.com/jpages/0270-9139/suppmat/index.html). (HEPATOLOGY 2005;42:809–818.)

Hepatocelluar carcinoma (HCC) is one of the most common malignancies worldwide. The chronic hepatitis resulting from infection with hepatitis B virus or hepatitis C virus and exposure to carcinogens such as aflatoxin B1 are known as major risk factors for HCC.1 Molecular investigations have recently found that genetic alterations of tumor suppressor genes or oncogenes such as p53, β-catenin, and AXIN1 might be involved in the progression to HCC,2–4 but the frequency of these somatic mutations appears to be low in HCCs. Furthermore, it is unclear how these genetic changes reflect the clinical characteristics of the individual tumors. Therefore, the predominant molecular events underlying HCC in most patients remain unknown.

Because HCC typically develops in close association with pre-existing cirrhosis, it is widely believed that a liver with cirrhosis may contain pre-neoplastic nodules that are in an intermediate stage between nonneoplastic regenerating nodules and overtly malignant HCC.5, 6 These nodular lesions have been designated as “dysplastic nodules” by the International Working Party and are further divided into low-grade dysplastic nodules (LGDNs) and high-grade dysplastic nodule (HGDNs) depending on the degree of cytological or architectural atypia on histological examination.7 That these nodules frequently contain one or more microscopic foci of HCC suggests that dysplastic nodules, especially HGDNs, might be precancerous lesions of HCC.8, 9 Some investigators have adopted the concept of early HCC (eHCC)—sometimes referred to as “carcinoma in situ” or “microinvasive carcinoma” of the liver—which is characterized by a small tumor mass lacking invasive growth properties such as vascular invasion or intrahepatic metastasis.10–12 However, there remains considerable controversy as to whether eHCC should be regarded as frank cancer or as a form of HGDN.13 As such, the distinction between precancerous and cancerous lesions remains debatable, and the developmental process from pre-neoplastic lesion to overt HCC is still unclear.

HCC can be classified into four different histological grades, known as Edmondson grades 1 through 4, which generally correspond to well-differentiated, moderately differentiated, poorly differentiated, and undifferentiated types of HCC, respectively.14, 15 Most cancer nodules less than 1 cm in diameter consist of well-differentiated cancerous tissues and are completely replaced by less well-differentiated cancerous tissues when the tumor size reaches a diameter of approximately 3 cm.16 As such, tumor de-differentiation and increasing tumor size are thought to reflect a continuum of morphological change in a multistep hepato-carcinogenesis process, but the molecular underpinnings of this are largely unknown.

Recently, DNA microarray technology has enabled the genome-wide analysis of gene transcript levels, and as such has yielded great insight into the molecular nature of cancer. Although several reports have described tumor-associated molecular expression profiles of liver cancers,17–20 little insight into the molecular nature of early or multistep hepato-carcinogenesis has been gained. To better understand this multistep process at the molecular level, we analyzed global transcript levels in the context of three different histological grades of HCC as well as precancerous LGDNs and HGDNs using a high-density spotted oligo-nucleotide microarray.

In the present study, we show that an extensive and remarkably reproducible expression signature comprised of several thousand genes underlies the procress of HCC progression. For the majority of genes, messenger RNA levels were either positively or negatively correlated with tumor progression. Functional analysis of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC, including cell proliferation, protein synthesis, and hepatocyte-specific functions. We also observed altered expression of known tumor suppressor genes and oncogenes that potentially contribute to this process. Additionally, we identified a subset of progression-associated genes that could accurately classify tumors according to grade, and readily distinguish dysplastic nodules from low-grade cancer.

Abbreviations

HCC, hepatocelluar carcinoma; LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; eHCC, early hepatocellular carcinoma; G1-3, Edmondson grades 1-3; HBV, hepatitis B virus; OVA, one-versus-all; SVM, support vector machine; DLDA, diagonal linear discriminant analysis.

Patients and Methods

Patients and Tissue Preparation.

Primary HCCs, including Edmondson grade 1 (G1), grade 2 (G2), grade 3 (G3), and premalignant lesions of HCC (LGDNs and HGDNs), were obtained from 42 patients who underwent surgical treatment for HCC at Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea. Immediately after hepatectomy, freshly removed livers were serially sliced from the top edge to the bottom edge at 7- to 8-mm intervals and examined by a pathologist for the presence of nodular lesions. Any bulging nodules 10 mm or more in diameter or lesions macroscopically different in color from the surrounding liver, regardless of size, were snap-frozen in liquid nitrogen and stored at −80°C until use. Subsequent sections from the same nodule were fixed in 10% neutral formalin for confirmation of morphological diagnosis. The hematoxylin-eosin–stained sections were examined independently by two pathologists and classified as HCC with different histological grading according to the Edmondson and Steiner method or dysplastic nodules of low or high grade according to the guidelines of the International Working Party. In this way, we obtained a total of 30 HCCs (10 G1, 10 G2, and 10 G3), 10 LGDNs, and 10 HGDNs from 42 patients. To reduce experimental bias, we selected all specimens that had a background associated with cirrhosis and were hepatitis B virus (HBV) seropositive (Supplementary Table 1 ). Approval was obtained from the institutional review boards of the Catholic University of Korea College of Medicine and the Sungkyunkwan University School of Medicine. Informed consent was provided according to the Declaration of Helsinki.

DNA Microarrays.

The Compugen/Sigma Human Oligolibrary (60-mers) representing 18,664 LEADS clusters (Compugen/Sigma-Genosys, Woodland, TX) was spotted onto poly-L-lysine-coated glass microscope slides using an OmniGrid robotic arrayer (GeneMachines, San Calos, CA). All microarrays were manufactured at the Microarray and Expression Genomics Laboratory of the Genome Institute of Singapore essentially according to Eisen and Brown.21

RNA Preparation and Microarray Hybridization.

Total RNA was extracted from frozen tissues using TRIzol reagent following the manufacturer's protocol (Life Technology, Rockville, MD). Human universal reference RNA (Stratagene, La Jolla, CA) was used as the reference RNA. Total RNA (20 μg) was used for DNA target synthesis as described.22 The reference RNA was labeled with Cyanine-3, and the test sample was labeled with Cyanine-5.

Data Processing and Analysis.

All GenePix files were uploaded into the Genome Institute of Singapore microarray database and log expression ratios were normalized using the global median method. Microarray features with signal (foreground) intensities less than 50% above median local background intensity in both channels, and features automatically and manually flagged as “not found” were treated as missing values. Genes with expression values in 70% or more of the tumors within each of the five grades were retained for further analyses. 10,376 of 18,708 probes passed this filter. This probe set was used as the basis for all subsequent analyses. Hierarchical clustering of log ratios was performed using the softwares Cluster and Treeview23; Pearson correlation, mean centering and average linkage were applied in all clustering applications. One-way ANOVA (F test) and one-versus-all (OVA) t test were performed in the R statistical package (http://www.r-project.org/). Support vector machines (SVM) and diagonal linear discriminant analysis (DLDA) were used to assess the classification accuracy of gene classifiers with grade prediction potential.24–26

Class Prediction: Stratified Three-fold Cross-validation.

Classification accuracies were assessed using a stratified three-fold cross-validation scheme. Here the arrays were randomly partitioned into three folds. Any random grouping that resulted in less than two members of any grade in any fold was discarded and resampled. Two folds were used to train the classifier, which was then tested on the remaining fold. At every training/test set selection, p genes were selected. The p genes used in the classifier comprised the p/2 genes most significantly upregulated and the p/2 genes most significantly downregulated for each grade according to the OVA Welch t test results. The three-fold cross-validation process was repeated 100 times, and the mean accuracies were reported.

Results

Large-Scale Gene Expression Alterations Coincide With Different Histological Grade of HCC Progression.

To determine whether global alterations in gene expression could discern histological grade differences ranging from pre-neoplastic lesion to advanced HCC, we examined the expression profiles in a series of 50 hepatocellular nodular lesions from 42 patients treated (see Materials and Methods). These specimens were subsequently hybridized onto spotted oligo-nucleotide microarrays, each containing 18,861 probes representing approximately 18,000 unique genes. Of these, 9 cases were not included in the final dataset as a result of suboptimal average signal intensities (owing to poor RNA quality) or unusually high background fluorescence. Therefore, 41 samples from patients comprised the final dataset, which included 7 LGDNs, 7 HGDNs, and 9 G1, 10 G2, and 8 G3 HCCs. The relevant patient/tissue clinico-pathological variables are provided as supporting information (Supplementary Table 1 ).

First, we performed unsupervised hierarchical cluster analysis on the expression profiles of the 41 hepatocellular nodular lesions with 10,376 genes that passed the basic filtering criteria described in Materials and Methods. This resulted in two predominant tissue clusters: one cluster (CI) that contained all the dysplastic nodules (LGDNs and HGDNs), and a second cluster (CII) that contained all of the G2 and G3 HCCs and a majority of the G1 tumors (6/9) (Fig. 1). Within the CII cluster, all of the G3 HCCs were found together in a single G3-exclusive subcluster, which was flanked by the G2 HCCs that together with the G3s comprised a larger G2-G3 subcluster. The ostensible separation of dysplastic hepatocytes (CI) from HCC (CII), along with the occurrence of grade-specific subclusters in CII, demonstrates that reproducible large-scale changes in gene expression distinguish pre-neoplastic lesions and overt HCC as well as different histological grades of HCC. Of the 9 G1 HCCs, 3 cases (G1-05, -06, and -09) were found to have expression profiles more similar to the pre-neoplastic nodules, while the rest clustered with the overt HCCs. This observation suggests that G1 HCCs are molecularly heterogeneous, sitting on the border between the transition from premalignant lesion to overt malignant carcinoma, and that this heterogeneity is distinguishable at the molecular level. Additionally, we compared a small number of nontumorigenic surrounding tissues (i.e., “normal” tissue) to the HCCs and dysplastic nodules. The expression profiles of the nontumorigenic surrounding tissues consistently clustered apart from the dysplastic nodules, suggesting that dysplasia itself is marked by transcriptional alterations distinct from “normal” liver tissue (Supplementary Fig. 1 ).

Figure 1.

Unsupervised hierarchical clustering of pre-neoplastic lesions and primary HCCs can accurately partition tissues according to malignancy status and high tumor grade. (A) Two-dimensional clustergram of the 10,376 genes selected with minimal filtering criteria (see Materials and Methods). Each row represents a tumor profile; each column represents a probe's measurements. The color saturation reflects the difference in expression between the tissue specimen and the common reference RNA. (B) Tissue dendrogram derived from clustering using the 10,376 gene set. Note that the two dominant clusters, cluster I (CI) and cluster II (CII)—with the exception of G1 HCCs—accurately partition the pre-neoplastic and malignant tissues, and that the CII cluster is further subdivided into branches that are largely grade-specific. LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Identification and Pattern Analysis of HCC Grade-Associated Genes.

To study in detail the genes most often correlated with tumor progression (referred to henceforth as “grade-associated” genes), we identified all genes associated with grade at P < .001 by either one-way ANOVA (F test) or OVA unpooled t test.26 The F test assigns the greatest significance to genes with expression profiles that show continuous variation among classes (e.g., grades), while the OVA t test assigns greater significance to genes with expression profiles that clearly distinguish one class from the rest. Consequently, the F test is biased toward selecting genes with profiles that progressively change from one class to the next, while the OVA t test is biased toward selection of those that show class-specific expression spikes. Thus, gene selection based on this combination of statistical measures allows for greater discovery of differentially expressed genes as it takes advantage of the inherent differences between the F test and the t test. We obtained 2,423 and 3,118 probes significant at P < .001 by F test and at least one of the five OVA t tests, respectively. After removing redundant discoveries, we were left with 3,084 probes with nonredundant gene identities with an expected maximum occurrence of false discoveries of 63 genes [10,376 × (0.001 + 0.001 × 5)]. Hierarchical clustering of these genes in Fig. 2 shows that the predominant grade-associated expression profiles are those with either positive or negative correlations with grade, rather than genes with spiking expression at precise stages of progression.

Figure 2.

Clusters of grade-associated genes with biological implications. Hierarchical clustergram of 3,084 gene expression patterns with significant associations with grade (P < .001; F test and/or OVA t test) is shown (left). Three clusters of highly correlated genes are shown: cell cycle genes (cluster 1, top), genes involved in protein synthesis (cluster 2, middle), and genes involved in liver-specific functions (cluster 3, bottom). Unigene names are given (Unigene build #161). Note that the majority of genes show gradual but continuous change from low dysplastic expression to high malignant expression, or high dysplastic expression to low malignant expression. LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Biological Properties of Grade-Associated Genes.

Hierarchical cluster analysis of the 3,084 grade-associated genes revealed several clusters of particularly highly correlated genes with ostensible biological implications, suggesting the coordination of certain biological activities with HCC progression. Genes of the top cluster (cluster 1) shown in Fig. 2 are characterized by a gradual increase in transcript levels, with the highest levels found consistently in G3 tumors. Using Gene Ontology terms, we observed enrichment in this cluster for genes associated with cell cycle functions, including numerous genes involved in DNA replication, chromatin remodeling, and cell proliferation. Comparative analysis of this cluster with the human cell cycle gene list defined by Whitfield et al.27 revealed further involvement of genes having periodic expression during the cell cycle (Fig. 2, cluster 1). Cluster 2 (middle cluster) is characterized by expression patterns showing a gradual increase from pre-neoplastic lesion to G2 HCC, followed by a relatively sharp increase in transcript abundance in most G3 tumors. These genes, which had the highest overall “within-cluster” correlation, are comprised predominantly of genes directly involved in protein synthesis, including ribosomal proteins, translation initiation, and elongation factors and constituents of the spliceosome. Finally, the genes comprising cluster 3 (Fig. 2, bottom cluster) are characterized by a gradual but large-magnitude decline in transcript levels from LGDNs to high-grade HCC, and are made up mostly of genes that have central roles in primary liver function or are expressed exclusively in hepatocytes. These include genes involved in fatty acid and lipid metabolism, detoxification pathways, and synthesis of complement and coagulation factors, suggesting a gradual loss of normal hepatocyte function coincident with progressive cellular de-differentiation.

Further examination of the grade-associated genes revealed a number of genes that, through altered expression, may contribute directly to the increasing malignant behavior of advancing HCC. Figure 3 shows representative function-associated clusters. For example, the top cluster shows 24 such genes known or suspected to play roles in oncogenic transformation or tumor suppression. In addition, several growth factors, genes involved in apoptosis, and cell adhesion molecules that might have potential roles in HCC development and progression through altered expression were also extracted via categorical analysis using Gene Ontology as shown in Fig. 3.

Figure 3.

Expression patterns of grade-associated genes with possible roles in HCC pathogenesis. Grade-correlated genes were classified according to Gene Ontology terms or through a search of the literature. Subsets of these genes are shown for the following categories: (1) oncogenes and tumor suppressors, (2) growth factors, (3) apoptosis, and (4) cell adhesion. Unigene cluster symbols and names are shown (Unigene build #161). LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; G1-3, Edmondson grades 1-3.

Grade-Associated Genes Predict Stage of HCC Progression.

We next sought to determine whether we could identify a subset of genes that could accurately classify the specimens according to grade. We addressed two classification problems: (1) discriminating among LGDNs, HGDNs, or G1, and (2) discriminating among G1, G2, or G3, because these are the most relevant problems in HCC diagnosis. Treating these as one five-grade problem could limit the use of some genes that otherwise might perform well in the two smaller clinically relevant problems.

We compared three different classification methods—DLDA, SVM,24 and k-nearest neighbor28—for each of the two classification problems. Of note, approximately 4% of the values were missing and were therefore imputed according to the k-nearest neighbor imputation method.28 Classification accuracies were assessed using stratified three-fold cross-validation with 100 repetitions (see Materials and Methods).

The number of genes for each grade, p, was varied to find the optimal number of gene classifiers. Figure 4A shows the plot of classification accuracy as p was varied for DLDA and SVM classifiers. We found DLDA and two varieties of SVM (ie, linear and RBF kernels to be the most robust classifiers for both problems. (Note: because the k-nearest neighbor classifiers had inferior performance, only the SVM and DLDA accuracies are shown.) These results suggested that using 30 to 50 genes per grade was optimal. We therefore decided to use 40 genes (the 20 most significantly upregulated and the 20 most significantly downregulated in each grade), resulting in 120 total genes for each problem (Fig. 4B-C)—that is, 120 “early-stage” genes for discriminating among early-stage samples, and 120 “late-stage” genes for discriminating among late-stage samples (Supplementary Tables 2 and 3 ) for a total of 240 grade classifier genes. As shown in Fig. 4, we would expect an approximately 95% classification accuracy in discriminating between early stage samples and an approximately 91% classification accuracy in discriminating between late-stage samples.

Figure 4.

Determination of optimal classification strategies and high-accuracy gene classifiers. (A) Classification accuracies as a function of gene (classifier) number are shown for each of three different classification methods (DLDA, SVM linear, and SVM radial), for each of the two grade problems: G1-G2-G3 and LGDN-HGDN-G1. (B,C) Genes were classified according to OVA t test–derived P values for the LGDN-HGDN-G1 problem (B) and the G1-G2-G3 problem (C), and the top 20 most highly expressed and 20 most underexpressed genes in each grade class (i.e., 40 genes per grade class) were selected to constitute a 120-gene classifier for each problem. G1-3, Edmondson grades 1-3; LGDN, low-grade dysplastic nodule; HGDN, high-grade dysplastic nodule; DLA, discriminant linear analysis; SVM, support vector machine.

The prediction confidence of a specimen can be assessed by the frequency with which it is correctly classified in 100 random partitions in three-fold cross-validation. A summary of the frequency of class assignments using the early- and late-stage genes is tabulated in Table 1. For the early-stage samples, 100% of the specimens were correctly classified the majority of the time by all three methods. For the late-stage samples, all but two arrays were classified correctly the majority of the time by all three methods. Specimens G2_02 and G3_08 were consistently misclassified into the adjacent lower grade.

Table 1. LGDN-HGDN-G1 Confusion Matrix and G1-G2-G3 Confusion Matrix of the DLDA and SVM Classifiers
SampleDiagonal LDASVM LinearSVM RadicalSampleDiagonal LDASVM LinearSVM Radical
LGDNHGDNG1LGDNHGDNG1LGDNHGDNG1G1G2G3G1G2G3G1G2G3
  • Abbreviation: LDA, linear discriminant analysis.

  • *

    Indicates misclassified sample.

LGDN_02955097309640G1_01623718515086140
LGDN_03653508020076240G1_0394608218087130
LGDN_05100001000010000G1_04991099109820
LGDN_07100001000010000G1_05100001000010000
LGDN_08991099109910G1_06100001000010000
LGDN_09991098209730G1_078812095509820
LGDN_1010000100009910G1_088911097309550
HGDN_0119363574036235G1_091000099109910
HGDN_020100009640982G1_10100001000010000
HGDN_05010000100001000G2_01095509730946
HGDN_060100009820973G2_02*574306040066340
HGDN_07010000100001000G2_030100049333916
HGDN_08010000100001000G2_04199059415950
HGDN_100100019903970G2_050100009910991
G1_01001000010000100G2_0609280901008614
G1_03001000010000100G2_0729711184514833
G1_04001000010000100G2_08089110683206238
G1_0521088001001099G2_0909731881118415
G1_060298001000199G2_1019901288018820
G1_07029809910793G3_02001000010000100
G1_08039703970397G3_04017830227802674
G1_090425816931792G3_05001000010000100
G1_10001000010000100G3_06019907930793
          G3_070010002980694
          G3_08*057430653506436
          G3_0902980109001189
          G3_10029805951792

We next extended our validation of the 240 classifier genes to an independent set of specimens consisting of 5 new samples and the 9 samples that were previously excluded from the initial analysis. As shown in Table 2, we were able to correctly classify all 5 of the new fresh samples. Furthermore, despite RNA quality concerns, the majority of the remaining 9 samples (7 of 9) were also classified correctly. These data, though limited by a relatively small test set, suggest that these 240 progression-associated genes could be clinically useful classifiers for assisting diagnosis of all stages of hepato-carcinogenesis.

Table 2. Good Overall Classification by DLDA on Two Independent Datasets
 SampleClassPrediction
  • NOTE. The new dataset is comprised of five previously unanalyzed tumor samples; the excluded dataset includes the nine arrays that were previously excluded from analysis due to suboptimal RNA or hybridization features.

  • *

    Indicates misclassified sample.

New datasetHCC_168G1G1
 HCC_141G2G2
 HCC_143G1G1
 HCC_203HGDNHGDN
 HCC_219G2G2
Excluded datasetLGDN_01LGDNLGDN
 LGDN_04LGDNLGDN
 LGDN_06LGDNLGDN
 HGDN_03HGDNHGDN
 HGDN_04HGDNHGDN
 HGDN_09HGDNHGDN
 G1_02*G1G1
 G3_01*G3G2
 G3_03*G3G1

Discussion

Patients with HCC have a poor prognosis because most HCCs are detected at a stage too late for curative treatment. Therefore, early detection of small HCC or precancerous lesions appears to be the best way to achieve better therapeutic results. However, morphological and molecular features of precancerous lesions are far from being fully elucidated. The terminology of nodular hepatocellular lesions adopted by the International Working Party of the 1995 World Congress of Gastroenterology suggests that there is a continuum in hepato-carcinogenesis that includes low-grade dysplastic nodules, HGDNs, and dysplastic nodules with microscopic foci of HCC, which may enlarge and replace the nodule giving rise to a small HCC, and finally advanced HCC.7, 9 Despite the fact that this group provided several morphological criteria to discriminate between well-differentiated HCC and HGDN and/or LGDN, they acknowledged that a strict line could not be drawn between premalignant and malignant lesions by simple microscopic observation.

The recent advance of DNA microarray technology, a high-throughput method of monitoring gene expression, has made it possible to analyze the expression of thousands of genes at once. Consequently, expression profiling by microarrays has been profitably applied to gene discovery and class determination in human cancers.29 To understand molecular changes associated with the developmental stages of HCC, we assessed gene expression profiles of the different histopathological stages of HCC, including LGDNs, HGDNs, and G1-G3 HCCs, using a high-density spotted oligo-nucleotide microarray analysis. We observed not only the clear separation of dysplastic nodule (CI) from overt cancer (CII) but also grade-specific subclusters of HCC in CII via unsupervised hierarchical clustering analysis (Fig. 1). These results indicate that there is a clear difference in molecular signature between each histological grade in the progression of HCC. However, there is some molecular heterogeneity in G1 HCC. Most G1 HCCs (6/9) had expression profiles showing more relation to the frank carcinoma and clustered adjacent to G2 and G3 HCCs as shown in Fig. 1; however, 3 cases of G1 HCCs (G1-5, G1-6, and G1-9) were grouped into precancerous nodules. Among these 3 cases, G1-9 was confirmed as eHCC upon histopathological review. In fact, histologically defined G1 HCC lesions can be further divided into small HCC with indistinctive margin (eHCC) and small nodular HCC with distinctive margin, with more than half of these encapsulated by a thin fibrous capsule.30 Unlike eHCC, which lacks an invasive growth pattern, the later lesion revealed tumor invasion into the portal vein and intrahepatic metastasis in 27% (G1-5) and 10% (G1-6).31 This heterogeneity strongly suggests that G1 HCC might border between pre-neoplastic lesion and outright carcinoma representing a transition state from dysplasia to carcinoma. Furthermore, after carefully analyzing whether or not the replicative state of HBV infection could influence the resultant expression profiling by using pathological information pertaining to the replicative state of HBV infection (as measured by HBV DNA levels in serum), we were unable to find any significant correlation between our gene expression results and the replicative state of HBV in the samples, suggesting that the replicative state of HBV has little or no measurable effect on gene transcription in our HCC samples.

We also identified 3,084 grade-associated genes whose transcript levels were either positively or negatively correlated with tumor progression through a combination of one-way ANOVA and OVA unpooled t test. Functional analysis of these genes revealed discrete expression clusters representing grade-dependent biological properties of HCC, including cell proliferation, protein synthesis, and hepatocyte-specific function (Fig. 2). Using Gene Ontology terms, we performed categorical analysis according to gene function and extracted a number of well-known genes as tumor suppressor, oncogenes, growth factors, effectors of apoptosis, and cell adhesion molecules involved in cell–cell and cell–matrix interactions whose expression patterns were associated with grade. For example, RARRES3 (retinoic acid receptor responder 3) is a class II tumor suppressor gene (i.e., downregulated in tumorigenesis rather than mutated) with growth suppressive and apoptosis-inducing activity.32 It has previously been found to be downregulated in a manner correlated with progression of B-CLL33 and cellular de-differentiation in colorectal adenocarcinoma,34 consistent with our observation that this gene is downregulated in G2 and G3 HCCs. The majority of oncogenes and tumor suppressors identified here demonstrate expression patterns that systematically change from dysplasia to carcinoma, and in some cases, with alterations in expression already detectable in the pre-neoplastic state. It is therefore interesting to speculate that these genes, acting together or separately, could be directly involved in common pathways of HCC pathogenesis in a grade-dependent fashion.

Due to the recent advances in diagnostic imaging techniques and increased clinical and pathological interest, small hepatocellular nodular lesions even less than 1 cm in size are frequently detected in patients with cirrhosis who have been monitored as high-risk patients. These nodules could be LGDNs, HGDNs, or well-differentiated small HCCs and sonographic- or CT-guided needle biopsies from these nodules are performed routinely for differential diagnosis. However, it is often difficult—even for a hepatopathologist—to differentiate among these lesions, especially in needle-biopsied specimens with limited material. For this reason, the discovery of an objective molecular marker or classifier genes that will help to standardize histological differential diagnosis of these nodules and lead to appropriate treatment is eagerly anticipated. In the present study, we identified a subset of genes that could accurately classify specimens according to histological grade. We considered these as two separate problems rather than one five-grade problem: (1) discriminating among LGDNs, HGDNs, or G1 HCCs (early-stage lesions) and (2) discriminating among G1, G2, or G3 HCCs (late-stage lesions). We selected the top 20 most highly expressed and 20 most under-expressed genes in each grade class (i.e., 40 gene per grade class) resulting in 120 total genes for each problem. Indeed, of 23 pre-neoplastic lesions and G1 HCCs, none was misclassified by a chosen set of 120 early-stage–associated genes. And 2 samples (2/27) were misclassified in the case of overt HCC (Table 1). We extended our confidence validation analysis for the 240 outlier genes to an independent set of specimens consisting of 5 new samples and the 9 samples that were previously excluded for analysis because of RNA quality concerns. We were able to correctly classify all samples except for two G3 cases, which were misclassified as G2 and G1, respectively. Although more testing on a larger, independent set of tumors graded by a different pathologist will be necessary to establish the accuracy and clinical value of the classifier, this implied that a series of 240 outlier genes could potentially be good classifiers especially for distinguishing among LGDNs, HGDNs, and G1 HCCs via both DLDA and SVM.

In conclusion, it is true that despite numerous investigations of hepato-carcinogenesis, only limited or incomplete data are available regarding gene expression profiles during the development and progression of HCC in humans.18, 19, 35–37 The systemic approaches such as the simultaneous evaluation of genome-wide transcripts and regulatory pathways in precancerous lesions and HCCs are necessary to gain much-needed molecular insight into hepato-carcinogenesis. We uncovered the molecular signatures of pre-neoplastic lesions and early- and advanced-stage HCC. Our 240 classifier genes for distinguishing the early and advanced stages of HCC exhibited high fidelity in classification from pre-neoplastic lesions to HCCs. Through further informative analysis of these outlier genes and intensive clinical validations, we hope to identify clinically useful biomarkers that will facilitate early detection of liver cancer, and perhaps further elucidate the underlying molecular pathology of HCC.

Ancillary