Identification of discriminators of hepatoma by gene expression profiling using a minimal dataset approach



The severity of hepatocellular carcinoma (HCC) and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Previous microarray analyses of HCC were restricted to the selected tissue sample sets without validation on an independent series of tissue samples. We describe an approach to the identification of a composite discriminator cassette by intersecting different microarray datasets. We studied the global transcriptional profiles of matched HCC tumor and nontumor liver samples from 37 patients using cDNA (cDNA) microarrays. Application of nonparametric Wilcoxon statistical analyses (P < 1 × 10−6) and the criteria of 1.5-fold differential gene expression change resulted in the identification of 218 genes, including BMI-1, ERBB3, and those involved in the ubiquitin-proteasome pathway. Elevated ERBB2 and epidermal growth factor receptor (EGFR) expression levels were detected in ERBB3-expressing tumors, suggesting the presence of ERBB3 cognate partners. Comparison of our dataset with an earlier study of approximately 150 tissue sets identified multiple overlapping discriminator markers, suggesting good concordance of data despite differences in patient populations and technology platforms. These overlapping discriminator markers could distinguish HCC tumor from nontumor liver samples with reasonable precision and the features were unlikely to appear by chance, as measured by Monte Carlo simulations. More significantly, validation of the discriminator cassettes on an independent set of 58 liver biopsy specimens yielded greater than 93% prediction accuracy. In conclusion, these data indicate the robustness of expression profiling in marker discovery using limited patient tissue specimens as well as identify novel genes that are highly likely to be excellent markers for HCC diagnosis and treatment. Supplementary material for this article can be found on the HEPATOLOGY website ( (HEPATOLOGY 2004;39:944–953.)

Hepatocellular carcinoma (HCC) is the most common primary malignant tumor of the liver that accounts for more than 70% of liver cancers worldwide.1 A major risk factor associated with the development of HCC is hepatitis B virus (HBV) infection. Death is usually due to liver failure associated with cirrhosis and/or rapid outgrowth of multiple nodules. Approximately 0.25 to 1 million new cases of HCC are diagnosed each year, and the cancer is especially prevalent in Southeast Asia, China, and sub-Saharan Africa. Although surgical resection is considered to be the main curative treatment, only 10% to 15% of cases are suitable for surgery at the time of presentation. This is because either the disease is detected at an advanced stage at presentation or the underlying poor liver functional reserve precluded surgical intervention.

The lack of molecular markers that characterize tumor formation poses a major problem to effective diagnosis and prognosis of HCC. Current diagnosis of HCC relies on the presence of a liver mass on radiologic investigations and the detection of an elevated level of serum alpha fetoprotein (AFP).2 However, an elevated level of AFP is not exclusive to HCC and has been observed in benign hepatic disease, such as cirrhosis and other cancers such as germ cell cancer.3 Treatment of HCC includes the use of interferon therapy and antiviral drugs, but the results are unpredictable and the effectiveness may be limited.2, 4 Genome-wide analysis by microarray5 offers a systematic approach to uncover comprehensive information about the transcription profile of HCC. Previous studies have used microarrays to address the changes in gene expression of HCC.6–11 One major study was published recently by Chen et al.12 that investigated more than 100 liver tissue specimens. However, these reports were restricted to the tissue samples selected for each study and there was an absence of validation of their findings on an independent series of tissue samples, limiting the potential significance and utility of the data.

In the current study, we used complementary DNA (cDNA) microarrays to examine the global cellular changes in matched pairs of HBV-associated HCC tumor and nontumor liver tissue specimens of 37 patients. In addition, gene expression patterns between primary HCC tumors and liver cancer cell lines were examined for possible biologic variation. A comparison was performed with other independent microarray studies of HCC in an attempt to identify a composite cassette of discriminator genes that could potentially serve as tumor markers. To validate the utility of these discriminator cassettes to distinguish tumor from nontumor, prediction accuracy was assessed on an entirely independent set of 58 liver biopsy samples. These experiments indicated that array-based expression profiling on limited tissue sets generates robust data and identified novel molecular markers that are highly likely to be excellent markers for HCC diagnosis and targets for new disease management strategies.


HCC, hepatocellular carcinoma; HBV, hepatitis B virus; AFP, alpha fetoprotein; cDNA, complementary DNA; kNN, k-nearest neighbor; RT-PCR, reverse-transcription polymerase chain reaction; IGFBP3, insulin-like growth factor binding protein 3; EGFR, epidermal growth factor receptor; PBGD, porphobilinogen deaminase.

Patients and Methods

RNA Isolation, RNA Amplification, and cDNA Microarray Hybridization.

All 37 patients (from whom the test set of tissue specimens was derived) had HBV-associated HCC and underwent curative liver resection. The paired samples of tumor and corresponding nontumor tissue specimens were obtained from the resected liver specimen. The validation tissue set comprised 58 liver biopsy samples from an independent cohort of 29 patients who also had HCC associated with HBV and underwent curative liver resection. Informed consent was obtained from the patients. The institutional research and ethics committee approved the study.

Tissue specimens were snap frozen in liquid nitrogen and stored at −150°C. A small section of each specimen was sampled and total RNA was isolated using Trizol reagent (Life Technologies, Bethesda, MD) according to the manufacturer's instructions. The integrity of the RNA specimen was verified by gel electrophoresis. The human liver cancer cell lines used in this study were PLC/PRF/5, HA22T, Huh1, Huh4, Tong, Hep3B, SNU182, SNU449, SNU475, HepG2, Huh6, Huh7, SKHep1, and Mahlavu. All cell lines were cultured under conditions recommended by the American Type Culture Collection (VA).

Because the amount of total RNA sample obtained from the limited tissue material was insufficient (approximately 4-15 μg) to be put directly on the array, RNA was linearly amplified using a procedure modified from Eberwine et al.13 Human universal reference RNA (Stratagene, La Jolla, CA), which comprised total RNA samples from 10 different human cell lines, was amplified and used as the reference for cDNA microarray analysis. Approximately 9,000 human cDNA features (Incyte Genomics, Palo Alto, CA) were spotted onto poly-L-lysine–coated slides using an OmniGrid arrayer (GeneMachines). Probes were generated from the amplified RNA material and hybridized to the chip as described previously.14 To minimize the effects of labeling biases, reciprocal dye swap labeling experiments were performed for each sample.

Data Analysis.

Raw data were analyzed on GenePix analysis software version 3.0 (Axon Instruments, Burlingame, CA) and uploaded to a relational database. The cDNA clones used for the microarray are represented by their UniGene identifiers. For each array, the logarithmic expression ratios for the spots were normalized. In addition, spots that did not meet our filtering criteria (see supplementary material for this article on the HEPATOLOGY website ( were excluded, resulting in the inclusion of 8,716 features for subsequent analysis. Statistical comparison of genes between HCC tumor and nontumor specimens was performed by the Wilcoxon rank-sum nonparametric test. To evaluate gene expression patterns, hierarchical clustering using one minus Pearson's correlation metric and average linkage,15 and multidimensional scaling was performed on normalized data (mean = 0, SD = 1). Functional characterization of genes was based on gene ontology16 and on other published works in the PubMed database (

Three different sets of Monte Carlo simulations17 were performed to (1) measure the quality of a set of selected gene features to be used as potential markers (Pa); (2) determine whether the set of genes observed to have a good performance as tumor discriminators, could appear merely by chance (Pb); and (3) approximate the significance of the number of observed overlapping genes after intersection of the important gene lists derived from two independent groups (i.e., the current study and the Chen et al. study; Pc). To validate the utility of the various expression cassettes to distinguish HCC tumor from nontumor liver specimens, the prediction accuracy of each discriminator cassette was assessed on an independent tissue set comprising 58 liver clinical biopsy specimens from 29 patients using a k-nearest neighbor (kNN) classification algorithm (k = 3) that employs Pearson correlation to measure the similarity between expression profiles. The algorithm was trained against the dataset comprising 74 tissue samples from 37 patients before testing against the new tissue set.

Real-Time Semiquantitative Reverse-Transcription Polymerase Chain Reaction (RT-PCR).

Total RNA samples were analyzed for the expression levels of selected genes by real-time semiquantitative RT-PCR using the LightCycler RNA amplification kit SYBR Green I on the LightCycler (Roche, Basel, Switzerland) according to the manufacturer's instructions. Data are presented as the level of gene expression in each HCC tumor specimen relative to its corresponding nontumor liver specimen.


Assessment of Global Gene Expression Differences Between HCC Tumor and Nontumor Liver Specimens.

The gene expression patterns of primary HCC tumor specimens and the corresponding nontumor liver tissue specimens from 37 patients were examined by cDNA microarray. First, we assessed the overall natural patterns of gene expression in the HCC tumor and nontumor liver tissue specimens based on unsupervised hierarchical clustering. ANOVA in expression levels for each gene across all the tissue specimens indicated that 500 gene features (containing 493 unique UniGenes) showed the largest variability across both HCC tumor and nontumor liver tissue specimens (Fig. 1). Included in this list are AFP, an often used prognostic marker for HCC, and other genes associated with HCC such as HGF and MYC. Hierarchical clustering analysis based on these highly variant genes showed two main clusters, one representing the HCC tumor specimens and the other, the nontumor liver tissue specimens with only 6 of 37 HCC tumor specimens misclassified as nontumors (Supplementary Fig. 1B). Thus, the molecular configuration of HCC can be readily distinguished from that of nontumor liver specimens with minimal data manipulation.

Figure 1.

Natural patterns of gene expression differences between HCC tumor and nontumor liver tissue specimens based on unsupervised clustering. Plot showing the variance of expression value for each of the gene features across all the HCC tumor and nontumor liver tissue specimens. Dotted line indicates the 500 most variable gene features.

Second, to investigate differential gene expression patterns between HCC tumor and nontumor liver specimens, we used the Wilcoxon rank-sum test and identified the top 2.5% candidate genes that displayed the smallest (best) P value scores (P < 1 × 10−6) and at least a 1.5-fold change in gene expression. For these 218 genes, false discovery rate analysis indicates a false-positive error of less than 0.4%. Multidimensional scaling analysis based on these outliers indicated that the HCC tumors were a more heterogeneous population than the nontumor liver tissue specimens (Fig. 2). Cancer cell lines derived from the primary tumor have traditionally been used as in vitro model systems for investigating the function of genes in the in vivo tumor environment. We asked how the expression pattern of the same 218 genes would appear in 14 established human liver cancer cell lines. It is apparent that the cell lines exhibited gene expression profiles that were different from the clinical HCC tumor and nontumor liver tissue specimens (Fig. 2), suggesting that these cell lines may have accumulated additional genetic or epigenetic alterations in culture.

Figure 2.

Significant gene differential expression between HCC tumor and nontumor liver tissue specimens and comparison with liver cancer cell lines (P < 1 × 10−6, approximately 1.5-fold change). Multidimensional scaling plot illustrates the ability of these 218 outlier genes to separate HCC tumor specimens (red circles) from nontumor liver tissue specimens (green circles). The multidimensional plot also shows how different liver cancer cell lines (yellow circles) are from the clinical tissue samples.

Identification of Gene Clusters Differentially Expressed in HCC Tumor Tissue Specimens.

Among the 218 significant genes that distinguished HCC tumors from nontumor liver tissue specimens, more genes were observed to be overexpressed than underexpressed in the malignant tissue specimens (Supplementary Fig. 1). Mapping of the chromosomal location of these 218 unique outliers indicated that a disproportionate number of genes was located on chromosome 1 (Fig. 3A), particularly in the 1q region, and that the majority of these genes were more highly expressed in the tumor tissue specimens. This result correlated well with previous reports of frequent amplification on chromosomal 1q in HCC tumor specimens.18 Further characterization of these outlier genes revealed many genes that have yet to be fully understood for their roles in HCC. A substantial proportion of genes were involved in transport (e.g., PEA15), RNA processing (e.g., RDBP), and metabolic processes (e.g., NME1) and showed increased expression in HCC tumor specimens, possibly indicating accelerated rates of metabolism (Fig. 3B, Table 1, Supplementary Table 2). Several genes (e.g., SMT3H1) were members of the ubiquitin-proteasome pathway, suggesting considerable deregulation of this pathway in HCC. Transcription factors (e.g., ESR1) and genes involved in controlling growth and differentiation (e.g., GRN), and signal transduction (e.g., CSTB) formed the other dominant gene groups. Notably, the polycomb group protein BMI-1, which is believed to be an oncogene, was consistently expressed at much higher levels in HCC tumor specimens (Fig. 4). BMI-1 expression level is elevated in various tumors including Hodgkin's disease19 but it has not been previously studied in HCC. In addition, there were a number of genes (e.g., MAWBP, AD24) that had no known functions. Genes that have been studied in HCC previously and were represented in our outlier list included HDGF and GHR.20, 21 A number of the differentially expressed genes such as MDK and CDC23 are consistent with those reported for cancer cells.22, 23 It is noteworthy that the list of outlier genes did not include AFP, which was elevated in a small number of our HCC tumor samples. This agrees well with previous reports that AFP levels were variably elevated in HCC12 and that AFP was present in approximately 50% of HCC.24

Figure 3.

Characterization of differentially expressed genes in HCC tumor specimens (P < 1 × 10−6, approximately1.5-fold change). (A) Chromosomal distribution of the 218 outlier genes. The dark-colored and light shaded bars represent genes that are at least 1.5-fold up-regulated and down-regulated, respectively, in HCC tumor specimens relative to nontumor liver samples. (B) Functional characterization of the outlier genes based on gene ontology and published works.

Table 1. Selected Genes Significantly Differentially Expressed in HCC Tumors
FunctionGene SymbolGene NameUniGeneExpression Change in HCC Tumor*
  • *

    Gene expression level showing at least 1.5-fold change in HCC tumor relative to non-tumor liver tissues (P < 1 × 106). (The full gene list with GenBank accession numbers is available in the supplementary data.)

Transcription factorILF2Interleukin enhancer binding factor 2, 45 kDHs. 75117
 BMI1Murine leukemia viral (bmi-1) oncogene homologHs. 431
 TAF9TAF9 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 32 kDHs. 60679
 ZNF146Zinc finger protein 146Hs. 301819
 CHD4Chromodomain helicase DNA binding protein 4Hs. 74441
 NR4A1Nuclear receptor subfamily 4, group A, member 1Hs. 1119
 ZNF238Zinc finger protein 238Hs. 69997
 FOSBFBJ murine osteosarcoma viral oncogene homolog BHs. 75678
RNA processingH2AFYH2A histone family, member YHs. 75258
 SNRPBSmall nuclear ribonucleoprotein polypeptides B and B1Hs. 83753
 RPS7Ribosomal protein S7Hs. 301547
 MRPS14Mitochondrial ribosomal protein S14Hs. 247324
 SNRPD2Small nuclear ribonucleoprotein D2 polypeptide (16.5 kD)Hs. 53125
 NCLNucleolinHs. 79110
 RPS10Ribosomal protein S10Hs. 76230
 RPL6Ribosomal protein L6Hs. 349961
 SNRPESmall nuclear ribonucleoprotein polypeptide EHs. 334612
 SF3B4Splicing factor 3b, subunit 4, 49 kDHs. 25797
 RDBPRD RNA-binding proteinHs. 106061
 SNRPFSmall nuclear ribonucleoprotein polypeptide FHs. 105465
 RPS10Ribosomal protein S10Hs. 76230
DNA replication/repairADPRTADP-ribosyltransferase (NAD+; poly(ADP-ribose) polymerase)Hs. 177766
 PRKDCProtein kinase, DNA-activated, catalytic polypeptideHs. 155637
 SMC4L1SMC4 (structural maintenance of chromosomes 4, yeast)-like 1Hs. 50758
 FEN1Flap structure-specific endonuclease 1Hs. 4756
 MCM2Minichromosome maintenance deficient (S. cerevisiae) 2 (mitotin)Hs. 57101
 HAT1Histone acetyltransferase 1Hs. 13340
Cell growth/differentiationGPC3Glypican 3Hs. 119651
 MDKMidkine (neurite growth-promoting factor 2)Hs. 82045
 HDGFHepatoma-derived growth factor (high-mobility group protein 1-like)Hs. 89525
 TP53BP2Tumor protein p53-binding protein, 2Hs. 44585
 CDC23CDC23 (cell division cycle 23, yeast, homolog)Hs. 153546
 IGFBP3Insulin-like growth factor binding protein 3Hs. 77326
Immune responseTMPOThymopoietinHs. 11355
 IGKCImmunoglobulin kappa constantHs. 156110
 IGHG3Immunoglobulin heavy constant gamma 3 (G3m marker)Hs. 300697
 IGJImmunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptidesHs. 76325
Cell adhesionLAMR1Laminin receptor 1 (67 kD, ribosomal protein SA)Hs. 181357
 CAPZA2Capping protein (actin filament) muscle Z-line, alpha 2Hs. 75546
 ARHERas homolog gene family, member EHs. 6838
Signal transductionCAP2Adenylyl cyclase-associated protein 2Hs. 296341
 CALM2Calmodulin 2 (phosphorylase kinase, delta)Hs. 182278
 LASP1LIM and SH3 protein 1Hs. 334851
 SHC1SHC (Src homology 2 domain-containing) transforming protein 1Hs. 81972
 RGS5Regulator of G-protein signalling 5Hs. 24950
 HAX1HS1 binding proteinHs. 15318
 ERBB3v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3Hs. 199067
Ubiquitin-proteasome pathwayUBDDiubiquitnHs. 44532
 USP14Ubiquitin specific protease 14 (tRNA-guanine transglycosylase)Hs. 75981
 PSMA1Proteasome (prosome, macropain) subunit, alpha type, 1Hs. 82159
 PSMB4Proteasome (prosome, macropain) subunit, beta type, 4Hs. 89545
TransportCCT5Chaperonin containing TCP1, subunit 5 (epsilon)Hs. 1600
 CCT3Chaperonin containing TCP1, subunit 3 (gamma)Hs. 1708
 HSPA5Heat shock 70 kD protein 5 (glucose-regulated protein, 78 kD)Hs. 75410
 XPO1Exportin 1 (CRM1, yeast, homolog)Hs. 79090
 NUCB2Nucleobindin 2Hs. 3164
 ATP6IP1ATPase, H+ transporting, lysosomal interacting protein 1Hs. 6551
 AP3S1Adaptor-related protein complex 3, sigma 1 subunitHs. 80917
 VDAC2Voltage-dependent anion channel 2Hs. 78902
MetabolismNME1Non-metastatic cells 1, protein (NM23A) expressed inHs. 118638
 DPM1Dolichyl-phosphate mannosyltransferase polypeptide 1, catalytic subunitHs. 5085
 ACLYATP citrate lyaseHs. 174140
 GCN1L1GCN1 (general control of amino-acid synthesis 1, yeast)-like 1Hs. 75354
 NAT2N-acetyltransferase 2 (arylamine N-acetyltransferase)Hs. 2
 CYP2C8Cytochrome P450, subfamily IIC (mephenytoin 4-hydroxylase), polypeptide 8Hs. 174220
 CYP2ECytochrome P450, subfamily IIE (ethanol-inducible)Hs. 75183
UnknownC20orf24Chromosome 20 open reading frame 24Hs. 184062
 C1orf9Chromosome 1 open reading frame 9Hs. 108636
 LOC51235Hypothetical proteinHs. 181444
 FLJ12666Hypothetical protein FLJ12666Hs. 23767
Figure 4.

Expression of BMI-1 in HCC tumor specimens as determined by cDNA microarray analysis. Data are presented as the level of expression (log base 2) in each HCC tumor specimen with respect to the corresponding nontumor liver sample.

To validate our microarray data, real-time RT-PCR analysis was performed for insulin-like growth factor binding protein 3 (IGFBP3) and ERBB3 in all the 37 matched HCC tumor and nontumor liver samples. The results of real-time RT-PCR analyses of IGFBP3 and ERBB3 were consistent with previous reports examining these individual markers.25, 26 For example, IGFBP3 expression was diminished in 35 of 37 HCC tumors relative to their corresponding nontumor liver tissue specimens (Fig. 5A), whereas ERBB3 expression was elevated in 34 of 37 tumor samples (Fig. 5B). We observed good concordance in fold changes between microarray data and real-time RT-PCR results and the magnitude of fold change obtained by PCR was similar or higher (as much as a fivefold in some cases, given the sensitivity of PCR). Because ERBB3 is defective in tyrosine kinase activity and requires dimerization with other receptors, possibly another member of the ERBB family,27 we tested the hypothesis that HCC tumors expressing high levels of ERBB3 were associated with high expression of ERRB2 or epidermal growth factor receptor (EGFR). The expression of ERBB2 was elevated in 12 of 37 tumor specimens, whereas high EGFR expression was found in 15 of 37 tumor specimens (Fig. 5B). With the top 50 percentile of high ERBB3-expressing HCC tumors, we found a significant concomitant increase in ERBB2 expression (t-test P = approximately .0026) but no association with high EGFR expression (t-test P = approximately .31). Clearly, the cognate partners of ERRB3 were present in tumor specimens expressing high levels of ERBB3.

Figure 5.

Real-time RT-PCR analysis of IGFBP3, ERBB3, ERBB2, and EGFR in HCC tumor samples. The gene expression patterns for (A) IGFBP3 and (B) ERBB3, ERBB2, and EGFR in all the 37 HCC tumor specimens and their corresonding nontumor liver tissue specimens were examined. All data were normalized to the amount of housekeeping gene porphobilinogen deaminase and are presented as relative fold expression change (log base 2) in HCC tumor specimens with respect to its corresponding nontumor liver counterpart. A positive value depicts a higher expression level, whereas a negative value depicts a lower expression level in the tumor relative to the nontumor specimen.

Validation of HCC Tumor Discriminator Expression Cassettes.

To assess the validity of our expression cassette of genes for distinguishing HCC tumor from nontumor liver tissue specimens, we explored the intersection of our data with those published in the literature. In the study by Chen et al.,12 HCC tumor specimens from 82 patients and nontumor liver tissue specimens from 74 patients were examined and 1,648 features (containing 1,449 unique UniGenes) were reported to discriminate tumor from nontumor, of which 600 features (containing 540 unique UniGenes) formed the best discriminators. First, we asked whether any of the Chen et al. dataset of 600 most differentially expressed genes was included in our array of 8,716 features (containing 7,521 unique UniGenes) based on UniGene identifiers. A total of 265 features (containing 245 unique UniGenes) from our microarray were observed to overlap (Supplementary Table 3). Hierarchical clustering analysis based on the expression levels of these 265 “overlap” features separated our tissue set into two distinct groups of tumor and nontumor specimens, with five tissue samples misclassified (Fig. 6). Such clustering was significant (Pa < 1 × 10−6) based on random permutation testing of sample labels. The likelihood of a randomly chosen set of 265 features producing five or fewer samples misclassified was low (Pb = 1.5 × 10−3). Therefore, these 265 overlap features could distinguish HCC tumor from nontumor liver specimens with reasonable precision, and the features were unlikely to appear by chance.

Figure 6.

Intersection between the best 600 genes from the Chen et al. list of good discriminators and the GIS microarray expression data set (8,716 features) based on UniGene identifiers. The 265 overlap features (containing 245 unique UniGenes) obtained were applied on the GIS tissue set. Clustering analysis was performed based on the expression levels of these features. The significance of the clustering obtained (Pa) was measured by Monte Carlo sample label permutations. The probability that the observed 265 overlap features could appear merely by chance alone (Pb) was estimated by performing a different series of Monte Carlo simulations (see supplementary information for details).

Second, we asked whether our 218 significant gene list (containing 213 unique UniGenes) was present in the Chen et al. microarray of approximately 23,000 features (containing 17,220 unique UniGenes). A total of 230 features (containing 166 unique UniGenes) from the Chen et al. array were observed to overlap (Supplementary Table 4). Hierarchical clustering analysis based on the expression levels of these 230 overlap features separated the Chen et al. tissue set into distinct tumor and nontumor groups, with four tissue samples misclassified (Fig. 7). Random permutation of sample labels indicated that the clustering was significant (Pa < 1 × 10−6) and it was unlikely that a randomly chosen set of 230 features could produce four or fewer samples misclassified (Pb < 1 × 10−4). These 230 overlap features are therefore able to discern fairly accurately HCC tumor from nontumor liver specimens. We also sort the overlap between our 218 significant gene list and the Chen et al. list of 1,648 good discriminating genes. A total of 68 unique UniGenes overlapped (Supplementary Table 5). The likelihood that the overlap would arise by chance if the two gene lists were totally independent was minuscule (Pc < 1 × 10−8). Therefore, cross-testing the results of each dataset with the other, we were able to validate tumor-nontumor discriminators for HCC diagnosis.

Figure 7.

Intersection between the GIS 218 significant gene list and the Chen et al. microarray expression data set (approximately 23,000 features) based on UniGene identifiers. The 230 overlap features (containing 166 unique UniGenes) obtained were applied on the Chen et al. tissue set. Clustering analysis was performed based on the expression levels of these features. The significance of the clustering obtained (Pa) was measured by Monte Carlo sample label permutations. The probability that the observed 230 overlap features could appear merely by chance alone (Pb) was estimated by performing a different series of Monte Carlo simulations (see supplementary information for details).

Third, to definitively validate the utility of these probe sets to distinguish HCC tumor from nontumor liver tissue specimens, we assessed the accuracy of these four discriminator cassettes on an independent tissue set consisting of 58 liver clinical biopsy specimens from 29 patients. These 58 liver samples were processed separately on the same cDNA microarray platform as with our test set of tissue samples from 37 patients. Using a kNN prediction algorithm, we found that all classifier probe cassettes could readily distinguish HCC tumor from nontumor liver specimens (Table 2). The GIS discriminator list (N = 218) resulted in a predictive accuracy of 93%, four false-negative and no false-positive results. The GIS genes found in the Chen et al. microarray (N = 166) gave three false-negative and no false-positive results for a predictive accuracy of 95%. The discriminators from the Chen et al. study found in the GIS microarray (N = 265) gave three false-negative and no false-positive results in the validation set for a predictive accuracy of 95%. Lastly, the intersect between the GIS and the Chen et al. discriminator probes comprising only 68 genes gave a predictive accuracy of 96%, one false-negative and one false-positive result. Therefore, the gene discriminators of tumor versus nontumor in HCC derived by the intersect analysis of limited tissue sets can be validated in an independent manner.

Table 2. Prediction Accuracy of Gene Classifiers Using k NN Algorithm on 58 Liver Biopsies From 29 Patients
Gene ClassifiersNo. of Gene ClassifiersMisclassification RateNo. of False Negative Cases*No. of False Positive CasesPredictive Accuracy
  • *

    False negative cases refer to HCC tumors which were misclassified as non-tumor livers.

  • False positive cases refer to non-tumor livers which were misclassified as HCC tumors.

GIS discriminating genes2184 of 58493%
GIS genes present in the Chen et al. microarray1663 of 58395%
Chen et al. genes present in the GIS microarray2653 of 58395%
Intersect between GIS and Chen et al. discriminating genes682 of 581196%


Due to the severity of the disease and the lack of good molecular markers for diagnosis and effective treatment strategies of the tumor, HCC remains a major cancer challenge. We present an analysis of gene expression patterns on matched HCC tumor and nontumor tissue specimens from 37 patients and uncovered 218 genes that can distinguish tumor from nontumor liver specimens. Previous work on HCC to identify unique expression signatures uncovered discriminating gene sets. We sought to capitalize on the progressive standardization of the array platforms by testing the performance of overlapping discriminator markers between our array results and those of Chen et al.12 Our results show remarkable concordance with those of Chen et al., despite the differences in patient populations and the technology platforms applied. We then validated these intersecting discriminators with an independent set of 58 HCC tumor and nontumor liver specimens and showed high predictive accuracy for these gene probes. Therefore, markers arising from array-based analysis of limited tissue sets are surprisingly robust. Moreover, we believe that future approaches combining in silico and limited experimental validation are sufficient to uncover biologically meaningful data from clinical expression array studies.

In-depth study of the functions of the useful markers shows that a substantial number of the genes are involved in the ubiquitin-proteasome pathway and are overexpressed in HCC tumor tissue specimens. Two of these genes, UBD and PSMD4, which were highly expressed in the tumor tissue specimens in the current study, are similar to the observations made by other independent array studies.9, 12 In addition to stability, the activities of many proteins such as the tumor suppressor p53 are modulated by ubiquitination.28 The molecular components of the proteasome complex are also involved in other cellular events that do not require proteolysis, such as endocytosis, the localization of certain proteins in the nucleus, and transcription.29, 30 Defects in the ubquitin-proteasome pathway have been observed in a variety of human diseases, including neurodegenerative disease, metabolic disorders, and cancer.31 Recently, it was demonstrated that application of the proteasome inhibitor PS-341 in orthotopic human pancreatic tumor xenografts resulted in inhibition of tumor growth and angiogenesis.32 These data suggest that the ubiquitination pathways may be a legitimate target for HCC therapeutics.

The putative oncogene ERBB3 was consistently present at much higher levels in our set of HCC tumor tissue specimens. This is consistent with findings of an independent study that found high ERRB3 levels in the majority of HCC tumor specimens by immunohistochemistry.26 High ERBB2 expression was found in approximately 41% of the Chen et al. HCC tumor set,12 similar to our findings that approximately 32% of our HCC tumor set overexpressed ERBB2. In addition, there was an association between high ERBB3 and high ERBB2 expression in the HCC tumor specimens in the current study. The physiologic effects of ERBB3 signaling are not completely clear. ERBB3 has been shown to serve as the primary binding site for heregulin in adult rat hepatocytes and whole liver specimens.33 In addition, ERBB3 is capable of eliciting both differentiation and growth responses when transactivated by the other ERBB receptors ERBB2 and EGFR, neither of which binds heregulin.27 These data, together, suggest that drugs targeting the kinase domains of ERBB2 and EGFR or the extracellular domain of ERBB3 may be useful in HCC treatment strategy.

Our study highlights a sizable number of potential marker genes that may be classified as cell adhesion molecules. Alterations in the expression and function of adhesion molecules, including integrins and cadherins, are associated with both tumor suppression and progression in different diseases.34 Some of these genes, such as ITGB1, have been shown to have an essential role in HCC.35 Although little information is available currently for other genes, such as DNCH1, work on other gene members (e.g., DNLC2A and DNLC2B) indicates that their levels are substantially up-regulated and down-regulated, respectively, in HCC tumor specimens compared with their adjacent nontumor liver tissue specimens.36 The current development of antagonists, including cilengitide,37 against integrins signifies the importance of cell adhesion molecules in cancer.

In addition, we showed the polycomb group protein BMI-1 to be a significant outlier that was highly differentially expressed in HCC tumors. Originally identified in Drosophila, polycomb group proteins are evolutionarily conserved chromatin components that are involved in maintaining transcriptional “memory” and are vital in embryogenesis and control of cell identity.38 Conceivably, deregulation of this epigenetic silencing would have deleterious effects on normal development and may result in the development of cancer. This was demonstrated by Varambally et al.39 in their study of prostate cancer. They showed that the polycomb group protein EZH2 is highly expressed in prostate cancer and is involved in the progression of prostate cancer. Recent studies indicate that BMI-1 has an essential role in determining the proliferative activities of normal and leukemic stem cells.40 These findings suggest that the oncogenic potential of BMI-1 may render it an important tumor marker as well as therapeutic target for HCC.

IGFBP3 showed consistent low expression levels across our and other datasets of HCC tumors,6, 12 suggesting a role for the IGF axis in HCC disease presentation. Perturbations of the IGF axis are also implicated in the formation of cancer in organs such as the colon, prostate, and lung.41 How the actions of IGFBP3 are modulated is still unclear, but it is likely to be complex and involve multiple players. Besides signaling through the IGF receptor-dependent pathway, IGFBP3 also mediates signals via other pathways such as transforming growth factor-beta, tumor necrosis factor-alpha, and RXRα.42 GPC3, a member of the heparin sulfate proteoglycans, showed consistent high abundance across HCC tumor specimens in the current study and in the Chen et al.12 datasets. High expression of GPC3 has also been reported in embryonal tumors such as Wilms tumor43 although its expression is reported to be suppressed in breast cancer.44 Heparin sulfate proteoglycans act as coreceptors for heparin-binding growth factors and ultimately lead to stimulation or inhibition of growth factor activity.45 A recent study by Midorikawa et al. determined a role for GPC3 in hepatocarcinogenesis. They showed that GPC3 interacts with FGF2, inhibits BMP7 signaling pathway, and modulates the activities of FGF2 and BMP7.46 The modulation of growth factors by GPC3 suggests a growth-promoting effect for GPC3 in HCC tumorigenesis and the use of GPC3 as a drug target for HCC.

Human liver cancer cell lines established from the primary HCC tumor are traditionally used to study the mechanisms of tumorigenesis due to the ease of procurement and handling of these cell lines. How robust cell lines mimic the biologic behavior of primary tumors has not been well documented. Recently, the genetic profiles of 19 HCC cell lines were characterized and two subtypes of cell lines that were highly correlated to the expression of AFP were identified.47 The overexpression of AFP, however, is not consistent across primary HCC tumors, as shown in the current study and by others.12, 24 Our findings reveal that many molecular features characteristic of primary HCC were lacking in the cell lines, whereas other important features exhibited more augmented or diminished expression levels in cell lines. Comparative genomic hybridization studies indicate that aberrant regulation of gene expression in primary HCC and liver cancer cell lines is frequently caused by the gain or loss of chromosomal regions.48 In particular, overrepresentations occur more frequently on chromosome 1q.18 This correlates well with our observations that a disproportionate number of genes located on chromosome 1q showed high expression in HCC tumor relative to nontumor tissue specimens.

In conclusion, our findings, taken together with other array studies, highlight the consistent biologic associations of HCC tumorigenesis with gene expression profiles. They suggest candidate markers and pathways uncovered by this approach that are likely to be useful in the diagnosis and treatment of HCC.


The authors thank Chandramouli Gadisetti and Christos Sotiriou for advice in the initial analysis; Adaikalavan Ramasamy, Safia S. Rahman, and Linda K. H. Teng for technical assistance; and Lance D. Miller for critical review of the manuscript.