fax: 011-65-6478-9052
Liver Failure & Liver Disease
Identification of discriminators of hepatoma by gene expression profiling using a minimal dataset approach
Article first published online: 25 MAR 2004
DOI: 10.1002/hep.20105
Copyright © 2004 American Association for the Study of Liver Diseases
Additional Information
How to Cite
Neo, S. Y., Leow, C. K., Vega, V. B., Long, P. M., Islam, A. F.M., Lai, P. B.S., Liu, E. T. and Ren, E. C. (2004), Identification of discriminators of hepatoma by gene expression profiling using a minimal dataset approach. Hepatology, 39: 944–953. doi: 10.1002/hep.20105
Publication History
- Issue published online: 25 MAR 2004
- Article first published online: 25 MAR 2004
- Manuscript Accepted: 20 DEC 2003
- Manuscript Received: 18 SEP 2003
Abstract
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
The severity of hepatocellular carcinoma (HCC) and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Previous microarray analyses of HCC were restricted to the selected tissue sample sets without validation on an independent series of tissue samples. We describe an approach to the identification of a composite discriminator cassette by intersecting different microarray datasets. We studied the global transcriptional profiles of matched HCC tumor and nontumor liver samples from 37 patients using cDNA (cDNA) microarrays. Application of nonparametric Wilcoxon statistical analyses (P < 1 × 10−6) and the criteria of 1.5-fold differential gene expression change resulted in the identification of 218 genes, including BMI-1, ERBB3, and those involved in the ubiquitin-proteasome pathway. Elevated ERBB2 and epidermal growth factor receptor (EGFR) expression levels were detected in ERBB3-expressing tumors, suggesting the presence of ERBB3 cognate partners. Comparison of our dataset with an earlier study of approximately 150 tissue sets identified multiple overlapping discriminator markers, suggesting good concordance of data despite differences in patient populations and technology platforms. These overlapping discriminator markers could distinguish HCC tumor from nontumor liver samples with reasonable precision and the features were unlikely to appear by chance, as measured by Monte Carlo simulations. More significantly, validation of the discriminator cassettes on an independent set of 58 liver biopsy specimens yielded greater than 93% prediction accuracy. In conclusion, these data indicate the robustness of expression profiling in marker discovery using limited patient tissue specimens as well as identify novel genes that are highly likely to be excellent markers for HCC diagnosis and treatment. Supplementary material for this article can be found on the HEPATOLOGY website (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html). (HEPATOLOGY 2004;39:944–953.)
Hepatocellular carcinoma (HCC) is the most common primary malignant tumor of the liver that accounts for more than 70% of liver cancers worldwide.1 A major risk factor associated with the development of HCC is hepatitis B virus (HBV) infection. Death is usually due to liver failure associated with cirrhosis and/or rapid outgrowth of multiple nodules. Approximately 0.25 to 1 million new cases of HCC are diagnosed each year, and the cancer is especially prevalent in Southeast Asia, China, and sub-Saharan Africa. Although surgical resection is considered to be the main curative treatment, only 10% to 15% of cases are suitable for surgery at the time of presentation. This is because either the disease is detected at an advanced stage at presentation or the underlying poor liver functional reserve precluded surgical intervention.
The lack of molecular markers that characterize tumor formation poses a major problem to effective diagnosis and prognosis of HCC. Current diagnosis of HCC relies on the presence of a liver mass on radiologic investigations and the detection of an elevated level of serum alpha fetoprotein (AFP).2 However, an elevated level of AFP is not exclusive to HCC and has been observed in benign hepatic disease, such as cirrhosis and other cancers such as germ cell cancer.3 Treatment of HCC includes the use of interferon therapy and antiviral drugs, but the results are unpredictable and the effectiveness may be limited.2, 4 Genome-wide analysis by microarray5 offers a systematic approach to uncover comprehensive information about the transcription profile of HCC. Previous studies have used microarrays to address the changes in gene expression of HCC.6–11 One major study was published recently by Chen et al.12 that investigated more than 100 liver tissue specimens. However, these reports were restricted to the tissue samples selected for each study and there was an absence of validation of their findings on an independent series of tissue samples, limiting the potential significance and utility of the data.
In the current study, we used complementary DNA (cDNA) microarrays to examine the global cellular changes in matched pairs of HBV-associated HCC tumor and nontumor liver tissue specimens of 37 patients. In addition, gene expression patterns between primary HCC tumors and liver cancer cell lines were examined for possible biologic variation. A comparison was performed with other independent microarray studies of HCC in an attempt to identify a composite cassette of discriminator genes that could potentially serve as tumor markers. To validate the utility of these discriminator cassettes to distinguish tumor from nontumor, prediction accuracy was assessed on an entirely independent set of 58 liver biopsy samples. These experiments indicated that array-based expression profiling on limited tissue sets generates robust data and identified novel molecular markers that are highly likely to be excellent markers for HCC diagnosis and targets for new disease management strategies.
Patients and Methods
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
RNA Isolation, RNA Amplification, and cDNA Microarray Hybridization.
All 37 patients (from whom the test set of tissue specimens was derived) had HBV-associated HCC and underwent curative liver resection. The paired samples of tumor and corresponding nontumor tissue specimens were obtained from the resected liver specimen. The validation tissue set comprised 58 liver biopsy samples from an independent cohort of 29 patients who also had HCC associated with HBV and underwent curative liver resection. Informed consent was obtained from the patients. The institutional research and ethics committee approved the study.
Tissue specimens were snap frozen in liquid nitrogen and stored at −150°C. A small section of each specimen was sampled and total RNA was isolated using Trizol reagent (Life Technologies, Bethesda, MD) according to the manufacturer's instructions. The integrity of the RNA specimen was verified by gel electrophoresis. The human liver cancer cell lines used in this study were PLC/PRF/5, HA22T, Huh1, Huh4, Tong, Hep3B, SNU182, SNU449, SNU475, HepG2, Huh6, Huh7, SKHep1, and Mahlavu. All cell lines were cultured under conditions recommended by the American Type Culture Collection (VA).
Because the amount of total RNA sample obtained from the limited tissue material was insufficient (approximately 4-15 μg) to be put directly on the array, RNA was linearly amplified using a procedure modified from Eberwine et al.13 Human universal reference RNA (Stratagene, La Jolla, CA), which comprised total RNA samples from 10 different human cell lines, was amplified and used as the reference for cDNA microarray analysis. Approximately 9,000 human cDNA features (Incyte Genomics, Palo Alto, CA) were spotted onto poly-L-lysine–coated slides using an OmniGrid arrayer (GeneMachines). Probes were generated from the amplified RNA material and hybridized to the chip as described previously.14 To minimize the effects of labeling biases, reciprocal dye swap labeling experiments were performed for each sample.
Data Analysis.
Raw data were analyzed on GenePix analysis software version 3.0 (Axon Instruments, Burlingame, CA) and uploaded to a relational database. The cDNA clones used for the microarray are represented by their UniGene identifiers. For each array, the logarithmic expression ratios for the spots were normalized. In addition, spots that did not meet our filtering criteria (see supplementary material for this article on the HEPATOLOGY website (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html)) were excluded, resulting in the inclusion of 8,716 features for subsequent analysis. Statistical comparison of genes between HCC tumor and nontumor specimens was performed by the Wilcoxon rank-sum nonparametric test. To evaluate gene expression patterns, hierarchical clustering using one minus Pearson's correlation metric and average linkage,15 and multidimensional scaling was performed on normalized data (mean = 0, SD = 1). Functional characterization of genes was based on gene ontology16 and on other published works in the PubMed database (http://www.ncbi.nlm.nih.gov/entrez).
Three different sets of Monte Carlo simulations17 were performed to (1) measure the quality of a set of selected gene features to be used as potential markers (Pa); (2) determine whether the set of genes observed to have a good performance as tumor discriminators, could appear merely by chance (Pb); and (3) approximate the significance of the number of observed overlapping genes after intersection of the important gene lists derived from two independent groups (i.e., the current study and the Chen et al. study; Pc). To validate the utility of the various expression cassettes to distinguish HCC tumor from nontumor liver specimens, the prediction accuracy of each discriminator cassette was assessed on an independent tissue set comprising 58 liver clinical biopsy specimens from 29 patients using a k-nearest neighbor (kNN) classification algorithm (k = 3) that employs Pearson correlation to measure the similarity between expression profiles. The algorithm was trained against the dataset comprising 74 tissue samples from 37 patients before testing against the new tissue set.
Real-Time Semiquantitative Reverse-Transcription Polymerase Chain Reaction (RT-PCR).
Total RNA samples were analyzed for the expression levels of selected genes by real-time semiquantitative RT-PCR using the LightCycler RNA amplification kit SYBR Green I on the LightCycler (Roche, Basel, Switzerland) according to the manufacturer's instructions. Data are presented as the level of gene expression in each HCC tumor specimen relative to its corresponding nontumor liver specimen.
Results
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
Assessment of Global Gene Expression Differences Between HCC Tumor and Nontumor Liver Specimens.
The gene expression patterns of primary HCC tumor specimens and the corresponding nontumor liver tissue specimens from 37 patients were examined by cDNA microarray. First, we assessed the overall natural patterns of gene expression in the HCC tumor and nontumor liver tissue specimens based on unsupervised hierarchical clustering. ANOVA in expression levels for each gene across all the tissue specimens indicated that 500 gene features (containing 493 unique UniGenes) showed the largest variability across both HCC tumor and nontumor liver tissue specimens (Fig. 1). Included in this list are AFP, an often used prognostic marker for HCC, and other genes associated with HCC such as HGF and MYC. Hierarchical clustering analysis based on these highly variant genes showed two main clusters, one representing the HCC tumor specimens and the other, the nontumor liver tissue specimens with only 6 of 37 HCC tumor specimens misclassified as nontumors (Supplementary Fig. 1B). Thus, the molecular configuration of HCC can be readily distinguished from that of nontumor liver specimens with minimal data manipulation.

Figure 1. Natural patterns of gene expression differences between HCC tumor and nontumor liver tissue specimens based on unsupervised clustering. Plot showing the variance of expression value for each of the gene features across all the HCC tumor and nontumor liver tissue specimens. Dotted line indicates the 500 most variable gene features.
Second, to investigate differential gene expression patterns between HCC tumor and nontumor liver specimens, we used the Wilcoxon rank-sum test and identified the top 2.5% candidate genes that displayed the smallest (best) P value scores (P < 1 × 10−6) and at least a 1.5-fold change in gene expression. For these 218 genes, false discovery rate analysis indicates a false-positive error of less than 0.4%. Multidimensional scaling analysis based on these outliers indicated that the HCC tumors were a more heterogeneous population than the nontumor liver tissue specimens (Fig. 2). Cancer cell lines derived from the primary tumor have traditionally been used as in vitro model systems for investigating the function of genes in the in vivo tumor environment. We asked how the expression pattern of the same 218 genes would appear in 14 established human liver cancer cell lines. It is apparent that the cell lines exhibited gene expression profiles that were different from the clinical HCC tumor and nontumor liver tissue specimens (Fig. 2), suggesting that these cell lines may have accumulated additional genetic or epigenetic alterations in culture.

Figure 2. Significant gene differential expression between HCC tumor and nontumor liver tissue specimens and comparison with liver cancer cell lines (P < 1 × 10−6, approximately 1.5-fold change). Multidimensional scaling plot illustrates the ability of these 218 outlier genes to separate HCC tumor specimens (red circles) from nontumor liver tissue specimens (green circles). The multidimensional plot also shows how different liver cancer cell lines (yellow circles) are from the clinical tissue samples.
Identification of Gene Clusters Differentially Expressed in HCC Tumor Tissue Specimens.
Among the 218 significant genes that distinguished HCC tumors from nontumor liver tissue specimens, more genes were observed to be overexpressed than underexpressed in the malignant tissue specimens (Supplementary Fig. 1). Mapping of the chromosomal location of these 218 unique outliers indicated that a disproportionate number of genes was located on chromosome 1 (Fig. 3A), particularly in the 1q region, and that the majority of these genes were more highly expressed in the tumor tissue specimens. This result correlated well with previous reports of frequent amplification on chromosomal 1q in HCC tumor specimens.18 Further characterization of these outlier genes revealed many genes that have yet to be fully understood for their roles in HCC. A substantial proportion of genes were involved in transport (e.g., PEA15), RNA processing (e.g., RDBP), and metabolic processes (e.g., NME1) and showed increased expression in HCC tumor specimens, possibly indicating accelerated rates of metabolism (Fig. 3B, Table 1, Supplementary Table 2). Several genes (e.g., SMT3H1) were members of the ubiquitin-proteasome pathway, suggesting considerable deregulation of this pathway in HCC. Transcription factors (e.g., ESR1) and genes involved in controlling growth and differentiation (e.g., GRN), and signal transduction (e.g., CSTB) formed the other dominant gene groups. Notably, the polycomb group protein BMI-1, which is believed to be an oncogene, was consistently expressed at much higher levels in HCC tumor specimens (Fig. 4). BMI-1 expression level is elevated in various tumors including Hodgkin's disease19 but it has not been previously studied in HCC. In addition, there were a number of genes (e.g., MAWBP, AD24) that had no known functions. Genes that have been studied in HCC previously and were represented in our outlier list included HDGF and GHR.20, 21 A number of the differentially expressed genes such as MDK and CDC23 are consistent with those reported for cancer cells.22, 23 It is noteworthy that the list of outlier genes did not include AFP, which was elevated in a small number of our HCC tumor samples. This agrees well with previous reports that AFP levels were variably elevated in HCC12 and that AFP was present in approximately 50% of HCC.24

Figure 3. Characterization of differentially expressed genes in HCC tumor specimens (P < 1 × 10−6, approximately1.5-fold change). (A) Chromosomal distribution of the 218 outlier genes. The dark-colored and light shaded bars represent genes that are at least 1.5-fold up-regulated and down-regulated, respectively, in HCC tumor specimens relative to nontumor liver samples. (B) Functional characterization of the outlier genes based on gene ontology and published works.
| Function | Gene Symbol | Gene Name | UniGene | Expression Change in HCC Tumor* |
|---|---|---|---|---|
| ||||
| Transcription factor | ILF2 | Interleukin enhancer binding factor 2, 45 kD | Hs. 75117 | ↑ |
| BMI1 | Murine leukemia viral (bmi-1) oncogene homolog | Hs. 431 | ↑ | |
| TAF9 | TAF9 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 32 kD | Hs. 60679 | ↑ | |
| ZNF146 | Zinc finger protein 146 | Hs. 301819 | ↑ | |
| CHD4 | Chromodomain helicase DNA binding protein 4 | Hs. 74441 | ↑ | |
| NR4A1 | Nuclear receptor subfamily 4, group A, member 1 | Hs. 1119 | ↓ | |
| ZNF238 | Zinc finger protein 238 | Hs. 69997 | ↓ | |
| FOSB | FBJ murine osteosarcoma viral oncogene homolog B | Hs. 75678 | ↓ | |
| RNA processing | H2AFY | H2A histone family, member Y | Hs. 75258 | ↑ |
| SNRPB | Small nuclear ribonucleoprotein polypeptides B and B1 | Hs. 83753 | ↑ | |
| RPS7 | Ribosomal protein S7 | Hs. 301547 | ↑ | |
| MRPS14 | Mitochondrial ribosomal protein S14 | Hs. 247324 | ↑ | |
| SNRPD2 | Small nuclear ribonucleoprotein D2 polypeptide (16.5 kD) | Hs. 53125 | ↑ | |
| NCL | Nucleolin | Hs. 79110 | ↑ | |
| RPS10 | Ribosomal protein S10 | Hs. 76230 | ↑ | |
| RPL6 | Ribosomal protein L6 | Hs. 349961 | ↑ | |
| SNRPE | Small nuclear ribonucleoprotein polypeptide E | Hs. 334612 | ↑ | |
| SF3B4 | Splicing factor 3b, subunit 4, 49 kD | Hs. 25797 | ↑ | |
| RDBP | RD RNA-binding protein | Hs. 106061 | ↑ | |
| SNRPF | Small nuclear ribonucleoprotein polypeptide F | Hs. 105465 | ↑ | |
| RPS10 | Ribosomal protein S10 | Hs. 76230 | ↑ | |
| DNA replication/repair | ADPRT | ADP-ribosyltransferase (NAD+; poly(ADP-ribose) polymerase) | Hs. 177766 | ↑ |
| PRKDC | Protein kinase, DNA-activated, catalytic polypeptide | Hs. 155637 | ↑ | |
| SMC4L1 | SMC4 (structural maintenance of chromosomes 4, yeast)-like 1 | Hs. 50758 | ↑ | |
| FEN1 | Flap structure-specific endonuclease 1 | Hs. 4756 | ↑ | |
| MCM2 | Minichromosome maintenance deficient (S. cerevisiae) 2 (mitotin) | Hs. 57101 | ↑ | |
| HAT1 | Histone acetyltransferase 1 | Hs. 13340 | ↑ | |
| Cell growth/differentiation | GPC3 | Glypican 3 | Hs. 119651 | ↑ |
| MDK | Midkine (neurite growth-promoting factor 2) | Hs. 82045 | ↑ | |
| HDGF | Hepatoma-derived growth factor (high-mobility group protein 1-like) | Hs. 89525 | ↑ | |
| TP53BP2 | Tumor protein p53-binding protein, 2 | Hs. 44585 | ↑ | |
| CDC23 | CDC23 (cell division cycle 23, yeast, homolog) | Hs. 153546 | ↑ | |
| IGFBP3 | Insulin-like growth factor binding protein 3 | Hs. 77326 | ↓ | |
| Immune response | TMPO | Thymopoietin | Hs. 11355 | ↑ |
| IGKC | Immunoglobulin kappa constant | Hs. 156110 | ↓ | |
| IGHG3 | Immunoglobulin heavy constant gamma 3 (G3m marker) | Hs. 300697 | ↓ | |
| IGJ | Immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides | Hs. 76325 | ↓ | |
| Cell adhesion | LAMR1 | Laminin receptor 1 (67 kD, ribosomal protein SA) | Hs. 181357 | ↑ |
| CAPZA2 | Capping protein (actin filament) muscle Z-line, alpha 2 | Hs. 75546 | ↑ | |
| ARHE | Ras homolog gene family, member E | Hs. 6838 | ↓ | |
| Signal transduction | CAP2 | Adenylyl cyclase-associated protein 2 | Hs. 296341 | ↑ |
| CALM2 | Calmodulin 2 (phosphorylase kinase, delta) | Hs. 182278 | ↑ | |
| LASP1 | LIM and SH3 protein 1 | Hs. 334851 | ↑ | |
| SHC1 | SHC (Src homology 2 domain-containing) transforming protein 1 | Hs. 81972 | ↑ | |
| RGS5 | Regulator of G-protein signalling 5 | Hs. 24950 | ↑ | |
| HAX1 | HS1 binding protein | Hs. 15318 | ↑ | |
| ERBB3 | v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3 | Hs. 199067 | ↑ | |
| Ubiquitin-proteasome pathway | UBD | Diubiquitn | Hs. 44532 | ↑ |
| USP14 | Ubiquitin specific protease 14 (tRNA-guanine transglycosylase) | Hs. 75981 | ↑ | |
| PSMA1 | Proteasome (prosome, macropain) subunit, alpha type, 1 | Hs. 82159 | ↑ | |
| PSMB4 | Proteasome (prosome, macropain) subunit, beta type, 4 | Hs. 89545 | ↑ | |
| Transport | CCT5 | Chaperonin containing TCP1, subunit 5 (epsilon) | Hs. 1600 | ↑ |
| CCT3 | Chaperonin containing TCP1, subunit 3 (gamma) | Hs. 1708 | ↑ | |
| HSPA5 | Heat shock 70 kD protein 5 (glucose-regulated protein, 78 kD) | Hs. 75410 | ↑ | |
| XPO1 | Exportin 1 (CRM1, yeast, homolog) | Hs. 79090 | ↑ | |
| NUCB2 | Nucleobindin 2 | Hs. 3164 | ↑ | |
| ATP6IP1 | ATPase, H+ transporting, lysosomal interacting protein 1 | Hs. 6551 | ↑ | |
| AP3S1 | Adaptor-related protein complex 3, sigma 1 subunit | Hs. 80917 | ↑ | |
| VDAC2 | Voltage-dependent anion channel 2 | Hs. 78902 | ↑ | |
| Metabolism | NME1 | Non-metastatic cells 1, protein (NM23A) expressed in | Hs. 118638 | ↑ |
| DPM1 | Dolichyl-phosphate mannosyltransferase polypeptide 1, catalytic subunit | Hs. 5085 | ↑ | |
| ACLY | ATP citrate lyase | Hs. 174140 | ↑ | |
| GCN1L1 | GCN1 (general control of amino-acid synthesis 1, yeast)-like 1 | Hs. 75354 | ↑ | |
| NAT2 | N-acetyltransferase 2 (arylamine N-acetyltransferase) | Hs. 2 | ↓ | |
| CYP2C8 | Cytochrome P450, subfamily IIC (mephenytoin 4-hydroxylase), polypeptide 8 | Hs. 174220 | ↓ | |
| CYP2E | Cytochrome P450, subfamily IIE (ethanol-inducible) | Hs. 75183 | ↓ | |
| Unknown | C20orf24 | Chromosome 20 open reading frame 24 | Hs. 184062 | ↑ |
| C1orf9 | Chromosome 1 open reading frame 9 | Hs. 108636 | ↑ | |
| LOC51235 | Hypothetical protein | Hs. 181444 | ↑ | |
| FLJ12666 | Hypothetical protein FLJ12666 | Hs. 23767 | ↓ | |

Figure 4. Expression of BMI-1 in HCC tumor specimens as determined by cDNA microarray analysis. Data are presented as the level of expression (log base 2) in each HCC tumor specimen with respect to the corresponding nontumor liver sample.
To validate our microarray data, real-time RT-PCR analysis was performed for insulin-like growth factor binding protein 3 (IGFBP3) and ERBB3 in all the 37 matched HCC tumor and nontumor liver samples. The results of real-time RT-PCR analyses of IGFBP3 and ERBB3 were consistent with previous reports examining these individual markers.25, 26 For example, IGFBP3 expression was diminished in 35 of 37 HCC tumors relative to their corresponding nontumor liver tissue specimens (Fig. 5A), whereas ERBB3 expression was elevated in 34 of 37 tumor samples (Fig. 5B). We observed good concordance in fold changes between microarray data and real-time RT-PCR results and the magnitude of fold change obtained by PCR was similar or higher (as much as a fivefold in some cases, given the sensitivity of PCR). Because ERBB3 is defective in tyrosine kinase activity and requires dimerization with other receptors, possibly another member of the ERBB family,27 we tested the hypothesis that HCC tumors expressing high levels of ERBB3 were associated with high expression of ERRB2 or epidermal growth factor receptor (EGFR). The expression of ERBB2 was elevated in 12 of 37 tumor specimens, whereas high EGFR expression was found in 15 of 37 tumor specimens (Fig. 5B). With the top 50 percentile of high ERBB3-expressing HCC tumors, we found a significant concomitant increase in ERBB2 expression (t-test P = approximately .0026) but no association with high EGFR expression (t-test P = approximately .31). Clearly, the cognate partners of ERRB3 were present in tumor specimens expressing high levels of ERBB3.

Figure 5. Real-time RT-PCR analysis of IGFBP3, ERBB3, ERBB2, and EGFR in HCC tumor samples. The gene expression patterns for (A) IGFBP3 and (B) ERBB3, ERBB2, and EGFR in all the 37 HCC tumor specimens and their corresonding nontumor liver tissue specimens were examined. All data were normalized to the amount of housekeeping gene porphobilinogen deaminase and are presented as relative fold expression change (log base 2) in HCC tumor specimens with respect to its corresponding nontumor liver counterpart. A positive value depicts a higher expression level, whereas a negative value depicts a lower expression level in the tumor relative to the nontumor specimen.
Validation of HCC Tumor Discriminator Expression Cassettes.
To assess the validity of our expression cassette of genes for distinguishing HCC tumor from nontumor liver tissue specimens, we explored the intersection of our data with those published in the literature. In the study by Chen et al.,12 HCC tumor specimens from 82 patients and nontumor liver tissue specimens from 74 patients were examined and 1,648 features (containing 1,449 unique UniGenes) were reported to discriminate tumor from nontumor, of which 600 features (containing 540 unique UniGenes) formed the best discriminators. First, we asked whether any of the Chen et al. dataset of 600 most differentially expressed genes was included in our array of 8,716 features (containing 7,521 unique UniGenes) based on UniGene identifiers. A total of 265 features (containing 245 unique UniGenes) from our microarray were observed to overlap (Supplementary Table 3). Hierarchical clustering analysis based on the expression levels of these 265 “overlap” features separated our tissue set into two distinct groups of tumor and nontumor specimens, with five tissue samples misclassified (Fig. 6). Such clustering was significant (Pa < 1 × 10−6) based on random permutation testing of sample labels. The likelihood of a randomly chosen set of 265 features producing five or fewer samples misclassified was low (Pb = 1.5 × 10−3). Therefore, these 265 overlap features could distinguish HCC tumor from nontumor liver specimens with reasonable precision, and the features were unlikely to appear by chance.

Figure 6. Intersection between the best 600 genes from the Chen et al. list of good discriminators and the GIS microarray expression data set (8,716 features) based on UniGene identifiers. The 265 overlap features (containing 245 unique UniGenes) obtained were applied on the GIS tissue set. Clustering analysis was performed based on the expression levels of these features. The significance of the clustering obtained (Pa) was measured by Monte Carlo sample label permutations. The probability that the observed 265 overlap features could appear merely by chance alone (Pb) was estimated by performing a different series of Monte Carlo simulations (see supplementary information for details).
Second, we asked whether our 218 significant gene list (containing 213 unique UniGenes) was present in the Chen et al. microarray of approximately 23,000 features (containing 17,220 unique UniGenes). A total of 230 features (containing 166 unique UniGenes) from the Chen et al. array were observed to overlap (Supplementary Table 4). Hierarchical clustering analysis based on the expression levels of these 230 overlap features separated the Chen et al. tissue set into distinct tumor and nontumor groups, with four tissue samples misclassified (Fig. 7). Random permutation of sample labels indicated that the clustering was significant (Pa < 1 × 10−6) and it was unlikely that a randomly chosen set of 230 features could produce four or fewer samples misclassified (Pb < 1 × 10−4). These 230 overlap features are therefore able to discern fairly accurately HCC tumor from nontumor liver specimens. We also sort the overlap between our 218 significant gene list and the Chen et al. list of 1,648 good discriminating genes. A total of 68 unique UniGenes overlapped (Supplementary Table 5). The likelihood that the overlap would arise by chance if the two gene lists were totally independent was minuscule (Pc < 1 × 10−8). Therefore, cross-testing the results of each dataset with the other, we were able to validate tumor-nontumor discriminators for HCC diagnosis.

Figure 7. Intersection between the GIS 218 significant gene list and the Chen et al. microarray expression data set (approximately 23,000 features) based on UniGene identifiers. The 230 overlap features (containing 166 unique UniGenes) obtained were applied on the Chen et al. tissue set. Clustering analysis was performed based on the expression levels of these features. The significance of the clustering obtained (Pa) was measured by Monte Carlo sample label permutations. The probability that the observed 230 overlap features could appear merely by chance alone (Pb) was estimated by performing a different series of Monte Carlo simulations (see supplementary information for details).
Third, to definitively validate the utility of these probe sets to distinguish HCC tumor from nontumor liver tissue specimens, we assessed the accuracy of these four discriminator cassettes on an independent tissue set consisting of 58 liver clinical biopsy specimens from 29 patients. These 58 liver samples were processed separately on the same cDNA microarray platform as with our test set of tissue samples from 37 patients. Using a kNN prediction algorithm, we found that all classifier probe cassettes could readily distinguish HCC tumor from nontumor liver specimens (Table 2). The GIS discriminator list (N = 218) resulted in a predictive accuracy of 93%, four false-negative and no false-positive results. The GIS genes found in the Chen et al. microarray (N = 166) gave three false-negative and no false-positive results for a predictive accuracy of 95%. The discriminators from the Chen et al. study found in the GIS microarray (N = 265) gave three false-negative and no false-positive results in the validation set for a predictive accuracy of 95%. Lastly, the intersect between the GIS and the Chen et al. discriminator probes comprising only 68 genes gave a predictive accuracy of 96%, one false-negative and one false-positive result. Therefore, the gene discriminators of tumor versus nontumor in HCC derived by the intersect analysis of limited tissue sets can be validated in an independent manner.
| Gene Classifiers | No. of Gene Classifiers | Misclassification Rate | No. of False Negative Cases* | No. of False Positive Cases† | Predictive Accuracy |
|---|---|---|---|---|---|
| |||||
| GIS discriminating genes | 218 | 4 of 58 | 4 | – | 93% |
| GIS genes present in the Chen et al. microarray | 166 | 3 of 58 | 3 | – | 95% |
| Chen et al. genes present in the GIS microarray | 265 | 3 of 58 | 3 | – | 95% |
| Intersect between GIS and Chen et al. discriminating genes | 68 | 2 of 58 | 1 | 1 | 96% |
Discussion
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
Due to the severity of the disease and the lack of good molecular markers for diagnosis and effective treatment strategies of the tumor, HCC remains a major cancer challenge. We present an analysis of gene expression patterns on matched HCC tumor and nontumor tissue specimens from 37 patients and uncovered 218 genes that can distinguish tumor from nontumor liver specimens. Previous work on HCC to identify unique expression signatures uncovered discriminating gene sets. We sought to capitalize on the progressive standardization of the array platforms by testing the performance of overlapping discriminator markers between our array results and those of Chen et al.12 Our results show remarkable concordance with those of Chen et al., despite the differences in patient populations and the technology platforms applied. We then validated these intersecting discriminators with an independent set of 58 HCC tumor and nontumor liver specimens and showed high predictive accuracy for these gene probes. Therefore, markers arising from array-based analysis of limited tissue sets are surprisingly robust. Moreover, we believe that future approaches combining in silico and limited experimental validation are sufficient to uncover biologically meaningful data from clinical expression array studies.
In-depth study of the functions of the useful markers shows that a substantial number of the genes are involved in the ubiquitin-proteasome pathway and are overexpressed in HCC tumor tissue specimens. Two of these genes, UBD and PSMD4, which were highly expressed in the tumor tissue specimens in the current study, are similar to the observations made by other independent array studies.9, 12 In addition to stability, the activities of many proteins such as the tumor suppressor p53 are modulated by ubiquitination.28 The molecular components of the proteasome complex are also involved in other cellular events that do not require proteolysis, such as endocytosis, the localization of certain proteins in the nucleus, and transcription.29, 30 Defects in the ubquitin-proteasome pathway have been observed in a variety of human diseases, including neurodegenerative disease, metabolic disorders, and cancer.31 Recently, it was demonstrated that application of the proteasome inhibitor PS-341 in orthotopic human pancreatic tumor xenografts resulted in inhibition of tumor growth and angiogenesis.32 These data suggest that the ubiquitination pathways may be a legitimate target for HCC therapeutics.
The putative oncogene ERBB3 was consistently present at much higher levels in our set of HCC tumor tissue specimens. This is consistent with findings of an independent study that found high ERRB3 levels in the majority of HCC tumor specimens by immunohistochemistry.26 High ERBB2 expression was found in approximately 41% of the Chen et al. HCC tumor set,12 similar to our findings that approximately 32% of our HCC tumor set overexpressed ERBB2. In addition, there was an association between high ERBB3 and high ERBB2 expression in the HCC tumor specimens in the current study. The physiologic effects of ERBB3 signaling are not completely clear. ERBB3 has been shown to serve as the primary binding site for heregulin in adult rat hepatocytes and whole liver specimens.33 In addition, ERBB3 is capable of eliciting both differentiation and growth responses when transactivated by the other ERBB receptors ERBB2 and EGFR, neither of which binds heregulin.27 These data, together, suggest that drugs targeting the kinase domains of ERBB2 and EGFR or the extracellular domain of ERBB3 may be useful in HCC treatment strategy.
Our study highlights a sizable number of potential marker genes that may be classified as cell adhesion molecules. Alterations in the expression and function of adhesion molecules, including integrins and cadherins, are associated with both tumor suppression and progression in different diseases.34 Some of these genes, such as ITGB1, have been shown to have an essential role in HCC.35 Although little information is available currently for other genes, such as DNCH1, work on other gene members (e.g., DNLC2A and DNLC2B) indicates that their levels are substantially up-regulated and down-regulated, respectively, in HCC tumor specimens compared with their adjacent nontumor liver tissue specimens.36 The current development of antagonists, including cilengitide,37 against integrins signifies the importance of cell adhesion molecules in cancer.
In addition, we showed the polycomb group protein BMI-1 to be a significant outlier that was highly differentially expressed in HCC tumors. Originally identified in Drosophila, polycomb group proteins are evolutionarily conserved chromatin components that are involved in maintaining transcriptional “memory” and are vital in embryogenesis and control of cell identity.38 Conceivably, deregulation of this epigenetic silencing would have deleterious effects on normal development and may result in the development of cancer. This was demonstrated by Varambally et al.39 in their study of prostate cancer. They showed that the polycomb group protein EZH2 is highly expressed in prostate cancer and is involved in the progression of prostate cancer. Recent studies indicate that BMI-1 has an essential role in determining the proliferative activities of normal and leukemic stem cells.40 These findings suggest that the oncogenic potential of BMI-1 may render it an important tumor marker as well as therapeutic target for HCC.
IGFBP3 showed consistent low expression levels across our and other datasets of HCC tumors,6, 12 suggesting a role for the IGF axis in HCC disease presentation. Perturbations of the IGF axis are also implicated in the formation of cancer in organs such as the colon, prostate, and lung.41 How the actions of IGFBP3 are modulated is still unclear, but it is likely to be complex and involve multiple players. Besides signaling through the IGF receptor-dependent pathway, IGFBP3 also mediates signals via other pathways such as transforming growth factor-beta, tumor necrosis factor-alpha, and RXRα.42 GPC3, a member of the heparin sulfate proteoglycans, showed consistent high abundance across HCC tumor specimens in the current study and in the Chen et al.12 datasets. High expression of GPC3 has also been reported in embryonal tumors such as Wilms tumor43 although its expression is reported to be suppressed in breast cancer.44 Heparin sulfate proteoglycans act as coreceptors for heparin-binding growth factors and ultimately lead to stimulation or inhibition of growth factor activity.45 A recent study by Midorikawa et al. determined a role for GPC3 in hepatocarcinogenesis. They showed that GPC3 interacts with FGF2, inhibits BMP7 signaling pathway, and modulates the activities of FGF2 and BMP7.46 The modulation of growth factors by GPC3 suggests a growth-promoting effect for GPC3 in HCC tumorigenesis and the use of GPC3 as a drug target for HCC.
Human liver cancer cell lines established from the primary HCC tumor are traditionally used to study the mechanisms of tumorigenesis due to the ease of procurement and handling of these cell lines. How robust cell lines mimic the biologic behavior of primary tumors has not been well documented. Recently, the genetic profiles of 19 HCC cell lines were characterized and two subtypes of cell lines that were highly correlated to the expression of AFP were identified.47 The overexpression of AFP, however, is not consistent across primary HCC tumors, as shown in the current study and by others.12, 24 Our findings reveal that many molecular features characteristic of primary HCC were lacking in the cell lines, whereas other important features exhibited more augmented or diminished expression levels in cell lines. Comparative genomic hybridization studies indicate that aberrant regulation of gene expression in primary HCC and liver cancer cell lines is frequently caused by the gain or loss of chromosomal regions.48 In particular, overrepresentations occur more frequently on chromosome 1q.18 This correlates well with our observations that a disproportionate number of genes located on chromosome 1q showed high expression in HCC tumor relative to nontumor tissue specimens.
In conclusion, our findings, taken together with other array studies, highlight the consistent biologic associations of HCC tumorigenesis with gene expression profiles. They suggest candidate markers and pathways uncovered by this approach that are likely to be useful in the diagnosis and treatment of HCC.
Acknowledgements
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
The authors thank Chandramouli Gadisetti and Christos Sotiriou for advice in the initial analysis; Adaikalavan Ramasamy, Safia S. Rahman, and Linda K. H. Teng for technical assistance; and Lance D. Miller for critical review of the manuscript.
References
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
- 1
- 2
- 3
- 4. Hepatitis B virus infection. N Engl J Med 1997; 337: 1733–1745.
- 5, . Exploring the new world of the genome with DNA microarrays. Nat Genet 1999; 21( Suppl): 33–37.
- 6, , , , , , , et al. Genome-wide analysis of gene expression in human hepatocellular carcinomas using cDNA microarray: identification of genes involved in viral carcinogenesis and tumor progression. Cancer Res 2001; 61: 2129–2137.
- 7, , , , . Differential gene expression between chronic hepatitis B and C hepatic lesion. Gastroenterology 2001; 120: 955–966.
- 8, , , , . Identification of differentially expressed genes in hepatocellular carcinoma with cDNA microarrays. HEPATOLOGY 2001; 33: 832–840.Direct Link:
- 9, , , , , , , et al. Identification of differentially expressed genes in hepatocellular carcinoma and metastatic liver tumors by oligonucleotide expression profiling. Cancer 2001; 92: 395–405.Direct Link:
- 10, , , , , , , et al. Expression profiling suggested a regulatory role of liver-enriched transcription factors in human hepatocellular carcinoma. Cancer Res 2001; 61: 3176–3781.
- 11, , , , , , , et al. Insight into hepatocellular carcinogenesis at transcriptome level by comparing gene expression profiles of hepatocellular carcinoma with those of corresponding noncancerous liver. Proc Natl Acad Sci USA 2001; 98: 15089–15094.
- 12, , , , , , , et al. Gene expression patterns in human liver cancers. Mol Biol Cell 2002; 13: 1929–1939.
- 13, , , , , , , et al. Analysis of gene expression in single live neurons. Proc Natl Acad Sci USA 1992; 89: 3010–3014.
- 14, , , , . Core biopsies can be used to distinguish differences in expression profiling by cDNA microarrays. J Mol Diag 2002; 4: 30–36.
- 15, , , . Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998; 95: 14863–14688.
- 16The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet 2000; 25: 25–29. (Available: URL: http://www.geneontology.org).
- 17, . Bootstrap methods and their application. Cambridge: Cambridge University Press, 1997.
- 18, , , , , , . Recurrent chromosomal abnormalities in hepatocellular carcinoma detected by comparative genomic hybridization. Genes Chrom Cancer 1997; 18: 59–65.Direct Link:
- 19, , , , , , , et al. Coexpression of BMI-1 and EZH2 polycomb group genes in Reed-Sternberg cells of Hodgkin's disease. Am J Pathol 2000; 157: 709–715.
- 20, , , , , , , et al. Antisense oligonucleotides of hepatoma-derived growth factor (HDGF) suppress the proliferation of hepatoma cells. Hepatogastroenterology 2002; 49: 1639–1644.
- 21, , , , . Insulin regulation of human hepatic growth hormone receptors: divergent effects on biosynthesis and surface translocation. J Clin Endocrinol Metab 2000; 85: 4712–4720.
- 22
- 23, . The anaphase-promoting complex is required in G1 arrested yeast cells to inhibit B-type cyclin accumulation and to prevent uncontrolled entry into S-phase. J Cell Sci 1997; 110: 1523–1531.
- 24
- 25, , , . A possible role for insulin-like growth factor-binding protein-3 autocrine/paracrine loops in controlling hepatocellular carcinoma cell proliferation. Cell Growth Diff 2002; 13: 115–122.
- 26, , , , , , , et al. Expression and clinical significance of erb-B receptor family in hepatocellular carcinoma. Br J Cancer 2001; 84: 1377–1383.
- 27, . Specificity within the EGF family/ErbB receptor family signaling network. Bioessays 1998; 20: 41–48.Direct Link:
- 28, . p53 de-ubiquitination: at the edge between life and death. Nat Cell Biol 2002; 4: E152–E153.
- 29. Ubiquitin enters the new millennium. Mol Cell 2001; 8: 499–504.
- 30, , , . Regulation of transcriptional activation domain function by ubiquitin. Science 2001; 293: 1651–1653.
- 31. Ubiquitin-dependent proteolysis: its role in human diseases and the design of therapeutic strategies. Mol Genet Metab 2002; 77: 44–56.
- 32, , , , , , , et al. Effects of the proteasome inhibitor PS-341 on apoptosis and angiogenesis in orthotopic human pancreatic tumor xenografts. Mol Cancer Ther 2002; 1: 1243–1253.
- 33, , , . Insulin regulates heregulin binding and ErbB3 expression in rat hepatocytes. J Biol Chem 1996; 271: 13491–13496.
- 34. Changing neighbours, changing behaviour: cell adhesion molecule-mediated signaling during tumor progression. EMBO J 2003; 22: 2318–2323.
- 35, , . Role of β1 integrins in adhesion and invasion of hepatocellular cells. HEPATOLOGY 1999; 29: 68–74.Direct Link:
- 36, , , , , , , et al. Identification of two novel human dynein light chain genes, DNLC2A and DNLC2B, and their expression changes in hepatocellular carcinoma tissues from 68 Chinese patients. Gene 2001; 281: 103–113.
- 37
- 38. Polycomb, epigenomes and control of cell identity. Cell 2003; 112: 599–606.
- 39, , , , , , , et al. The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 2002; 419: 624–629.
- 40, . Bmi-1 determines the proliferative capacity of normal and leukaemic stem cells. Nature 2003; 423: 255–260.
- 41, . Insulin-like growth factors and cancer. Lancet Oncol 2002; 3: 298–302.
- 42, . Role of insulin-like growth factors and their binding proteins in growth control and carcinogenesis. J Cell Physiol 2000; 183: 1–9.Direct Link:
- 43, . Expression of glypican 3 (GPC3) in embryonal tumors. Int J Cancer 2000; 89: 418–422.Direct Link:
- 44, , . Glypican-3 expression is silenced in human breast cancer. Oncogene 2001; 20: 7408–7412.
- 45, . Glypicans: proteoglycans with a surprise. J Clin Invest 2001; 108: 497–501.
- 46, , , , , , , et al. Glypican-3, overexpressed in hepatocellular carcinoma, modulates FGF2 and BMP-7 signaling. Int J Cancer 2003; 103: 455–465.Direct Link:
- 47, . Functional and genomic implications of global gene expression profiles in cell lines from human hepatocellular cancer. HEPATOLOGY 2002; 35: 1134–1143.Direct Link:
- 48, , , . Novel recurrent genetic imbalances in human hepatocellular carcinoma cell lines identified by comparative genomic hybridization. HEPATOLOGY 1999; 29: 1208–1214.Direct Link:
Supporting Information
- Top of page
- Abstract
- Patients and Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
This article includes Supplementary tables and figures http://www.interscience.wiley.com/jpages/0270-9139/suppmat/2004/39/v39.944.html
| Filename | Format | Size | Description |
|---|---|---|---|
| suppmat_944_fig1.eps | 3701K | Hierarchical cluster analysis of 218 genes that show differential expression between HCC tumor and non-tumor liver tissues from 37 patients (P<1x10-6, =1.5-fold change). As comparison, the same gene information from 14 different liver cancer cell lines was aligned. Rows represent individual genes, columns represent unique samples. Each cell in the matrix represents the expression level of a gene in an individual sample. The red and green color bars reflect high and low expression levels respectively, while black indicates equivalent expression level. The magnitude of the log-transformed ratio is reflected by the degree of color saturation. | |
| suppmat_944_fig1B.eps | 6081K | Hierarchical clustering of 37 pairs of HCC tumor and nontumor liver samples using the 500 most variable gene features separated the tissues into two main groups, tumor and nontumor. Rows represent individual genes, columns represent unique samples. Each cell in the matrix represents the expression level of a gene feature in an individual tissue sample. The red and green color bars reflect high and low expression levels respectively, while black indicates equivalent expression level. The magnitude of the log-transformed ratio is reflected by the degree of color saturation. | |
| suppmat_944_fig6.eps | 3799K | Supporting Information file suppmat_944_fig6.eps | |
| suppmat_944_fig7.eps | 6264K | Supporting Information file suppmat_944_fig7.eps | |
| suppmat_944_table1.xls | 56K | Genes significantly differentially expressed between HCC tumor and nontumor liver tissues. | |
| suppmat_944_table2.xls | 93K | Composite of UniGenes that overlapped in the intersection between Chen et al 600 significant gene list and GIS microarray expression dataset. | |
| suppmat_944_table3.xls | 76K | Composite of UniGenes that overlapped in the intersection between GIS 218 significant gene list and Chen et al microarray expression dataset. | |
| suppmat_944_table4.xls | 35K | Composite of UniGenes that overlapped in the intersection between GIS and Chen et al significant gene lists. |
Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

1527-3350/asset/olbannerleft.gif?v=1&s=4b2409f9534ed500d3c8da1940a23842e2b9932d)
1527-3350/asset/olbannerright.gif?v=1&s=141b9a8485298533c3e2016e937b0404f7d933e1)
1527-3350/asset/cover.gif?v=1&s=3cd983af6575c8dbfd6b47a63ffa95415ace15f8)