Hepatocellular carcinoma is one of the most lethal cancers worldwide. More accurate stratification of patients at risk is necessary to improve its clinical management. As epithelial–mesenchymal transition is critical for the invasiveness and metastasis of human cancers, we investigated expression profiles of 12 genes related to epithelial–mesenchymal transition through a real-time polymerase chain reaction. From a univariate Cox analysis for a training cohort of 128 hepatocellular carcinoma patients, four candidate genes (E-cadherin [CDH1], inhibitor of DNA binding 2 [ID2], matrix metalloproteinase 9 [MMP9], and transcription factor 3 [TCF3]) with significant prognostic values were selected to develop a risk score of patient survival. Patients with high risk scores calculated from the four-gene signature showed significantly shorter overall survival times. Moreover, the multivariate Cox analysis revealed that four-gene signature (P = 0.0026) and tumor stage (P = 0.0023) were independent prognostic factors for overall survival. Subsequently, the four-gene signature was validated in an independent cohort of 231 patients from three institutions, in which high risk score was significantly correlated with shorter overall survival (P = 0.00011) and disease-free survival (P = 0.00038). When the risk score was entered in a multivariate Cox analysis with tumor stage only, both the risk score (P = 0.0046) and tumor stage (P = 2.6 × 10−9) emerged as independent prognostic factors. In conclusion, we suggest that the proposed gene signature may improve the prediction accuracy for survival of hepatocellular carcinoma patients, and complement prognostic assessment based on important clinicopathologic parameters such as tumor stage. (Cancer Sci 2010)
Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide and the most common primary hepatic malignancy, being responsible for 80% of malignant tumors in adult livers. Moreover, its mortality is third among all cancers, behind only lung and colon cancer.(1) HCC is known for its endemic prevalence in Asia and Africa, and the incidence of HCC has doubled in the USA and Europe in the past four decades.(1–3) HCC is resistant to conventional chemotherapy and is rarely amenable to radiotherapy,(4) leaving this disease with no effective therapeutic options and a very poor prognosis. Although the major etiological agents have been identified, the molecular pathogenesis of HCC remains unclear.(5) It is therefore important to identify molecular targets to develop novel diagnostic, therapeutic, and preventive strategies.
Epithelial–mesenchymal transition (EMT) is a key step during embryogenesis but also plays a critical role in cancer progression, through which epithelial cancers invade and metastasize.(6) Therefore, EMT-related pathways have been studied in relation to cancer management and drug resistance, for instance in breast cancer(7) and ovarian cancer.(8) The existence of EMT in vivo has been controversial due to its spatial and temporal heterogeneity that complicates a direct observation in clinic.(6) Nevertheless, several EMT markers have been analyzed in clinical specimens and cell lines in vitro.(9–11) Meta-analysis of gene expression profiles in HCC revealed three robust subclasses of HCC.(12) Interestingly, one of the subgroups was characterized by overexpression of transforming growth factor-β (TGF-β) target gene sets including genes involved in EMT, and this subgroup was correlated with early recurrence. In another recent study analyzing EMT markers in HCC, the protein expression levels of E-cadherin, Snail (SNAI1), Slug, and Twist were evaluated by immunohistochemistry in 123 HCC samples and a significant association of Snail and Twist on prognosis was revealed.(13) Thus, we hypothesized that the gene expression profiling of EMT markers in a large number of HCC patients could provide a basis for prognostic predictors of patient outcomes.
In the present study, to construct a reliable prognostic gene signature that could identify HCC patients with a high risk of death, we examined the expression of twelve genes related to EMT by quantitative real-time polymerase chain reaction (PCR). Four genes (E-cadherin [CDH1], inhibitor of DNA binding 2 [ID2], matrix metalloproteinase 9 [MMP9], and transcription factor 3 [TCF3]) were selected as highly predictive of survival in the training cohort of 128 patients. The four-gene signature was positively validated in an independent cohort of 231 patients from three institutions. Thus, the novel four-gene signature may be useful to refine a patient’s prognosis and improve clinical management.
Materials and Methods
Patients and tissue samples.
The study comprised patient cohorts from three medical institutions. The training cohort included 128 randomly selected patients who underwent curative hepatectomy for primary HCC between 2001 and 2005 in the Department of Surgery, Samsung Medical Center (SMC), Korea. The validation cohort comprised three patient cohorts from three medical centers: 104 additional independent cases randomly selected from patients who underwent curative hepatectomy for primary HCC between 2001 and 2005 at the SMC, 94 randomly selected cases from patients who underwent curative hepatectomy for primary HCC between 1995 and 2004 at Ajou University Medical Center (AMC), and 33 randomly selected cases from patients who underwent curative hepatectomy for primary HCC between 2001 and 2004 at Hanyang University Medical Center (HMC). Patient characteristics for patient cohorts are summarized in Table 1. The study protocol was approved by the Institutional Review Boards of SMC, AMC, and HMC. Complete clinical data were available in all cases, except for two patients with unknown hepatitis C virus (HCV) infection, one patient with unknown alpha fetoprotein (AFP) level, and three patients with unknown status of liver cirrhosis from AMC. All patients had adequate liver function reserve, and had survived for at least 2 months after hepatectomy. Recurrence or death was evaluated from medical records of patients. We defined the recurrence as evidence of an overt new growing mass in the remaining liver or as distant metastasis in radiologic studies including computed tomography or magnetic resonance imaging. None of the patients had received treatment prior to surgery such as transarterial chemoembolization or radiofrequency ablation. Immediately after hepatectomy, fresh tumors and background livers were partly snap-frozen in liquid nitrogen and stored at −80°C and were partly embedded in paraffin after fixation in 10% formalin for histological diagnosis. All available hematoxylin–eosin stained slides were reviewed. The tumor grading was based on the criteria proposed by Edmondson and Steiner (I, well differentiated; II, moderately differentiated; III, poorly differentiated; IV, undifferentiated).(14) The conventional TNM system outlined in the cancer staging manual (6th ed.) by the American Joint Committee on Cancer was used in tumor staging. The tumor size was obtained from the pathology reports.
Table 1. Clinical characteristics of the training and validation cohorts (N = 359)
Training cohort, SMC (n = 128)
Validation cohort, SMC (n = 104)
Validation cohort, AMC (n = 94)
Validation cohort, HMC (n = 33)
There are two patients with unknown hepatitis C virus (HCV) infection, one patient with unknown alpha fetoprotein (AFP) level, and three patients with unknown status of liver cirrhosis from Ajou University Medical Center (AMC). HBV, hepatitis B virus; HMC, Hanyang University Medical Center; SMC, Samsung Medical Center.
Follow-up period, months
RNA extraction and cDNA synthesis.
RNA extraction and cDNA synthesis were carried out as described previously.(15,16) Briefly, total RNA was extracted from cancerous and surrounding non-cancerous frozen tissues using an RNeasy minikit (Qiagen, Hilden, Germany). The integrity of all tested total RNA samples was verified using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). DNase I treatment was routinely included in the extraction step. Samples containing 4 μg of total RNA were incubated with 2 μL of 10 μm oligo d(T)18 primer (Genotech, Daejeon, Korea) at 70°C for 7 min and cooled on ice for 5 min. After adding the enzyme mix to the annealed total RNA sample, the reaction was incubated for 90 min at 42°C prior to heat inactivation of reverse-transcriptase at 80°C for 10 min. The cDNA samples were brought up to a final volume of 400 μL with the addition of diethylpyrocarbonate (DEPC)-treated water.
Quantitative real-time PCR.
Real-time PCR amplifications were carried out as described previously.(15,16) Briefly, using Applied Biosystems Prism 7900HT instruments (Applied Biosystems, Foster City, CA, USA), the real-time PCR analysis was performed in a total volume of 10 μL with the amplification steps described below: an initial activation step at 95°C for 10 min which was followed by 45 cycles of denaturation at 95°C for 15 s and elongation at 60°C for 1 min. The primer and probe sequences were designed using Primer Express 3.0 software (Applied Biosystems), and all probe sequences were labeled with FAM at the 5′ end and with TAMRA at the 3′ end (Table 2). The mRNA levels of target genes (CDH1, CDH2, ID2, MMP2, MMP9, TCF3, TWIST1, vascular endothelial growth factor A [VEGFA], SNAI1, SNAI2, zinc finger E-box binding homeobox 1 [ZEB1], and ZEB2) were measured (the threshold cycle, CT) in triplicate and were then normalized relative to a set of reference genes (beta-2-microglobulin [B2M], GAPDH, hydroxymethylbilane synthase [HMBS], hypoxanthine phosphoribosyltransferase 1 [HPRT1], and succinate dehydrogenase complex, subunit A, flavoprotein [SDHA]) by subtracting the average of the expression of the five reference genes as an internal control.(17) Using the ΔCT values (target gene CT– average CT of reference genes), the mRNA copy number ratio was calculated as 2−ΔCt. Standard curves were constructed from the results of simultaneous amplifications of serial dilutions of the cDNA samples.
Table 2. Oligonucleotide sequences of PCR primers and probes
Clinicopathologic variables of the training and validation cohorts were evaluated using a χ2-test and Fisher’s exact test (HCV infection). The gene expression data were normalized by means of the log2 transform. After transformation, results for each gene were centered and scaled to an average of 0 and an SD of 1. For the training cohort, univariate Cox regression analyses were run for each gene, as summarized in Table 3. Genes achieving P-values of <0.05 in the univariate Cox analyses were then entered as potential predictors of patient risk. The risk score was derived by the summation of each gene expression level multiplied by its corresponding regression coefficient.(18) The classification accuracy was measured by the area under the curve (AUC) of the receiver–operator curves (ROC). A multivariate Cox proportional hazard model was then used to identify independent prognostic factors for overall survival (OS). Kaplan–Meier survival curves were calculated using tumor recurrence (defined as the first appearance of a tumor at any site following definitive treatment) or death as the end points. The differences in OS curve or disease-free survival (DFS) curve were examined by log-rank test. Significant differences between gene expression levels for HCC and non-cancerous tissues were evaluated by a Student’s t-test. A two-tailed P-value test was used, with a P-value of <0.05 considered statistically significant. All statistical analyses were done with the open source statistical programming environment R.
Table 3. Univariate Cox regression analysis of OS according to gene expression in the training cohort
We performed quantitative real-time PCR for 12 genes (CDH1, CDH2, ID2, MMP2, MMP9, TCF3, TWIST1, VEGFA, SNAI1, SNAI2, ZEB1, and ZEB2) related to the EMT process from frozen paired samples derived from a training cohort of 128 patients with HCC. Expressions of these 12 genes were measured in triplicate and were then normalized relative to the expression of a set of reference genes (B2M, GAPDH, HMBS, HPRT1, and SDHA) as an internal control.(17) The log2-transformed gene expression levels were centered and scaled to an average of 0 and an SD of 1. Gene expression levels were correlated with OS by univariate Cox regression analysis and the genes were ranked according to their effect on OS (Table 3). Four genes were significantly correlated with the OS of patients: CDH1 and ID2 were protective genes (associated with a hazard ratio of less than 1), and MMP9 and TCF3 were risk genes (associated with a hazard ratio of more than 1). These genes were then profiled in 40 paired noncancerous hepatic samples from the training cohort. Protective genes were down-regulated and risk genes were up-regulated in HCC compared to noncancerous livers (Fig. 1).
A patient’s risk score was derived by the summation of each gene expression level multiplied by its corresponding coefficient, as follows: risk score = (−0.333 × CDH1) + (−0.400 × ID2) + (0.339 × MMP9) + (0.387 × TCF3), wherein CDH1, ID2, MMP9, and TCF3 refer to the log2-transformed and normalized results for each gene. The AUC of ROC showing prediction of patient survival by risk score was 0.772 (Fig. 2a). The cut-off value of the risk score (θ = 0.303) was determined from the ROC of patient survival in the training cohort with 62.5% sensitivity, 80.6% specificity, and 75.0% accuracy (Fig. 2a). The risk score was used to classify patients into high (>0.303) or low (<0.303) risk groups, where high risk indicated poor survival. At the 5- and 7-year follow-up, approximately 80% and 80% of the low-risk group survived, whereas 52% and 36% of the high-risk group survived, respectively (Fig. 2b). The log-rank test showed that patients with a high risk score had a significantly shorter OS time (P = 3.5 × 10−6). Univariate Cox analysis of clinicopathologic parameters revealed that tumor grade (P = 0.0042), AFP level (P = 0.0052), tumor size (P = 0.00058), tumor stage (P = 3.1 × 10−10), vascular invasion (P = 0.00077), and tumor number (P = 8.9 × 10−7) were significant prognostic factors for OS. The risk score was included in a multivariate Cox regression analysis with clinicopathologic parameters. The risk score (P = 0.0026) and tumor stage (P = 0.0023) emerged as independent prognostic factors (Table 4). When patients were divided into subgroups according to tumor stage, patients with a high risk score had a significantly shorter OS time for both stage I (P = 5.6 × 10−5) and stage III/IV tumors (P = 0.0097, log-rank test; Fig. 2c,d). Among the stage I patients, five out of eight high-risk patients died within the follow-up period, whereas one out of 40 low-risk patients died during follow-up, resulting in an accuracy of 92% for OS.
Table 4. Multivariate Cox regression analysis for OS in the training and validation cohort
Next, we tested the prognostic value of the four-gene risk score in independent cohorts of 104 cases from SMC, 94 cases from AMC, and 33 cases from HMC. The validation cohort from SMC contained a higher proportion of patients younger than 55 years (P = 0.017) and patients with hepatitis B virus (HBV) (P = 0.026) compared to the training cohort (Table 1). The validation cohort from AMC contained a higher proportion of patients with high tumor grade (P < 0.001), liver cirrhosis (P < 0.001), high tumor stage (P < 0.001), and large tumor size (P = 0.026) compared to the training cohort. The validation cohort from HMC contained a higher proportion of patients with high tumor grade (P < 0.001), high tumor stage (P = 0.023), and multiple tumors (P = 0.0063) compared to the training cohort. Consequently, the validation cohorts from AMC and HMC had worse prognosis compared to the training cohort from SMC. In the validation cohort, both the regression coefficients of risk score and the cut-off value derived from the training cohort were applied directly. For the SMC cohort, AUC of ROC showing prediction of patient survival by risk score was 0.743 (Fig. 3a) and patients with a high risk score (>0.303) had a significantly shorter OS time (P = 0.00064, log-rank test; Fig. 3b). The difference in OS time remained significant for a slightly higher cut-off value (P = 4.7 × 10−5, log-rank test; Fig. 3c). For the AMC cohort, AUC of ROC was 0.613 (Fig. 3d) and patients with a high risk score (>0.303) had a significantly shorter OS time (P = 0.016, log-rank test; Fig. 3e). The difference in OS time remained significant for a slightly higher cut-off value (P = 0.00036, log-rank test; Fig. 3f). However, for the HMC cohort, AUC of ROC was 0.68 (Fig. 3g) and patients with a high risk score (>0.303) had a shorter OS time, but the difference was not statistically significant (P = 0.17, log-rank test; Fig. 3h). For the whole validation cohorts (n = 231), AUC of ROC showing prediction of patient survival by risk score was 0.652 (Fig. 4a) and patients with a high risk score (>0.303) had a significantly shorter OS time (P = 0.00011, log-rank test; Fig. 4b). In addition, patients with a high risk score had a significantly shorter DFS time (P = 0.00038, log-rank test; Fig. 4c). Univariate Cox analysis of clinicopathologic parameters revealed that tumor grade (P = 7.1 × 10−6), AFP level (P = 2.1 × 10−5), liver cirrhosis (P = 0.0064), tumor size (P = 0.00017), tumor stage (P = 7.8 × 10−11), vascular invasion (P = 1.1 × 10−6), and tumor number (P = 3.6 × 10−10) were significant prognostic factors for OS. However, the risk score was not an independent prognostic factor in a multivariate Cox analysis with all the important clinicopathologic parameters. When the risk score was entered in a multivariate Cox analysis with tumor stage only, both the risk score (P = 0.0046) and tumor stage (P = 2.6 × 10−9) emerged as independent prognostic factors (Table 4). On the other hand, in a multivariate Cox analysis for the risk score treated as a continuous variable, the risk score (P = 0.012), liver cirrhosis (P = 0.0056), tumor number (P = 0.00094), and vascular invasion (P = 0.0073) emerged as independent prognostic factors (data not shown). When patients were further stratified into subgroups according to tumor stage, patients with a high risk score had a significantly shorter OS time (P = 0.049) and DFS time (P = 0.024, log-rank test) for stage III–IV tumors (Fig. 4d,e).
HCC is a highly heterogeneous disease, and even in patients with similar clinical and pathological features, the outcome varies. Staging systems for HCC that are based on clinical and pathological findings can be complemented by molecular methods that add more predictive power in patient outcomes. Gene-expression profiling with the use of microarrays or real-time PCR has been utilized to identify molecular classifications of patients with HCC.(19) However, the use of microarrays in clinical practice is limited by the large number of genes and relatively complex methodology involved.(20,21) On the other hand, quantitative real-time PCR involving a small number of genes allows for accurate and reproducible quantification of RNA obtained from both frozen tissues and paraffin-embedded tissues.(22,23) Thus, a gene signature based on real-time PCR may offer a more convenient clinical application.
Currently, there is no clear molecular classification of HCC.(19) In a study utilizing 91 HCC samples, a 406-gene signature could classify patients with significant differences in survival.(24) This gene signature revealed that transcripts related to cell proliferation, apoptosis, histone modification, and ubiquitination were important discriminators of patient survival. Subsequently, a subpopulation of patients with progenitor cell characteristics was found to be correlated with poor prognosis.(25) Another study utilized a 153-gene signature generated from 40 HCC patients to discriminate patients with different risk levels of death.(26) In addition, multiple gene signatures have been proposed to predict recurrence in HCC (12,(27) 20,(28) and 57 genes(29)). Gene expression signatures for predicting HCC prognosis may not be unique. Similarly, multiple gene expression signatures were developed for predicting prognosis of breast cancers including 21-gene,(30) 70-gene,(31) and 76-gene signatures.(32) While these gene signatures contained largely non-overlapping genes, the prognostic values were significant.
In this study, we evaluated 12 genes related to EMT processes and constructed a prognostic four-gene signature (CDH1, ID2, MMP9, and TCF3) for HCC. Not surprisingly, the prediction accuracy of the four-gene signature was best when applied to the validation cohort from SMC which was most similar in patient characteristics compared to the training cohort. The AUC of ROC were smaller in validation cohorts from AMC and HMC, and the four-gene risk score did not achieve statistically significant classification at the designated score threshold for the validation cohort from HMC. However, the prognostic value of the gene-expression signature was positively validated in the total validation cohort from three institutions. Multivariate analysis further strengthened the finding that the four-gene signature was an independent prognostic factor along with tumor stage, thus complementing traditional clinicopathologic parameters.
The four genes in our model are closely related to tumor invasion and metastasis. E-cadherin encoded by CDH1 is the most prominent epithelial marker as the main molecule of adherent junctions.(6) A decreased expression of E-cadherin in HCC has been reported(33,34) and correlated with poor prognosis.(13) ID2 encoded by the ID2 gene belongs to a helix-loop-helix family of proteins and represses EMT induced by TGF-β in epithelial cells.(35) Decreased ID2 expression was correlated with shorter DFS in HCV-related HCC patients.(36)ID2 was also found in the 57-gene signature for predicting HCC recurrence.(29) At the protein level, decreased ID2 expression was correlated with de-differentiation of HCC.(37) MMPs have been found to be up-regulated in EMT cells(38) but are also capable of inducing EMT.(39)MMP9 overexpression has been linked to the growth of small HCC(40,41) and elevated plasma MMP9 levels have been observed in patients with HCC.(42) Overexpression of MMP9 protein has reported to be correlated with poor prognosis of HCC patients.(43) Interestingly, the expression level of TCF3 was significantly associated with prognosis in our analysis, yet little is known about its relation to HCC. E12/E47 encoded by TCF3 and Twist encoded by TWIST1 are potent repressors of E-cadherin expression.(44,45) Expression of Twist has been reported to be significantly correlated with prognosis in HCC.(46) In another recent study analyzing EMT markers in HCC, a significant association of Snail and Twist on prognosis was revealed.(13) It is not clear why SNAI1 and TWIST1 were not a significant prognostic factor in our patient cohort. We hypothesize that TCF3 may play a regulatory role similar to TWIST1, as shown by its close correlation with prognosis in our patient cohort.
In conclusion, we found that the novel four-gene expression signature was associated with the prognosis of HCC patients. This signature could be useful in stratifying patients according to risk beyond traditional clinicopathologic parameters. Moreover, a quantitative real-time PCR assay is convenient in terms of the work load and is applicable for routine clinical use. Therefore, this new gene expression signature merits further study as a basis for selecting high-risk HCC patients.
This work was supported by intramural research funds from CbsBioscience, Inc (CBS-08-71).