Genome-wide DNA methylation profiles in hepatocellular carcinoma


  • Potential conflict of interest: Nothing to report.

  • This work was supported by the National Institutes of Health (grants R01 ES005116, P30 ES009089, P30 CA013696, and R03 CA150140.


Alterations in DNA methylation frequently occur in hepatocellular cancer (HCC). We have previously demonstrated that hypermethylation in candidate genes can be detected in plasma DNA before HCC diagnosis. To identify, with a genome-wide approach, additional genes hypermethylated in HCC that could be used for more accurate analysis of plasma DNA for early diagnosis, we analyzed tumor and adjacent nontumor tissues from 62 Taiwanese HCC cases using Illumina methylation arrays (Illumina, Inc., San Diego, CA) that screen 26,486 autosomal CpG sites. After Bonferroni adjustment, a total of 2,324 CpG sites significantly differed in methylation level, with 684 CpG sites significantly hypermethylated and 1,640 hypomethylated in tumor, compared to nontumor tissues. Array data were validated with pyrosequencing in a subset of five of these genes; correlation coefficients ranged from 0.92 to 0.97. Analysis of plasma DNA from 38 cases demonstrated that 37%-63% of cases had detectable hypermethylated DNA (≥5% methylation) for these five genes individually. At least one of these genes was hypermethylated in 87% of the cases, suggesting that measurement of DNA methylation in plasma samples is feasible. Conclusion: The panel of methylated genes indentified in the current study will be further tested in a large cohort of prospectively collected samples to determine their utility as early biomarkers of HCC. (HEPATOLOGY 2012;55:1799–1810)

Hepatocellular carcinoma (HCC) is a complex disease and is likely the result of the accumulation of both genetic and epigenetic aberrations. A number of mutations have been observed in HCC, most frequently in p53.1 Gene-expression studies have found profiles associated with survival, recurrence, and metastasis.2 These changes in gene expression may be related to gene-specific DNA hyper- or hypomethylation, as has been reviewed elsewhere.3

Most previous methylation studies looked at one or a few genes at a time,4-11 although 105 genes were analyzed in one study.12 Though reasonably consistent results have been observed across studies, the exact frequencies of hypermethylation in tumor tissues differ. CDKN2A/INK4 (p16) is methylated in 30%-70% of HCCs.13-16 RASSF1A is methylated in up to 85% of HCCs,15, 17 GSTP1 in 50%-90%,18-20 and MGMT in 40%.21 Our studies also observed that frequent methylation of particular genes correlated with aflatoxin B1 (AFB1)-DNA adduct levels in liver tissues.15, 16, 18, 21 We found correlations between gene-specific hypermethylation in tumor tissue and plasma DNA using blood collected at the time of diagnosis.16 Using samples from a prospective ∼25,000-subject cohort, we found that methylation of three genes (e.g., RASSF1A, CDKN2A, and INK4B [p15]) in plasma DNA was predictive of later HCC development.22 These previous studies used a candidate gene approach.

To identify additional differentially methylated genes with a genome-wide approach, we used Illumina Infinium HumanMethylation27K arrays (Illumina, Inc., San Diego, CA) to analyze 27,578 CpG sites covering 14,495 genes in paired HCC tumor and adjacent nontumor tissues. The aims of the current study were first to identify DNA-methylation markers that significantly differentiate tumor tissue from adjacent nontumor tissue and then to test the feasibility of detecting the hypermethylated markers in plasma samples and their correlations with relevant liver tissues. Because plasma DNAs are mostly derived from necrotic or apoptotic cells with little released from white blood cells, it is appropriate to use plasma to study circulating tumor DNA.23 Recently, three other studies have reported DNA-methylation profiles in HCC tumor/adjacent tissues using Illumina arrays. Two studies used Illumina 1,500 Golden Gate arrays on five paired samples from Korea and 30 from France and the third used Illumina Human Methylation27K arrays on 12 samples from Germany.24-26 A fourth earlier study used methylated CpG-island amplification microarrays to study 6,458 CpG islands in 10 paired samples from Japan.27 These previous studies differed in sample size, technology used, and the major etiologic cause (i.e., hepatitis B virus [HBV], hepatitis C virus [HCV], and alcohol). The current study is the largest to date and is comprised of Taiwanese cases who are predominantly HBV positive.


5-HT, 5-hydroxytryptamine; AFB1, aflatoxin B1; bp, base pairs; HBC, hepatitis B virus; HBsAg, hepatitis B surface antigen; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; NTU, National Taiwan University; PCR, polymerase chain reaction; QC, quality control; SD, standard deviation; TSS, transcription start site.

Patients and Methods

Patients and Biopsy Specimens.

This study was approved by the institutional review boards of Columbia University (New York, NY) and National Taiwan University (NTU; Taipei, Taiwan). Written informed consent was obtained. Sixty-six frozen liver tissues collected in the Department of Surgery at NTU Hospital were assayed. Demographic data and clinicopathologic characteristics were obtained from hospital charts; HBV (hepatitis B surface antigen; HBsAg) and HCV (anti-HCV) status were determined by immunoassay. For 39 subjects missing HCV status, liver tissues were stained with monoclonal antibody nonstructural protein 3 (Novocastra Laboratories Ltd., Newcastle upon Tyne, UK). Specimens were kept at −70°C until shipment to Columbia University, where pathologic analysis confirmed HCC status and indicated that adjacent tissues were primarily cirrhotic. Blood specimens were collected at the time of diagnosis for 30 patients and were plasma-frozen. Plasma from 8 additional cases from the same hospital was included in the analysis.

DNA Preparation and Illumina Infinium Human Methylation Platform.

DNA was extracted by standard proteinase K/RNase treatment and phenol/chloroform extraction. Plasma DNAs were extracted using DNeasy Blood and Tissue Kits (Qiagen, Valencia, CA). Bisulfite modification of 1 μg of DNA was conducted using an EZ DNA Methylation Kit (Zymo Research, Irvine, CA).

The HumanMethylation27 DNA Analysis BeadChips (Illumina) were used to interrogate 27,578 highly informative CpG sites covering 14,495 genes, following their standard protocol. Paired samples (e.g., HCC tumor and adjacent nontumor tissues) were processed on the same chip to avoid chip-to-chip variation; four pairs of tissues were repeat-assayed as a quality control (QC). Information on location of CpG sites in the promoter regions was provided by Dr. Kim.28

Pyrosequencing for Candidate Gene Methylation.

Pyrosequencing was carried out with primers designed with Pyromark Assay Design software (version 2.0; Qiagen). The region selected for interrogation included the CpG sites identified to be differentially methylated based on the array data, as well as surrounding sites. Polymerase chain reaction (PCR) was performed in a 25-uL reaction mix containing 50 ng of bisulfite-converted DNA, 1× Pyromark PCR Master Mix (Qiagen), 1× Coral Load Concentrate (Qiagen), and 0.3-uM forward and 5apos; biotinylated reverse primers, using the cycling conditions outlined in Supporting Table 1. Each set of amplifications included bisulfite-converted CpGenome universal methylated (Millipore, Billerica, MA), unmethylated (whole genome amplified DNA), and nontemplate controls.

The sequencing reaction and quantitation of methylation was conducted using a PyroMark Q24 instrument and software (Qiagen). Percent methylation was calculated by averaging across all CpG sites interrogated. A plasma DNA sample was considered positive if percent methylation was ≥5%, because lower values are not reliable.29

Statistical Methods.

β-values were generated using the Illumina BeadStudio software.30 Sites on the sex chromosomes were removed from the analysis, leaving 26,486 autosomal sites. For QC, methylation measures with a detection P value >0.05 and samples with CpG coverage <95% were removed. This eliminated four pairs, with a final sample size of 62 paired tissues. For these samples, the control panel in the BeadStudio analytical software showed excellent intensity for staining (>15,000), clear clustering for the hybridization probes, good target-removal intensity (<400), and satisfactory bisulfite conversion.31 Demographic data for the 62 patients are presented in Table 1.

Table 1. Clinical and Pathological Characteristics of 62 HCC Patients With Tissue DNA Methylation Array Data and 8 Additional Cases With Only Plasma DNA Data
CharacteristicsCases With Array DataCases With Only Plasma Data
Variables(Mean ± SD)(Mean ± SD)
  • *

    Age missing for 5 subjects.

  • Data missing for 1 subject.

Age at diagnosis (years)52.2 ± 14.2*53.0 ± 11.5
GenderN (%)N (%)
 Male54 (87)7 (100)
 Female8 (13)0
Viral status  
 HBV (−) and HCV (−)7 (11)0
 HBV (+) and HCV (−)36 (58)7 (88)
 HBV (−) and HCV (+)6 (10)0
 HBV (+) and HCV (+)13 (21)1 (12)
Cigarette smoking  
 No25 (40)2 (25)
 Yes24 (39)4 (50)
 Missing13 (21)2 (25)
Alcohol drinking  
 No41 (66)8 (100)
 Yes8 (13)0
 Missing13 (21)0
AFB1-DNA adducts in tumor tissues  
 High to medium37 (60) 
 Low25 (40) 
AFB1-DNA adducts in nontumor tissues  
 High to medium15 (24) 
 Low15 (24) 
 Missing32 (52) 

Paired t tests with Bonferroni's correction for multiple testing were used to identify CpG sites that were differentially methylated between tumor and adjacent nontumor tissues. A significant difference was defined as sites with a Bonferroni-corrected P value ≤0.05. A volcano plot displayed mean DNA-methylation differences for all 26,486 CpG sites. A Manhattan plot displayed the significance (−log10 [adjusted P value]) of the associations by chromosomes.

To select genes for validation of the methylation array data, we focused on hypermethylation because our long-term goal is to detect hypermethylated plasma DNA for early diagnosis of HCC. Candidate CpG sites were selected for confirmatory analysis with two methods. In method A, we required that (1) the mean difference in methylation levels between tumor and adjacent tissues would be ≥20%, (2) ≥70% of the tumor tissues had methylation levels greater than 2 standard deviations (SDs) above the mean methylation level of all 62 adjacent tissues, and (3) the mean methylation level for adjacent tissues would be ≤25%. In method B, we conducted 3-fold cross-validation, where we randomly chose 40 of 62 pairs to form a training set and the remaining 22 pairs as a testing set. We then repeated the paired t test using the training set and selected the top 100 most significant CpG sites with the following loosened three criteria to ensure selection of enough candidate CpG sites at each cross-validation: (1) the mean difference in methylation levels between tumor and adjacent tissues was ≥20%; (2) ≥60% of the tumor tissues had methylation levels greater than 2 SDs above the mean methylation level of the 40 adjacent tissues; and (3) the mean methylation level for adjacent tissues was ≤40%. We repeated the 3-fold cross-validation 1,000 times and selected the top most frequently selected CpG sites with the same number as in the list using method A. We then applied three different prediction methods—diagonal linear discriminant analysis, support vector machines, and k-nearest neighbor—to determine the prediction accuracy of the selected panel (method B) using data on the remaining 22 pairs.32

The hierarchical clustering of the methylation data was performed with the top 1,000 most significantly differentially methylated sites and with the two selected panels of CpG sites using methods A and B. Gene-ontology analysis was performed by the PANTHER classification system ( to compare the significant methylated gene lists with the reference (National Center for Biotechnology Information, human genome build 36).33 The binomial test was used to identify significantly enriched pathways, biological processes, molecular functions, cellular components, and protein class terms after Bonferroni's correction for multiple comparisons, with a cutoff of p ≤ 0.05.

To investigate whether methylation levels are affected by HCC risk factors, such as HBsAg status, HCV status, cigarette smoking (i.e., ever/never), alcohol consumption (i.e., ever/never), AFB1-DNA adduct level, and gender within tumor and adjacent nontumor tissues separately, we used a two-sample t test with Bonferroni's correction for multiple testing.

In the second-stage confirmatory analysis, Pearson's correlations between methylation levels using Illumina arrays and pyrosequencing on selected sites were calculated.

All analyses were conducted using the R language (


Clinical and Pathological Characteristics of the HCC Cases.

Clinical and pathological characteristics are described in Table 1. Almost 90% of cases were male and 79% were HBsAg positive. Approximately 31% of subjects were positive for HCV. Seven subjects were negative for both HBV and HCV, 36 subjects were positive for HBsAg and negative for HCV, 13 subjects were both HBV and HCV positive, and the remaining 6 subjects were negative for HBsAg and positive for HCV. Thus, viral infection, primarily HBV, was the major risk factor in this population. The average age at HCC diagnosis was 52.2 ± 14.2 years. Approximately 40% of cases smoked and 13% consumed alcohol, but data were missing for approximately 20% of subjects. AFB1-DNA adducts, measured previously in all tumor tissues and in approximately half of the adjacent tissues,34, 35 are also summarized in Table 1.

Reproducibility of Methylation Array Data.

Reproducibility of the Illumina platform was evaluated using replicates of four paired samples on a different day. High concordance was observed for all eight replicates, with coefficients of determination (R2) ranging from 0.96 to 0.98. A representative example of the concordance between two replicates for an adjacent tissue sample is given in Supporting Fig. 1 and is consistent with previous studies.24, 26 The site-by-site comparisons across the 26,486 sites between the four pairs of replicates gave absolute mean differences in methylation level ranging from 0.003 to 0.04.

Methylation Profiles Differentiate HCC Tumor From Nontumor Tissues.

Methylation levels of the individual 26,486 autosomal CpG sites as well as the overall means were compared between the 62 pairs of tissues. There were 2,324 CpG sites that significantly differed in methylation level between tumor and nontumor tissues after Bonferroni's adjustment (for a complete list, see Supporting Tables 2 and 3). Among all significant CpG sites, 684 were significantly hypermethylated (covering 548 genes) and 1,640 were significantly hypomethylated (covering 1,290 genes) in tumor, compared to nontumor, tissues. Figure 1 displays mean DNA-methylation differences between the 62 paired tumor/adjacent tissues at all 26,486 CpG sites using a volcano plot. Both hyper- and hypomethylation alterations are common events in HCC tumor tissues. The top 20 hyper- or hypomethylated sites ranked by statistical significance are given in Table 2. Regardless of whether they were hypo- or hypermethylated, all significant CpG sites had similar mean methylation levels in tumor tissues (42.2% versus 42.9%), whereas the mean methylation levels in nontumor tissues were dramatically different (26.0% for hypermethylated versus 58.4% for hypomethylated sites). Figure 2 shows the heatmap of the top 1,000 CpG sites (based on statistical significance) distinguishing tumor from adjacent tissues. In general, good separation of tumor and adjacent tissues was observed, with a small amount of misclassification.

Figure 1.

Volcano plot for differential DNA methylation analysis of 62 paired HCC tumor and adjacent nontumor tissues. The x-axis shows the mean DNA-methylation difference, whereas the y-axis shows the –log10 of the adjusted P value for each CpG site, representing the strength of association. Above the dashed line indicates statistically significant (P ≤ 0.05) after Bonferroni's correction.

Figure 2.

Hierachical cluster analysis of top 1,000 significantly differentially methylated CpG sites between 62 tumor and adjacent tissues. Blue represents adjacent nontumor tissue, and red represents tumor tissue. Blue represents adjacent nontumor tissue, and red represents tumor tissue.

Table 2. Top 20 Ranked Hypermethylated or Hypomethylated Genes*
Hypermethylated in TumorHypomethylated in Tumor
GenesTarget IDMean in TumorMean in NontumorMean DifferenceAdjusted P ValueGenesTarget IDMeans in TumorMean in NontumorMean DifferenceAdjusted P Value
  • *

    Top 20 hyper- or hypomethylated genes in HCC tumor tissues, compared to adjacent nontumor tissues ranked by statistical significance.

  • After Bonferroni's adjustment.


Characteristics of Significant CpG Sites in HCC Tissues.

A Manhattan plot was used to display the −log10 (adjusted P value) for the differences in methylation by chromosome (Supporting Fig. 2) and indicates that aberrant methylation is spread across all chromosomes. Among the 2,324 significantly differentially methylated CpG sites, >80% (82.3% and 85.8% for hyper- and hypomethylated sites, respectively) had a >10% absolute tumor/nontumor difference in percent methylation, and >50% had a >15% difference (Supporting Table 4). These data indicate that the methylation changes occurring during HCC development are robust and may provide useful biomarkers.

The majority of the significantly differentially methylated CpG sites are located within the proximal promoter regions. Among the 2,324 significant CpG sites, the distances to the transcription start site (TSS) ranged from 0 to 1,498 bp (base pairs), with an average of 407 bp and an SD of 362 bp. Hypermethylated CpG sites are more common within a short distance of TSS (50.7% within 250 bp and 26.9% between 250 and 500 bp), compared to hypomethylated sites (41.6% and 23.3%, respectively) (Supporting Fig. 3). The average distance to the TSS was significantly shorter for hypermethylated (mean = 332 bp; SD = 312 bp), compared with hypomethylated, sites (mean = 437 bp; SD = 377 bp; P = 3.95 × 10−10). Within CpG islands, more sites were significantly hypermethylated in tumors, whereas within non-CpG island regions, more sites were significantly hypomethylated in tumors (Supporting Table 5; Supporting Fig. 4). The pattern did not vary whether the sites were in promoter regions or not.

Through PANTHER ontology analysis, we found 12 significant pathways for hypermethylated and 11 pathways for hypomethylated genes (Supporting Table 6). A number of potentially important cellular pathways involved in tumorigenesis were observed, such as the pathways of heterotrimeric G-protein signaling, endothelin signaling, phosphoinositide-3 kinase, interleukin signaling, and inflammation mediated by chemokine/cytokine signaling and insulin/insulin growth factor, and so on. For the first time, Wnt and 5-hydroxytryptamine (5-HT)4-type receptor-mediated signaling pathways were identified.

Methylation Profiles Altered by HCC Risk Factors.

A two-sample t test was used to compare methylation levels among tumor and adjacent tissues separately for several HCC risk factors. No site was identified that was significantly differentially methylated by gender, HBV status, HCV status, or AFB1-DNA adduct levels (i.e., high/medium versus low) (data not shown). However, the results may be partially caused by small numbers of females, viral status, and missing adduct data in some adjacent tissues. For alcohol consumption status, within adjacent tissues, methylation level at one CpG site in VPREB1 significantly differed between drinkers and nondrinkers, whereas within tumor tissues, seven CpG sites in CRISPLD1, PCDHB2, PCSK1, LXH1, KCTD8, TSHD3, and CXCL12 were identified after Bonferroni's adjustment. Further unsupervised hierarchic cluster analysis clearly suggested an even better separation of drinkers from nondrinkers using the top differentially methylated sites among tumor tissues (Supporting Fig. 5A), compared to nontumor tissues (Supporting Fig. 5B).

Selection of Candidate Genes and Validation of Methylation by Pyrosequencing.

To select the list of candidate CpG sites for confirmatory analysis, method A with the complete data set of 62 pairs resulted in a list of 24 sites in 18 genes (Supporting Table 7). The heatmap of the selected 24 CpG sites shows good separation of tumor and adjacent tissues in general (Supporting Fig. 6). Method B, based on 1,000 three-fold cross-validations of training sets with 40 pairs, resulted in a list of 24 top CpG sites that were most frequently selected (all ≥98% of times of 1,000 three-fold cross-validations) (Table 3). The two panels of 24 CpG sites had 20 overlapping sites (Table 3; Supporting Table 7). Figure 3 shows the heatmap of the selected 24 CpG sites using method B. The two heatmaps show similar separations. Using the testing set, the selected panel of 24 CpG sites (method B) had high prediction accuracy in the testing set: 0.886 (SD = 0.044) based on diagonal linear discriminant analysis, 0.918 (SD = 0.044) based on support vector machines, and 0.877 (SD = 0.038) based on k-nearest neighbor. This suggests that the selected list of 24 CpG sites using the 3-fold cross-validation for second-stage confirmatory analysis is robust. Furthermore, compared to Fig. 2, which displays the top 1,000 differentially methylated sites with both hyper- and hypomethylated, almost the same set of tumor tissues were misclassified.

Figure 3.

Hierachical cluster analysis of the selected 24 hypermethylated CpG sites selected by 1,000 three-fold cross-validations. Blue represents adjacent nontumor tissue, and red represents tumor tissue.

Table 3. Selected 24 CpG Sites Based on 1,000 Three-Fold Cross-validations Distinguishing HCC Tumors From Adjacent Tissues With Their Mean Methylation Levels in Tumor and Adjacent Tissues and the Differences*
SymbolTarget IDConsensus FrequencyMean β in TumorMean β in AdjacentMean β DifferenceAdjusted P ValueFunction
  • *

    Sites were selected based on a mean difference in methylation levels between tumor and adjacent tissues of at least 20%; more than 60% of tumors had methylation levels greater than 2 SDs above the mean in all nontumor tissues and with mean levels of methylation in adjacent tissues <40%.

  • After Bonferroni's adjustment.

BMP4cg143100341,0000.550.140.415.36E-16Bone morphogenetic protein 4
C6orf206cg046006189850.440.160.284.08E-09Radial spoke head 9 homolog
CCDC37cg008912789800.600.310.295.07E-12Coiled-coil domain containing 37
CDKN2Acg090997441,0000.500.080.428.69E-14kinase inhibitor
CFTRcg255091841,0000.590.310.284.23E-11Cystic fibrosis transmembrane conductance regulator
DAB2IPcg056848911,0000.610.210.403.58E-17GTPase-activating protein
DNM3cg233917859930.550.180.362.75E-10Dynamin 3
HIST1H3Gcg029097901,0000.490.170.325.28E-10H3 histone family
HIST1H3Jcg177183021,0000.370.050.312.36E-08H3 histone family
NKX6-2cg092600891,0000.430.130.302.59E-10NK6-related transcription factor
PBX4cg199963559990.470.150.322.46E-09Pre-B-cell leukemia homeobox 4
RAB31cg179821029880.380.140.247.14E-10Member Ras oncogene family
SPDY1cg047868571,0000.550.190.361.79E-14Speedy homolog 1
STEAP4cg005641639990.490.190.311.68E-09Tumor necrosis factor
ZFP41cg126806091,0000.510.120.393.25E-15Zinc finger protein 41 homolog
ZNF154cg086687901,0000.510.120.394.67E-12Zinc finger
ZNF154cg217906261,0000.510.080.431.14E-12protein 154
ZNF540cg039756941,0000.530.240.302.03E-12Zinc finger protein 540

Because previous studies have found a good correlation between Illumina array percent methylation and that by pyrosequencing,28, 36, 37 we randomly selected just five genes (e.g., CDKL2, STEAP4, HIST1H3G, CDKN2A, and ZNF154) from the top 18 candidates for validation in 42 paired tissues. Data were analyzed by looking at the correlation between array and pyrosequencing data for both the specific CpG site on the array as well as the mean of all the CpG sites analyzed by pyrosequencing. Excellent correlations were found between array data and pyrosequencing results for both specific sites and the mean of all CpG sites ranging from 0.921 to 0.971 (Table 4; Supporting Fig. 7).

Table 4. Correlations Between Array Data and Pyrosequencing Data Within 86 Tumor and Nontumor Tissues at the Specific CpG Sites on the Array As Well As the Mean of All CpG Sites Per Gene Aassayed
GeneArray CpG SiteMean of All CpG Sites (No. of CpG Sites)
STEAP40.9360.940 (4)
CDKL20.9420.944 (3)
CDKN2A0.9380.925 (5)
HIST1H3G0.9540.921 (5)
ZNF1540.9710.967 (3)

Analysis of Methylation of Candidate Genes in Plasma Samples.

We next determined the feasibility of measuring methylation in the five randomly selected genes in plasma DNA available for a subset of 30 of the cases with tissue data plus eight plasma samples from additional cases. The characteristics of these additional 8 cases are similar to those 62 with tissue data (Table 1). The success rate of pyrosequencing ranged from 63% to 100% (Table 5). Detailed data on percent methylation for each sample is given in Supporting Table 7. The frequency of hypermethylated DNA in plasma (defined as methylation level by pyrosequencing ≥5%) ranged between 37% and 63% (Table 5). With available data, 33 (87%) subjects had at least one gene positive, whereas 2 subjects had all five genes. However, data were complete for only 20 (53%) subjects. Five subjects were negative for all genes, but none had complete data.

Table 5. Plasma DNA Methylation Using Pyrosequencing and Number of Samples With Available Pyrosequencing Data
GeneSamples Positive* (N; %)Samples With Data (N; %)
  • *

    Defined as ≥5% methylation.

  • Based on successful pyrosequencing analysis.

CDKL214 (37)38 (100)
CDKN2A13 (48)27 (71)
HIST1H3G9 (38)24 (63)
STEAP420 (63)32 (84)
ZNF15416 (47)34 (89)


We screened 62 paired tumor and adjacent tissues at 26,486 autosomal CpG sites. After Bonferroni's adjustment, we found 2,324 CpG sites to significantly differ in methylation level; 684 were significantly hypermethylated and 1,640 were significantly hypomethylated. Because our aim was to identify methylation biomarkers in plasma DNA, mostly derived from necrotic or apoptotic cells,23 for early identification of HCC in high-risk populations, we limited further study to hypermethylated sites. To select candidate CpG sites for confirmatory analysis, we used both the full data set and a training set of 40 pairs from the 3-fold cross-validation. Two panels of 24 hypermethylated CpG sites in 18 genes with 20 CpG sites overlapping were selected. This suggests that the selected panel of CpG sites based on the training set from the 3-fold cross-validation was robust. Further analysis of prediction accuracy using the testing data with 22 pairs suggested good prediction power in the testing set to separate tumor and adjacent nontumor tissues. With the largest sample size thus far, we identified more significant CpG sites that differentiate tumor from adjacent tissue than previous methylation array studies in HCC.24-26

Only one previous study by Ammerpohl et al. reported the use of Illumina HumanMethylation 27K arrays to investigate DNA methylation in 12 HCC paired tissues; alcohol was likely the major etiologic agent for half of the cases.26 All 24 sites we selected (Table 3) were also identified by Ammerpohl et al. as being significantly hypermethylated in HCC, compared to cirrhosis. Among all sites they identified as having a >20% difference in methylation, there was an overlap of 823 sites (63%) with our significant sites. These overlapping sites were 100% consistent in the direction of the methylation change. The magnitude of methylation levels was also significantly correlated (R2 from 0.76 to 0.99; P < 0.0001). In addition to identifying two novel pathways (Wnt and 5-HT4-type receptor-mediated signaling), 10 cellular pathways overlapped with those identified by Ammerpohl et al.

Two other studies have used the Illumina 1,500 Golden Gate Methylation Assay to evaluate five paired samples from Korea25 and 30 from France.24 In the Korean study, 24 new genes were identified as significantly hypermethylated in tumor.25 Nine genes (ADCYAP1, FLT3, HOXA9, IRAK3, MLF1, NPY, SH3BP2, TAL1, and TNFRSF10C) were also significantly hypermethylated in our tumor tissues. The remaining genes were nonsignificantly weakly hypermethylated in our tumors, except for HIC2, NOTCH3, and PTCH2, which showed no hypermethylation. These three genes were also not hypermethylated in Ammerpohl et al.26 and thus were unlikely to be significantly hypermethylated in HCC. The second study24 identified 27 genes as hypermethylated. Fourteen genes overlap with those we identified, including APC, BMP4, CDKN2A, F2R, FLT4, GSTP1, HOXA9, IGF1R, IRAK3, MYOD1, RASSF1, SH3BP2, TERT, and ZMYND10 (Supporting Table 2). Ninety-six of their one hundred and twenty-four significant CpG sites overlap with ours, with 92% consistency in the direction of methylation change.

Using pyrosequencing, we confirmed methylation data for the five genes analyzed. Array data were highly correlated with both the specific CpG site and the mean of the three to five CpG sites assayed within a gene (Table 4; Supporting Fig. 7).

We attempted to determine whether methylation changes in specific CpG sites were associated with certain risk factors, such as gender, viral infection, alcohol consumption, and AFB1-DNA adduct levels. We identified sites that differed significantly after Bonferroni's adjustment only for alcohol consumption. However, these results did not match previous data.24 Most of our cases were virus infected, whereas the previous study was able to look at noninfected cases in which alcohol was the major risk factor. This may explain the discrepant results. Data on survival were not available for most of our cases, so we were unable to investigate methylation profile and survival.

We also determined whether methylation of a random subset of five genes could be detected in plasma DNA by pyrosequencing. Not all samples were successfully amplified for all five genes, with HIST1H3G having the lowest frequency of usable data (63%). This may be a result of the larger PCR product for this gene (248 bp), compared to the other four genes (<200 bp). Future studies should consider PCR product size when designing pyrosequencing assays for plasma DNA. Using ≥5% methylation as the cutoff for positivity, the frequency of positive plasma DNA samples ranged from 37% to 63%. When any one gene positive was used to define a positive case, 87% were positive. These results, in conjunction with our previous study of plasma from controls,22 suggest that analysis of plasma DNA is feasible and may be useful for the diagnosis of HCC. However, the quality of the bisulfite-treated plasma DNA will be a key component of a successful screening assay.

Among the strengths of our study is that it is the largest sample-size methylation-array study of HCC to date. Among the limitations is the lack of information on AFB1-DNA in adjacent nontumor tissue, for some cases. In addition, data on alcohol consumption and cigarette smoking were missing for approximately 20% of the cases. These missing data limited our ability to investigate relationships between methylation profiles and these factors. In addition, almost all our cases were infected with either HBV or HCV or both. Thus, we could not investigate the role of viral infection on methylation. Another limitation was the lack of healthy tissue from unaffected controls as a comparison group for our array studies. Our tumor adjacent tissues were primarily cirrhotic. Thus, we identified genes whose methylation was increased in progression from cirrhosis to HCC. Because our aim was to identify genes whose methylation is associated with HCC but not cirrhosis, this comparison is appropriate, but tells us nothing about progression from normal tissue. A limitation of our plasma DNA analysis is that only samples from cases were available. Thus, whereas the frequency of methylation was high, we have no data on controls. In our previous prospective study of plasma DNA analyzing three genes using methylation-specific PCR, we found 2 of 50 (4%) controls with CDKN2A methylation and comparable cases positive (44% versus 48%).22

In summary, we used genome-wide methylation arrays to identify genes methylated in HCC from primarily HBV-infected Taiwanese cases. Pyrosequencing of candidate genes validated the array data, and analysis of plasma DNA suggests that these genes may be appropriate to apply as biomarkers of early HCC diagnosis. We are in the process of testing custom arrays for analyzing larger numbers of CpG sites followed by pyrosequencing that can be applied to small amounts of plasma DNA. We will then use this methodology in our prospective study that includes HCC cases and controls, as required, to further determine the utility of this approach.


The authors thank Dr. Abby Siegel for careful reading of the manuscript for this article.