Elevated methylation of HPV16 DNA is associated with the development of high grade cervical intraepithelial neoplasia

Authors

  • Lisa Mirabello,

    Corresponding author
    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    • Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 6120 Executive Blvd., EPS/7101, Rockville, MD 20892, USA
    Search for more papers by this author
    • Tel.: +301-451-9733, Fax: +301-496-1854

  • Mark Schiffman,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    Search for more papers by this author
  • Arpita Ghosh,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    Search for more papers by this author
  • Ana C. Rodriguez,

    1. Proyecto Epidemiológico Guanacaste, Fundación INCIENSA, San José, Costa Rica
    Search for more papers by this author
  • Natasa Vasiljevic,

    1. Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary, University of London, London, United Kingdom
    Search for more papers by this author
  • Nicolas Wentzensen,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    Search for more papers by this author
  • Rolando Herrero,

    1. Proyecto Epidemiológico Guanacaste, Fundación INCIENSA, San José, Costa Rica
    Search for more papers by this author
  • Allan Hildesheim,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    Search for more papers by this author
  • Sholom Wacholder,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
    Search for more papers by this author
  • Dorota Scibior-Bentkowska,

    1. Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary, University of London, London, United Kingdom
    Search for more papers by this author
  • Robert D. Burk,

    1. Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY
    2. Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY
    3. Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY
    4. Department of Obstetrics, Gynecology and Women's Health, Albert Einstein College of Medicine, Bronx, NY
    Search for more papers by this author
  • Attila T. Lorincz

    Corresponding author
    1. Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary, University of London, London, United Kingdom
    • Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary, University of London, London, United Kingdom
    Search for more papers by this author
    • Tel.: +44-20-7882-3540, Fax: +44-20-7882-3890


  • This article was published online on 20 August 2012. An error was subsequently identified. This notice is included in the online and print versions to indicate that both have been corrected 7 September 2012.

Abstract

We explored the association of human papillomavirus type 16 (HPV16) DNA methylation with age, viral load, viral persistence and risk of incident and prevalent high grade CIN (CIN2+) in serially collected specimens from the Guanacaste, Costa Rica cohort. 273 exfoliated cervical cell specimens (diagnostic and pre-diagnostic) were selected: (1) 92 with HPV16 DNA clearance (controls), (2) 72 with HPV16 DNA persistence (without CIN2+) and (3) 109 with CIN2+. DNA was extracted, bisulfite converted and methylation was quantified using pyrosequencing assays at 66 CpGs across the HPV genome. The Kruskal-Wallis test was used to determine significant differences among groups, and receiver operating characteristic curve analyses were used to evaluate how well methylation identified women with CIN2+. In diagnostic specimens, 88% of CpG sites had significantly higher methylation levels in CIN2+ after correction for multiple tests compared with controls. The highest area under the ROC curve (AUC) was 0.82 for CpG site 6457 in L1, and a diagnostic sensitivity of 91% corresponded to a specificity of 60% for CIN2+. Prospectively, 17% of CpG sites had significantly higher methylation in pre-diagnostic CIN2+ specimens (median time of 3 years before diagnosis) versus controls. The strongest pre-diagnostic CpG site was 6367 in L1 with an AUC of 0.76. Age-stratified analyses suggested that women older than the median age of 28 years have an increased risk of precancer associated with high methylation. Higher methylation in CIN2+ cases was not explained by higher viral load. We conclude that elevated levels of HPV16 DNA methylation may be useful to predict concurrently diagnosed as well as future CIN2+.

Persistent infection with human papillomavirus type 16 (HPV16) leads to the majority of cervical cancers and related cervical precancer.1, 2 However, neoplastic progression is an uncommon outcome,3, 4 and identifying biomarkers that distinguish HPV infections that clear spontaneously from those that progress to cervical precancer, or that detect prevalent precancer, has been the subject of considerable effort.5

HPV16 DNA methylation has recently been shown to be associated with the risk of precancer, i.e., cervical intraepithelial neoplasia grade 2 or worse (CIN2+) (6–15). Although some of the published literature has reported elevated methylation at specific CpG sites in CIN2+,6–13 there has been considerable variation in methylation levels within outcome groups and inconsistent conclusions (reviewed in Mirabello et al.)12 The two studies examining the most sites within the HPV16 genome had limited sample sizes of <20 women and found elevated methylation associated with CIN2+.11, 13 Larger studies (>50 women) have only examined a limited number of CpG sites, mostly within the upstream regulatory region (URR); the largest study to date, 121 women, examined 16 URR CpG sites and found decreased methylation associated with progression.14 One prospective study15 examined six URR CpG sites that showed higher methylation with a lower risk of CIN2. These variable reports may be related to the study of different CpG sites including different E2 binding sites or to methodological differences.

We recently assessed HPV DNA methylation using an accurate and quantitative method in a large nested case–control and prospective epidemiological study12 in the Guanacaste, Costa Rica cohort. We found that methylation of several CpG sites within the HPV16 L1, L2 and E2-E4 ORFs were associated with infection outcome. Higher methylation at each of these CpG sites was associated with an increased risk of CIN3 compared to women who cleared their HPV16 infections. The combined effect of having increased methylation at specific CpG sites in L1, L2 and E2–E4 was associated with an odds ratio of 52 [95% confidence interval (CI): 4.0–670] for CIN3 compared to low methylation at all three of these sites.12

Here we include results of additional exploration of methylation in the Guanacaste, Costa Rica cohort beyond what we reported in the previous report.12 We have enlarged the sample size by using more longitudinal and serial specimens, and adding CIN2 and invasive cancer cases and controls. In this report, we further explore the association between methylation and infection outcome, assess methylation changes over time and evaluate the performance of HPV16 DNA methylation to distinguish women who will develop CIN2+ from women that will clear their HPV16 infections without progression. We additionally explored the effects of age and viral load on DNA methylation.

Abbreviations

ALTS: ASCUS-LSIL Triage Study; AUC: area under the ROC curve; CI: confidence intervals; CIN: cervical intraepithelial neoplasia; CIN2+: CIN grade 2 or worse; HPV16: human papillomavirus type 16; ICC: intra-class correlation coefficients; OR: odds ratio; ROC: receiver operating characteristic; STM: standard transport medium; URR: upstream regulatory region

Material and Methods

Study population

Relevant details of the Guanacaste, Costa Rica cohort have been described previously.12 In brief, the cohort is population-based and the 10,049 participants were recruited for screening and follow-up between June 1993 and December 1994 as part of a natural history study of HPV infection and cervical neoplasia of women aged 18+ years.16 The participation rate was 93.6% and loss to follow-up was <10% over 7 years.17 The study protocol was reviewed and reapproved annually by National Cancer Institute and Costa Rican Institutional Review Boards.

Women were referred to colposcopy if they had abnormal cytology, abnormal direct visual examinations or the appearance of severe cervical abnormalities on review of their Cervigrams (magnified digital images of the cervix), as previously detailed.17 HPV16 infections were identified in 503 women. Swab-derived cervical cell specimens,16 previously documented to contain HPV16 DNA, were selected from 205 women: (1) 100 with HPV16 DNA clearance (in <2 years; controls); (2) 38 with HPV16 DNA persistence (2+ years) without observed progression to CIN2+ (persistence); (3) 22 with HPV16 infection and CIN2; (4) 31 with HPV16 infection and CIN3 and (5) 14 women with HPV16 infection and cervical cancer. Final diagnosis pathology reports and HPV genotyping were confirmed for all study participants. Controls were chosen from a set of 300 available controls to match the age distributions and proportion of HPV16 variant lineages (European, non-European) in the case outcome groups, and we analyzed the last HPV16-positive sample collected before HPV16 clearance. For the case outcome groups, we analyzed specimens collected at two time points (diagnostic and pre-diagnostic). We analyzed 36 HPV16 samples collected at the last HPV16-positive screening visit (referred to as diagnostic samples) from women with persistence, and 16 HPV16 samples from the screening visit closest to diagnosis of CIN2 (median time before diagnosis = 0 months), 28 from CIN3 (median time before diagnosis = 3 months) and 13 from cancer (median time before diagnosis = 0 months). We also analyzed samples collected at the first HPV16-positive screening visit in the case groups for those with these samples available (referred to as pre-diagnostic samples): 36 with persistence (median time before last HPV16-positive visit = 72 months), 17 with CIN2 (median time before diagnosis = 27 months), 16 with CIN3 (median time before diagnosis = 45 months) and three with cancer (median time before diagnosis = 34 months).

Seven women had multiple HPV16 specimens collected at successive screening visits before CIN3 diagnosis, and for these women we additionally analyzed all serial HPV16 samples collected before diagnosis (3–7 serial samples per women); there were 30 serial samples total for these seven women (16 additional serial samples plus the 14 pre-diagnostic and diagnostic samples for the seven women described above).

DNA isolation and bisulfite conversion

Genomic DNA was extracted from cervical specimens using 200 μl of the standard transport medium (STM) with a QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) as recommended by the manufacturer. DNA was quantified by UV absorption, yielding in total approximately 800 ng of DNA per specimen. 250 ng of DNA was used in the bisulfite conversion reactions where unmethylated cytosines were converted to uracil with the EZ DNA methylation kit (Zymo research, Irvine, CA) according to manufacturer's instructions. Converted DNA was eluted in 25 μl Buffer EB.

HPV16 DNA methylation assay

All methylation levels in the data shown here are from new measurements in the Lorincz laboratory, performed blindly with respect to CIN status and results from the Burk laboratory. Lorincz methylation assays were run subsequently to the Burk assays, and the QC from Burk did inform the Lorincz assay protocol; although no analysis results were shared.

Primer sets with one biotin-labeled primer were used to amplify the bisulfite converted DNA. New primers for 16 PCRs in E2-E4, L2, L1, URR, E6 and E7 covering 66 CpG positions were designed using PyroMark Assay Design software version 2.0.1.15 (Qiagen). Due care was taken to avoid any primer overlapping CG dyads to prevent amplification biases. To provide the internal control for total bisulfite conversion, a non-CG cytosine in the region for pyrosequencing was included where possible. PCRs were performed using a converted DNA equivalent of 1,500 cells employing the PyroMark PCR kit (Qiagen). The cell genome-equivalents of DNA calculations assumed 6.6 pg DNA per diploid cell. Briefly, 12.5 μl of PCR master mix, 2.5 μl Coral red, 1.2–1.5 μl of DNA, 1–2 μl of primer (7.5 pmol of each primer), sample and an optimized concentration of MgCl2 adjusted with water to give a final 25 μl reaction volume were combined and run at thermal cycling conditions: 95°C for 15 min and then 50 cycles: 30 sec at 94°C; 30 sec at the optimized primer-specific annealing temperature; 30 sec at 72°C and a final extension for 10 min at 72°C. Details on the primers, the amount of MgCl2 and the annealing temperatures used are given in Supporting Information Table 1. The amplified DNA was confirmed by the QiaExel capillary electrophoresis instrument (Qiagen). 10 μl of PCR product was pyrosequenced using a PyroMark™Q96 ID (Qiagen) instrument as previously described by Vasiljevic et al.18

The nucleotide positions for CpG sites in the L2 gene region were adjusted by two base pairs (added) to parallel the CpG positions reported in the previous related study12 by our team. This two base discrepancy is a result of using different primers and reference HPV16 DNA sequences. 58/66 CpG sites analyzed were included in the previous study.12

The final sample size was 273 specimens from 196 women after removal of specimens from nine women who failed to amplify. Twenty quality control (QC) samples were repeat tested blinded to disease outcome from bisulfite treatment through to methylation pyrosequencing assays to assess variability at 20 CpG sites in L1, URR, E6 and E7 regions. The individual intra-class correlation coefficients (ICC) were >90% at 18 of the 20 CpG sites evaluated, one was 83% in E7 and one was 14% in the URR (data not shown).

Viral load determination

Quantitative real time PCR was carried out in 20 μl reactions containing 10 μl of (2×) QuantiFast SYBR Green PCR Master Mix (Qiagen, Hilden, Germany), 0.8 μM of each F and R HPV16 E6 primer19 or 10 μM of each F and R GAPDH primer19 and 1μl of 1:20 diluted extracted DNA. Reactions were run on a Rotor-Gene Instrument (RG-6000, Corbett Research) with an initial denaturation step at 95°C for 5 min followed by 40 cycles of 95°C for 10 sec and 55°C for 30 sec for HPV16 E6 or 30 cycles of 95°C for 15 sec, 50°C for 20 sec and 72°C for 20 sec for GAPDH. Data acquisition at 510 nm was performed at 55°C and at 72°C, respectively. Standard curves for HPV were obtained by amplification of 10-fold dilution series of 107 to 101 copies of plasmid HPV16 in a fixed amount of 100 pg of human placental DNA (Sigma-Aldrich). The standard curve for GAPDH was obtained by amplification of 250, 25 and 2.5 ng and 250 and 25 pg of DNA. The amount of human DNA was then converted to number of cells by the assumption that 6.6 pg of DNA is present per diploid cell. Standard curves were generated from mean threshold cycle (CT) values of each dilution in triplicate, and mean CT values of samples in duplicate were used for quantity calculation. The amount of E6 and GAPDH genes was determined by linear interpolation of the crossing point (Cp) value using the equation of the regression line obtained from the correspondent absolute standard curves. Viral load was only estimated in controls, and CIN2 and CIN3 diagnostic specimens.

Replication and extension of previously reported analysis

Previously, we reported HPV16 methylation data from a smaller but overlapping set of specimens from women with CIN3, HPV persistence, or clearance (controls).12 As a prelude to our current larger analysis, we performed masked re-testing on specimens from 93 women (33 controls, 35 women with persistence and 25 women with CIN3); in addition, the current analysis includes previously unanalyzed specimens from 59 more controls, six more women with CIN3 (including 16 more pre-diagnostic specimens from women with CIN3). Also, to cover more completely the spectrum of cervical neoplasia, we have included 22 women with CIN2 and 14 with cervical cancer. This analysis, therefore, serves to both replicate and extend the previously reported work. Moreover, the two laboratories (RDB and ATL) were able to optimize further the bisulfite treatment and pyrosequencing assay for this extended analysis. The intra-laboratory reliability of our assay as performed in the Lorincz laboratory, taking advantage of additional optimization, was excellent with 90% of the ICCs >90%.

In terms of resultant inter-laboratory reliability, the methylation data for the replicated individuals and CpG sites was not highly correlated between the two data sets, as would be expected during assay optimization. The correlations were highest for CpG sites located in the L1 (median = 0.57), L2 (median = 0.58), and E2 (median = 0.63) regions.

Statistical methods

The non-parametric Kruskal-Wallis test was used since methylation was not normally distributed in the outcome groups to determine whether the proportion of methylation at each individual CpG site was associated with HPV infection outcome. To account for multiple testing, p values were corrected using the Benjamini-Hochberg method that adjusts expected proportions of false positives to less than the nominal level of 0.05.20 Logistic regression models were used to obtain the odds ratio (OR) and 95% CI for CIN2+ using the controls as the referent group for each individual CpG site. For obtaining ORs for a CpG site, methylation levels were dichotomized using the second tertile (66.7 percentile point), based on the distribution for that site in the controls. Women with methylation levels above the second tertile (high methylation) were compared to women with methylation levels below the second tertile (average to low methylation). To obtain the ORs for a combination of CpG sites a categorical variable representing the number of sites with % methylation in the top tertile (0; 1; 2; or 3) was created; we fit a logistic regression model with the categorical variable as the predictor of CIN2+ and then compared the odds of CIN2+ for women with any one of the three sites highly methylated (>2nd tertile of methylation), any two of the three sites highly methylated and for all three sites highly methylated versus women with average to low methylation (<2nd tertile) at all three sites. Analyses were performed using all diagnostic or pre-diagnostic specimens unless noted that only the specimens not in the previous study were included.12 For the age analyses, the median age was used to stratify the cases and controls to determine whether there was effect modification by age. Viral load was analyzed as a continuous variable and included as a potential confounder in the logistic regression models for each individual CpG site. Spearman rank correlations were used to investigate the associations between viral methylation levels at each CpG site and time before CIN3 diagnosis. Receiver operating characteristic (ROC) curve analyses, including calculation of area under the ROC curve (AUC), were used to evaluate the ability of methylation at individual CpG sites to separate women with CIN2+ from controls (clearance). Partial areas under the curve (pAUC) were estimated for a specificity of 40% or higher, and a correction was applied to have a maximal AUC of 1.0 and a non-discriminant AUC of 0.5.21 Analyses were performed using SPSS 15.0 and R software packages.

Results

We analyzed methylation data from 92 controls (clearance), 93 diagnostic and 72 pre-diagnostic persistence or CIN2+ case specimens, and 16 serial specimens from seven women who were eventually diagnosed with CIN3 (total = 273 samples from 196 women) (Table 1).

Table 1. Pre-diagnostic and diagnostic specimens and number of women in each outcome group from Guanacaste, Costa Rica between 1993 and 2001
inline image

High methylation associated with CIN2+ diagnosis

Diagnostic specimens

Ninety-four percent of CpG sites (62/66, excluding four sites in URR) had significantly higher methylation levels in CIN2+ cases compared to controls when including only specimens not in the previous study12 after correction for multiple tests. 87.9% of CpG sites (58/66) had significantly different methylation levels among the three outcome groups (controls, persistence, CIN2+) including all specimens, even after accounting for multiple tests, particularly in L1, E2–E4 and L2 with p-values of <10−5 (Supporting Information Table 2). Methylation levels were not significantly different between women with CIN2 and those with CIN3 (p > 0.05 for every CpG site; Supporting Information Fig. 1), so these outcome groups were combined. Methylation was high in the L2 and L1 gene regions and particularly for those with cancer (Fig. 1a). High methylation (top tertile of methylation) at all of the significant CpG sites was associated with an increased risk of CIN2+ (Table 2 and Supporting Information Table 2). The strongest associations were in L1 (nucleotide position 6457; OR: 11.5, 95% CI: 4.8–27.7) and L2 (nucleotide position 4261; OR: 20.8, 95% CI: 6.6–65.7).

Figure 1.

Median % methylation for the outcome groups at each CpG site for specimens collected at diagnosis (a) and 27–72 months (median time) before diagnosis (b). The legend indicates the color of each outcome group and the number of women in each group; women with CIN2 and CIN3 are combined because of their similarity in median % methylation. The x-axis indicates each individual CpG site by nucleotide position grouped by gene region.

Table 2. Tests of association and measures of predictive capacity for HPV16 CpG sites exhibiting diagnostic AUC ≥0.75 and pre-diagnostic AUC ≥0.7
inline image

There were no significant methylation differences among cytology of the controls (normal vs. ASCUS/LSIL, data not shown). If we stratified the CIN2-3 cases by co-infections status, the single HPV16 infections had slightly higher methylation levels at most CpG sites compared to the multi-HPV infections (data not shown).

Stratifying the cases and controls by the median age of 28 years, showed that the CIN2–3 cases (excluding 13 cancer cases since they were generally only in the older women) aged >28 years had higher methylation at most sites compared to those aged ≤28 years, and the controls aged >28 years had lower methylation at most sites (data not shown). A similar trend was observed in the women categorized as young, middle aged and older (aged 18–25, 26–45 and 46+ years) at most sites: in the controls, methylation tended to decrease with age, and in the cases, methylation tended to increase with age (data not shown). High methylation (top tertile of methylation) was associated with a stronger increased risk of CIN2-3 in women aged >28 years at most CpG sites (Fig. 2); in particular for the CpG site at nucleotide position 6457 in L1 (>28 years, OR: 17.6, 95% CI: 4.2–72.6), and there was a significant interaction between methylation and age group at L1 6457 (p = 0.03).

Figure 2.

Odds ratio estimates for the association between high methylation and CIN2-3 stratified by the median age of 28 years. Logistic regression models were used to obtain the odds ratios for CIN2-3, using the controls as the referent group, for methylation dichotomized using the second tertile based on the distribution for that site in the controls. The legend shows the color of each age group. The x-axis indicates each individual CpG site by nucleotide position grouped by gene region.

Pre-diagnostic specimens

Twenty-three percent of CpG sites (15/66) had significantly higher methylation levels in CIN2+ cases compared to controls when including only specimens not in the previous study12 before correction for multiple tests; three CpG sites (two in L2 and one in L1) remained significant after correction. Methylation at 11 CpG sites (16.7%) had significantly different methylation levels among the three outcome groups after accounting for multiple tests including all specimens: one site in URR, one in E2-E4, three in L2 and six in L1 (Table 2 and data not shown). Methylation levels were higher in CIN2, CIN3 and cancer case groups primarily in the L2 and L1 gene regions (Fig. 1b). High methylation (top tertile of methylation) at each of these 11 significant CpG sites was associated with an increased risk of CIN2+ (ORs ranged from 3.3 to 9.3) (Table 2 and data not shown), with the strongest risk estimate for position 4261 in L2 of 9.3 (95% CI: 2.3–45.1).

After stratifying by the median age of 28 years, the CIN2+ cases aged >28 years had higher methylation at most sites compared to those aged ≤28 years, and the controls aged >28 years had lower methylation at most sites (data not shown), as shown for the diagnostic specimens.

Serial CIN3 methylation increases over time

Seven women had samples collected at time points 0–7 years before CIN3 diagnosis in addition to the pre-diagnostic and diagnostic samples (30 samples total). There was a trend of increasing levels of methylation in samples collected with less time to diagnosis that was significant at 10 CpG sites in L2, eight sites in L1, two sites in E6 and one site in E2 (Spearman correlation coefficients of −0.38 to −0.56), and weak inverse associations at most other CpG sites (Fig. 3). Some CpG positions in the URR region showed the opposite trend (Fig. 3), however none were significant.

Figure 3.

Mean % methylation by CpG site for 30 serial samples from seven women collected 0–7 years before diagnosis of CIN3. The legend indicates the color of each time interval. The x-axis indicates each individual CpG site by nucleotide position grouped by gene region.

Viral load correlations with methylation

There were no significant correlations between individual CpG site methylation and viral load in the controls, CIN2 or CIN3 diagnostic specimens after correction for multiple tests (data not shown). Viral load did not substantially attenuate the risk estimates for the association between high methylation and CIN2-3 (excluding cancer cases) for each CpG site when adjusted for in the model (Supporting Information Fig. 2). The risk estimate for the positive association seen between viral load and CIN2-3 was not significantly affected by adding each CpG methylation to the model (data not shown).

Methylation distinguishes women with precancer

ROC analyses were used to assess the ability of methylation to distinguish women with CIN2+ from controls at each CpG site and for a combination of sites. ROC curves were generated showing the percent sensitivity versus 100-specificity.

Diagnostic specimens

The AUCs ranged from 0.52 to 0.82, and the pAUCs for a specificity of 40% or higher ranged from 0.45 to 0.79 (Supporting Information Table 2). A majority of CpG sites in L1 and L2, but only one each in E6 and E7 and 2 sites in E2–E4 gave good separation (AUC ≥ 0.75, Table 2). The highest AUC was for a CpG site in L1 at nucleotide position 6457 (AUC 0.82, 95% CI: 0.75–0.89; pAUC 0.79); the ROC curve for this site is shown in Figure 4a and at a sensitivity of 91.1% the corresponding specificity for CIN2+ was 60.2%. For women aged >28 years, the AUC at nucleotide position 6457 in L1 was 0.89 (95% CI: 0.81–0.97), and a sensitivity of 90.0% corresponded to a specificity of 75.6% for CIN2+. The AUCs for CIN2-3 in women aged >28 years and ≤28 years at nucleotide position 6457 in L1 were not statistically significantly different (AUC: 0.85, 95% CI: 0.75–0.96, vs. 0.73, 95% CI: 0.61–0.86; p > 0.05).

Figure 4.

ROC curves for the CpG site at nucleotide position 6457 (a) in L1 using diagnostic specimens and at 6367 (b) in L1 using pre-diagnostic specimens. These are the strongest diagnostic and pre-diagnostic CpG sites with the highest AUCs. The % sensitivity, the true positive rate, is given along the y-axis versus 100-specificity %, the false positive rate, is shown along the x-axis, with a diagonal reference line. The 95% CIs are shown as dotted lines for each ROC curve.

There is a large amount of correlation among methylation levels at CpG sites within the L1 and L2 gene regions (Supporting Information Fig. 3). Therefore, we chose as an a posteriori exploratory analysis to combine the effects of three sites that were weakly correlated (correlation coefficient, r < 0.6) and had the highest AUCs and strongest ORs. For diagnostic specimens, we analyzed the following combination of three CpG sites: position 4261 in L2, position 6457 in L1 and position 790 in E7. The combined OR was 216 (95% CI: 21–999) for women with high methylation at all three CpG sites compared to women with low methylation at all three sites. For women with any one or two of the three CpG sites with high methylation the OR for CIN2+ was 16.5 (95% CI: 2–139) and 33.2 (95% CI: 4–278), respectively. The AUC for the ROC curve for this 3 CpG site combination was 0.82 (95% CI: 0.75–0.90). The effect of high methylation at the same L2 (position 4261) and L1 (position 6457) CpG site combined with high methylation at a site in E2–E4 (position 3436) or in E6 (position 218) had similar results (data not shown), with AUCs of 0.83 (95% CI: 0.76–0.91) and 0.82 (95% CI: 0.75–0.90), respectively.

Pre-diagnostic specimens

AUCs ranged from 0.4 to 0.76, with all CpG sites in the E2–E4, E6, E7 and URR gene regions giving AUCs <0.7 (data not shown). Four CpG sites had AUCs of ≥0.7: 2 in L2 at nucleotide positions 5173 (AUC: 0.73, 95% CI: 0.62–0.84; pAUC: 0.75) and 5128 (AUC: 0.71, 95% CI: 0.59–0.82; pAUC: 0.71) and two in L1 at nucleotide positions 6367 (AUC: 0.76, 95% CI: 0.66–0.86; pAUC: 0.75) and 5927 (AUC: 0.70, 95% CI: 0.59–0.82; pAUC: 0.77) (Table 2). The ROC curve for L1 6367 is shown in Figure 4b, at a sensitivity of 90.0% the corresponding specificity for CIN2+ was 33.8%. CpG sites at positions L2 5173, L1 5927 and L1 6367 were strong diagnostic and pre-diagnostic sites, with both specimens having ORs >3 and AUCs >0.7 for developing CIN2+ (Table 2). For women aged >28 years, the AUC at nucleotide position 6367 in L1 was 0.84 (95% CI: 0.71–0.96), and a sensitivity of 93.8% corresponded to a specificity of 42.5% for CIN2+.

Discussion

The molecular determinants of CIN2+ development in women infected with oncogenic HPV types are unknown; we need effective markers that could retain high sensitivity and increase the specificity of DNA-based HPV assays to distinguish the very common benign HPV16 infections from rare malignant HPV16 infections. Our study shows that accurate quantitation of HPV16 DNA methylation with classification of women by methylation levels has a good sensitivity and specificity (AUC >0.8, sensitivity 91%, specificity 60%) for detecting CIN2+ in HPV16 screen-positive women and to distinguish between CIN2+ and HPV16 infections that clear and that in older women the diagnostic performance of DNA methylation triage appears to increase. We provide a list of the stronger CpG biomarker sites on which, if confirmed, a diagnostic and/or a prognostic assay could possibly be based (Table 2).

Using pre-diagnostic specimens, CpG sites in L2 and L1 had the strongest associations with outcome. The one other longitudinal study examining six CpG sites in the URR region found the opposite trend, they observed high methylation associated with a lower likelihood of CIN2+.15 Using diagnostic specimens, numerous sites in L1, L2, E2-E4, E6 and E7 had significantly higher methylation levels in women with CIN2+ compared to controls. The published literature has been recently reviewed,12 and overall other cross-sectional studies have also observed high HPV16 DNA methylation in the L16, 8, 10, 11, 13 and L211, 13 gene regions associated with advanced disease. Replication of select samples and CpG sites indicate that methylation levels in the URR region may be unreliable, likely due to the very low methylation levels in this region, and may account for some of the inconsistencies in the published literature.7–10, 13–15, 22

A previous report on HPV16 methylation12 included information from a subset of the women analyzed in the current report, including 25 women with CIN3. Unlike our previous report, this report includes women with CIN2 and cancer, as well as serial samples for select women with CIN3; we also now include information from 59 additional controls and six additional women with CIN3. Our measures of methylation are all from a different laboratory; extraction, bisulfite conversion, primer design and methylation assays in the earlier report were obtained from the Burk laboratory in the US, then independently in a different laboratory in the UK. The two studies ran successively. To avoid potential bias, the laboratory results of the previous study did not inform the current analyses.

Our independent replication corroborates the finding that women with CIN3 have higher methylation in L1, L2 and E2–E4 gene regions;12 in addition, we detected strong associations between high methylation and CIN2+ at more CpG sites in all gene regions. In comparison, 10/11 CpG sites that were significantly associated with disease outcome (clearance, persistence, CIN3) in the previous study12 were included in this study and were also significantly associated with outcome (clearance, persistence, CIN2+; noted in Table 2 and Supporting Information Table 2). Our risk estimates were similar or stronger at all sites, except L1 7,136 showed an increased risk of CIN2+ in the current study and a decreased risk of CIN3 previously.12 The strongest site in the previous study,12 L1 5,378 was not included in the current study and similarly the two strongest sites in the current study, L1 6,457 and L1 6,367, were not tested in the earlier study; however, the data combined highlight the importance of the L1 gene region.

We have also extended the analyses to an evaluation of age and HPV viral load and show that viral load is not a significant confounding factor and that age may have modifying effects on the DNA methylation results. Our findings expand the data from the previous study12 by utilizing a larger sample size and cervical disease spectrum to evaluate the utility of viral methylation as a biomarker, as well as serially collected specimens on the same women. In particular, we have shown that methylation at specific CpG sites can distinguish women who will be diagnosed with CIN2+ several years in the future.

In cervical cancer screening, the most important application for novel biomarkers5 is to triage women with positive cytology and/or HPV tests. For example, p16 immunocytochemical staining is a promising biomarker for this triage.23–25 The performance of our HPV16 methylation assay to detect CIN2+ (sensitivity of 91.1% and specificity of 60.2% for CpG L1 6457) was comparable to p16 (sensitivity of 92.6% and specificity of 63.2%),25 although a comparison in similar populations is currently lacking. Since HPV infections are the necessary cause of almost all cervical cancers, detection of HPV DNA has a high diagnostic sensitivity. However, the majority of infections clear, resulting in poor specificity of HPV DNA tests.5 Therefore, as a triage among HPV positive women a biomarker that retains a relatively high sensitivity and has a reasonable triage specificity would be most valuable. Based on data from the ASCUS-LSIL Triage Study (ALTS) and other similar studies there has been a general acceptance of HPV tests to triage ASCUS with characteristics of >90% sensitivity and >45% specificity.26, 27 Our data demonstrate that HPV genomic methylation could be a triage marker after primary HPV testing that would not need a morphological interpretation as required for current p16 and other similar biomarkers. A suitable DNA methylation assay could in theory detect most of the potentially transforming HPV infections while reducing the numbers of women who may be un-necessarily referred to colposcopy or to a “see-and-treat” approach.5, 28 Clearly this concept needs to be extended in a joint test to the other important HPV types (18, 31, 45 etc) and to much larger studies in diverse settings for it to be considered a realistic contender as a triage for HPV DNA screening positives.

Most CpG sites in the HPV16 genome appear to be methylated in a coordinated fashion and this appears to be the reason why the diagnostic performance of HPV16 DNA methylation was not significantly improved by combining the strongest CpG sites. Despite a very strong association with methylation, illustrated by the high odds ratios, the AUCs did not improve much for any of the combinations of the three strongest CpG sites compared to the best single site (0.83 compared to 0.82), even though the sites were chosen so the methylation levels of sites in the set were not highly correlated.

Since our pre-diagnostic samples were collected approximately 3 years before diagnosis of CIN2+, as a biomarker they could potentially provide a period of prognostic risk stratification. Methylation of these pre-diagnostic specimens, particularly at CpG sites within L1 and L2, showed good performance. The four CpG sites (two in L1 and two in L2) that had AUCs >0.7 in pre-diagnostic specimens also had AUCs >0.7 in diagnostic specimens. On the basis of the poor to no-additivity of the AUCs, the best HPV DNA methylation assay may rely on an accurate measurement of just one biologically robust CpG site. Or, perhaps a simple average of a few sites per HPV type, in a cocktail. This approach would minimize costs and may provide for a sufficiently accurate and robust assay selected from several possible CpGs in either the L1, the L2 or the E2 ORFs.

The serial samples collected 0–7 years before CIN3 diagnosis suggest that methylation at most regions of the HPV16 genome increases over time until the time of diagnosis, particularly in the L2 and L1 gene regions. This pattern is also reflected in the pre-diagnostic (the first HPV16 positive sample) and diagnostic samples for women with CIN2+, which show consistently higher methylation levels in the diagnostic samples. However, more extensive longitudinal analyses are needed to further evaluate whether changes in methylation predict detection of CIN2+ later.

In conclusion, we have shown that HPV16 DNA methylation may be a useful diagnostic biomarker for CIN2+ to triage HPV DNA positives and may also be a reasonable prognostic marker. Viral methylation appears to occur years before detection of CIN2+ in this cohort and may provide high sensitivity and reasonable specificity for the development of CIN2+. Additional follow-up studies are needed to extend our pre-diagnostic, age modification and serial sample findings in cohorts of women in different settings and with more in-depth longitudinal data.

Acknowledgements

The authors are grateful to the women who participated in this study and to the Guanacaste Project staff who so carefully collected the samples over the years.

Ancillary