Methylation of CpG 5962 in L1 of the human papillomavirus 16 genome as a potential predictive marker for viral persistence: A prospective large cohort study using cervical swab samples

Abstract Several studies have demonstrated that the viral genome can be methylated by the host cell during progression from persistent infection to cervical cancer. The aim of this study was to investigate whether methylation at a specific site could predict the development of viral persistence and whether viral load shows a correlation with specific methylation patterns. HPV16‐positive samples from women aged 20–29 years (n = 99) with a follow‐up time of 13 years, were included from a Danish cohort comprising 11 088 women. Viral load was measured by real‐time PCR and methylation status was determined for 39 CpG sites in the upstream regulatory region (URR), E6/E7, and L1 region of HPV16 by next‐generation sequencing. Participants were divided into two groups according to whether they were persistently (≥ 24 months) or transiently HPV16 infected. The general methylation status was significantly different between women with a persistent and women with a transient infection outcome (P = .025). One site located in L1 (nt. 5962) was statistically significantly (P = .00048) different in the methylation status after correction using the Holm‐Sidak method (alpha = 0.05). Correlation analyses of samples from HPV16 persistently infected women suggest that methylation is higher although viral load is lower. This study indicates that methylation at position 5962 of the HPV16 genome within the L1 gene might be a predictive marker for the development of a persistent HPV16 infection.


| 1059
FERTEY ET al. While the vast majority of women acquire a HR HPV infection during their lifetime, only a minority of infections persist and progress to cervical cancer or precancerous lesions, defined as high-grade cervical intraepithelial neoplasia grade 3 (CIN3+). 2 In a time span of 2-3 decades, approximately a third of untreated CIN3 lesions become invasive and progress to cervical cancer. 3 The transition from a productive to a transforming infection from CIN1 to CIN3 lesions is characterized by a substantial change in HPV gene expression, from high-level structural gene (L1 and L2) expression and low oncogene (E6 and E7) expression to high expression of oncogenes that interfere with cellular apoptosis and cell cycle regulation and diminished expression of structural proteins. 4 Altered viral gene expression has been linked to methylation of the viral genome for several DNA viruses including Kaposi Sarcoma Herpesvirus, Epstein-Barr Virus, Simian Virus 40 and Adenoviruses (reviewed by 5 ). DNA methylation, which is regulated by DNA methyltransferases (DNMTs) is also a common mechanism in mammalian cells to silence genes; and aberrant methylation patterns have been detected in many different promoters controlling tumor suppressor genes in cancer cells and in cancer-derived cell lines. [6][7][8][9] Recent findings identified a second class of enzymes involved in demethylation, the Ten-eleven-translocation proteins (TETs) that led to the conclusion that gene silencing by methylation is a reversible state. 10 It has been demonstrated that the papillomavirus genome is de novo methylated by DNMTs of the host cell that also play a role in innate immunity. 5,7 Therefore methylation of the viral genome might be a novel mechanism by which the host cell regulates viral gene expression and thereby HPV pathogenicity.
For several papillomaviruses it was shown, that during their normal lifecycle upon differentiation of the host cell, the viral URR becomes hypomethylated highlighting the dynamics of methylation-dependent gene expression regulation. 9,[11][12][13] Therefore several studies have analyzed the correlation between HPV associated lesions and methylation of the viral genome with the common observation that methylation reaches the highest degree in cancer samples. 6,[14][15][16] Experiments in tissue culture indicate that integration of the viral genome into the host chromosome may lead to an alteration in methylation patterns on the viral or host genome depending on the type of integration event. 8,9 Integration of the viral DNA into the host genome has been demonstrated to increase during persistence or carcinogenesis. [17][18][19] Recent studies in HPV16-infected women demonstrated that the levels of methylation slowly increase with enduring HPV persistence, with the diagnosis of cervical cancer and with age. 16,20 Most of the studies focused on samples derived from precancerous lesions and cancer and only a few studies have been conducted using a prospective study design. Furthermore, very few studies have been conducted combining Methylation status and viral load as possible predictors of future cervical disease. 20,21 With the samples used in this prospective cohort study it was possible to investigate the methylation status as well as the viral load years before CIN development. Due to the personal identification number that is being used in the pathology registry in Denmark, data on health status can be easily followed up and assigned to each patient years after the sample was taken. This unique feature allowed us to analyze the HPV16 genome methylation at an early time point of infection.

| Study population
The Danish HPV cohort has previously been described. 17,22 In brief, women were randomly selected from the general female population of Copenhagen, Denmark, using the unique 10-digit personal identification number, which is universally used. All women had cervical swabs taken at enrollment and at approximately 2 years later.
The existence of nationwide registers and the unique personal identification numbers ensure valid linkage between the pathology registry, and facilitate follow-up studies with virtually no loss to follow-up. The women within this study cohort were linked to the Pathology Data Bank and followed up until 2014 identifying all cervical lesions and diagnostic or treatment procedures. As HPV DNA testing took place several years after the study examinations were performed, the women were unaware of their HPV test results; and these results had no influence on clinical management of these women. All women were between the ages of 20 and 29 at the time the first samples were taken. The study was approved by the national Scientific Ethical Committee and the national Data Protection Board.

| Study design
In total 101 samples from women with single HPV16 infections from the enrollment examination were included and methylation analysis was performed. Two samples failed to amplify and were excluded from the study leaving 99 women within this study. To investigate whether methylation of the viral genome determines the outcome of an infection, DNA from cervical swabs was extracted, bisulfite treated and analyzed by next-generation sequencing on a Roche 454 platform. We chose next-generation sequencing for methylation analysis after bisulfite treatment to get quantitative data for each methylated CpG site, because cervical samples may contain molecules with mixed methylation status at the same CpG position. As DNA content and quality of the cervical swab samples was highly variable, not all regions could be amplified with the same efficiency after bisulfite treatment. After sequencing, the reads were sorted according to the respective tags to receive one pool per patient, nonfunctional sequences like primer or adapter sequences were removed and the reads were filtered according to their quality. We initially chose 50 CpG sites in the following regions for analysis: Enhancer/URR, E6/E7 and L1. However, multiple CpG sites within the L1-URR fragment yielded no signal or not enough information after quality filtering of the dataset. Therefore, we decided to only analyze those regions that match high-quality standards and contain sufficient information for statistical analysis, reducing the number of CpG sites further analyzed to 39. Women were included in the analysis who were also HPV16-positive at the second examination, that is, women, who were HPV16-positive at two time points with two years in between, were considered to have a persistent HPV16 infection (n = 53), whereas women who were HPV16 negative at the second examination were considered to have cleared their infection (n = 46) ( Figure 1). Finally, via linkage to the Pathology Data Bank we were able to determine the outcome of a persistent HPV16 infection. This way, we also assessed the role of specific methylation sites for the development of cervical disease of CIN2+ cases occurring up to 12 years following the second examination.

| HPV testing
Residual LBC samples were tested using the Hybrid Capture 2 (HC2) HPV test (Qiagen) using the HR probe set in the central molecular testing laboratory (UKT, Tuebingen) as previously described. 23 HPV genotyping was carried out using the INNO-LiPA HPV Genotyping v2 test as previously described. 24

| DNA isolation and bisulfite treatment
DNA was isolated from cervical swabs or cell cultures using the QiaSymphony robotic system (Qiagen) and the QiaSymphony Virus/Bacteria kit (Qiagen) with an 800 µl extraction protocol according to manufacturer's instructions. Extracted DNA was subjected to bisulfite treatment using the Epitect DNA Methylation Kit (Qiagen). In brief, 200-500 ng of DNA (depending on the DNA concentration of the sample) were converted according to the low concentrated samples-protocol of the manufacturer. During bisulfite treatment unmethylated cytosines (C) are delaminated, which are then converted into uracil (U), and replaced with thymine (T) during PCR amplification; methylated C nucleotides are protected from conversion and remain unmodified.

| Primer design and PCR
Primers flanking the CpG sites of interest were designed using an in silico bisulfite converted HPV16 sequence (Human papillomavirus type 16 clone 114/K, complete genome 7,906 bp circular DNA EU118173.1 GI:157087542). 4 µl of the bisulfitetreated sample was used for subsequent preamplification using six primer pairs, generating four larger amplicons. These served as template for the subsequent amplification that generated 12 amplicons, the primers are listed in Table S1 without M13 linker sequence, which was added to attach the identifier for next-generation sequencing. All segments were amplified by OneTaq DNA polymerase (NEB). The PCR for each method was carried out in 50 µl reaction volumes in an MJ research Thermal Cycler in a 96 well format with the following parameters: initial denaturation at 94°C for 5 minutes, followed by 45 cycles of 94°C for 30 seconds, 50°C for 1 minute, and 72°C for 1 minute with a final extension at 72°C for 10 minutes. PCR products were run on 1.5% agarose gel electrophoresis to check efficient amplification.

| Quantitative realtime PCR
Quantitative RealTime PCR was performed for viral load and integration status measurements as described. 17 Predesigned primers (QT00203763, Qiagen) were used for detection of the single copy gene IFNB1 for normalization. The HPV16 E2 and E6 primers have previously been described. 25 PCR was performed in a final volume of 20 μl containing 1 × Light Cycler 480 SYBR green I Master Mix (Roche Diagnostics), 0.3 μM primers and 5 μl of DNA.

| PCR purification and 454 Nextgensequencing
In a first step, multiplex identifiers (MIDs) for library preparation were added in an additional PCR reaction | 1061 FERTEY ET al.
described above using primers that carry an M13 sequence to create identical ends for the Amplicon pools. The sequence is described in Table S1. PCR products of each patient were pooled and purified using QiaQuick nucleotide removal kit (Qiagen) for subsequent 454 sequencing. 454 sequencing was carried out on the Titanium platform (Roche/454 Life Sciences) by Eurofins/MWG Operon, Erlangen Germany.

| Data analysis
The amplicons were sorted by barcodes (MIDs) to set up 101 libraries for analysis. All sequences were quality trimmed followed by cutting off nucleotides with low quality (<20, sliding window 7). Short reads, remaining fragments of MID and primer sequences were removed after demultiplexing. The reads were collapsed, so that no identical reads were present. The resulting reads were aligned to the reference sequence and percentage of methylation was calculated. 26 As reference sequence a consensus sequence of 22 HPV16 variants was used. The consensus sequence was generated with MegAlign (SeqMan NGen ® ) using the following sequences (Genebank Accession Numbers): EU118173.

| Statistical analysis
For analysis of viral load as risk factor for persistence, we compared transiently HPV16-infected women with persistently HPV16-infected women using cervical samples taken at the initial examination at enrollment. The Mann-Whitney test was used to test for significant differences. All statistical tests were conducted at the 5% two-sided significance level. For multiple comparisons the correction was performed using the Holm-Sidak method 27 with alpha = 0.05 assuming that all rows are sampled from populations with the same scatter (SD). Statistical tests and multiple testing were performed using GraphPad Prism 6 ® (GraphPad Software). Graphs were generated using GraphPad Prism 6 ® . Heatmaps were generated using the web-based application "CIMminer" (http://disco ver.nci. nih.gov/cimmi ner/home.do).
F I G U R E 1 Schematic overview of the study population. Overview of study design of the Danish HPV cohort study (modified from 32 ).
For methylation analysis 101 HPV-positive samples were selected, n = 99 samples have been finally included. Of these 99 samples, 53 remained HPV16-positive after 2 years and were therefore regarded as persistently infected. 46 were negative for HPV16 after 2 years and were therefore regarded as transiently infected 3 | RESULTS Figure 2A shows the persistent samples, whereas Figure  2B displays the transient samples. White regions indicate missing data, black indicates 100% methylation, whereas gray shades indicate the respective percentage of methylation. To control for the efficiency of the bisulfite treatment and the PCR reactions, DNA from the cervical carcinoma cell lines SiHa and Caski served as positive controls. SiHa cells are generally regarded as low methylated, because they harbor only 1-2 HPV16 copies, which are highly transcriptionally active. Caski cells display a higher overall methylation of the viral genome, because they contain up to 600 integrated HPV16 copies of which most of them are transcriptionally silenced by DNA methylation. 6,28,29 In agreement with this, SiHa cells showed a significantly lower methylation level than Caski cells except for the L1 region ( Figure S1B).

| Methylation as predictor of HPV16 infection outcome
To investigate the methylation pattern in patient samples with known infection outcome, women with confirmed HPV16 infections (n = 99) were selected and only the samples from the initial examination were investigated in this study. The patients were grouped according to their future infection outcome and the average methylation of 39 sites was analyzed using the Mann-Whitney test. The mean methylation status was significantly different between a persistent and a transient infection outcome (P = .025) ( Figure 3A). The mean methylation of the particular CpG sites showed a higher methylation in the majority of the CpG positions for the samples that will develop a persistent infection. In detail, the mean methylation status of the individual CpG sites displayed a highly variable distribution with the highest methylation levels in the L1 region ( Figure 3B). The two patient groups were further analyzed for specific CpG positions. Since testing for normal distribution (Shapiro-Wilk P < .05) failed, the nonparametric Mann-Whitney U-test was applied. Before correction for multiple testing, there were two sites below P = .05 that segregated samples that will develop a persistent infection from those who will clear the virus and will therefore be transiently infected. One site is located in E6 (bp 494, P = .016), the other site in E7 (bp 701, P = .022) ( Figure 3C).
To correct for multiple testing at the methylated CpG positions, a multiple comparison was applied to the data and correction for multiple testing was performed using the Holm-Sidak method with alpha = 0.05, assuming that all rows are samples from populations with the same scatter. An unpaired t test for each row was computed, using one pooled standard deviation (SD). The results are shown in Table 1. Only one CpG site remained highly significant after correction for multiple testing and is therefore considered differentially methylated in cervical samples of women that develop a persistent HPV16 infection and that clear the infection (Figure 4). The site is located in L1 (bp 5962, P = .00048), a region, which has been proposed by several studies to be a prognostic marker for cancer development. 30,31 We also analyzed the methylation in persistently infected samples regarding a predictive association with CIN2 development. CIN 2+ encompasses CIN 2, CIN 3, adenocarcinoma in situ (AIS), and cancer. The histological diagnosis has been described in detail. 17 Of the 53 women with a persistent HPV16 infection, 28 went on to develop a CIN2+ during follow-up, whereas 25 remained normal. No statistically significant association between methylation and CIN2+development was found, indicating that at this early time point no specific methylation pattern predicted development of severe disease (data not shown).

| Determination of HPV16 viral load
As previously described 32 out of the 11 088 women aged 20-29 years, 82 were diagnosed persistently infected with HPV16 (Figure 1). Viral load for these samples has

F I G U R E 3 The methylation status differs between samples with persistent and transient infection outcome before correction for multiple
testing. A, Overall comparison of the quantities of methylation between transiently and persistently infected women. For this, results were grouped according to the patient's future infection outcome and the mean of the methylation at every CpG was calculated, (each dot represents the mean value of one CpG position). The P value was calculated using the nonparametric, two-tailed Mann-Whitney test. B, Comparison of methylation quantities at each CpG site between transiently and persistently infected women. The mean of the methylation amount at each CpG was calculated, (each bar represents the mean value). SEMs are indicated. C, Comparison of methylation quantities at CpG sites 494 and 701 between transiently and persistently infected women. The mean of the methylation at every CpG was calculated, and a Mann-Whitney test was applied to test for significant difference between the two groups for each CpG. The only sites that showed statistical significance (P < .05) were CpGs 494 and 701 previously been analyzed for the baseline and the follow-up examination ( Figure S1A). 17 Viral loads at enrollment ranged from one to over 1000 viral DNA copies per cell for both the persistent and the transient samples. The mean viral load was significantly higher in cervical samples from transiently HPV16-infected women (P = .006).
T A B L E 1 Detailed overview of the CpGs analyzed by multiple comparison. The correction for multiple testing was performed using the

| Correlation of viral load with methylation of the HPV16 genome
Generally, samples that will develop a persistent infection displayed a lower viral load ( Figure S1A) and a higher methylation of the genome ( Figure 3B), whereas samples that remain transiently infected showed a higher viral load and a lower methylation status of the genome. Correlation of methylation and viral load was investigated as described before. Five sites were identified that showed a significant result ( Figure S2): One in E6 (bp 494, P = .0202), two in E7 (bp 502, P = .0035; bp 539, P = .0151), and two in L1 (bp 5616, P = .0103; bp 7033, P = .0089). The strongest correlation was observed for site 5616 with a correlation coefficient of r = 0.4407. These data indicate that there is a weak to moderate positive correlation between methylation and viral load.

| DISCUSSION
In this prospective cohort study, we investigated viral DNA methylation of HPV16 at 39 CpG sites in the URR, E6/E7, and L1 region. We used samples that were taken before diagnosis from a cohort of HPV16-infected women within the prospective Danish HPV Cohort from Copenhagen, Denmark. 22 Our observations show that it might be possible to predict a persistent infection as a risk factor for disease progression after a single test at a given time point.
We identified two CpG sites in E6/E7 that have statistically significant differences in methylation levels among HPV16-infected women before correction for multiple comparisons. Methylation levels were generally higher in samples from women that became persistently infected than in samples from women with a transient infection outcome.
The identified CpGs are not completely consistent with the results from other studies. On one hand, this might be due to the sample material as most of the studies conducted so far focused on cervical cancer or precancer samples and not on cervical smears before cancer diagnosis as in this study. On the other hand, this difference might be based on the fact that these studies had a nonprospective design. Most of these reports describe a hypermethylation of the L1 and L2 region in tumor samples. 16,30,[33][34][35][36][37][38][39] The E6/E7 region has not been in the focus of most studies conducted so far. In productive infections, E6 and E7 are always expressed but at relatively low levels. During carcinogenic progression, transcription of E6 and E7 increases often due to disruption of the E2 ORF after integration of the viral genome. Only the studies, which investigated whole HPV genome methylation provide data for this region 15,20 and found no significant differences. Other reports mainly focused on the URR but found inconsistent results. Several studies showed elevated methylation levels which were associated with CIN development, [14][15][16]40,41 whereas others had opposite findings. 6,[42][43][44] Results from a prospective cohort study described the association between HPV16 DNA methylation and cervical disease using serially taken samples analyzing 67 CpG sites distributed over the HPV16 genome. 35 In HPV16-positive samples that were taken before onset of disease or of HPV clearance, a CpG site in L2 (position 4261) showed increased methylation associated with development of CIN3. Others demonstrated significantly different methylated positions in the L1-region as well, but several reports identified CpGs between 5600 and 5617 as targets [45][46][47] or CpGs 6368, 6405, and 6443. 48,49 The CpG position we identified after correction for multiple testing (bp 5962) differs from the site that was identified in these studies. The major reason for this difference might be that the other studies used cancer or precancer samples, while we focused on prospective samples before establishment of viral persistence or development of cytological abnormalities. It was further demonstrated that increased methylation at CpG sites in the E2, L2, and L1 region was associated with a risk of CIN3 compared to women who cleared the infection. 35 Later Mirabello et al confirmed their findings in a larger number of samples and showed an increased methylation in L2, L1 and E2/E4 regions among cases with CIN2+. 20 We could not identify a statistically significant association between methylation within these regions and CIN2+-development in our prospective study, which suggests, that specific methylation patterns might not exist at the first time of HPV16 detection and probably develop later during progression of the infection.
Mirabello et al demonstrated that viral load did not affect methylation levels and that women older than the median age of 28 years tended to have higher methylation levels compared to women younger than 28 years. 20 We here analyzed samples of the young Danish cohort, which involves women 20-29 years of age at the time of enrollment. Therefore, we cannot make any statement whether methylation increases with age. At this time point, it is unclear whether the patterns of HPV16 methylation we observed would be the same for older women. Another study used laser micro dissection to capture different layers of the epithelium within lesions of HPV16-positive patients. Investigations of the methylation status of the HPV16 URR revealed dynamic changes in the methylation status in the context of the viral life cycle, demonstrating that the methylation status of the HPV16 genome is highly dynamic upon differentiation of the host cell. 13 Furthermore, the authors showed that neoplastic transformation was associated with methylation of two distinct CpG sites within the distal E2 binding site 1 (E2BS1) leading to URR hyperactivation. Other recent studies revealed that hypermethylation of E2BS3 and E2BS4 together with high viral load was associated with a worse cancer-specific survival rate in vulvar but not in vaginal carcinoma. 21 Taken together, a positive correlation of methylation and viral load might determine the outcome of a HPV infection.
One limitation of this study is the number of missing sequences particularly within the L1 region. This lack of data can only be compensated by assessing our data in a validation cohort. This, however, would require a large prospective patient cohort study with an extended follow-up period, which is out of scope of this study.
In summary, results from other prospective studies are not fully consistent with the CpG methylation patterns we identified here. Differences might stem from variations in the time points the samples were taken at, the methods applied for methylation analysis and the CpG positions that were actually analyzed. Sampling differences might also play a role in HPV DNA methylation analyses, since cervical swabs often contain diverse cell types and material from multiple lesions as compared to biopsy material. Another difference is that comparable studies used pyrosequencing for analysis. However, a recent report that compared next-generation-and pyrosequencing suggests that the results are similar but not identical with generally higher methylation levels identified by next-generation sequencing. 50 With this prospective study, we observed that methylation at the CpG site at nt 5962 located within the L1 ORF might be predictive for a future persistent HPV infection. Furthermore, we found modest correlations between the methylation of specific CpGs and a high viral load ( Table 2). In summary, we demonstrate the potential of using a specific HPV DNA methylation site as biomarker for the determination of a persistent HPV16 infection as risk factor for the development of cervical disease.

CONFLICT OF INTEREST
CM has received support for conference participation and speaker's fees from Sanofi Pasteur MSD. SKK has received speaker's and advisory board fees and research grants through her institution from Sanofi Pasteur MSD and Merck. TI received speaker honoraria from Hologic GmbH, Becton Dickinson Diagnostics GmbH and Sanofi Pasteur MSD; and his host institution (UKT Tuebingen) received an unconditional research grant from Hologic GmbH and Becton Dickinson Diagnostics GmbH.
T A B L E 2 Correlation of viral load with methylation of the viral genome. The methylation for each CpG was correlated with viral load using a Spearman nonparametric correlation (two-tailed). Only those positions that resulted in a significant correlation (P < .05) are shown