An epigenomic landscape of cervical intraepithelial neoplasia and cervical cancer using single‐base resolution methylome and hydroxymethylome

Abstract Background Cervical cancer (CC) is the second leading cause of cancer death among women worldwide. Epigenetic regulation of gene expression through DNA methylation and hydroxymethylation plays a pivotal role during tumorigenesis. In this study, to analyze the epigenomic landscape and identify potential biomarkers for CCs, we selected a series of samples from normal to cervical intra‐epithelial neoplasia (CINs) to CCs and performed an integrative analysis of whole‐genome bisulfite sequencing (WGBS‐seq), oxidative WGBS, RNA‐seq, and external histone modifications profiling data. Results In the development and progression of CC, there were genome‐wide hypo‐methylation and hypo‐hydroxymethylation, accompanied by local hyper‐methylation and hyper‐hydroxymethylation. Hydroxymethylation prefers to distribute in the CpG islands and CpG shores, as displayed a trend of gradual decline from health to CIN2, while a trend of increase from CIN3 to CC. The differentially methylated and hydroxymethylated region‐associated genes both enriched in Hippo and other cancer‐related signaling pathways that drive cervical carcinogenesis. Furthermore, we identified eight novel differentially methylated/hydroxymethylated‐associated genes (DES, MAL, MTIF2, PIP5K1A, RPS6KA6, ANGEL2, MPP, and PAPSS2) significantly correlated with the overall survival of CC. In addition, no any correlation was observed between methylation or hydroxymethylation levels and somatic copy number variations in CINs and CCs. Conclusion Our current study systematically delineates the map of methylome and hydroxymethylome from CINs to CC, and some differentially methylated/hydroxymethylated‐associated genes can be used as the potential epigenetic biomarkers in CC prognosis.

both enriched in Hippo and other cancer-related signaling pathways that drive cervical carcinogenesis. Furthermore, we identified eight novel differentially methylated/hydroxymethylated-associated genes (DES, MAL, MTIF2, PIP5K1A, RPS6KA6, ANGEL2, MPP, and PAPSS2) significantly correlated with the overall survival of CC. In addition, no any correlation was observed between methylation or hydroxymethylation levels and somatic copy number variations in CINs and CCs.
Conclusion: Our current study systematically delineates the map of methylome and hydroxymethylome from CINs to CC, and some differentially methylated/hydroxymethylated-associated genes can be used as the potential epigenetic biomarkers in CC prognosis.

K E Y W O R D S
cervical cancer, cervical intraepithelial neoplasia, DhMR, DMR, hydroxymethylation, methylation

BACKGROUND
Cervical cancer (CC) is one of the most common gynecological tumors that become a health threat to women. There are more than half a million new cases and more than 300,000 deaths worldwide each year. 1 China is one of the countries with a high incidence of CC due to the lack of a large-scale standardized regular screening and the low usage of human papilloma virus (HPV) vaccine. 2 Although, the 5-year survival rate of CC patients detected at an early stage is more than 90%, the survival rate decreases dramatically for advanced CC and metastatic CC. 3 Therefore, it is urgent need to further study the pathogenesis of CC and to find potential biomarkers to research and develop novel diagnostic techniques and treatments for advanced CC and metastatic CC. It is reported that aberrant epigenetic modifications, including primarily cytosine methylation (5mC) and hydroxymethylation (5hmC) can lead to inappropriate activation/suppression of genes and then drive the tumorigenisis. 4,5 While DNA 5mC has been implicated in numerous biological processes, and aberrant DNA methylation patterns are considered as a hallmark of cancer, the biological significance of DNA 5hmC in cancer remains elusive. Recent studies have shown global loss of 5hmC in a variety of human solid tumors (breast, colon, gastric, liver, lung, melanoma, and prostate cancer) compared with the normal surrounding matched tissue demonstrated by immunohistological chemistry and dot blot assays. 6,7 The depletion of DNA 5hmC in cancer may also be a potential biomarker for early detection and prognosis of distinct cancers.
In CC, one of the fundamental processes driving the initiation and progression of CC is the accumulation of genetic and epigenetic alterations in cervix epithelial cells. A large amount of evidence showed that the methylation of the host tumor suppressor gene promoter region can cause dysregulation of many genes which in turn leads to cervical tumorigenesis. 8 Verlaat et al reported that cancerlike methylation patterns in CC could be detected early in cervical intra-epithelial neoplasia (CIN) using 12 hostcell DNA methylated genes. 9 Furthermore, improvements in the methods used for detecting DNA methylation and hydroxymethylation have accelerated our understanding of epigenetic variants in cancers. Wang et al drew the whole 5mC and 5hmC profile in CC and compared to cervicitis tissues by semi-quantitative methylation analysis. 10 However, these studies did not draw a full epigenomic map that can differentiate 5mC from 5hmC in the entire pathogenesis from normal to CIN to CC.
In addition, multi-omics integrated analysis can better illuminate the mechanism of the occurrence and development of carcinoma. Integrated analysis of methylation and gene expression for diagnosis, treatment, or prognosis has been reported in breast cancer, lung cancer, thyroid cancer, and hepatocellular carcinoma, head and neck squamous cell carcinoma, and polycystic ovary syndrome. [11][12][13][14][15][16] Although CC screening programs are now carried out worldwide, many women are still diagnosed with advanced CC which the overall prognosis remains poor.
To date, only a limited number of reports about biomarkers were identified in CC by multi-omics integrated analysis.
In this study, we systematically analyzed the landscape diagram of methylation and hydroxymethylation from normal to CIN to CC at single-base resolution using wholegenome bisulfite sequencing (WGBS-seq) and oxidative WGBS (oxWGBS-seq) methods. In addition, we also performed an integrative analysis of WGBS-seq, oxWGBS-seq, RNA-seq data and histone modifications profiling data from TCGA to identify CC-specific potential epigenomic biomarkers.

A decreasing trend of global 5mC and 5hmC levels from healthy groups to CCs
We performed WGBS-seq and oxWGBS-seq on 16 genomic DNA extracted from two health cervical (healthy) tissues, six CIN tissues (CIN1, CIN2, CIN3, n = 2 for each stage), and four paired CC tissues, and adjacent paracancer tissues ( Figures S1 and S2). The adequate and enough DNA with good quality were sequenced to a depth of 10.48-fold and 10.44-fold for WGBS-seq and oxWGBS-seq with average cytosine coverage of 96.03% and 95.69%, respectively (Table S2). We observed both high bisulfite (unmethylated cytosine to uracil, 99.92%) and high oxidative bisulfite conversion rates (5-hydroxymethylcytosine to uracil, 96.57%; Table S2). In total, 4.8-14.8 million CpG sites could be used to evaluate the hydroxymethylation variation at baseresolution across all samples. Thus, we created genomewide, single-base-resolution 5mC and 5hmC maps of the healthy groups, CINs, and CCs, and we drew a map of 5mC and 5hmC modification profiling in CpG contexts ( Figure S3).
We then compared the global levels of 5mC and 5hmC in healthy groups, CINs and CCs according to beta values derived from WGBS-seq and oxWGBS-seq libraries (see methods for details, the level was defined as the ratio of methylated or hydroxymethylated cytosines to the total number of cytosines). On the genomic scale, the average global 5mC levels at CpG sites were 73.67 (56.20%-80.97%), and the average global 5hmC levels at the CpG sites were 2.39% (0.75%-7.23%) among 16 samples. Both 5mC and 5hmC levels showed a decreasing trend from healthy groups (72.45% and 3.31%) to CCs (68.66% and 0.94%), even if the level of 5mC and 5hmC in the adjacent para-cancer tissues tended to be higher than that in the cancer tissues ( Figure 1A). The average 5mC levels were 76.58%, 78.57%, 79.84%, and 72.13% for healthy, CIN1, CIN2, and CIN3, respectively. While the average 5hmC levels were 1.83%, 1.141%, 0.82%, and 3.62% for healthy, CIN1, CIN2, and CIN3, respectively.
In addition, we also analyzed the distribution frequency of methylation, hydroxymethylation, and unmethylation at modified CpG site. The 5hmC level was 43-fold lower than the 5mC level across all the patients ( Figure 1B). We found that there were 6.33%-24.63% of 5hmC modifications in all of 5mC modificated sits in all samples, while about 10% of 5hmC modifications in healthy cervix (Figure 1C, blue marked). Interestingly, we found that 100% 5hmC sites were also simultaneously modified by 5mC ( Figure 1C, yellow marked). Our results provide valuable data in support of the reports that upstream modification of hydroxymethylation is methylation. 17,18

Distribution of 5mC and 5hmC content in human genomic regions
We then analyzed the distribution of 5mC and 5hmC in genomic regions in each group. Our results showed that 5mC sites and 5hmC sites distributed across the human genome ( Figure 2A). To explore whether the methylation levels in a strand-dependent manner, the methylation levels were interrogated by both Watson and Crick strand. The methylation level in both strands was plotted for each sample, and we did not observe any differences of methylation between the Watson and Crick strands for each sample (typical result was shown in Figure S4A). Intriguingly, from the kernel density of mC/hmC levels results, the ratio of fully-methylated sites in the CC group was 47% lower than groups in the healthy and CINs. However, no differences were observed on the ratio of fullyhydroxymethylated sites among the healthy, CINs, and CC stages ( Figure S4B).
Consistent with the global methylation levels, the 5mC and 5hmC levels exhibited a similar trend in different genomic regions and all chromosome in CINs and CC. However, we observed different methylation patterns at locally functional regions including the exon and intron regions, promoter, and intergenic regions ( Figure 2A).
The DMR and DhMR were preferably enriched in CpG island (CpGi) and CpG shore (flanking ± 1 kbp of CpG island) (hypergeometric test, p < 0.05; Figure 2B). Remarkably, DMR of CINs and CC significantly enriched in exons (hypergeometric test, p < 0.05; Figure 2B), compared to the expected. DMR in CIN3 co-localized in enhancers, flanking 1-5 kb of genes, exon-intron boundaries and introns, while methylation in gene promoter was only significantly found in CC (hypergeometric test, p < 0.05; Figure 2B), suggesting the essential role of promoter methylation in CC initiation.
The number of DMR increased in CIN3 and CC stage, compared with CIN1 and CIN2, suggesting that the increased methylation changes in CC development. The number of DhMR increased dramatically in the CIN3 stage and decreased in CC stage ( Figure 3A). Next, we also compared the distribution of DMRs and DhMRs among CC at the chromosomal level. The results suggested that CIN3 was mainly hypomethylated, while CC was mainly hypermethylated. Consistent with these results, high hydroxymethylation occurred in CIN3, and a low level of hydroxymethylation occurred in CC ( Figure 3B). It is suggested that CIN3 displayed the most drastic demethylation in the process of CC.

Functionally associated genes based on DMRs and DhMRs
We then annotated the genes associated with DMR and DhMR. Here, we called them DMRs-associated genes (DAGs) or DhMRs-associated genes (DhAGs). Considering the promoter methylation mainly contributes to the gene expression, we focused on the promoter-linked DAGs and DhAGs, we found several genes (such as NFIX, CDH4, PDE4D, PITX2, and G6PD) were closely related to cervical carcinogenesis ( Figure 4A left). These hypermethylation genes could play a role in the CC progression. Similarly, we also found two genes (PLSCR4, CUL4B) in hydroxymethylation (Figure 4A right). Of these genes, NFIX and PITX2 might be used as early detection and prognostic markers for breast cancer. 19,20 CDH4 and PDE4D can be used as early diagnostic markers of gastrointestinal tumor and prostate cancer, respectively. 21,22 Together, these findings provided further support for an important role of epigenesis in CC tumorigenesis and progression.
To gain insight into the potential biological function of methylation and hydroxymethylation, the DAGs and DhAGs were enriched via GO and KEGG. GO enrichment results showed that CINs and CC-related biological processes including DNA−binding transcription activator activity and cell adhesion molecule binding ( Figure 4B). The KEGG enrichment result showed that these potentially methylated genes were significantly enriched in Hippo, cAMP, Adherens junction, Axon guidance, and Neuroactive ligand-receptor interaction (all p < 0.05); hydroxymethylated genes were also significantly clustered in Hippo, cAMP, Rap1, ErbB, and MAPK signaling pathways ( Figure 4C) (all p < 0.05).

The relationship between methylation and hydroxymethylation level and gene expression in DAGs and DhAGs
To determine the association between the methylation/hydroxymethylation level and gene expression, we downloaded the level 3 methylation data and the paired RNAseq data of CC deposited in TCGA database. Then, we analyzed the expression level of DMRs/DhMRs-associated genes (DAGs) from TCGA, of which these DAGs were overlapped to our results (Tables S3 and S4). And, we found that 67.61 % and 81.27 % for promoter and gene bodies of DMR/DhMR-associated genes that identified in our data, were epigenetically silenced in TCGA through the cor-relation between methylation and expression. 23 Typical results showed that gene expression was negatively correlated with methylation ( Figure 5A). As an example, the MAL expression showed a significant negative correlation with its promoter methylation (Spearman rho = −0.42, p = 1.6e −14). Besides, we also analyzed methylation and hydroxymethylation levels in promoters and exons in each stage of cervical lesions ( Figure S5A). Higher levels of methylation were observed in upregulated gene expression at the promoter boundary. These observations were consistent with previous studies. 24 Similarly, a positive trend was observed between methylation and gene expression in exon boundaries (±150 bp), especially in cancer and matched adjacent paracancer tissues. However, for hydroxymethylation, similar phenomenon was not found in any of these groups ( Figure S5B).
To further understand the clinical relevance of DAGs/DhAGs in CC, the same strategy was performed as the above. We firstly identified DMRs/DhMRs-associated genes (DAGs) from TCGA, of which these DAGs were overlapped to our results (Tables S3 and S4). And, then we investigated the association between the methylation or expression of these DAGs/DhAGs and overall survival. Both methylation and expression showed statistically positive correlation with the overall survival rate of MTIF2, PIP5K1A, and RPS6KA6 in CC, whereas those of DES and MAL exhibited negative association with overall survival (all log-rank p < 0.05 Figures 5B  and 5C). Besides, we used the same analysis method for DAGs/DhAGs of CIN3. Interestingly, three genes (ANGEL2, MPP1, and PAPSS2) were also identified with statistically significant differences in the overall survival for CC. Therefore, methylation of these eight genes could be used as potential biomarkers for predicting prognosis.

Integrated analysis of methylation/ hydroxymethylation and genomic variations
To investigate the relationship among methylation/hydroxymethylation levels and copy number variations, we selected CNV data using both the bisulfite sequencing data and the whole exome sequencing of CC and CINs in our previous study. 25 We compared the locations of CNV regions and methylation regions unmatched data. The SCNV gain or loss region did not consistently overlap with DMR regions across the genome ( Figures 6A  and 6B). And an example showed the inconsistency of the location of CNV and DMR/DhMR in chromosome 2 ( Figure 6C).
To explore the association between CNV and methylation/hydroxymethylation, the genome was randomly CNVvsmC and CNVvshmC) in all the matched data. And, the correlation coefficiency and p values were inconsistent across samples at specific stage ( Figure S6A).
In terms of the degree of change, the methylation/hydroxymethylation occurred earlier and much more than that of CNV ( Figures 6A and 6B, and S6B), which would be more suitable for a marker for early warning of CC. In addition, the mutation types in cytosine sites were measured according to methylated status using WGBSseq and oxWGBSseq data. Hydroxymethylated cytosines were highly enriched in C > A and C > G transversions (all p < 0.001), while C > A mutations were also observed at methylated cytosines (p < 0.01; Figure S7).

Integrated analysis of methylation and hydroxymethylation with histone modifications peaks
Finally, we associated the methylation and hydroxymethylation with histone modification. The methylation and hydroxymethylation levels were mapped to signals of ChIP-Seq dataset of six epigenetic marks in the normal human cervix from the ENCODE depository.
Promoter-and enhancer-associated histone binding peaks (peaks ± 5 kb, ranging ±15 kb) were associated with DNA methylation. H3K4me3 was depleted both in DNA methylation and hydroxymethylation. H3K27ac and H3K4me1 were depleted in methylation, while enriched in hydroxymethylation ( Figure 7A). Especially, methylation levels in promoter-associated H3K4me3 and enhancerassociated marks (H3K27ac and H3K4me1) were significantly lower in CIN3 and adjacent paracancer tissues than in tissues of healthy and CINs. H3K27me3, a poly-comb-associated mark, 27 was only depleted in methylation. No discrepancy was observed in methylation and hydroxymethylation at H3K36me3 peaks, which is a dsDNA repair-marks, 28 among all cervix tissues. The association of DNA methylation and hydroxymethylation as well as fourrelated histone modification localized at enhancer region, the EPS15 gene was shown as an example in both CC and paracancer tissues ( Figure 7B). Our analysis suggested that methylation and hydroxymethylation were an epigenetic mark in cervical tissues. And the hydroxymethylation resisted methylation in DNA modification at epigenetic binding sites. An enrichment of hydroxymethylation in promoter and enhancer elements indicated its gene regulatory activity. 29

DISCUSSION
Epigenetic modification, especially DNA methylation and DNA hydroxymethylation, as an epigenetic hallmark has been proved to play an essential role in cancer. Thus far, studies have reported that whole-genome methylation and hydroxymethylation profiling for several cancers, profiling for CC remains limited. In this study, we have performed a base-level analysis of DNA methylation and hydroxymethylation across the whole genome in healthy, CINs, and CC with matched adjacent paracancer tissues, which revealed novel insights regarding the role of epigenetic modification contributes to the cervical carcinogenesis and its prognosis. The global methylation level of tumor tissues and the matched adjacent paracancer tissues are the same, but tend toward be higher than CINs and healthy; the global hydroxymethylation level of adjacent paracancer tissues tends toward to be higher than tumor tissues ( Figure 2A). The results showed that although the morphology of the adjacent paracancer tissues had not changed, the epigenetics has changed on a large scale. And these observations are consistent with most of the previous work about cancer epigenetics. 30,31 Global hypo-methylation and hypo-hydroxymethylation have been observed in CC, especially, the number of DMRs and DhMRs increased in CC, compared with the healthy. The alteration of methylation and hydroxymethylation occurred in CpG islands and CpG shores in CINs and CC tissues. And the exon methylation was observed in CINs and CCs ( Figure 2B). There results indicated that the local Locally hyper-methylation and hyper-hydroxymethylation have been identified in the CINs and CC. When CC compared to normal tissue, high hydroxymethylation occurred in CIN3 and low hydroxymethylation occurred in CC which suggested drastic transformation have taken place in the process of CC development ( Figure 3A). A large number of hyper-DMRs and hypo-DhMRs occurred in CC, and amount of hypo-DMRs and hypo-DhMRs occurred in CIN3 ( Figure 3B). The change of DMR and DhMR from CINs to CC exhibited a non-linear pattern, rather than linear fashion.
Methylation and hydroxymethylation in lesions at CINs may be a key checkpoint for the fate of cervical cells. We found that methylation modification levels in stage CIN3 were distinct from CIN1-2 stages with aberrant decreased methylation and sharply increased hydroxymethylation. Additionally, the differential methylation regions in CIN3 specifically co-localized in enhancers. It used to be thought that methylation/hydroxymethylation level of CIN progressed through these pre-cancerous stages toward cancer in a linear fashion, 32 except report of the disease progression in CC. 33 Coincidently, the researchers use this model on the methylation status of the healthy individual, CIN1, CIN2+ and CC patients, and observed the non-linear and bi-modal methylation dynamic with maximum methylation in CIN2+ stage. Our study provides a more precise transition stage in the methylation dynamic process from CINs to CCs. Moreover, our results showed that a higher deviation of hydroxymethylation was observed than that of methylation in this process. Inspired by the previous studies, 34 we suspected that CIN3 is a critical stage that the methylation/hydroxymethylation status is reversible in this stage and could determine the clinical outcome of CC. Moreover, the hydroxymethylation level might be more sensitive than methylation in this process due to its high deviation.
Notably, novel methylated genes were identified in CC. Typically, cancer driver genes NFIX and CDH4 may act as methylation biomarkers for early detection for breast cancer and gastrointestinal tumorigenesis. 19,21 PITX2 acts as an oncogene in multi-cancers, while no reports about the role of methylation in other cancers as well as CC. As a whole, these novel methylated genes provided new insights for a better understanding of the progress of CC and may serve as useful biomarkers for the accurate management of CC.
GO and KEGG pathway analysis shows that the DAGs/DhAGs were significantly enriched in Hippo signaling pathways and human papillomavirus infection pathways. Hippo hijacks EGFR and HPV E6 oncoprotein to promote CC, 58 and its downstream YAP and TAZ proteins play a role in cervical tumor-infiltrated cell activation. 59 Previously study finds that mutation, amplification, and deletions of Hippo in hypermethylated CC subgroup. 60 We provided additional evidences that methylated genes in Hippo pathway occurred in CC, even at CINs stages. In addition, this finding is not similar to previous studies which have suggested that the cell cycle was a key biological process and a critical driver in CC by bioinformatics analysis. 61 To some extent, our finding shows the advantage of finding the CC signal pathway by WGBS-seq and oxWGBS-seq.
Gene expression was controlled by multiple aspects including DNA methylation, DNA hydroxymethylation, histone modifications, thus, the correlation between DNA methylation, and gene expression might be overestimated when all genes were involved in the correlation analysis. It can thus be suggested that the alteration in gene expression and changes in methylation might be associated with the outcome of CC. Using TCGA as an external validation, we identify the epigenetic changes that may play a key role in the prognosis of CC. We finally identified eight DMR/DhMR-related genes (DES, MAL, MTIF2, PIP5K1A, RPS6KA6, ANGEL2, MPP, and PAPSS2) that showed significant associations between overall survival and the identified DNA differential methylationrelated genes among the CC patients from TCGA. All the eight DMR/DhMR-related genes were novel prognostic genes at methylation level in CC. As a tumor suppressor gene, the promoter methylation of MAL has been found in CINs and CCs. 55 Hypermethylation of MAL is correlated with its downregulation of gene expression in esophageal adenocarcinomas. 62 PIP5K1α and PIP5K1A have been reported to be involved in carcinogenesis. 63,64 RPS6KA6 (RSK4) is a putative tumor suppressor gene. 65,66 The RSK4 gene has also been reported as oncogenic gene in several cancers. 67 Overexpression of RSK4 positively correlates with poor prognosis in RCC and ESCC. 67,68 Notably, the roles of DES and MTIF2 at genetic or epigenetic levels are unclear in human cancers.
The three prognostic markers (PAPSS2, ANGEL2, and MPP1) that hyper-hydroxymethylated in CIN3 also play important roles in cancers. PAPSS2 is critical for breast cancer cell migration and metastasis. 69 Abnormal ANGEL2 expression will affect the efficiency of pre-tRNA processing. 70 MPP1 may be a new protein interaction target for therapy against tumors. 71 The epigenetic modification of these genes needs to be further investigation in cancer biology and clinical implication.
The genetic and epigenetic variations co-contribute to cervical carcinogenesis. Herein, we found that C > A and C > G mutations in hydroxymethylated sites were higher than non-hydroxymethylated cytosines ( Figure S7). These observations demonstrated an association between DNA mutation and modification, and these associations was also reported in human genome of cell lines, brain, kidney, and myeloid cells. 72,73 However, we found that methylation/hydroxymethylation and CNV were independent variables in our cohort, in line with previous results that methylation levels unrelated with CNV changes. 74 In addition, we predicted the performance of methylation and hydroxymethylation are better than that of CNV in the diagnosis for cervical carcinogenesis. This could be due to epigenetic alterations that have been indicated to occur much earlier than genetic alterations in CC. 75

CONCLUSION
In this study, we performed the genome-wide profiles of DNA methylation and hydroxymethylation at single-base resolution in the healthy, CINs lesions, CC. We found several genes that provide further support for the importance of epigenetics in tumorigenesis and progress. Through multi-omics analysis, 5mC, and 5hmC might be irrelevant to CNV in cervical carcinogenesis. Finally, we identified eight novel prognosis-associated genes and may serve as novel targets for CC treatment.

Sample collection
For physical examination volunteers and patients suffering from CINs and CCs, the written informed consent was obtained from each person before enrollment. The cervical exfoliated cells were sampled using cervical brushes for each enrolled patient, which were used to observe the morphological characteristics based on the liquid-based cytology (Becton Dickinson Company, New Jersey, USA). In Shenzhen People's Hospital, cervical liquid-based cytological test was used as a routine screening test for CC. Hence, those patients with cytological abnormality were recommended to accept colposcopy examination. And then the colposcopy findings were used to determine if a biopsy is necessary. The diagnoses of CIN1, CIN2, CIN3, or CC were determined and reviewed by two pathologists independently. 76 This study was approved by the Ethical Review Board of Shenzhen People's Hospital (SPH-2017022). A total of 16 samples including two healthy cervix tissues and six CINs and four CCs were enrolled in this study. The healthy cervix tissues and the paired adjacent paracancer tissues to CCs were used as the control. The biopsies from CINs and the surgically resected tumors and paired adjacent paracancer tissues were snap-frozen in liquid nitrogen and stored at −80 • C. All tissues were diagnosed by two independent pathologists using H&E staining. The purity of samples was above or equal to 70% according to pathological results, while no tumor content was observed in adjacent tumor tissues, CINs, and healthy cervix. All the CINs and cancer patients were HPV positive, while the healthy donors were HPV negative. A detailed description of clinical information was found in Table S1.

WGBS-seq and oxWGBS-seq library preparation
The WGBS and oxWGBS libraries were prepared followed by the instruction of TrueMethyl Seq Kit (TrueMethyl Seq Kit, CEGX, Cambridge Epigenetix Limited). One microgram genomic DNA and non-methylated lambda DNA were fragmented to 250 bp, then end-repair step was performed in the presence of phosphokinase and DNA polymerase. The control DNA was spiked in to assess oxidative conversion rate. Then, repaired DNA was adding tailing A and adaptorization. The adaptorized libraries were purified with magnetic beads, denatured, and oxidized with or without KRuO4 to generate oxBS and BS libraries. After oxidation, the DNA was converted by sodium bisulfite and amplified with methylated primers. Then the digestion control was amplified and qualified with TaqαI enzymes. Finally, the libraries were amplified, qualified with bioanalyzer, and sequenced by Illumina Xten with PE150.

Methylome and hydroxymethylome bioinformatical analysis
Raw bisulfite sequencing data were filtered by cutadapt and trimgalore. Then clean data were mapped to reference hg19 using BSMAP software. 77 The methylation levels at a single cytosine site represented as the beta value (the ratio of methylated cytosine count to total number of cytosine counts). The hydroxymethylation levels were obtained by mlml algorithm. 78 The DMR and DhMR were analyzed by metilene, 79 using circular binary segmentation with the following parameters and other default parameters. DMR was identified as differential methylated region with methylation levels > 0.2 and CpG coverage ≥ 5, 80,81 2D-KS and Mann-Whitney U test p < 0.001, and corrected by Benjamini-Hochberg method. DhMR was defined as differential hydroxymethylated region with levels > 0.2, CpG coverage ≥ 5, mininum CpG ≥ 5 per region, valley filter = 0.05, 2D-KS and Mann-Whitney U test p < 0.001, and corrected by Benjamini-Hochberg method. 79 The number of DMR/DhMR varied dependent on the parameter thresholds; the number and landscape of DhMR with different parameters of two algorithms were shown in Figure S8 and Table S5.

Transcriptome data and correlation analysis
The RNA-seq data of paired CC samples derived from our previously reported work was used to obtain the results in Supplementary Figure S5 25 And, the RNA-seq data were classified into high and low expression using absolute fold change of 2 as cutoff, while genes with no changes were defined as "no" change group. We calculated the methylation levels of all the genes localized around TSS or exons in the healthy, CINs, CC, and matched paracancer tissues.
For RNAseq from TCGA (n = 307), the RSEM value for each DAGs/DhAGs weas selected and correlated with the average methylation levels within DMR/DhMR for each gene with promoter methylation using Spearman coefficiency method.

SNV analysis using bisulfite sequencing data
The SNVs from WGBSseq and oxWGBSseq were obtained by Bis-SNP using bisulfite sequencing data. 82 Briefly, all the possible genotypes for each SNP were modeled using dbSNP data. Then the bisulfite data for each cytosine were used to establish Bayesian model using prior bisulfite conversion rate and probabilities of methylation level. Finally, the likelihood probability of SNP frequency was measured at the same loci by Bayesian interfere. We chose the SNVs that were identified both in bisulfite libraries and oxidative bisulfite libraries in the same sample to ensure the reliability. The overlapped SNVs were further filtered by 1000G, Esp6500, dbSNP150, Gnomad_exome, and exac03 datasets. Then the SNVs sites were counted by transversion/transition type, classified according to methylation status at cytosine sites, and viewed by ggplot2 package.

CNV analysis and correlation analysis
Copy number variation was called by Aberration Detection in Tumour Exome algorithm using whole-exome sequencing data. 83 The cutoff of copy number gain and loss were defined as absolute threshold ≥ 0.2. The copy number alterations were viewed through copynumber R package.
Copy number variation of bisulfite sequencing data was called by ReadDepth for single samples. 84 Calling was detected by circular binary segmentation using the default parameters of software. The optimized bin was used as 33 kb for each samples. CNV gain and loss threshold was set as copy number ≥ 2.68 and copy number ≤ 1.38, respectively. Using the z score of log2 (copy number ratio) and methylation variation as follows, 26 the correlation of pairwise CNV and methylation was measured for all samples using bisulfite sequencing data.
Whereas, M represent log2 of beta values. SD case means the standard deviation of case sample. And logR depicts the log2 of read density.

Geneset enrichment analysis
The DMR/DhMR-associated genes were annotated by GenomicRanges and genomation package. GO and KEGG pathway enrichment for DAGs/DhAGs of each disease stage was performed by clusterProfiler. 85

Enrichment analysis
To understand the enriched functional region that DMR/DhMR preferred to occur in the genome, we randomly generate the same size of DMR/DhMR region as the expected regions. Then we counted the number of DMR/DhMR with or without specific functional regions (intergenic, CpG island, CpG shore, CpG shelves, FAN-TOM5_enhancer, gene bodies, 3′UTR, 5′UTR, intron, exon, flanking 1-5 kb of genes, boundary, and promoters) using annotatr package. Then the numbers of observed DMR/DhMR was compared with that of expected regions using hypergeometric test. The enrichment fold was calculated by the ratio of each functional region to the randomly simulated region.

Statistical analysis
Methylation levels were measured as the ratio of methylated cytosine to total cytosines at a single site. For comparison, the functional regions were divided into 20 or 10 bins. And the average methylation levels at functional regions including promoter, exons, introns, CpGi, CpG shores (URL:https://genome.ucsc.edu/cgi-bin/hgTables), and enhancers were measured using genomation and GenomicRanges packages (R v3.6.1). 86 The functional regions were annotated by annotatr R package. 87 The average methylation level for each gene in DMR/DhMR was calculated using TCGA level3 methylation data (n = 307). The methylation level or gene expression was classified into a high and low group