Stratification of HPV‐associated and HPV‐negative oropharyngeal squamous cell carcinomas based on DNA methylation epigenotypes

While the incidence of oropharyngeal squamous cell carcinoma (OPSCC) has been increasing in these two decades, primarily due to human papillomavirus (HPV), stratification of OPSCC into molecular subgroups showing different clinicopathological features has not been fully investigated. We performed DNA methylome analysis using Infinium 450k for 170 OPSCC cases, including 89 cases in our cohort and 81 cases reported by The Cancer Genome Atlas, together with targeted exon sequencing analysis. We stratified OPSCC by hierarchical clustering analysis using methylome data. Methylation levels of classifier markers were validated quantitatively using pyrosequencing, and area under the curve (AUC) values of receiver operating characteristics (ROC) curves were calculated. OPSCC was stratified into four epigenotypes: HPV(+) high‐methylation (OP1), HPV(+) intermediate‐methylation (OP2), HPV(−) intermediate‐methylation (OP3) and HPV(−) low‐methylation (OP4). Ten methylation marker genes were generated: five to classify HPV(+) cases into OP1 and OP2, and five to classify HPV(−) cases into OP3 and OP4. AUC values of ROC curves were 0.969 and 0.952 for the two marker panels, respectively. While significantly higher TP53 mutation and CCND1 copy number gains were observed in HPV(−) than in HPV(+) groups (p < 0.01), no significant difference of genomic aberrations was observed between OP1 and OP2, or OP3 and OP4. The four epigenotypes showed significantly different prognosis (p = 0.0006), distinguishing the most favorable OPSCC subgroup (OP1) among generally favorable HPV(+) cases, and the most unfavorable OPSCC subgroup (OP3) among generally unfavorable HPV(−) cases. HPV(+) and HPV(−) OPSCC are further divided into distinct DNA methylation epigenotypes, showing significantly different prognosis.


Introduction
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide, with an annual incidence of 600,000 cases, accounting for 3-5% of all malignancies. Despite significant advancement in treatment strategy, HNSCC remains lethal with a 5-year survival rate of 40-50%. 1,2 HNSCC has shown a paradigmatic change in characteristics in the last two decades due to human papillomavirus (HPV) infection. Although the incidence of cervical cancer is decreasing due to improvement in screening systems and vaccination, 3 the incidence of oropharyngeal squamous cell carcinoma (OPSCC) has been on the rise over the last two decades, primarily due to the dramatic increase in the number of HPV-associated OPSCC cases 4,5 and the nonavailability of screening instruments for HPV-associated OPSCC, unlike cervical cancer. 6 HPV-associated OPSCC has distinct clinical and biological characteristics compared to HPV-negative cancer. 7,8 A metaanalysis confirmed that patients with HPV-associated OPSCC have a 28% lower risk of death (meta HR 0.72 [95% CI 0.5-1]) than HPV-negative OPSCC. 9 Several clinical trials on treatment de-escalation for HPV-associated OPSCC have attempted to maintain better overall survival (OS) by reducing the acute and late toxicity induced by treatment. 10 Recent reports, however, showed that approximately 20% HPV-associated OPSCC patients do not show good prognosis or good response to treatment, 11 and the molecular mechanism(s) underlying this intermediate-risk group remains elusive. This renders de-escalation of treatment strategy difficult, with the possibility of the existence of other significant molecular bases in HPV-associated OPSCC. Furthermore, the underlying molecular mechanisms of HPV-negative OPSCC, which exhibits a significantly worse prognosis than HPVassociated OPSCC, also need to be urgently elucidated. A proper understanding of molecular landscape of OPSCC is therefore required to address these issues.
Previous studies focusing on HPV-negative HNSCC cohorts revealed frequent mutations in several genes, such as TP53, CDKN2A and PIK3CA, as well as NOTCH pathway gene alterations using genomic and transcriptomic studies. [12][13][14][15] TP53 somatic mutation and CDKN2A genetic alterations are rare in HPV-associated OPSCC; instead, p53 and RB are inactivated by the HPV E6 and E7 oncoproteins, respectively. [15][16][17][18] In 2015, the Cancer Genome Atlas (TCGA) consortium performed a comprehensive genomic analysis of 279 HNSCC samples, 33 and 36 of which were OPSCC and HPV HNSCC samples, respectively. This analysis validated genetic alterations in HPV-negative HNSCC, such as TP53 and CDKN2A, and revealed distinct genetic alterations in HPV-associated OPSCC, such as PIK3CA, E2F1 and TRAF3. 19 The study, however, included only 33 OPSCC samples, which could not be precisely stratified.
DNA methylation, one of the critical epigenetic mechanisms silencing tumor suppressor genes in cancers, 20 has been used to analyze HNSCC samples in numerous studies, with the identification of promoter DNA methylations in several genes including CDKN2A (p16 INK4A ), DAPK, HOXA9, MGMT, GATA4 and RASSF1, which are important in HNSCC carcinogenesis. 21,22 Viral infection could induce aberrant DNA methylations during carcinogenesis, 23,24 and HPV-associated OPSCC tends to harbor more aberrant DNA methylations compared to HPV-negative OPSCC. 25,26 TCGA also performed a comprehensive DNA methylation analysis of 279 HNSCC samples, revealing that HPVassociated HNSCC harbored significantly higher methylation level than HPV-negative HNSCC, although this cohort included only 33 OPSCC cases. Simon et al. reanalyzed the data of 532 HNSCC samples uploaded by TCGA, including 81 OPSCC samples, and stratified HNSCC into five clusters: one HPV(+) cluster and four HPV(−) clusters. 27 Although one of the four HPV(−) cluster was named H3K36 cluster, characterized with H3K36Met alterations or NSD1 mutation, the clinicopathological features of the five clusters, such as distinct prognosis, were unclear. While only three OPSCC cases were included in H3K36 cluster, HPV-associated OPSCC or even HPV-associated HNSCC was not divided into further subtypes.
Taken together, we hypothesized that HPV-associated and HPV-negative OPSCC could be further classified into molecular subgroups with distinct risks, and comprehensive OPSCC stratification based on genetic and epigenetic statuses reflecting disease prognosis using larger cohort might be useful for de-escalation treatment strategies.
In our study, we performed DNA methylation analysis of 89 clinical OPSCC samples and five noncancerous samples on a genome-wide scale using Infinium HumanMethylation450 BeadChip arrays. This is the largest cohort study for the sake of OPSCC samples only. We performed hierarchical clustering analysis for a total of 170 OPSCC samples by combining 81 samples from TCGA, and categorized OPSCC into four DNA methylation epigenotypes, including two HPV(+) epigenotypes and two HPV(−) epigenotypes. We identified five classifier marker genes to divide HPV(+) OPSCC and another five to divide HPV(−) OPSCC, and validated these markers quantitatively by pyrosequencing. We also performed targeted exon sequencing analysis of 86 OPSCC samples and seven noncancerous samples to analyze mutation profiles of the identified subtypes. We found that DNA methylation status can successfully divide OPSCC into four molecular subgroups reflecting distinct prognosis, which could not be classified by genomic mutation analysis.

Study design
We performed DNA methylome analysis using Infinium 450k for 89 OPSCC cases and targeted exon sequencing for 86 OPSCC cases, and analyzed the data together with 81 cases reported by TCGA. We stratified OPSCC by hierarchical clustering analysis using methylome data. Methylation levels of classifier markers were validated quantitatively using pyrosequencing, and area under the curve (AUC) values of receiver operating characteristics (ROC) curves were calculated (Supporting Information Fig. S1).

Clinical samples and cells
Clinical specimens were collected from 96 patients with OPSCC who underwent therapy at Chiba University Hospital or Hamamatsu University Hospital. Fifty-nine OPSCC What's new? Infection with human papilloma virus (HPV) generally lowers risk of death for patients with oropharyngeal squamous cell carcinoma but 20% of patients do not "benefit" from HPV infection in this sense. Here the authors used DNA methylome analysis to further subdivide HPV+ and HPV-cancer patients, successfully recapitulating the distinct prognosis. They also generated classifier markers to accurately distinguish between subtypes based on epigenome analysis, but not genomic mutations, thus potentially aiding with clinical treatment decisions in the future.
samples were obtained at Chiba University, and 37 OPSCC samples were obtained at Hamamatsu University. Written informed consent was obtained from all patients. Noncancerous mucosa samples were obtained from OPSCC patients via biopsy from the side opposite to the cancerous side. These biopsy specimens were immediately frozen in liquid nitrogen and stored at −80 C. The cancer cell content of the frozen materials was determined to be ≤10, 20,30,40,50,60,70,80 or ≥90% through microscopic examination by two independent pathologists, and was enriched by dissection when necessary. Among 96 samples, 89 samples, of which the cancer cell content was determined ≥50% by both the two pathologists, were used for subsequent molecular analyses. The remaining seven samples were excluded because two contained no cancer cell, three were determined as ≤50% cancer cell content by both pathologists, and two were determined as ≤50% by one pathologist. DNA was extracted using the QIAquick DNA mini kit (Qiagen, Hilden, Germany). DNA of normal peripheral blood cells (PBC) was purchased from the Coriell Cell Repositories. The protocol was approved by the institutional review board at Chiba University and Hamamatsu University.
Immunohistochemistry IHC for p53 was performed using a DO-7 anti-mouse monoclonal antibody (anti-p53 DO-7 primary antibody, Roche, Basel, Switzerland) and the BenchMark ULTRA automated staining system (Roche). Samples with diffuse strong positive nuclear staining were considered p53-IHC(+) and designated as TP53-mutation(+), while those with negative nuclear staining were considered p53-IHC(−) and also designated as TP53mutation(+). Those with sporadic positive nuclear staining were considered p53-IHC wild-type and designated as TP53-mutation(−). For p16 INK4A , IHC was performed using an antimouse monoclonal antibody (CINtec p16 Histology, Roche) and BenchMark ULTRA, and samples with positive nuclear staining (nuclear and cytoplasmic immunoreactivity in >75% of tumor cells) were considered p16-IHC(+) and designated as p16(+).

Amplification of L1 DNA region in high-risk HPV
HPV infection status was evaluated by amplifying a portion of the L1 region in high-risk HPV using GP5 and GP6 primers, as previously reported. 28 Samples with the PCR amplicon were designated as HPV-L1(+).

Infinium assays
The Infinium HumanMethylation450 BeadChip (Illumina) contains approximately 485,000 individual CpG sites covering 99% RefSeq genes with an average of 17 CpG sites per gene. For each CpG site, β-value, ranging from 0.00 to 1.00, was measured using a methylated probe relative to the sum of both methylated and unmethylated probes. Bisulfite conversion was performed using the Zymo EZ DNA methylation kit (Zymo Research, Irvine, CA) with 500 ng of genomic DNA for each sample. Whole genome amplification, labeling, hybridization and scanning were performed according to the manufacturer's protocols. Infinium analysis was performed against 89 clinical OPSCC and five noncancerous mucosa samples.
For analyzing DNA methylation at promoter regions based on Infinium data, the probe nearest to the transcription start site (TSS) was selected when multiple probes were designed for one promoter region. The CpG score for each probe was calculated based on a previous report. 29,30 For analysis of CpG island shore regions, N_Shore and S_Shore, the probe nearest to TSS was selected among probes at the CpG island shore (annotated by Illumina). For analysis of gene body, the probe showing the highest CpG score was selected among probes at gene body, that is, +4 kb downstream from the TSS through the transcription termination site based on a previous report. 31

TCGA samples and their HPV status
Independent Infinium datasets of 81 OPSCC samples supplied by TCGA (https://gdac.broadinstitute.org/) were used for additional and combined analysis of DNA methylation status. Their HPV status was determined by either p16-IHC or in situ hybridization test, and adapted to our study.

Targeted exon-sequencing
Targeted exon-sequencing against 86 clinical OPSCC and seven noncancerous mucosa samples was performed using GeneRead DNAseq Targeted Panels V2 and the Human Comprehensive Cancer Panel (Qiagen), which targets 160 candidate driver genes, suits short amplicon design and enables detection of low-level mutations, reducing the amount of DNA required. For each of the somatic mutations identified using this analysis, variants with an allele frequency <0.1 and >0.9 were excluded. The known variants previously reported in the dbSNP (http://www.ncbi.nlm.nih.gov/SNP) were filtered out, and synonymous mutations were also excluded. Subsequently, we used MutationTaster classification tools that can predict the functional consequences of amino acid changes or frameshift mutation, and considered the observed mutations significant when MutationTaster evaluated "disease-causing automatically." Finally, we excluded the mutations that were also found in seven noncancerous samples.

Gene ontology analysis
Gene annotation enrichment analysis was conducted based on GO (biological process, cellular component and molecular function) using the Functional Annotation tool of Metascape (http://metascape.org/gp/index.html#/main/step1). 32

Pyrosequencing analysis
Methylation status in 77 OPSCC cases and two normal samples was quantitatively validated using pyrosequencing with PyroMark Q96 (Qiagen) as previously reported. 30 Primers were designed using the Pyro Q-CpG software (Qiagen) to  Table S1). Methylation control samples (0, 25, 50, 75 and 100%) were prepared as described previously 33 and used to confirm the high quantification ability of pyrosequencing assays (Supporting Information Fig. S2).

Statistical analysis
Association between clinicopathological factors and DNA methylation or mutation status was analyzed using the Fisher's exact test, χ 2 test and Student's t-test. Unsupervised two-way hierarchical clustering was performed using the Cluster 3.0 software. OS was measured from the date of registration to the date of OPSCC-related death, filtering out patients who were alive at last follow-up and non-OPSCC-related deaths. OS distribution was estimated using the Kaplan-Meier method. Differences between groups were determined using a log-rank test and Cox proportional hazard model using the R software (www.r-project.org/). Univariate and multivariate analysis for correlation of clinicopathological factors with prognosis was performed using the Cox proportional hazard model. p < 0.01 or <0.05 was considered statistically significant. False discovery rate was estimated by calculation of q-value by Benjamini-Hochberg method.

Data availability
Infinium data were submitted to the GEO database under the accession number GSE124633. Targeted exon-sequencing data were submitted to the Sequence Read Archive (SRA) database under the accession number: SRR8775783−SRR8775850.

Immunostaining analyses and HPV status
HPV status was evaluated using PCR amplification of the L1 region in high-risk HPV with GP5/GP6 primers (Fig. 1a). Expression of p16 INK4A and p53 in HPV-associated OPSCC and HPVnegative OPSCC was analyzed using IHC (Fig. 1b). Whereas 49 of 51 HPV-associated OPSCC samples subjected to IHC were p16(+), only one HPV-negative OPSCC sample was p16(+). Most of the p16(+) cases, except two, were TP53-mutation(−). HPV status and p16(+) status correlated significantly with TP53-mutation (−) (p < 1 × 10 −5 ), consistent with a previous study. 31 Hierarchical clustering analysis of Infinium 450k data We performed Infinium 450k BeadArray analysis on 89 clinical OPSCC and five noncancerous mucosa samples (Fig. 1c). According to the previously proposed classification, 29 Infinium probes were classified into three categories: high-CpG, intermediate-CpG and low-CpG probes, on the basis of CpG ratio and GC content within the 500-bp region (from −250 bp to +249 bp) around the probe site. In our study, we used highand intermediate-CpG probes, and the probe closest to the TSS was selected for each gene for promoter analysis; 2,404 probes with standard deviation in β-values >0.10 in 89 clinical OPSCC and five noncancerous mucosa samples were selected.
Unsupervised 2-way hierarchical clustering analysis of 89 OPSCC and five normal mucosa samples clearly classified these samples into three clusters: high methylation epigenotype (HME), intermediate methylation epigenotype (IME) and low methylation epigenotype (LME). OPSCC samples in HME were highly methylated compared to those in the other epigenotypes (p = 2 × 10 −67 ) and were all HPV-associated OPSCC (p < 1 × 10 −5 ; Fig. 1c). IME was the moderately methylated cluster including both HPV-associated and HPV-negative OPSCC samples. LME was the lowest methylation cluster, including the five noncancerous mucosa samples and mainly HPV-negative OPSCC samples. We also investigated CpG island shore regions (N_Shore: upstream from TSS and S_Shore: downstream from TSS) and gene body regions. All of these regions in each cluster were generally hypermethylated.
To validate these epigenotypes, we performed unsupervised two-way hierarchical clustering analysis of 81 OPSCC data uploaded by TCGA using the same 2,404 probes. OPSCC samples were clearly classified into three epigenotypes: HME, IME and LME (Fig. 1d). While almost all HME and IME samples were HPV-associated OPSCC, LME mainly included HPV-negative samples. Outlier samples mainly included HPV-negative OPSCC with intermediate level of methylation.

Stratification of OPSCC into four subtypes
For clearer stratification, we combined our methylation data of 89 OPSCC samples and TCGA's methylation data of 81 OPSCC samples (Fig. 2a). We selected 1,315 high-and intermediate-CpG probes with standard deviation of β-values >0.10 in the 170 clinical OPSCC and five noncancerous mucosa samples for this analysis. The 170 OPSCC cases were classified into mainly four subgroups: OP1, HPV(+) HME; OP2, HPV(+) IME; OP3, HPV(−) IME; OP4, HPV(−) LME (Fig. 2b). An additional cluster including all the normal samples was considered as "Others (Normal-like)". Upon increasing the number of samples, IME was stratified into HPV(+) IME and HPV(−) IME. The OP1 subgroup (HPV(+) HME) showed markedly higher methylation levels in promoter region than the other subgroups when a cluster of 486 genes was extracted from the 1,315 high-and  We also investigated methylation levels in CpG island shore (N_Shore and S_Shore) and gene body regions. All these regions in each epigenotype generally showed higher level of DNA methylation, and the tendencies of average β-values in each epigenotype were similar in promoter region, except for the gene body between HPV(+) IME and HPV(−) IME. In summary, methylation levels were altered most apparently in promoter region, and we focused on promoter region for subsequent analyses.
As for HPV(−) group, 81 genes were extracted as OP3 markers showing hypermethylation in OP3 but not in OP4, while 41 genes were extracted as OP3 + OP4 markers showing hypermethylation in both OP3 and OP4 (Fig. 3b). PRC-target genes in ES cells were not significantly frequent in OP3 markers (15%, 12/81; p = 0.1), but more frequently included in OP3 + OP4 markers significantly (24%, 10/41; p = 0.003; Fig. 3b). Marker genes specifically hypermethylated in a subgroup, such as OP1 and OP3, did not show enrichment of PRC-target genes in ES cells, while those hypermethylated commonly, such as OP1 + OP2 and OP3 + OP4, showed significant enrichment of PRC-target genes, as observed in other malignancies. 30 Given these results, the five most valuable genes were chosen from OP1 and OP3 markers as HPV(+) and HPV(−) classifier markers, respectively (Figs. 3 and 4). Five HPV(+) classifier markers were MYOD1, CDH11, PLCB4, FGF5 and PIK3AP1 (Fig. 4a). The accuracy of classification was evaluated by ROC analysis based on true and false positive rates using quantitative DNA methylation levels of the classifier markers (Fig. 4b). When OPSCC sample showing hypermethylation (β > 0.2) in ≥2 out of 5 HPV(+) classifier markers was determined as OP1, the calculated AUC values of our cohort was 1.00 for our cohort, and as high as 0.933 if TCGA samples and our cohort were combined. The methylation levels of the markers were then validated by quantitative methylation analysis using pyrosequencing (Fig. 4a). We analyzed the methylation levels of CpG sites at the selected Infinium probe site as well as its surrounding CpG sites, using OPSCC and noncancerous samples that were previously used for Infinium analysis. Methylation levels were confirmed to be highly similar, not only for the selected probe site, but also the surrounding CpG sites (Fig. 4b). When OPSCC sample showing methylation level >20% in ≥2 out of 5 HPV(+) classifier markers was determined as OP1 by using pyrosequencing data, the calculated AUC value was as high as 0.969.
We reclassified the 86 OPSCC samples into OP1 -OP4 epigenotypes using HPV status and DNA methylation information of classifier markers, and compared gene mutations, copy number variations and clinicopathological data of OPSCC samples (Supporting Information Fig. S4). The representative carcinogenic pathways, namely, those involved in p53 signaling, cell cycle and survival, RTK/RAS signaling, PI3K signaling, epigenome and differentiation, were compared. For the HPV(−) group, OP3 and OP4 showed significantly higher number of somatic mutations in TP53 and CCND1 copy number gains (p < 0.01). For carcinogenic pathways, genes associated with p53 signaling and cell cycle and survival showed a significantly higher number of somatic mutations and copy number variations (p < 0.01). For the HPV(+) group, p16(+) and TP53-mutation(−) that were determined by IHC, as well as subsite of primary tumors (tonsils and base of tongue) were significantly more associated with OP1 and OP2 than OP3 and OP4 (p < 0.01). Notably, OP1 included remarkably lower number of dead patients, while OP3 included remarkably higher number of dead patients (p < 0.01). Any molecular markers other than DNA methylation did not significantly distinguish OP1 and OP2, or OP3 and OP4 (Supporting Information Fig. S4).

Combined analysis of genomic alterations using our and TCGA's data
To further clarify the relationship among mutation status, clinicopathological factors and each epigenotype, we analyzed mutations and copy number variations by combining data of our 86 OPSCC samples and TCGA's 74 OPSCC samples (Fig. 5). In this combined analysis, mutations that were classified as "silent" were excluded. OP3 and OP4 showed significantly higher number of somatic mutations in TP53 and CCND1 copy number gains (p < 0.01). The incidence of somatic mutations and copy number variations in genes associated with p53 signaling and cell cycle and survival were also significantly high in OP3 and OP4 (p < 0.01). The subsites of primary tumors (tonsils and base of tongue) were significantly highly represented in the OP1 and OP2 (p < 0.01). OP3 included remarkably higher number of dead patients (p < 0.01), whereas OP1 included remarkably lower number of dead patients (p < 0.01). Any molecular markers other than DNA methylation did not significantly distinguish OP1 and OP2, or OP3 and OP4 (Fig. 5).

Comparison of epigenotype and mutation signatures
To compare epigenotypes with mutation signatures, we performed mutation signature analysis for a total of 34,988 SNVs in OPSCC samples. Mutation signature analysis did not divide OPSCC samples into any subtypes, showing highly similar mutation signatures among epigenotypes (Supporting Information Fig. S5a). All the epigenotypes had frequent C>A (35-36%) and C>T mutations (41-42%; Supporting Information Fig. S5b). OPSCC samples were highly enriched in Signature 1, which correlated with age of cancer diagnosis (Supporting Information Fig. S6).

Prognostic analysis of OPSCC samples with respect to HPV status and epigenotypes
We finally investigated the association of epigenotype and HPV status with prognosis. HPV-associated OPSCC showed significantly better prognosis than HPV-negative OPSCC  ; Fig. 6b). Among generally favorable HPV(+) OPSCC, OP1 showed the most favorable outcome, with OP2 being relatively worse (p = 0.02). Among generally unfavorable HPV(−) OPSCC, OP3 showed the most unfavorable outcome, with OP4 being relatively better (p = 0.03). These indicate that HPV-associated favorable OPSCC and HPV-negative  unfavorable OPSCC can be further divided by DNA methylation epigenotypes.
We also analyzed the prognosis based on combination of HPV status and smoking or alcohol consumption status. The HPV(+) OPSCC showed significantly better prognosis than HPV(−) in these cohorts, but none of the HPV(+) or HPV(−) groups were further stratified significantly by smoking status (p = 0.3 and 0.6, respectively) or alcohol consumption status (p = 0.7 and 0.8, respectively; Supporting Information Fig. S7b). To investigate the correlation between prognosis and clinicopathological factors, we performed multivariate analysis using cox proportional hazard model (Supporting Information Table S2). OPSCC epigenotypes investigated in our study was only considered statically significant in multivariate analysis (p = 0.0002, HR 2.2 [95% CI, 1.5-3.3]).

Discussion
To elucidate the molecular mechanisms underlying intermediate risk HPV-associated OPSCC and generally unfavorable HPV-negative OPSCC, we here analyzed DNA methylome by the Infinium 450K, genetic alterations by targeted exon sequencing, and their correlations with clinicopathological factors, using our cohort of 89 OPSCC in combination with TCGA's 81 OPSCC samples. We found that OPSCC is stratified into four epigenotypes (OP1-OP4), reflecting different clinicopathological features. OP1 showed the most favorable outcome among generally favorable HPV(+) OPSCC, and OP3 showed the most unfavorable outcome among generally unfavorable HPV(−) OPSCC. These indicate that DNA methylation epigenotypes can further stratify HPV-associated and HPV-negative OPSCC, reflecting distinct prognosis, which could not be achieved by other features such as genomic mutations.
The correlation between DNA methylation and cancer has been observed in several cancer types, especially in virusassociated cancers such as gastric cancer (associated with Epstein-Barr virus, EBV), 30 hepatocellular carcinoma (associated with hepatitis B and C viruses), 35 cervical cancer 36 and HNSCC (associated with HPV), 19 partially because aberrant DNA methylation can be induced by virus infection during carcinogenesis. 23,24 Lleras et al. showed that HPV-associated HNSCC samples harbored approximately three times higher methylation than HPV-negative HNSCC samples. 26 We have previously observed aberrant promoter hypermethylation in OPSCC samples, and showed that HPV-associated OPSCC samples had significantly higher methylation level. 31 Two previous studies based on comprehensive genomewide data of HNSCC by TCGA, investigated distinct subtypes of HNSCC in association of HPV status, although these data included only 33 and 81 OPSCC samples, respectively, 19,27 and mainly included other HNSCC samples. Lechner et al. also performed an DNA methylation analysis of 32 OPSCC samples on a genome-wide scale using formalin-fixed paraffin-embedded tissues, and showed hypermethylation of genes including CDH1 in HPV(+) OPSCC cases. 37 Despite clinical requirements, there has been no study analyzing solely OPSCC samples in a sufficiently large sample size. To the best of our knowledge, this is the first study to stratify OPSCC cases using an adequate sample size, and successfully into four DNA methylation epigenotypes. Favorable HPV(+) OPSCC as well as unfavorable HPV(−) OPSCC are further divided into subgroups reflecting distinct prognosis, and two panels of classifier markers were established.
The complex mutation landscape of HNSCC has been elucidated by several studies, including TCGA projects. 19,38 HPV-negative HNSCC is mainly characterized by genomic loss of TP53 and genetic alteration of CDKN2A/RB1, and is also associated with mutations in genes involved in carcinogenic signaling of differentiation (NOTCH1 and TP63), oxidative stress (NFE2L2 and KEAP1), WNT signaling (AJUVA and FAT1), immune evasion (HLA-A, B2M and TGFBR2) and chromatin remodeling (MLL2 and NSD1). The amplification of genes encoding receptor tyrosine kinases (RTKs) such as EGFR, ERBB2, MET and FGFR1 are also observed in this cancer type. In contrast, HPV-associated HNSCC are characterized by few genomic alterations in TP53 and CDKN2A, but harbor PIK3CA alterations, copy-number gains in TRAF3 and E2F1 and lack of CCND1 amplification, compared to HPVnegative cancer. While studies solely focusing on OPSCC are lacking, we compared our OPSCC data of DNA methylome as well as targeted exon sequencing with TCGA data for OPSCC cases. Both our data and TCGA data showed that TP53 mutations, CCND1 copy number gains and alterations in genes associated with p53 signaling and cell cycle and survival were more frequent in HPV(−) than in HPV(+) OPSCC, consistent with the previous studies.
Although the relationship between genetic and epigenetic alterations has been previously reported in other cancer types, such as colorectal cancer exhibiting correlation of high methylation with BRAF mutation or intermediate methylation with KRAS mutation, 39,40 reports on HNSCC are limited. Loss-offunction mutation in NSD1, which regulates the H3K36 methylation pathway, was previously reported in HNSCC, and the H3K36 alteration subgroup in our study showed marked DNA hypomethylation. However, only three OPSCC samples were included in this subgroup, which mainly constituted from laryngeal cancer and oral cancer samples. 27 In our OPSCC cohort with 89 cases, only six samples harbored NSD1 mutation, and there was no significant corelation with any epigenotype. Although genetic mutation could stratify OPSCC samples into two types, HPV(+) OPSCC and HPV(−) OPSCC, it could not further divide HPV(+) or HPV(−) OPSCC into subtypes, indicating that, besides genetic alterations, the accumulation of aberrant epigenetic alterations might deeply influence OPSCC biology and contribute to additional subclassification of both HPV-associated and HPVnegative OPSCC. 16,41 Whereas HPV-associated OPSCC shows generally better prognosis than HPV-negative OPSCC, a fraction of HPVassociated OPSCC cases has reportedly exhibited therapeutic resistance or worse prognosis. Ang et al. reported the importance of smoking status combined with tumor stage, in stratifying HPV-associated OPSCC, 11 which was subsequently validated by Granata et al. 42 However, the molecular aberrations underlying this intermediate risk is yet to be clarified despite previous genome-wide and epigenome-wide studies in HNSCC. 19,38 This could be because only a few epigenome analyses had focused solely on OPSCC, and even with these, the cohort size was small. [43][44][45] We here performed DNA methylation analysis of the largest OPSCC cohort on a genome-wide scale, and 170 OPSCC cases were successfully classified into four epigenotypes reflecting the prognostic status, while TNM stage did not correlate with prognosis. While there have been no molecular markers available that clinically distinguish low-risk patients from intermediate-risk patients in HPV-associated OPSCC, we here established highly accurate classifier marker panels to easily categorize OPSCC into distinct epigenotypes. While smoking status was reportedly associated with subgroup of HPV-associated OPSCC with relatively worse prognosis, 11 smoking status was not significantly associated with OP2 or any other epigenotypes (Fig. 5), and HPV(+) or HPV(−) OPSCC was not further stratified significantly by smoking status, suggesting that independent molecular markers that are more accurate than smoking status were identified in our study. In addition, multivariate analysis estimated our classification into four distinct epigenotypes as the only independent prognostic factor.
Although we successfully divided OPSCC into four epigenotypes, the cause of DNA methylation remains unclear. To explain two distinct patterns of aberrant DNA methylation accumulation in HME and IME, at least two different environmental factors might be involved. As HME is observed only in HPV(+) groups, HME might be related to HPV infection. The most direct explanation is that HPV might induce DNA methylation. E7, an oncoprotein of HPV, has been shown to directly interact with DNA methyltransferase 1 (DNMT1) in vitro, 46 and increased expression of DNMT1 has been observed in HPV-associated OPSCC 47 and cervical cancer, which is also an HPV-associated cancer. 48 E6/E7 is reportedly upregulated in the case of E2-disrupted HPV integration or DNA methylation of E2-binding sites (E2BSs). 49 Parfenov et al. reported in a relatively small 35 HNSCC cohort (including 22 OPSCC cases) that DNA methylation profiles of HPV-integration(+) tumors were similar to those of HPV-negative tumors, whereas HPV-integration(−) tumors showed relatively higher methylation. 44 While no significant association was observed between HPV-integration status and clinical outcome in this 35 HNSCC cohort, Nulton et al. reported a significant relationship between HPV-integration (+) and poor survival in 56 HPV-associated HNSCC. 50 In gastric cancer, while infected EBV genome exists in episomal form and EBV infection was shown to cause extensive DNA methylation induction, EBV(+) gastric cancer shows markedly high DNA methylation and favorable clinical outcome. 30 Considering these, episomal HPV might be involved in DNA methylation induction and form a subgroup with high methylation and relatively good prognosis; further investigations, for example, DNA methylation status of E2BSs in episomal HPV genome, are however required to fully clarify this. IME exists in both of HPV(+) and HPV(−) OPSCC (OP2 and OP3, respectively), and corresponds to poorer prognosis regardless of HPV status. A certain factor independent of HPV might be expected to involve DNA methylation induction to develop IME and poor prognosis. Our present study, however, showed no significant correlation of IME (OP2 and OP3) with any molecular or clinicopathological features including smoking, alcohol intake or genetic alterations. Further detailed investigations are also necessary to clarify the causes of IME and its association with poor prognosis.
In summary, our DNA methylation epigenotypes distinguished the most favorable OPSCC subgroup (OP1) from OP2 among generally favorable HPV(+) cases, and the most unfavorable OPSCC subgroup (OP3) from OP4 among generally unfavorable HPV(−) cases. We also established classifier marker panels, while such subclassification of OPSCC could not be achieved by other features such as genomic mutation. These findings might improve our understanding of the molecular basis of OPSCC and de-escalation treatment strategies.