Investigation of somatic mutation profiles and tumor evolution of primary oropharyngeal cancer and sequential lymph node metastases using multiregional whole‐exome sequencing

Lymph node (LN) metastasis is an important factor in determining the treatment and prognosis of oropharyngeal squamous cell carcinoma (OPSCC). Here, we compared the somatic mutational profiles and clonal evolution of primary and metastatic LNs using multiregion sequencing of human papilloma virus (HPV)‐positive OPSCC and HPV‐negative OPSCC. We performed high‐depth whole‐exome sequencing (200×) of 76 samples from 18 patients with OPSCC (10 HPV‐positive and 8 HPV‐negative), including 18 primary tumor samples, 40 metastatic LN samples, and 18 normal tissue samples. Among 40 metastatic LNs, 22 showed extranodal extension (ENE). Mutation profiles of HPV‐positive OPSCC and HPV‐negative OPSCC were similar to those reported previously. Somatic mutations in CDKN2A and TP53 were frequently detected in HPV‐negative OPSCC. Somatic mutations in HPV‐positive OPSCC samples showed APOBEC‐related signatures. Somatic mutations from metastatic LNs showed a different pattern than the primary tumor. Somatic mutations acquired in the WNT pathway during metastasis showed a significant relationship with ENE. Clonal evolution analysis of primary and metastatic LNs showed that, in some cases, each metastatic LN originated from a different primary tumor sub‐clone.

Lymph node (LN) metastasis is an important factor in determining the treatment and prognosis of oropharyngeal squamous cell carcinoma (OPSCC). Here, we compared the somatic mutational profiles and clonal evolution of primary and metastatic LNs using multiregion sequencing of human papilloma virus (HPV)-positive OPSCC and HPV-negative OPSCC. We performed high-depth whole-exome sequencing (2009) of 76 samples from 18 patients with OPSCC (10 HPV-positive and 8 HPV-negative), including 18 primary tumor samples, 40 metastatic LN samples, and 18 normal tissue samples. Among 40 metastatic LNs, 22 showed extranodal extension (ENE). Mutation profiles of HPV-positive OPSCC and HPVnegative OPSCC were similar to those reported previously. Somatic mutations in CDKN2A and TP53 were frequently detected in HPV-negative OPSCC. Somatic mutations in HPV-positive OPSCC samples showed APOBEC-related signatures. Somatic mutations from metastatic LNs showed a different pattern than the primary tumor. Somatic mutations acquired in the WNT pathway during metastasis showed a significant relationship with ENE. Clonal evolution analysis of primary and metastatic LNs showed that, in some cases, each metastatic LN originated from a different primary tumor sub-clone.

Introduction
The incidence of oropharyngeal squamous cell carcinoma (OPSCC) associated with human papilloma virus (HPV) has increased in recent decades, whereas the incidence of OPSCC caused by smoking and drinking has decreased. HPV-positive OPSCC has distinct clinical features compared with HPV-negative OPSCC [1][2][3]. It occurs in relatively young patients and the response to treatment is favorable, so the patient's life expectancy is relatively longer than HPV-negative OPSCC. Therefore, a range of clinical trials has investigated how to reduce treatment-related morbidities and improve the quality of life of these patients [4][5][6][7][8]. However, despite the favorable prognosis of HPVpositive OPSCC, the locoregional failure rate remains high at about 15%, and 10-20% of HPV-positive OPSCC patients eventually die due to disease progression. Although HPV-negative OPSCC due to smoking and drinking has decreased, it still accounts for 30% of OPSCC cases and the prognosis of these patients is worse than that of patients with HPV-positive OPSCC. To improve the poor prognosis of HPV-negative OPSCC patients and a subset of HPV-positive OPSCC patients, molecular and genomic studies to understand the mechanisms underlying the progression and metastasis of OPSCC are required.
Most OPSCC cases originate from the palatine tonsils, with lymphatic metastasis occurring first in the level II area known as the first echelon lymph nodes (LNs). Thereafter, LN metastasis sequentially occurs to level III and level IV LNs along the jugular chain. During this process, extranodal extension (ENE), an adverse prognostic feature, occurs in some metastatic LNs. In the revised 8th American Joint Cancer Committee (AJCC) classification guidelines, the N classification was upgraded to N2a or N3b for the finding of ENE, including in HPV-negative OPSCC [9][10][11][12]. Although there is controversy surrounding the prognostic significance of ENE findings in HPV-positive OPSCC, current NCCN guidelines recommend adjuvant treatment if ENE is observed.
Next-generation sequencing (NGS) technology has made it possible to perform genomic analysis of cancer samples to identify genetic mutations related to the progression and metastasis of cancer in a cost-effective manner. In this study, we performed multiregional whole-exome sequencing of the primary tumor and metastatic LN samples to explore the genetic variation and tumor evolution of OPSCC tumors with sequential LN metastases and ENE. An understanding of the molecular genetic mechanisms underlying sequential lymph node metastasis and ENE will likely improve the treatment outcomes of OPSCC patients.

Subject ascertainment
Among OPSCC patients who underwent surgery at Severance Hospital, 18 patients who had surgical specimens available and who provided written informed consent for use of these samples to obtain genetic information were enrolled in this study. An experienced specialized head and neck pathologist (S. J. Shin) marked the tumor-corresponding region in formalin-fixed paraffinembedded (FFPE) blocks through hematoxylin and eosin (H&E) staining. Tumor portions on LNs containing tumor metastasis were marked and matched with the primary tumor followed by manual dissection. For metastatic LN samples, metastatic LNs with ENE findings were preferentially collected, and if available, metastatic LNs samples were collected at different neck levels including II, III, and IV. Normal tissue samples were also obtained from surgical specimens. Clinical information was obtained from medical records. This study was approved by the Institutional Review Board of Yonsei University (4-2020-1370). All research procedures conformed to the principles of the Helsinki Declaration.

DNA extraction and library preparation
Genomic DNA was extracted from matched tumor, metastatic LN, and normal tissue samples that were re-evaluated by a specialized head and neck pathologist using QIAamp DNA FFPE kits (Qiagen, Germantown, Maryland, USA) following standard protocols. A sufficient amount of DNA (over 1 lg) was extracted from 18 primary tumors, 40 matched metastatic LNs, and adjacent normal tissue. Each sample was prepared according to Agilent library preparation protocols (Agilent SureSelect Human All exon V6 kit, Agilent, Santa Clara, CA, USA). Libraries underwent paired-end sequencing on an Illumina HiSeq 4000 instrument according to the manufacturer's protocol.

Bioinformatics analyses of somatic mutations
To generate analysis-ready bam files from Fastq files, we followed the 'Best Practices' workflow suggested by the Broad Institute. Briefly, raw sequencing files were aligned to the Genome Research Consortium human build 38 (GRCh38) using BWA-MEM [13]. After applying MarkDuplicates and Base Recalibration processes, initial raw candidate variants were called for each sample using the MUTECT2 caller [14,15]. To reduce falsepositive variants, all variants were filtered using the following in-house filtering criteria: (a) total depth < 50, (b) variant allelic frequency < 7%, (c) altered read counts < 4. Additional filtering criteria were then applied to assess pathogenicity: (a) combined annotation dependent depletion (CADD) phred score < 26, and (b) minor allele frequency (MAF) > 0.1% in the gnomAD database for global and East Asian populations [16]. To extract mutational signatures based on 30 recurrent base substitution patterns from the Catalog of Somatic Mutations in Cancer (COSMIC), we used a public web service program called Mutalisk [17]. Copy number log-ratios were computed with CNVKIT [18]. Unsupervised clustering of CNV was performed using the R package 'CNTOOLS' [19]. Oncoplots were constructed using MAF-TOOLS [20]. Mutated genes were categorized based on oncogenic signaling pathways reported in TCGA [21]. Tumor mutational burden (TMB) was analyzed by comparison with previously reported data following basic scripts in MAFTOOLS [22].

Immunohistochemistry
Pathologist (SJS) reviewed all hematoxylin and eosin (H&E) slides used at the time of diagnosis. Formalinfixed, paraffin-embedded tissue blocks of metastatic LNs chosen, and sections (4-lm thickness) from each metastatic LN blocks were immunostained with a primary antibody against b-catenin (1 : 200, mouse monoclonal, Cell marque, Merck, Darmstadt, Germany), using the Ventana Benchmark XT automated staining system (Ventana Medical Systems, Tucson, AZ, USA) according to the manufacturer's protocol.

Bioinformatics-based clonal analysis
To analyze ancestral sub-clones of each metastatic LN, somatic mutations in the primary tumor were listed without considering pathogenicity (CADD phred score and MAF score). Variant allelic frequency (VAF) of selected candidates from metastatic LNs was calculated using an in-house script. Based on these processes, it was not necessary to interpret somatic mutations in the LNs themselves during the process of LNs metastasis. Clonal lineage reconstruction and VAF-based clustering were performed using LICHeE with all parameters set to default settings except for the following: -maxVAFAbsent 0.005, -minVAFPresent 0.05, -minClusterSize 5. The best-scored lineage tree from each sample was exported in DOT format for GRAPHVIZ visualization [23].

Statistical analysis
To determine somatic mutations associated with ENE status, we grouped somatic mutations into clusters based on the previously mentioned oncogenic signaling pathways. Categorical variables consisting of mutations occurring in each signaling pathway were compared using the Fisher's exact test. Statistical analyses were performed with SPSS version 26 (IBM Corp., Armonk, NY, USA), and forest plots were generated using GRAPHPAD PRISM version 6 (GraphPad Software, La Jolla, CA, USA).

Tumor characteristics
A total of 76 samples (18 from primary tumors, 40 from metastatic LNs, and 18 from normal tissue) were obtained from 18 patients with OPSCC. Among 40 metastatic LNs, 22 metastatic LNs had ENE findings (Table 1). First, we evaluated the genetic landscape of the primary tumors according to HPV status ( samples was similar to that reported previously for TCGA data. The most common somatic mutations were single nucleotide polymorphisms (SNPs), with missense mutations the most common type of SNP (Fig. S1). Copy number variation (CNV) analysis showed a gain of 3q and 8q and loss of 3p, 11, and 13. Unsupervised clustering analysis showed no distinctive subsets according to HPV status (Fig. S2). To analyze when acquired or missing somatic mutations occurred during the process of LN metastasis, we generated oncoplots representing the mutational profiles of all primary tumor and metastatic LN samples (Fig. S3). Interestingly, metastatic LNs had unique somatic mutations compared with their primary tumor and other metastatic LNs located at different neck levels in the same patient. To evaluate pathogenic somatic mutations related to ENE of metastatic LNs, we analyzed the somatic mutations frequently occurred in the metastatic LN with ENE. Surprisingly, somatic mutations of WNT pathway were observed in 41.7% (10 out of 24) from metastatic LNs with ENE compared with 6.3% (1 out of 16) from metastatic LNs without ENE and 5.6% (1 out of 18) from the primary tumor. In addition, most mutations (90.9%, 10 out of 11) were observed in HPV-positive patients ( Fig. 2A). Among metastatic LNs with ENE, two somatic mutations were nonsense mutations in tumor suppressor genes (TLE2 and APC) while 12 somatic mutations (SFRP5, CHD4, CHD8, DVL1, APC, TLE4, and SOST) were missense mutations that had a probable or possibly damaging effect based on PolyPhen prediction scores. Additionally, all somatic missense mutations had a phred score over 27 CADD (Table S1). To determine the correlation between mutations in oncogenic signaling pathways and ENE findings, we performed the statistical analysis. Somatic mutations in the WNT pathway were significantly associated with ENE based on a two-tailed Fisher's exact test (odds ratio = 10.714, 95% confidence interval = 1.210-94.862, P = 0.027, Fig. 2B). To determine whether metastatic LNs containing WNT pathway mutations resulted in up-regulation of the WNT pathway, we stained metastatic LNs with an antibody specific to bcatenin to determine activation of the WNT pathway. As the expression of b-catenin may be increased in cancer cells, we compared the intensity of b-catenin staining between metastatic LNs with and without ENE. Compared with metastatic LNs without ENE, metastatic LNs with ENE showed more intense bcatenin staining than metastatic LNs without ENE (Fig. 2C).
Finally, we evaluated the clonal evolution of primary tumors and metastatic LNs. We harvested metastatic tissues from LNs at different neck levels. By analyzing the initial sub-clones of each metastatic LN, we were able to infer the evolutionary process of LN metastases. We extracted the VAF of somatic mutations belonging to primary tumors only for each metastatic LN. By evaluating only the somatic mutations of primary tumors, we could infer the initial sub-     clones responsible for each metastatic LN. If metastatic LNs spread sequentially along the jugular lymph node chain from upper neck levels down to lower levels, all metastatic LNs should originate from the same sub-clone of the primary tumor. We found that most metastatic LNs originated from the same subclone of the primary tumor (Fig. S4). However, several metastatic LNs did not originate from the same subclone of the primary tumor, i.e., they arose independently from different sub-clones of the primary tumor. For example, level IV metastatic LNs of sample HN01 originated from different sub-clone of the primary tumor. Level III metastatic LNs of sample HN04 were derived from a more ancestral sub-clone than level II metastatic LNs. Level IV metastatic LNs of sample HN18 originated from independent sub-clones different from the sub-clone from which level I metastatic LNs arose. These different origins of metastatic LNs mean that LN metastases can arise directly from the primary tumor rather than from other metastatic LNs located at upper LN levels. Interestingly, we observed that contralateral metastatic LNs and retropharyngeal LNs originated from sub-clones that were completely different to those from which ipsilateral metastatic LNs arose in samples HN05, HN15, and HN17. Taken together, our results suggest that cancer cells of metastatic LNs not only come from ipsilateral metastatic LNs but also from the primary tumor (Fig. 3).

Discussion
Here, using multiregional whole-exome sequencing of the matched primary tumor and metastatic LN samples, we analyzed the genetic landscape and clonal evolution of primary tumors and metastatic LNs in HPVpositive and HPV-negative OPSCC. First, we analyzed the somatic mutational profile of the primary tumors of OPSCC patients and observed similar patterns of somatic mutations as reported previously. We found that acquired somatic mutations in the WNT pathway showed a significant association with ENE of metastatic LNs and that the WNT pathway activity was up-regulated in metastatic LNs with ENE. Finally, we confirmed that metastatic LNs not only originated from sub-clones of other ipsilateral metastatic LNs located at different neck levels but that some also arose from independent sub-clones of the primary tumor. HPV status has received attention recently as an important prognostic factor in determining treatment outcomes of OPSCC patients. Treatment results are better for HPV-positive OPSCC patients than HPVnegative OPSCC patients in terms of disease-free survival (DFS), tumor-specific survival (TSS), and overall survival (OS) [25]. Therefore, the 8th American Joint Committee on Cancer (AJCC) staging system modified the stage of OPSCC to reflect the prognostic implication of p16+ as representative of HPV status [26]. As expected, the somatic mutations, mutational signatures, and copy number alterations that we detected in primary tumors of OPSCC were similar to those found previously [21,[27][28][29]. Because our analysis was based on Asians, there may be limitations in interpreting the result. However, we detected a smaller overall number of somatic mutations in HPV-negative OPSCC samples than in HPV-positive OPSCC samples, and somatic mutations in NOTCH and HIPPO pathways were detected only in HPV-positive OPSCC samples, but these findings should be interpreted with caution because of the limited number of HPV-negative OPSCC samples analyzed in this study. To overcome the low quality of DNA extracted from FFPE samples, we applied stringent criteria to reduce the number of false-positive variants. We found that mutations present in some high-purity samples had a great effect on the overall genetic landscape. Notwithstanding, our finding that the most frequently detected mutations in HPV-negative OPSCC samples were in CDKN2A and TP53 is consistent with previous studies. These results indirectly indicate that our analysis based on FFPE samples was not distorted by the limited number of samples or unbalanced sample quality.
Lymph node metastasis itself is considered an important feature in cancer treatment and prognosis evaluation and is believed to be an intermediate step in the progression of cancer during which distant metastasis occurs [30][31][32]. Moreover, ENE has long been considered a pathologic high-risk feature, meaning that patients with the finding of ENE are at an increased risk of disease progression and would benefit  from adjuvant therapy [33]. Although the pathologic LN status of HPV-positive OPSCC is based solely on the number of positive lymph nodes in the revised 8th AJCC staging system, the finding of ENE is still considered an important risk factor when planning adjuvant treatment. In addition, several studies have reported that ENE is significantly related to poorer DFS, PFS, and OS even in HPV-positive OPSCC patients [34,35]. In this study, we found that acquired somatic mutations in the WNT pathway were significantly associated with ENE of metastatic LNs. Furthermore, we demonstrated that the WNT pathway was activated in metastatic LNs with ENE compared to LNs without ENE in the same patients using immunohistochemical staining of b-catenin as a biomarker of WNT pathway activation. The previous study reported an association between ENE in metastatic LNs with CTTN and MMP9 mutations [36]. In addition, CTTN and MMP9 are related to the WNT activation [37][38][39]. Due to the limited number of samples that we evaluated, our results need to be validated in a larger cohort. Nevertheless, the association between ENE and somatic mutations in the WNT pathway suggests another therapeutic approach to improve refractory OPSCC based on the control of locoregional recurrences associated with ENE of metastatic LNs. OPSCC metastasizes to first echelon LNs according to the location of the primary tumor, then sequentially spreads to ipsilateral LNs [40,41]. As OPSCC commonly metastasizes to level II LNs followed by level III-IV LNs irrespective of HPV status, elective neck dissection including that of level II-IV LNs is usually performed in the surgical treatment of N0 OPSCC patients. Several studies have demonstrated that the number and location of metastatic LNs are critical predictors of treatment outcomes and survival, but studies on why these nodal factors of metastatic LNs affect prognosis are lacking [30,[42][43][44]]. In the current study, we found that a large number of somatic mutations occurred in metastatic LNs. We compared clonal evolution between primary tumor and metastatic LN samples and found that metastatic LNs not only originated from ipsilateral metastatic LNs located at different neck levels but that some also arose independently from sub-clones of the primary tumor. Since we applied strict filtering criteria to reduce false-positive calls, it was thought that there were many falsenegative mutations in metastatic LNs. Therefore, we set the somatic mutations identified in the primary tumor as the correct mutation set, and the presence or absence of mutations was checked in each metastatic LN. We could demonstrate the heterogeneity of the primary tumor, but it was difficult to infer the heterogeneity of each metastatic LNs itself. However, these findings suggest the presence of inter-lesional heterogeneity in metastatic LNs at different neck levels in OPSCC patients. We attempted to evaluate which oncogenic signaling pathway affects phylogenetic tree separation but did not find a clear correlation. This may be due to the low number of the oncogenic signaling pathway involved. Previous studies have demonstrated that a higher level of intratumoral heterogeneity is related to a poorer prognosis [45][46][47]. Although further research is required, inter-lesional heterogeneity of metastatic LNs in OPSCC may lead to a poor prognosis and treatment resistance, as does intratumoral heterogeneity of the primary tumor. Furthermore, our analysis suggested that in order to plan the targeted therapy, mutation analysis should be performed in consideration of the location of recurrence, not only the genetic analysis of the primary tumor.

Conclusions
Somatic mutations in WNT pathway-associated genes were significantly associated with ENE of metastatic LNs, and WNT pathway activity was up-regulated in metastatic LNs. Furthermore, some metastatic LNs, including contralateral or retropharyngeal LNs, were found to originate independently from sub-clones of the primary tumor rather than from ipsilateral Fig. 3. Phylogenetic trees of the primary tumor and metastatic LNs. Each circle represented a sub-clone of the primary tumor. The number of somatic mutations is indicated in circles. The mean variant allelic fractions of sub-clones are indicated next to the line. Heatmap nearby each tree showed the presence of each mutation in each sub-clone. Columns in the heatmap represented each sub-clone. Right label of heatmap showed somatic mutations included in the oncogenic signaling pathway reported in the previous report. (A) Level IV metastatic LNs of HN01 originated from different sub-clones than Level II and III metastatic LNs. (B) Level II metastatic LNs of HN04 were derived from ancestral sub-clones rather than Level III metastatic LNs. (C) Level IV metastatic LNs of HN18 were derived from different sub-clones than Level I metastatic LNs. (D) Retropharyngeal metastatic LNs of HN05 were derived from different sub-clones than level III metastatic LNs. (E) Contralateral level II metastatic LNs of HN15 originated from an ancestral sub-clone rather than level II metastatic LNs. (F) Contralateral level I metastatic LNs of HN17 originated from an ancestral sub-clone rather than level I and II metastatic LNs. cLv, contralateral level; GL, germline; Lv, level; RP, retropharyngeal. metastatic LNs located elsewhere in the neck. Future studies should investigate how inter-lesional heterogeneity of metastatic LNs of OPSCC patients affects treatment outcomes and prognosis.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Somatic mutations from metastatic LNs in WNT pathway. Fig. S1. Summary of somatic mutations in primary tumors.