Unraveling most abundant mutational signatures in head and neck cancer

Genomic alterations are a driving force in the multistep process of head and neck cancer (HNC) and result from the interaction of exogenous environmental exposures and endogenous cellular processes. Each of these processes leaves a characteristic pattern of mutations on the tumor genome providing the unique opportunity to decipher specific signatures of mutational processes operative during HNC pathogenesis and to address their prognostic value. Computational analysis of whole exome sequencing data of the HIPO‐HNC (Heidelberg Center for Personalized Oncology‐head and neck cancer) (n = 83) and TCGA‐HNSC (The Cancer Genome Atlas‐Head and Neck Squamous Cell Carcinoma) (n = 506) cohorts revealed five common mutational signatures (Catalogue of Somatic Mutations in Cancer [COSMIC] Signatures 1, 2, 3, 13 and 16) and demonstrated their significant association with etiological risk factors (tobacco, alcohol and HPV16). Unsupervised hierarchical clustering identified four clusters (A, B, C1 and C2) of which Subcluster C2 was enriched for cases with a higher frequency of signature 16 mutations. Tumors of Subcluster C2 had significantly lower p16INK4A expression accompanied by homozygous CDKN2A deletion in almost one half of cases. Survival analysis revealed an unfavorable prognosis for patients with tumors characterized by a higher mutation burden attributed to signature 16 as well as cases in Subcluster C2. Finally, a LASSO‐Cox regression model was applied to prioritize clinically relevant signatures and to establish a prognostic risk score for head and neck squamous cell carcinoma patients. In conclusion, our study provides a proof of concept that computational analysis of somatic mutational signatures is not only a powerful tool to decipher environmental and intrinsic processes in the pathogenesis of HNC, but could also pave the way to establish reliable prognostic patterns.

In the last few decades, the incidence of oropharyngeal squamous cell carcinoma (OPSCC) has been increasing in developed countries and the role of HPV is emerging as an important factor in the rise of OPSCCs. 5 Due to the lack of symptoms in the early stage and effective screening techniques, the majority of HNSCC patients are diagnosed at an advanced stage. 6 Despite our current knowledge on underlying mutational and biological processes as well as implementation of an aggressive and multimodal therapy consisting of surgery, radiotherapy and platinum-based chemotherapy, the prognosis of patients with advanced HNSCC remains dismal with a 5-year survival rate of less than 50%. [7][8][9] Hence, new concepts are urgently needed that will advance our understanding on the complex interplay between etiology, mutational and biological processes, which operate during HNSCC pathogenesis.
HNSCC develops in a multistep process that involves different molecular alterations including accumulation of multiple genetic and epigenetic changes with tumor progression. 2 Somatic mutations in a cancer genome are the cumulative result of mutational processes as a consequence of the intrinsic infidelity of the DNA replication machinery, exogenous or endogenous mutagen exposures, enzymatic modification of DNA or defective DNA repair. 10 However, our understanding of the processes that cause somatic mutations in HNC is poorly understood. 11 Genomic analyses revealed the presence of several mutational signatures in HNSCC. 12,13 However, it is worth noting that most individual cancer genomes exhibit more than one mutational signature and many different combinations of signatures were observed. 14 In addition, global gene expression and DNA methylome profiling analysis elucidated distinct HNSCC subgroups with characteristic features concerning clinical and pathological traits as well as patient prognosis. [15][16][17] Up to date, little progress has been made in utilizing this informa-

| Analysis of mutational signatures
Supervised mutational signature analysis of high-confidence somatic SNVs in individual samples was performed based on non-negative matrix factorization formalism as described previously. 10

| Statistical analysis and hierarchical clustering
All patient data were collected and documented using the program

| LASSO Cox regression model
The LASSO Cox regression algorithm was applied to prioritize most relevant prognostic candidates of mutational signatures for the TCGA-HNSC cohort. The risk score was computed by "glmnet" (Lambda.

| Somatic mutation frequency in the HIPO-HNC cohort
Total somatic mutation counts per cancer genome were determined based on whole-exome sequencing data, which were available for n = 83 cases of the HIPO-HNC cohort. Mutation counts were slightly higher in smokers as compared to nonsmokers (P = .054), which did   Figure S1A). Similar data were obtained for mutation counts of the TCGA-HNSC cohort, except for age, which revealed a weak but significant positive correlation with mutation counts (Supplemental Figure S1B).

| Most abundant mutation signatures in the HIPO-HNC cohort
We computed the relative contribution of distinct mutational signatures for individual cancer genomes ( Figure 1A, Supplemental Table S2). This analysis identified signatures 1, 2, 3, 13 and 16 (nomenclature according to COSMIC) 27 as the most abundant mutational signatures in the HIPO-HNC cohort ( Figure 1A, Supplemental Table S3).
In 64 out of 83 cases (77.1%), the relative mutation burden attributed  Table S3). Moreover, unsupervised hierarchical clustering revealed  Table S4). In particular, a high relative contribution of either Signature 2 in Cluster A or Signature 1 in Cluster C1 was a characteristic feature of HPV16-related cancers, while Signature 16 (enriched in cluster C2) was almost absent in these cases (Supplemental Figure S2C). In contrast, a high relative contribution or  Table S5). In contrast, no significant difference in survival was detected for any other more abundant signature (Supplemental Figure S3G,H).

| Most abundant mutation signatures in the TCGA-HNC cohort
To confirm our findings in an independent and larger patient cohort,   In 117 cases (23.1%), it was larger than 75% ( Figure 3B). Most subgroups of the TCGA-HNSC cohort based on risk factor stratification shared Signatures 1, 2, 3, 13 and 16 as top five candidates (Supplemental Table S7). The only exception was the subgroup without smoking history, in which Signatures 3 and 16 were replaced by Signatures 6 and 7. However, the latter two signatures had no impact on clinical outcome in both cohorts, including subgroups with or without a smoking history (Supplemental Figure S4A, data not shown), only a minor impact on the stratification of clusters and subclusters (Supplemental Figure S4B,C), and were not related to strong enrichments of somatic mutations in MutSig genes (Supplemental Figure S4D,E).  Table S8; Supplemental Figure S5C). In the group of smokers, pack-years were slightly but significantly associated with the relative fraction of somatic mutations attributed to signature 16 (Supplemental Figure S5D). Patients with a high mutational burden attributed to Signature 16 had a significantly shorter OS ( Figure 4B,C), which was most prominent for oral SCC and OPSCC (Supplemental  Table S9). The unfavorable OS of patients in the BC2 as compared to AC1 subgroup was independent of risk factors (tobacco, HPV) or the primary tumor sites (Supplemental Figure S6), and together with a positive resection margin, high tumor mutation counts served as an independent risk factor for OS in a multivariate Cox regression model adjusted for tobacco, HPV16, gender, age, tumor size and lymph node metastasis (Supplemental Table S9).

| Association between CDKN2A expression and Signature 16
Global gene expression analysis was conducted to unravel DEGs among individual clusters of the TCGA-HNSC cohort (Supplemental Tables S10). We selected private DEGs of Subcluster C2 for further analysis (Supplemental Figure S7A) Table S11). As expected, the risk score was higher in Cluster B and C2 and significantly different between Clusters BC2 and AC1 ( Figure 6D,E). Furthermore, a higher risk score was significantly associated with unfavorable PFS and DFS of the HIPO-HNC cohort ( Figure 6F,G).

| DISCUSSION
The main objective of our study was the profiling of somatic mutations to enable a comprehensive and system-based interrogation of mutational landscapes and to evaluate their prognostic impact on survival of HNSCC patients. We conducted a whole-exome sequencing analysis of samples from the HIPO-HNC cohort, for which complete information on clinical features was available, and confirmed our findings in the larger TCGA-HNC cohort.
In the HIPO-HNC cohort, we identified five prevalent mutational Ohio cohort). 13 In addition to an association with smoking and HPVnegative cancers, Signature 16 is also associated with alcohol intake. 30,[34][35][36] In a recent study, Signature 4 was found mainly in cancers derived from epithelia directly exposed to tobacco smoke and was most abundant in lung and laryngeal cancers. 29 Signature 4 mutations were also found in oral cavity and pharynx cancers, albeit in much smaller numbers most likely due to less exposure to tobacco smoke or more efficient clearance. A higher abundance of Signature 4 mutations based on our analysis was confirmed for the TCGA-HNSC and HIPO-HNC cohorts (data not shown), but as laryngeal cancers represent a minor subgroup within both cohorts, Signature 4 was not ranked as one of the top five signatures in our study.
Global gene expression profiling demonstrated that tumors with a highly relative mutational burden attributed to Signature 46,47 Targeting of CDK4 and 6 is subject of many clinical early phase III trials. 45 Although our sample size was limited, our integration of exomic sequencing data, combined with the validation of our findings in a larger cohort of clinical samples, provides a complementary breadth and depth of molecular information. In order to further unravel the prognostic impact of mutational signatures in HNSCC, especially for Signatures 3 and 16, prospective studies are warranted.
Another attractive avenue in clinical translation of data presented in our study is the combination of mutation frequencies attributed to clinically relevant and highly abundant signatures with less abundant signatures to establish a reliable prognostic risk score for treatment intensification or de-escalation of HNSCC patient of distinct category (HPV-driven or not).