Identification of novel oncogenes in oral cancer among elderly nonsmokers

Abstract Objectives In recent years, an increase in oral cancer among elderly nonsmokers has been noted. The aim of this study was to identify novel oncogenes in oral cancer in older nonsmokers. Material and Methods Whole‐exome sequencing (WES) data from 324 oral cancer patients were obtained from The Cancer Genome Atlas. Single nucleotide variants (SNVs) and insertions/deletions (INDELs) were extracted from the WES data of older patients. Fisher's exact test was performed to determine the specificity of variants in these genes. Finally, SNVs and INDELs were identified by target enrichment sequencing. Results Gene ontology analysis of 112 genes with significant SNVs or INDELs in nonsmokers revealed that nonsynonymous SNVs in HECTD4 were significantly more frequent in nonsmokers than in smokers by target enrichment sequencing (p = .02). Conclusions Further investigation of the function of HECTD4 variants as oncogenes in older nonsmokers is warranted.

Smoking is the most important risk factor for oral cancer, and most oral cancers have been considered to arise in smokers.
However, epidemiological studies have recently identified an increasing incidence of oral cancer in nonsmokers (Dahlstrom et al., 2008;Farshadpour et al., 2007). By age group, the incidence of oral cancer is increasing in the 50-to 79-year-old stratum (Ellington et al., 2020). In addition, oral cancer among nonsmoking women has been increasing from the youngest to the oldest age groups (Satgunaseelan et al., 2020). The number of elderly nonsmokers with oral cancer is thus expected to increase in the future.
The causes for this increasing incidence of oral cancer among elderly nonsmokers remain unclear. We hypothesized that this epidemiologically distinct disease would also prove to be genomically distinct, particularly with respect to alterations caused by smoking, and that a better understanding of the differences would identify novel opportunities for treatment and/ or primary prevention.
To clarify oncogenes in elderly nonsmokers, the results of whole-exome sequencing (WES) of The Cancer Genome Atlas (TCGA) were examined for the specificity of single-nucleotide variants (SNVs) in two groups of oral cancer patients: elderly nonsmokers and elderly smokers. Fisher's exact test showed the specificity of SNVs for 112 genes in the nonsmoker group, so we selected 21 genes for target enrichment sequencing in our elderly clinical cases, as reported here.

| Retrieval of public data
A total of 328 anonymized patients bearing primary oral cancers were identified from the TCGA database. Clinical patient files were downloaded from the TCGA database website (Firebrowse, http:// firebrowse.org/). Of these 328 oral cancer patients, individuals ≥65 years old were defined as elderly and classified by sex and into smoker and nonsmoker groups (Figure 1).

| Statistical analysis
Fisher's exact test was performed to determine the specificity of gene SNVs between smokers and nonsmokers (R package; https:// bioconductor.org/packages/release/-bioc/html/edgeR.html). All statistical tests were two-sided, with values of p < .05 considered significant.
F I G U R E 1 As a result of the analysis of the TCGA data, those aged 65 years and older were defined as elderly, and classified into smoking and nonsmoking groups by gender. (a) The number of male and female components of the 328 anonymized primary oral cancer patients identified in the TCGA database. (b) The smoking trends of primary oral cancer patients aged 65 years and older identified in the TCGA database were shown. (c) Four groups were created according to age and smoking propensity, and the gender ratio of each group was shown. TCGA, The Cancer Genome Atlas.

| DNA extraction, quantification, and QC
Tissue sections (thickness, 8 µm) were prepared from paraffinembedded specimens generated before surgery or from diagnostic biopsy specimens obtained before starting clinical therapy. One of two successive sections was stained with hematoxylin and eosin (HE) to assess the cancerous portion. Manual microdissection using a scalpel and microscope was conducted to recover cancerous portions using the HE-stained slide as a reference.
Chromosomal DNA was isolated from formalin-fixed paraffinembedded (FFPE) tissues of patients (n = 63) with head and neck squamous cell carcinoma (HNSCC) following the instructions from the kit manufacturer (Maxwell RSC DNA FFPE kit; Promega). The concentration of extracted DNA was determined by the fluorometer of the Qubit dsDNA kit (Thermo Fisher Scientific).
The integrity score (ΔΔC q values) and concentration of extracted chromosomal DNA were measured using an Agilent NGS FFPE QC kit (Agilent Technologies) for all samples. As described in the Agilent protocol for HaloPlex HS Target Enrichment, the amount of input DNA was determined based on ΔΔC q values. For ΔΔC q < 1.5, 50 ng was used, and for ΔΔC q > 1.5, 100 ng was used. The amount of DNA determined by the criteria described above was fragmented using restriction enzymes. Probes with sequence indexes were added and hybridized to the targeted fragments. Each probe was an oligonucleotide designed to hybridize to both ends of a targeted DNA restriction fragment, thereby guiding the targeted fragments to form circular DNA molecules. The HaloPlex probes were biotinylated, and the targeted fragments were then retrieved with magnetic streptavidin beads. Small fragments and unligated probes were removed from the mix by AMPure purification (Agencourt Bioscience). Next, the circular molecules were closed by ligation. Finally, enriched DNA fragments were amplified using universal primers. The concentration of the enriched library was estimated using a library quantification kit (Kapa Biosystems). Highthroughput sequencing was performed with 100-bp paired-end reads on a MiSeq platform (Illumina) for each enriched library according to the protocols from the manufacturer.
Read alignments to the GRCh38/hg38 were performed using the Burrows-Wheeler Aligner ). Nonmappable reads were removed using SAMtools   In this experiment, we used SelectVariants to choose variants with "DP > 10" (depth of coverage greater than 10×).

| TCGA data analysis
A total of 328 anonymized patients bearing primary oral cancers identified from the TCGA database included WES data ( Figure 1a).
We defined elderly individuals as ≥65 years old and focused on 135 cases from the 328 cases extracted from the TCGA ( Figure 1b). TCGA data for the 135 cases of elderly oral cancer showed frequencies of 29% for smokers (n = 39) and 71% for nonsmokers (n = 96).
Proportions of sex differences in smoking prevalence and age were examined. The results showed that the proportion of females was higher in the elderly nonsmoker group than in other groups ( Figure 1c).

| Extraction of particular SNVs
To identify novel oncogenes for oral cancer in elderly nonsmokers, we extracted genes that exhibited SNVs and INDELs from the WES data of TCGA in elderly oral cancer patients.  159-to 2509-fold) per nucleotide in the coding region of the target gene (Figure 3a,b). nonsmoker groups showed that SNV (p = .0424) in the 3′-UTR of BAG3 was significant in the nonsmoker group (Table 3). On the other hand, the nonsynonymous SNV (p = .0413) of CUBN, was significantly different among smokers (Figure 2; Table 3).

| Evaluation of nonsynonymous SNVs
In-silico analyses were only performed for nonsynonymous SNVs that resulted in an amino acid substitution. PolyPhen2 and SIFT were used to predict the effects of missense mutations.

| Analysis of clinical data
No significant differences were seen in survival or treatment response according to the presence or absence of HECTD4 variants.
Regarding the effect of alcohol consumption, HECTD4 variants were slightly more common in the group without a history of alcohol  (Nojadeh et al., 2018). Frameshift deletion of DCHS2 (rs140019361), an STR site that repeats TTTG six times, was found in 95% of oral cancers in elderly individuals, both smokers and nonsmokers (Table 3). Nonframeshift deletion of STR sites in MAP3K1 (rs570353965) and FAM155A (rs3832903) was also found at similarly high rates (Table 5).

| DISCUSSION
Many reports have described studies on gene mutations in oral cancer. Among these, genetic factors and the mechanism of canceration are being elucidated to some extent. However, their content is due to environmental factors (smoking, alcohol, mechanical irritation) (Ali et al., 2017;Sasahira & Kirita, 2018  proteins. These results suggest that HECTD4 plays a role as an E3 ligase targeting AR and MYC (Vatapalli et al., 2020). The proto-oncogene c-Myc is markedly upregulated in oral cancer patients, and its expression correlates with the clinicopathological grade and stage of oral cancer. In addition, c-Myc has been reported as an important factor in the development of oral cancer in immunocompromised mice (Wang et al., 2019). AR has also been recognized for its importance in cancer etiology and progression. OSCC cells express AR, and in vitro experiments and patient studies have shown that AR plays an important role in promoting cell growth . We did not find any reports that described HECTD4 as a putative driver of OSCC in our searches.
However, reports that decreased HECTD4 in prostate cancer leads to increased AR and MYC proteins and that increased AR/MYC is oncogenic in OSCC suggest that loss-of-function mutations in HECTD4 are oncogenic and may be a cause of carcinogenesis in elderly nonsmokers.
No significant differences were seen in survival or treatment response according to the presence or absence of HECTD4 variants.
Due to the small number of cases in this study, the correlation between HECTD4 mutations and prognosis could not be fully tested. in the absence of lymph node metastases. This issue should therefore be examined in more detail in a larger cohort in future research.
To our knowledge, this study is the first to reveal a genetic variant that characterizes elderly nonsmokers with OSCC, particularly in a Japanese cohort. However, some limitations of the present study should be considered. First, the small number of cases in this study may have been the reason that the mutation in HECTD4 was the only gene mutation to show molecular biological features of oral cancer in older nonsmokers. Second, somatic mutations that accumulate in normal tissues have been linked to aging, disease, and disorders. A recent series of studies reported comprehensive genomic analyses of morphologically normal tissues (Li et al., 2021).
Even in morphologically normal tissues, accumulation of somatic mutations and clonal expansion were widely observed to varying degrees. We hypothesized that a comprehensive genomic analysis of normal tissues from smokers and nonsmokers would facilitate an understanding of the carcinogenic risk of oral cancer in elderly nonsmokers. Third, this study was retrospective in design and examined only somatic mutations, so no germline information was examined. In addition, DNA extraction was performed from formalin sections, and the DNA may thus have been degraded or modified to some extent. In the future, we would like to increase the number of cases to investigate whether HECTD4 has any role as a marker by examining in detail the effects of the presence or absence of HECTD4 variants on malignancy and prognosis in clinical specimens.

| CONCLUSION
We identified a significant variant in HECTD4 in elderly nonsmokers, compared with elderly smokers. Further study is warranted to elucidate whether HECTD4 variants may lead to the identification of therapeutic targets. . Funding for this study had no direct impact on study design or conduct; data collection, management, analysis, or interpretation; or preparation, review, or approval of the manuscript.