Single‐nucleotide polymorphism at alcohol dehydrogenase 1B: A susceptible gene marker in oro‐/hypopharyngeal cancers from genome‐wide association study

Abstract Introduction In the era of precision preventive medicine, susceptible genetic markers for oro‐/hypopharyngeal squamous cell carcinoma (OPSCC) have been investigated for genome‐wide associations. Materials and Methods A case–control study including 659 male head and neck squamous cell carcinoma (HNSCC) patients, including 331 oropharyngeal cancer, treated between March 1996 and December 2016 and 2400 normal controls was performed. A single‐nucleotide polymorphism (SNP) array was used to determine genetic loci that increase susceptibility to OPSCC. Results We analyzed the allele frequencies of 664,994 autosomal SNPs in 659 HNSCC cases; 7 SNPs scattered in loci of chromosomes 5, 7, 9, 11, and 19 were significant in genome‐wide association analysis (Pc < 1.0669 × 10−7). In OPSCCs (n = 331), two clustered regions in chromosomes 4 and 6 were significantly different from the controls. We successfully identified a missense alteration of the SNP region in alcohol dehydrogenase 1B (ADH1B) (https://genome.ucsc.edu; hg38); the top correlated locus was rs1229984 (p = 1 × 10−11). Adjusted for environmental exposure, including smoking, alcohol, and areca quid, a region in chromosome 12, related to alcohol metabolism, was found to independently increase the susceptibility to OPSCC. The ADH1B rs1229984 AA genotype had better overall survival compared to the AG and GG genotypes (p = 0.042) in OPSCC. The GG genotype in rs1229984 was significantly associated with a younger age of onset than other genotypes (p = 0.001 and <0.001, respectively) in OPSCC patients who consumed alcohol. Conclusion ADH1B was an important genetic locus that significantly correlated with the development of OPSCCs and patient survival.


| INTRODUCTION
Head and neck squamous cell carcinomas (HNSCCs) ranked the sixth highest incidence of all cancers worldwide and the majority were oral cavity cancers.They are the fourth most common malignancies among Taiwanese males 1 and their incidence is increasing.Oro-/hypopharyngeal squamous cell carcinoma (OPSCC) is a subgroup of HNSCCs.In Taiwan, the occurrence of OPSCC is approximately 941 new cases (11.47% in HNSCC) and 400 deaths in 2019 (https://www.hpa.gov.tw/Pages/Detail.aspx?nodei d=269&pid=14913).The treatments such as surgery, irradiation, and chemotherapy of OPSCC usually accompany with swallowing and speech sequalae.Early detection of OPSCC significantly improves the survival rate and functional outcome.
The tumorigenesis of HNSCCs is closely associated with tobacco, areca quid (AQ), alcohol consumption. 2,3uman papilloma virus (HPV) infection recently was identified to play important roles in the carcinogenesis of oropharyngeal cancer.More importantly, HPV infection in oropharyngeal cancer patients renders a better prognosis after chemoradiation therapy. 4In addition to the environmental exposures, individuals' susceptibility was investigated to delineate the complex host-environment interactions.In our previous study, familial aggregation was observed for HNSCCs including OPSCCs in a population-based analyses, 5 suggesting the role of genetics in the process of tumor development.In oropharyngeal cancers, genome-wide association analyses identify several loci, including rs1229984 in ADH1B, rs3828805 in HLA-DQB1, rs4318431 nearby gene GALNT14, rs13211972 in MUC21, and rs34518860 in HLA-DQA1, are correlated with patients' susceptibility and prognosis. 6,7In the literature, studies on genetic susceptibility in AQ endemic regions focused on oral cavity cancer.The OPSCC genetic studies in AQ use area were limited and the case numbers in these studies were 85 patients in Japan and 103 patients in Taiwan. 8,9The knowledge of susceptible genes in OPSCC could be helpful clinically.Early detection of oropharyngeal and esophageal lesions is challenging.Utilizing genetics in picking out susceptible OPSCC individuals could improve the screening and early detection in a cost-effective manner. 10o identify high-risk genetic loci for OPSCC in Taiwan, we retrospectively investigated 697 HNSCC patients, including 331 OPSCCs, 345 oral cavity and other subsites.A questionnaire was used to collect detailed information about environmental exposures.Singlenucleotide polymorphism (SNP) arrays and environmental exposure adjustments were utilized to find high-risk loci for OPSCCs.

| Study population
We recruited 697 male HNSCC patients including 331 OPSCC treated at Chang Gung Memorial Hospital, Lin-Kou, Taiwan in the period between 1996 and 2016.All significant in genome-wide association analysis (Pc < 1.0669 × 10 −7 ).In OPSCCs (n = 331), two clustered regions in chromosomes 4 and 6 were significantly different from the controls.We successfully identified a missense alteration of the SNP region in alcohol dehydrogenase 1B (ADH1B) (https://genome.ucsc.edu;hg38); the top correlated locus was rs1229984 (p = 1 × 10 −11 ).Adjusted for environmental exposure, including smoking, alcohol, and areca quid, a region in chromosome 12, related to alcohol metabolism, was found to independently increase the susceptibility to OPSCC.The ADH1B rs1229984 AA genotype had better overall survival compared to the AG and GG genotypes (p = 0.042) in OPSCC.The GG genotype in rs1229984 was significantly associated with a younger age of onset than other genotypes (p = 0.001 and <0.001, respectively) in OPSCC patients who consumed alcohol.

Conclusion:
ADH1B was an important genetic locus that significantly correlated with the development of OPSCCs and patient survival.

K E Y W O R D S
alcohol dehydrogenase 1B, aldehyde dehydrogenase 2, genome-wide association study, head and neck cancer, oropharyngeal cancer patients were a histologically confirmed of primary squamous cell carcinoma.
As per the 2010 guidelines from the American Joint Committee on Cancer, oral cavity cancer was characterized as cancer occurring in the lip, buccal mucosa, alveolus, retromolar area, tongue, floor of the mouth, or hard palate. 11Oropharyngeal cancer encompassed the subsites of the soft palate, oropharyngeal walls, or tonsils. 11Similarly, hypopharyngeal cancer was defined as cancer located in the lower portion of the pharynx and included the lateral pharyngeal walls, pyriform sinus, and posterior cricoid region. 11ll patients were followed up regularly for >2 years.A questionnaire was employed to gather details pertaining to demographic information, familial background, and behaviors encompassing cigarette smoking, alcohol consumption, and areca nut (AQ) usage.Individuals were categorized as smokers if they had smoked more than 100 cigarettes throughout their lifetime.Those who imbibed alcohol at least once a month were classified as alcohol drinkers.Consistent AQ chewers were identified as those who had consumed over 100 nuts in their lifetime.

| Participant selection
This study was approved by the Institutional Review Board of Chang Gung Medical Foundation (201800439B0).We recruited 2400 ethnically and geographically matched healthy controls (TMU-201805076) from a biobank as a nationwide population study. 12Informed consent was obtained from all participants.The controls were recruited from Taiwan.Most (98%) of the population is Han Chinese and few were Hakka Chinese.Furthermore, 100 healthy controls were randomly selected for genotype validation. 13

DNA extraction
In all participants, a 10 mL sample of venous blood was collected into a vacuum tube containing an anticoagulant (Vacutainer; BD).The buffy coat, obtained from this sample, was isolated and stored at a temperature of −80°C.High-molecular weight DNA was then extracted from the buffy coat cells through the employment of the phenol-chloroform method and subsequently stored at −80°C. 13

| Genotyping and quality control
A genome-wide association study (GWAS) of samples containing 703,949 SNPs obtained from 697 HNSCC patients and 2400 controls was performed using the Axiom Genome-Wide TWB Array Plate (Affymetrix GeneTitan; Thermo Fisher Scientific). 14To evaluate DNA quantity and purity, a NanoDrop ND-1000 spectrophotometer (Nan-oDrop Technologies LLC) was used; an absorbance ratio of 260/280 and a purity index >1.8 were considered optimal.The volume for array analysis was 50 μL at a concentration of 15 ng/μL for all samples.The GWAS dataset underwent analysis using the PLINK software (v1.90b5).Logistic regression was executed, incorporating sex and ancestryspecific principal components (referred to as PC1-PC10) as covariates. 15,16For genomic coordinates, the National Center for Biotechnology Information Human Genome Build 37 (GRCh37) was utilized.Out of the 703,949 genotyped SNPs, 664,994 situated on autosomal chromosomes underwent quality control.SNPs with a call rate exceeding 0.95 were retained, whereas 47,488 variants were excluded due to missing genotype information.A total of 143,017 variants were discarded due to their low minor allele frequency (<0.01), and 5849 were eliminated for deviating from Hardy-Weinberg equilibrium.To visualize potential deviations from expected distributions, a quantile-quantile plot was generated using the Bioconductor package GWASTool in R language (The R Foundation). 17The genomic inflation factor was computed through PLINK.Linkage disequilibrium analyses for the SNPs rs1229984 and rs671 were conducted using the LDproxy module available in the online software LDLink (https://analy sisto ols.nci.nih.gov/LDlink).As the genotyped SNPs on the TWB Array totaled 703,949 and had been preselected and filtered based on their frequencies in Asian populations, no further genotype imputation was done in our study.

| SNP array genotyping and quality control
Quality control procedures were executed both at the individual and marker levels.Initial assessments encompassed individual-specific aspects, such as sample quality, kinship, and population stratification.Dishsample quality control (DQC) was employed to oversee non-polymorphic sites, gauging signal and background channels.Individuals falling below satisfactory DQC thresholds and maintaining call rates of less than 97% were excluded.
Additionally, a plate pass rate was introduced, involving the selection of samples with acceptable DQC values and a call rate of 97% or higher, divided by the total number of samples on the respective plate.Inclusion criteria for analysis encompassed samples boasting a call pass rate surpassing 95%, with an average call rate of sample passage exceeding 99%.To address inbreeding, coefficients were evaluated, and samples displaying significant kinship connections were removed.The genome-wide identity was investigated through multidimensional scaling analysis, facilitating the identification and removal of outlier clusters.
In relation to markers, markers failing to meet specific criteria were omitted.These criteria included a missing rate below 2%, a minor allele frequency exceeding 1%, and adherence to Hardy-Weinberg equilibrium (p > 0.001).The replication sample underwent an identical set of quality control procedures to ensure consistency and reliability.

| ADH1B and ALDH2 genotyping
In the replication study, direct sequencing of associated SNPs on the initial GWAS was performed using an independent sample set of 100 HNSCC cases and controls each.Genotyping for ALDH2, rs671, and rs1229984, was done by direct sequencing.The wild-type allele ALDH2*1 and variant allele*2 were defined as Glu504 and Lys504 (rs671), respectively.Polymerase chain reaction (PCR) amplification of the SNP rs671 near ALDH2 was performed using the forward primer 5′-TCCTA TTG CAT TGG GCA TATT-3′ and reverse primer 5′-TCCAT TTA CGC CTC AACTCA-3′.The forward and reverse primers for rs1229984 were 5′-TCACC CCT TCT CCA ACACTC-3′ and 5′-ATTCT GTA GAT GGT GGC TGTAG-3′.The annealing temperature for both rs671 and rs1229984 was 58°C.The conditions for PCR reactions were as previously described. 18

| Statistical analysis
The distributions of age, tumor subsites, cigarette smoking, AQ chewing, and alcohol consumption among HNSCC patients were calculated.Chi-squared test and t-test were used for the analyses of categorical and continuous variables, respectively.p < 0.05 were considered significant.A multivariate analysis was performed using Cox regression.The curves for the age at disease onset were compared between genotypes using the Kolmogorov-Smirnov test. 19verall survival (OS) and disease-free survival (DFS) were assessed using the Kaplan-Meier method and differences were estimated using the logrank test.All analyses were performed using SPSS Statistics version 18 (IBM Corp.) and R language (Version 4.2.2,The R Foundation). 20

| RESULTS
A GWAS of 697 HNSCC patients and 2400 controls was initially performed (Table 1); 703,949 variants were analyzed using the TW2.0 SNP chip.After quality control and removal of variants with minor allele frequencies <1%, 468,640 variants and 3085 samples (including 697 cases and 2400 controls) were included.SNPs on sex chromosomes were excluded from the GWAS.
The principal component analysis map showed no differences in the distribution of ancestry between HNSCC patients and controls (Figure S1).A quantile-quantile plot used for quality control demonstrated successful population matching (Figure S2).The genomic inflation factor (λGC) 21 was 1.061, suggesting an acceptable population structure in the GWAS. Figure 1 shows the Manhattan plot according to tumor subsites.When comparing allele frequencies of the 664,994 autosomal SNPs in 331 OPSCC cases and 2400 controls, 13 SNPs reached the threshold of genome-wide significance (Pc < 1.0669 × 10 7 ) (Table S1).
A cut-off p-value of 1 × 10 −7 was determined by false discovery rate.In all HNSCC patients (n = 697), only scattered loci in chromosomes 7, 9, 11, and 19 were identified as significant (Table 1, Figure 1).Similar scattered loci were found in the oral cavity cancer subgroup (n = 293) compared to the controls.In the OPSCC subgroup (n = 331), two regions located in chromosomes 4 and 6 were different compared to the controls.We successfully identified a missense alteration in the SNP region of ADH1B (https://genome.ucsc.edu;hg38); the top correlated locus was rs1229984 (Figure 2; p = 1 × 10 −11 ).Adjustment for environmental exposure, including cigarettes, alcohol, and AQ, led to the identification of a region of chromosome 12 for susceptibility to OPSCC (Figure 3).We further identified a missense alteration in the SNP region of ALDH2 (https://genome.ucsc.edu;hg38); the top correlated locus was rs671 (Figure 4, p = 1 × 10 −25 ).The sequence results of rs671 and rs1229984 in 100 samples matched completely with the array genotyping.
To investigate the effect of rs671 and rs1229984 in HNSCC carcinogenesis, univariate analysis with different models were analyzed (Table 2).We can see the OPSCC carries the highest hazard ratio (HR) with rs671 (1.920, 95% confidence interval [95% CI]: 1.512-2.439)and rs1229984 (1.714, 95% CI: 1.356-2.167) in dominant model compared with HNSCC and OSCC.Multiple logistic regression analyses were also performed to adjust for environmental factors, including cigarette, alcohol, and AQ use (Table 3).

| Clinical implications
Alcohol, cigarettes, and AQ are the three most common environmental causes of HNSCCs.To clarify the role of rs671 and rs1229984 in HNSCC tumorigenesis, we adjusted for these environmental factors using logistic regression (Table 3).We demonstrated that rs671 was independently related to both OPSCC and oral cavity cancers (odds ratio [OR]: 2.078-2.165),while rs1229984 was only related to OPSCCs (OR: 1.648, 95% CI: 1.335-2.034).AQ users had the highest risk (OR: 14.212, 95% CI: 9.579-21.085) of oral cavity cancer, while alcohol played a more important role in OPSCCs (OR: 13.123, 95% CI: 9.152-18.815).This suggested that alcohol consumption and genotypes of ADH1B and ALDH2 may be used as predictors of susceptibility to OPSCCs.

| DISCUSSION
In the literature, limited risk gene loci were reported in OPSCC.It is due to the lower incidence of OPSCCs compared with other HNSCCs.Previous studies reported five susceptible regions, including rs1229984, rs3828805, rs4318431, rs13211972, and rs34518860, for OPSCC in European and American. 6,7In this study, we identified two susceptible loci, rs1229984 and rs671, for the OPSCC in Chinese Han population.Among these loci, rs1229984 can be one very important genetic locus in OPSCC both in Asians, Europeans, and Americans.Both two enzymes, ADH1B (rs1229984) and ALDH2 (rs671), involve in alcohol metabolic pathway.Initially, alcohol is catalyzed to acetaldehyde by ADH, and then converted to acetate by ALDH to decrease cytotoxic stress.Previous study demonstrates a causal effect of smoking and alcohol on oral cancer and OPSCC. 22In this OPSCC GWAS study, we identified the ADH1B (rs1229984) as well as ALDH2 (rs671) increase the risks associated with OPSCC in Chinese Han population. 6Our results may provide genetic evidence for the association between alcohol consumption and OPSCC.
Previous studies have aimed to understand individual susceptibility to environmental exposure by investigating gene-environment interactions, particularly for genetic markers that increase the risk for HNSCCs.Genetic studies have focused on carcinogen metabolism and the capability for DNA repair.Various genes, including XRCC1 23 and CYP1B1 (rs10012 and rs1056836), 24 have been implicated in HNSCC pathogenesis.These genes are mainly involved in the maintenance of genetic integrity or carcinogen metabolism.
In a meta-analysis of aerodigestive tract squamous cell carcinomas in patients of European ancestry based on a GWAS, ADH1B was found to play a significant role in oral and oropharyngeal cancer development.The T allele was significantly protective against oral and oropharyngeal cancers compared to the C allele (OR: 0.58, 95% CI: 0.50-0.67).In our previous study, a functional genetic polymorphism of the T allele in ADH1B was also associated with the occurrence of multiple upper aerodigestive tract primary tumors. 18No other studies have demonstrated an association of ADH1B and ALDH2 with OPSCCs.From Figure 1, we can see the genetic effects of ADH1B and The bold values stand for p value < 0.05.
ALDH2 were easily obscured in HNSCCs.When we stratified the population into different tumor subsites, the associations between ADH1B, ALDH2, and OPSCC became more and more evident.Alcohol has a demonstrated association with esophageal cancers 25 ; its effects are more important in upper digestive tract carcinogenesis.These effects are caused by alcohol metabolites and because ethanol acts as a solvent for carcinogens that cause upper aerodigestive tract cancers. 26,27The effects of alcohol are more prominent among Asians because of alcohol flush reactions, related to the reduced activity of alcohol-metabolizing enzymes.Surprisingly, the carcinogenic effects of alcohol-metabolizing enzymes emerged as critical loci in HNSCCs and their influence was independent of exposure to cigarettes, alcohol, and AQ.
From our analysis, we found that the effects of ADH1B and ALDH2 are not limited to cancer onset, but also contribute to cancer prognosis. 28,29The prognostic value of ADH1B was seldom investigated in the literature.ADH1B (rs122984) was associated better OS in HNSCC (p = 0.030) and OPSCC (p = 0.042) patients, but not in oral cavity cancer patients (p = 0.433).The effect of ADH1B on HNSCC survival has rarely been reported before.Lee et al. found that pre-diagnosis alcohol consumption was significantly related with worse OS of HNSCC patients.ADH1B and ALDH2 modified the relationship between alcohol use and OS of HNSCC patients. 30Kagemoto et al. 28 demonstrated that ADH1B and ALDH2 were related to survival in esophageal cancer patients. 18Lee et al. speculated that the influence of alcohol, ADH1B and ALDH2 on OS could be related with the advanced stage of HNSCC. 30We have previously demonstrated that ADH1B (*1 allele carriers) significantly increase the risk of developing multiple primary tumors in the upper digestive tract (OR, 2.093; 95% CI: 1.149-3.812). 18he influence of survival by AHD1B could come from the increased risks of multiple primary tumors in susceptible genetic carriers.The underlying mechanisms need to be investigated in a larger population and longer follow-up period.
HPV infection was reported to play important roles in oropharyngeal cancers. 31The limitation of this study is Abbreviations: AQ, areca-quid chewing; CI, confidence interval; HNSqCC, head and neck squamous cell carcinoma; HR, hazard ratio.
The bold values stand for p value < 0.05.lack of adjustment of HPV infection in OPSCCs.We used PCR-based method to detect HPV infection in a small subset of OPSCCs in this study (n = 144). 32The rate of HPV infection OPSCCs was low (2.1%, data not shown).Samples in our study were recruited since 1996 and the infection rate in OPSCCs could be low in patients recruited two decades ago. 31From our study, it proves the genetics also play a role in the tumor formation in HNSCCs.
Although OSCC and OPSCC located within the field of head and neck, we found that the susceptible gene loci in OPSCCs are different from OSCCs.Alcohol-metabolizing genes are more important in the tumor formation in F I G U R E 5 (A) In oro-/ hypopharyngeal squamous cell carcinomas, the alcohol dehydrogenase 1B rs1229984 GG genotype had better disease-free survival (DFS) compared to the AA and AG genotypes (p = 0.010).(B) The GG genotype had better overall survival (OS) compared to the AA and AG genotypes (p = 0.042).

T A B L E 1
Tumor subsite distributions in the study cohort (n = 698).F I G U R E 1 Manhattan plot of the genome-wide association results.The y-axis corresponds to −log10 p-values and the x-axis corresponds to the genomic positions.The horizontal red line (p = 5 × 10 −8 ) denotes the false discovery rate.(A) Head and neck squamous cell carcinomas.(B) Oro-/hypopharyngeal squamous cell carcinomas (OPSCCs).(C) Oral cavity squamous cell carcinoma.A clustered region of single-nucleotide polymorphisms in chromosome 4 (blue arrow) were found in OPSCCs (B).(standard deviation [SD]: ±9.290), 55.96 (SD: ±10.717), and 49.59 years (SD: ±9.845), respectively.The GG genotype had a significantly lower age of onset for OPSCCs among alcohol users compared to other genotypes (p = 0.001 and <0.001, respectively).The average ages at onset for rs671 GG (n = 117), GA (n = 221), and AA F I G U R E 2 Linkage disequilibrium diagram of chromosome 4.We successfully identified a missense alteration of the single-nucleotide polymorphism region in alcohol dehydrogenase 1B (https://genome.ucsc.edu;hg38); the top correlated locus was rs1229984 (p = 1 × 10 −11 ).F I G U R E 3 Manhattan plot of the genome-wide association results after adjustment of environmental exposures of alcohol, smoking, and areca quid.Chromosome 12 was significantly related with all head and neck cancers (arrow).(A) Head and neck squamous cell carcinomas.(B) Oro-/hypopharyngeal squamous cell carcinomas.(C) Oral cavity cancers.
Nonsynonymous SNPs in ADH1B (rs671) and ALDH2 (rs1229984) and head and neck cancer, oro-/hypopharyngeal cancer and oral squamous cell carcinoma in univariate analysis.
The hazard ratio of rs671 and rs1229984 in the tumorigenesis after logistic regression adjusting environmental exposure in different tumor locations.
T A B L E 3