Evaluating shared genetic influences on nonsyndromic cleft lip/palate and oropharyngeal neoplasms

Abstract It has been hypothesised that nonsyndromic cleft lip/palate (nsCL/P) and cancer may share aetiological risk factors. Population studies have found inconsistent evidence for increased incidence of cancer in nsCL/P cases, but several genes (e.g., CDH1, AXIN2) have been implicated in the aetiologies of both phenotypes. We aimed to evaluate shared genetic aetiology between nsCL/P and oral cavity/oropharyngeal cancers (OC/OPC), which affect similar anatomical regions. Using a primary sample of 5,048 OC/OPC cases and 5,450 controls of European ancestry and a replication sample of 750 cases and 336,319 controls from UK Biobank, we estimate genetic overlap using nsCL/P polygenic risk scores (PRS) with Mendelian randomization analyses performed to evaluate potential causal mechanisms. In the primary sample, we found strong evidence for an association between a nsCL/P PRS and increased odds of OC/OPC (per standard deviation increase in score, odds ratio [OR]: 1.09; 95% confidence interval [CI]: 1.04, 1.13; p = .000053). Although confidence intervals overlapped with the primary estimate, we did not find confirmatory evidence of an association between the PRS and OC/OPC in UK Biobank (OR 1.02; 95% CI: 0.95, 1.10; p = .55). Mendelian randomization analyses provided evidence that major nsCL/P risk variants are unlikely to influence OC/OPC. Our findings suggest possible shared genetic influences on nsCL/P and OC/OPC.

There are several practical limitations of comparing cancer incidence in nsCL/P cases with incidence in the general population. First, the co-occurrence of nsCL/P and cancer is relatively modest because of the modest prevalence of nsCL/P (around 1 in 700; Mossey et al., 2009), and the two phenotypes typically differ in timing of onset. Second, cancers are highly heterogeneous across subtypes (e.g., childhood leukaemia and colorectal cancer), are highly distinct and stratifying by cancer subtype would further reduce statistical power. Third, syndromic and nonsyndromic forms of CL/P have different aetiologies and can be difficult to distinguish, suggesting that combining them together could be problematic if they have differences in cancer risk.
An alternative approach to evaluating the cooccurrence of nsCL/P and cancer is to explore shared genetic risk factors. Previous studies have found several genes implicated in both nsCL/P and cancer (Dunkhase et al., 2016), notably CDH1 (Hozyasz et al., 2014;Machado et al., 2017;Vogelaar et al., 2012) which is linked to gastric and breast cancer (Pharoah, Guilford, & Caldas, 2001), and AXIN2 (Letra, Menezes, Granjeiro, & Vieira, 2009;Letra et al., 2012;Machado et al., 2017;Mostowska et al., 2012) which is associated with colorectal cancer and tooth agenesis (Liu et al., 2000;Lammi et al., 2004). These findings suggest that common biological pathways may influence nsCL/P and cancer risk, but the extent to which the two phenotypes share genetic risk factors is currently unclear.
Two methods for evaluating shared genetic risk factors between two traits are polygenic risk scores (PRS) and Mendelian randomization (Davey Smith & Ebrahim, 2003). PRS are scores consisting of multiple genetic variants associated with a phenotype that can be used to evaluate genetic overlap between two traits (Dudbridge, 2013). Similarly, Mendelian randomization uses genetic variants robustly associated with a trait, typically at genome-wide significance, in an instrumental variable framework to evaluate possible causal relationships (Davey Smith & Hemani, 2014). In this instance, Mendelian randomization can be applied to test the possibility that common nsCL/P genetic variants, a latent measure of an individual's underlying liability to nsCL/P, influence cancer risk. Previous studies have demonstrated the utility of nsCL/P PRS and Mendelian randomization for disentangling mechanistic relationships (Dardani et al., 2020;Howe et al., 2018Howe et al., , 2019. For example, these approaches were used to provide evidence of shared genetic influences between nsCL/P and facial morphology (Howe et al., 2018).
A group of cancers that are strong candidates for shared genetic aetiology with nsCL/P are cancers of the oral cavity and oropharynx (OC/OPC) because of the similar anatomical sites affected. Major risk factors for OC/OPC include alcohol consumption, tobacco use and human papillomavirus infection (IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 2007) but OC/OPC also has a substantial heritable component (Lesseur et al., 2016). The possibility of shared genetic risk factors between nsCL/P and OC/OPC has not been previously investigated, possibly because of the relative rarity of both phenotypes.
Here, we first constructed nsCL/P PRS at a number of different thresholds using nsCL/P genome-wide association study (GWAS) summary data. We then evaluated the association between the nsCL/P PRSs and OC/OPC using individual participant data on OC/OPC cases and controls using data from the largest OC/OPC GWAS (Lesseur et al., 2016) and UK Biobank. To differentiate between the two datasets, we refer to the samples as the International Agency for Research on Cancer (IARC) and UK Biobank samples, respectively. We then applied Mendelian randomization to both data sets to evaluate a potential causal effect of liability to nsCL/P, a latent measure of nsCL/P proxied by common nsCL/P variants, and OC/OPC. Finally, we used data from UK Biobank to explore associations between nsCL/P PRS and potential shared risk factors between nsCL/P and OC/OPC (alcohol/tobacco measures).
2 | METHODS 2.1 | Data sources 2.1.1 | Summary data from nsCL/P GWAS meta-analysis We used data from two nsCL/P GWAS, which have previously been meta-analysed and published (Beaty et al., 2010;Ludwig et al., 2012;Mangold et al., 2010). The combined summary statistics were not publicly available, so meta-analysis summary statistics were reconstructed by the authors, this has been described in detail previously (Howe et al., 2018(Howe et al., , 2019. In brief, a Transmission Disequilibrium Test (TDT) was implemented in a genome wide association study of 638 parent-offspring trios and 178 offspring duos of European descent (Beaty et al., 2010). The TDT results were then meta-analysed with GWAS summary results on 399 cases and 1,318 controls from the Bonn-II study, also of European descent (Mangold et al., 2010).

| IARC OC and OPC cancer GWAS data set
In this study, we used a data set of OC/OPC cases and controls which were part of a previous OC/OPC GWAS (Lesseur et al., 2016). In brief, the data set includes 6,034 OC, oropharyngeal or hypopharyngeal cancer cases and 6,585 controls. More information on the study design, genotyping, and phenotyping is contained in the Supporting Information Methods.
For analyses, the data set was restricted to 5,048 cases and 5,450 controls of recent European ancestry (confirmed by principal components analysis), which were split into two subsamples, based on the continent of the study centre (North America and Europe).

| UK Biobank
UK Biobank is a large-scale cohort study of 502,655 participants aged between 40 and 69 years, who were recruited from 22 recruitment centres across the United Kingdom between 2006 and 2010. In UK Biobank, we identified OC/OPC cases and controls using secondary care data from Hospital Episode Statistics (HES) as well as the death and cancer registers. For secondary analyses, we used data on self-reported alcohol consumption and tobacco smoking which were collected at baseline using a questionnaire. More information on the study design, genotyping and phenotyping is contained in the Supporting Information Methods.
For analyses, we used a subset of the study of 750 OC/ OPC cases and 336,319 controls, after restricting to individuals of self-reported "White British" descent and using kinship coefficients to remove individuals related to the greatest number of other individuals.

| Statistical analyses
2.2.1 | The association of nsCL/P PRS with OC/OPC in the IARC sample nsCL/P PRS were defined using the nsCL/P metaanalysis GWAS summary statistics. The summary statistics were LD clumped (r 2 < .1 and 250 kb) at 11 different p value inclusion thresholds (.000001, .000005, .00001, 0.00004, .0001, .0005, .001, .005, .01, .05, .1) to generate sets of independent (in terms of LD) variants in each score. The PRS was then constructed using effect alleles and weightings based on the magnitude of effect on nsCL/P of each genetic variant taken from the meta-analysis GWAS summary statistics. LD clumping was performed using PLINK (Purcell et al., 2007) with the 1000 Genomes (Phase 3; Genomes Project Consortium, 2015) CEU samples used as the reference panel.
Next, the 11 different nsCL/P PRS were constructed separately in the European and North American OC/ OPC case-control sample using PLINK (Purcell et al., 2007) with an individual's PRS defined as the sum of weights across all variants in the PRS of the nsCL/P multiplied by the number of alleles. The associations between the nsCL/P PRS and OC/OPC case-control status were then estimated in the two subsamples using logistic regression adjusting for the first 10 genetic principal components, sex and age. Analyses were run separately for; all cases, OC cases only and oropharyngeal cases only (OPC). As a sensitivity analysis, we additionally removed cases and controls with less than 70% CEU ancestry. The effect sizes, standard errors and p values from the European and North American subsamples were meta-analysed using a fixed-effects model.

| The association of nsCL/P PRS with OC/OPC in UK Biobank
The UK Biobank data set was used as a follow-up data set for analyses in the OC/OPC GWAS data set. For the analyses in UK Biobank, we used the nsCL/P PRS most strongly associated with OC/OPC in the OC/ OPC GWAS data set, tested the association of this PRS with OC/OPC. The limited number of OC/OPC cases (N = 750) in our sample meant that it was not possible to stratify by OC/OPC subtype, so all cases were analysed together. All analyses were adjusted for the first 10 genetic principal components, age, and sex.
2.2.3 | Mendelian randomization analysis of liability to nsCL/P on OC/OPC Mendelian randomization was then applied to evaluate a causal relationship between liability to nsCL/P and OC/OPC. As genetic instruments for liability to nsCL/P, we used six genome-wide significant single nucleotide polymorphisms (SNPs), as in a previous study (Howe et al., 2018;Ludwig et al., 2012; Table S2). Chosen variants have large effects on nsCL/P risk (Ludwig et al., 2012) suggesting that they can be utilised as effective genetic instruments. Although recent nsCL/P GWAS have identified additional genome-wide significant markers in European populations (Leslie et al., 2017;Ludwig et al., 2016), we were limited by available summary data. SNP data for liability to nsCL/P were extracted from the nsCL/P meta-analysis GWAS summary statistics.
SNP data for OC/OPC were extracted from a metaanalysis of the IARC GWAS sample (European and North American subsamples) and UK Biobank (Lesseur et al., 2016). The GWAS in the IARC sample has been previously described in detail (Lesseur et al., 2016). In UK Biobank, the GWAS of 750 cases and 336,319 controls was conducted using logistic regression in PLINK v2.00 (Purcell et al., 2007). Sex, age, and the first 10 principal components were fitted as covariates in the model. The GWAS summary data from all data sets were metaanalysed using METAL (Willer, Li, & Abecasis, 2010) to generate combined summary statistics for OC/OPC.
We then used the TwoSampleMR  package in R to conduct Mendelian randomization analyses; reporting the inverse-variance weighted estimate as the primary analysis. We also used several sensitivity analyses; MR Egger, weighted median, weighted mode and Cochran's Q heterogeneity test (Hemani, Bowden, & Davey Smith, 2018).

| nsCL/P PRS, alcohol consumption, and tobacco smoking
We tested the nsCL/P PRS (including SNPs with p < 0.1) for association with alcoholic units consumed per week and tobacco smoking pack years. All analyses were adjusted for sex, age and the first 10 genetic principal components.

| nsCL/P PRS and risk of OC/OPC in the IARC sample
We found strong evidence for an association between nsCL/P PRS and increased risk of OC/OPC in the IARC sample of 5,048 cases and 5,450 controls. The strongest association was for the nsCL/P PRS including SNPs with p < .1, where a 1 standard deviation (SD) increase in nsCL/P PRS was associated with increased odds of PC/OPC (odds ratio [OR]: 1.09; 95% confidence interval [CI]: 1.04, 1.13; p = .000053). PRS with more liberal inclusion thresholds (e.g., p < .05 and p < .1), which included thousands of SNPs, were more strongly associated with risk of OC/OPC than more conservative inclusion thresholds (Figure 1 and Table 1).

| nsCL/P PRS and risk of OC/OPC in UK Biobank
In the UK Biobank sample of 750 cases and 336,319 controls, we attempted to replicate the association between the nsCL/P PRS most strongly associated with OC/OPC in the IARC sample (p value inclusion threshold < .1). Here, we found no clear evidence that this PRS is associated with increased risk of OC/OPC (OR: 1.02; 95% CI: 0.95, 1.10; p = .55). However, confidence intervals overlapped with the IARC sample estimate (OR: 1.09 95% CI: 1.04, 1.13) and the meta-analysis of the two estimates suggested evidence for an association (OR: 1.07; 95% CI: 1.03, 1.11; p = .0009).

| Mendelian randomization:
Liability to nsCL/P and risk of OC/OPC Using combined GWAS summary data from the IARC and UK Biobank samples, 5,798 cases and 341,769 controls, we found little evidence for an effect of liability to nsCL/P on OC/OPC; a 1-unit log odd increase in liability to nsCL/P was not strongly associated with OC/OPC (MR IVW: OR: 0.99; 95% CI: 0.95, 1.05; p = .84). Mendelian randomization sensitivity analyses generated consistent effect estimates (Table 2).

| nsCL/P PRS, alcohol consumption and tobacco smoking
To evaluate whether shared genetic effects could relate to interactions of the nsCL/P PRS with shared environmental risk factors we investigated the genetic overlap between the PRS and measures of alcohol consumption and tobacco smoking in UK Biobank. Again, we used the nsCL/P PRS most strongly associated with OC/OPC in the IARC sample (p < 0.1). We did not find strong evidence that the nsCL/P PRS was associated with any of the measures tested. A 1 SD increase in nsCL/P PRS corresponded to a −0.02 pack year decrease in lifetime smoking (95% CI: −0.07, 0.03; p = .46; N = 284,144) and a −0.05 units per week decrease in alcohol use (95% CI: −0.11, 0.01; p = .11; N = 336,026).

| DISCUSSION
In this study, we found some evidence to suggest that nsCL/P PRS are associated with modestly increased risk of OC/OPC. Follow-up Mendelian randomization analyses using well established nsCL/P risk SNPs as a latent measure of nsCL/P suggested no consistent association between key common nsCL/P variants and OC/OPC. These findings are consistent with our initial hypothesis that nsCL/P and OC/OPC share some genetic risk factors as opposed to the possibility of a causal relationship, which has previously been demonstrated for nsCL/P and philtrum width (Howe et al., 2018).
The specific shared genetic influences and relevant biological pathways that may have induced the association between the nsCL/P PRS and OC/OPC are currently unclear. While the Mendelian randomization approach includes only replicated variants with strong associations, the PRS approach is much more liberal and will include many variants with weaker associations. The increased variation explained by the PRS approach comes with the caveat that many variants of small effects will be included which have less clear direct relevant to nsCL/P. The interpretation of what a PRS is proxying for becomes increasingly complex as the number of SNPs included in the F I G U R E 1 Association of nsCL/P PRS with OC/OPC in the International Agency for Research on Cancer sample. A forest plot showing the associations between different nsCL/P PRS, ranging from 10 SNPs to over 10,000 SNPs, and OC/OPC risk. nsCL/P, nonsyndromic cleft lip/palate; OC, oral cavity; OPC, oropharynx; PRS, polygenic risk score; SNP, single nucleotide polymorphism score increases. Indeed, the nsCL/P PRS most strongly associated with OC/OPC included over 10,000 SNPs. One possibility is that the genetic overlap may be attributable to certain genes being involved both in early development and tumour suppression. For example, the CDH1 gene, thought to be related to nsCL/P and cancer subtypes, has been shown to be related to both axonal growth and patterning in the developing murine brain (Konishi, Stegmüller, Matsuda, Bonni, & Bonni, 2004), and tumour suppression (Berx, Becker, Höfler, & Van Roy, 1998). Another possibility is that the nsCL/P PRS may capture environmental influences relating to both nsCL/P and cancer risk. Maternal behaviour such as cigarette smoking and alcohol consumption have been hypothesised to influence nsCL/P risk and both are established risk factors for OC/OPC. Given that the maternal genotype is correlated with the offspring genotype, it is plausible that nsCL/P PRS could capture effects of maternal environmental factors. The foetal genotype may also play an important role in the effect of environmental exposures on risk of orofacial clefts. For example, there is some evidence that the maternal and foetal ADH1C haplotype may modify the association between maternal alcohol consumption and risk of orofacial clefts via alcohol metabolism (Boyles et al., 2010). We did not find strong evidence that nsCL/P PRS are associated with alcohol and cigarette use in UK Biobank but were unable to test pathways relating to metabolism.
The findings of this study are consistent with previous genetic studies that have found evidence of loci associated with both nsCL/P and various cancer subtypes (Dunkhase et al., 2016;Hozyasz et al., 2014;Letra et al., 2009Letra et al., , 2012Machado et al., 2017;Mostowska et al., 2012;Vogelaar et al., 2012). Our results are also consistent with a previous study exploring adult-onset cancers in nsCL/P cases, which was underpowered to detect a modest effect (Bille et al., 2005).
This study is the first to investigate the genetic overlap between nsCL/P and OC/OPC. Previous epidemiological and genetic studies have explored the relationship between nsCL/P and all cancers, however, cancers arising from different organs may be aetiologically heterogeneous. Although similar results were found in analyses for the OC and OPC subtype analyses with these subtypes also aetiologically heterogeneous. In Western populations, smoking and alcohol are established risk factors for both OC and OPC, while HPV status is thought to be a risk factor for OPC only (Lesseur et al., 2016). Future work could evaluate genetic overlap between nsCL/P and other cancer subtypes to determine if the association is consistent across other cancer subtypes or localised specifically to oral tissues.
The use of a PRS as a genetic proxy for nsCL/P in our analyses is a considerable strength. PRS analyses have advantages over candidate gene or candidate SNP approaches used in previous studies (Dunkhase et al., 2016;Hozyasz et al., 2014;Letra et al., 2009Letra et al., , 2012Machado et al., 2017;Mostowska et al., 2012;Vogelaar et al., 2012) because of the reduction in the number of statistical tests, and the potential to extend the evidence of genetic overlap at specific loci to examine genome-wide genetic overlap. Although the nsCL/P GWAS data set was modestly sized, the construction of nsCL/P PRS in a much larger OC/OPC GWAS data set meant that analyses were well-powered to detect a modest genetic overlap.
Nevertheless, there are several limitations of this study. First, the lack of a convincing replication of the PRS analysis in UK Biobank weakens the argument for genetic overlap between the two phenotypes. However, this may be because of the modest number of OC/OPC cases in the UK Biobank, with confidence intervals overlapping between the two studies. Second, the OC/OPC GWAS data set was highly ancestrally heterogeneous and included samples from 12 different epidemiological studies, including a case-only study (the Head and Neck 5000). Therefore, it is possible that allele frequency differences between cases and controls relating to population differences could result in spurious associations with the nsCL/P PRS, although this is unlikely given the number of SNPs in the PRS. Third, with insufficient genetic instruments for OC/OPC, we did not perform a bidirectional Mendelian randomization analysis so cannot rule out the possibility that OC/OPC variants consistently affect nsCL/P risk. Fourth, we were unable to run stratified analysis investigating the possibility of heterogeneity between HPV and non-HPV driven tumours. Fifth, in this study we treated nsCL/P subtypes (cleft lip with cleft palate, cleft lip only) homogeneously despite growing evidence that they are aetiologically distinct (Leslie et al., 2017;Ludwig et al., 2016;Sharp et al., 2017). This complicates the interpretation of the results because the OPC includes the soft palate while the OC includes the lips and hard palate. It is therefore, possible that the different nsCL/P subtypes may have different mechanistic relationships with the OC/OPC subtypes. Finally, we were unable to explore the effects of somatic mutations in the oral tissues which could influence both orofacial cleft risk and localized cancer.
To conclude, we found some evidence of shared genetic influences on nsCL/P and OC/OPC unrelated to alcohol or tobacco intake. Follow-up analyses, potentially using additional datasets such as the Cleft Collective (www.bristol.ac.uk/dental/cleft-collective/), are required to investigate the possible common biological pathways between nsCL/P and OC/OPC, and to evaluate possible mechanistic relationships between nsCL/P and cancer subtypes affecting distinct areas to orofacial clefts. R01-DE-013939 (Scott). Parts of this study were supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Wilcox, Lie) and additional recruitment was supported by the Smile Train Foundation for recruitment in China (Jabs, Beaty, Shi) and a grant from the Korean government (Jee). The genome-wide association study, also known the Cleft Consortium, is part of the Gene Environment Association Studies (GENEVA) program of the trans-NIH Genes, Environment and Health Initiative (GEI) supported by U01-DE-018993. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health (NIH) to The Johns Hopkins University, contract number HHSN268200782096C. Funds for genotyping were provided by the NIDCR through CIDR's NIH contract. Assistance with genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01-HG-004446) and by the National Center for Biotechnology Information (NCBI). The Bonn nsCL/P GWAS was supported by the Deutsche Research Foundation (grant FOR 423, and individual grants MA 2546/3-1 to E.M, and LU 1944/3-1 to K.U.L. Genotyping for the OC/OPC GWAS data set was performed at the Center for Inherited Disease Research (CIDR), funded through the U.S. National Institute of Dental and Craniofacial Research (NIDCR) grant 1X01HG007780-0. Genotyping for shared controls with the Lung OncoArray initiative was funded through the grant X01HG007492-0. The University of Pittsburgh head and neck cancer case-control study is supported by National Institutes of Health grants P50 CA097190 and P30 CA047904. The Carolina Head and Neck Cancer Study (CHANCE) was supported by the National Cancer Institute (R01-CA90731). The HN5000 study was funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research scheme (RP-PG-300 0707-10034), the views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.