Identification of novel susceptibility loci for non‐syndromic cleft lip with or without cleft palate

Abstract Although several genome‐wide association studies (GWAS) of non‐syndromic cleft lip with or without cleft palate (NSCL/P) have been reported, more novel association signals are remained to be exploited. Here, we performed an in‐depth analysis of our previously published Chinese GWAS cohort study with replication in an extra dbGaP case‐parent trios and another in‐house Nanjing cohort, and finally identified five novel significant association signals (rs11119445: 3’ of SERTAD4, P = 6.44 × 10−14; rs227227 and rs12561877: intron of SYT14, P = 5.02 × 10−13 and 2.80 × 10−11, respectively; rs643118: intron of TRAF3IP3, P = 4.45 × 10−6; rs2095293: intron of NR6A1, P = 2.98 × 10−5). The mean (standard deviation) of the weighted genetic risk score (wGRS) from these SNPs was 1.83 (0.65) for NSCL/P cases and 1.58 (0.68) for controls, respectively (P = 2.67 × 10−16). Rs643118 was identified as a shared susceptible factor of NSCL/P among Asians and Europeans, while rs227227 may contribute to the risk of NSCL/P as well as NSCPO. In addition, sertad4 knockdown zebrafish models resulted in down‐regulation of sox2 and caused oedema around the heart and mandibular deficiency, compared with control embryos. Taken together, this study has improved our understanding of the genetic susceptibility to NSCL/P and provided further clues to its aetiology in the Chinese population.


| INTRODUC TI ON
Orofacial clefts are among the most common craniofacial birth defects worldwide with an overall prevalence of one per 700 live births. 1,2 Affected individuals face feeding difficulties and typically require multiple repair surgeries, therapeutic dental procedures and speech therapy throughout childhood. 3 Approximately 70% of orofacial clefts are non-syndromic orofacial clefts (NSOCs), which are commonly categorized as non-syndromic cleft lip with or without cleft palate (NSCL/P) and non-syndromic cleft palate only (NSCPO), due to the different developmental origins of the lip and palate. 4 The aetiology of NSCL/P is related to both genetic susceptibility and epidemiological risk factors such as maternal smoking and alcohol consumption. [5][6][7][8] In the past few years, genome-wide association studies (GWAS) have successfully identified thousands of loci associated with complex traits and diseases, including NSCL/P. To date, more than forty susceptibility loci have been reported to be associated with NSCL/P risk, which aid in understanding biological mechanisms involved in the risk of NSCL/P. [9][10][11][12][13][14] In our previous work, we conducted a case-control-based GWAS followed by two rounds of replication and identified five genome-wide significant common variant signals that influence the risk of NSCL/P. 13 However, the currently confirmed NSCL/P risk loci explain only a fraction of the heritability of NSCL/P. The extent of genetic contribution, including that attributable to common variants, remains largely unknown. One of the reasons is that the threshold of GWAS is very strict, which leads to high false negative. Therefore, studies focusing on SNPs with relatively moderate P values of GWAS were demonstrated to be helpful and useful in improving the understanding of the missing heritability of GWAS. For instance, by concentrating on the SNPs with relatively moderate P values in the GWAS, Lin et al identified additional loci by testing promising associations in an extended 3-stage validation consisting of 6053 coronary heart disease (CHD) cases and 7410 controls. 15 Wang et al selected 16 significant but unreplicated SNPs from stage 1 of a GWAS analysis to validate their association with the risk of NSCL/P and identified an independent locus in 10q25.3 that was associated with NSCL/P. 16 Furthermore, the database of Genotypes and Phenotypes (dbGaP) is a highly utilized tool for sharing individual-level data and summary-level data from GWAS, sequencing studies and other large-scale genomic studies, 17 which was widely used in a large number of researches to increase the understanding of the genetic architecture. 11,12,18 In the current study, to explore additional promising signals from our previously published Chinese GWAS cohort study, we performed an in-depth analysis of data from that study, focusing on the risk loci with P values ranging from 10 −3 to 10 −5 that did not reach genome-wide significance in the previous GWAS, and then followed by two replications in additional dbGaP case-parent trios and an in-house case-control cohort. We identified five novel significant association signals for NSCL/P. Among them, rs11119445, rs227227 and rs12561877 reached the genome-wide significance and rs643118 was a shared NSCL/P susceptibility variant between Asian and European populations. Then, we calculated the weighted genetic risk score (wGRS) of the susceptibility loci based on odds ratio of each variant from the replication cohort to assess the predictive ability. Furthermore, morphological defects in embryos were analysed to reveal the potential functional role of genes during zebrafish embryogenesis.

| Primary GWAS data
As shown in Table 1 and previously reported, 13 the primary GWAS data consisted of two independent cohorts which were respectively derived from Huaxi (504 NSCL/P cases and 455 newborn controls) and Nanjing (354 NSCL/P cases and 793 controls). All samples from the Huaxi cohort and NSCL/P cases from the Nanjing cohort were genotyped using Affymetrix Axiom Genome-Wide CHB1 and CHB2 arrays by the CapitalBio corporation (1,280,786 single nucleotide polymorphisms, SNPs); the controls from the Nanjing cohort were from a previous study 15  and to improve coverage of the genome, we used imputed data as a control for the Nanjing cohort. The principal component analysis (PCA) in discovery stage indicated that the cases and controls were genetically matched, without evidence of gross population stratification, which has been described previously. 13 After the basic quality control, we extracted best-guess genotype data for SNPs with imputation quality info >0.8 and minor allele frequency (MAF)>0.05 of sex-matched individuals and combined them with the genotype data of the Nanjing cases. 13

| Replication samples
The imputed data of the International Consortium to Identify Genes

| DNA extraction and genotyping (second-stage replication)
Genomic DNA from the second-stage replication cohort was isolated from peripheral blood lymphocytes of all subjects using the conventional phenol-chloroform method and the Qiagen Blood kit.
TaqMan assays (Applied Biosystems) were performed to genotype all the samples. Genotype analysis was performed by investigators blinded to the case/control status. Approximately 5% of the samples were randomly selected for repeated analysis. Further information on the primers and probes is shown in Table S1.

| Weighted genetic risk scores
To develop a risk scoring system based on genetic markers and assess their predictive ability, we used five susceptibility SNPs (rs11119445, rs227227, rs12561877, rs643118 and rs2095293) to calculate weighted genetic risk score (wGRS) values in the secondstage replication. The wGRS was calculated by multiplying the number of risk alleles for each SNP by its weight according to the following formula: where k is the number of SNP replicates in the study, which equals 5; βi is the weight of each SNP, which is the natural log of the odds ratio for each allele; and SNPi is the number of copies of the risk allele (0, 1 or 2).

| Gene expression during mouse craniofacial development and human dental pulp stem cell cultures (DPSCs)
Gene expression during growth and fusion of the facial prominences in the C57BL/6J mouse strain during embryonic days (E) 10.5-14.5 were downloaded from the GEO data set (GSE67985). 20 Processed microarray expression data from dental pulp stem cell cultures (DPSCs) of NSCL/P patients (N = 7) and controls (N = 6) were searched from EMBL-EBI (E-GEOD-42589) to assess differences in expression levels for the associated genes. 21

| Microinjection of morpholino oligos
The translation-blocking morpholino antisense oligonucleotide (MO) against sertad4 and standard control MO were

| CRISPR/Cas9-mediated genome editing (crispant) of sertad4 in zebrafish embryos
The zebrafish sertad4 gene sequences were obtained from the zebrafish information network (www.zfin.org). Single guide RNAs (sgRNAs) were designed using the CRISPRscan algorithm 22  For the CRISPR/Cas9 microinjection, Cas9 protein (GenScript, Z03388-100) and sgRNA mix were prepared and zebrafish embryos were injected directly with 200 ng/μL and 100 ng/μL of Cas9 and sgRNA per embryo, respectively. To confirm genome editing, direct sequencing of PCR products was applied.

| Plasmid construction
The coding sequence (CDS) of human and zebrafish nr6a1 was amplified and cloned into pXT7-NR6A1-Human and pXT7-nr6a1a-Zebrafish between the restriction enzyme site EcoRI and XhoI, respectively. All constructs were confirmed by Sanger sequencing. For the over-expression experiment, 50 pg human NR6A1 or zebrafish nr6a1 mRNAs per embryo was injected into one-cell stage embryos.

| Statistical analyses
During the discovery stage, we used a fixed-effects inverse variance method when there was no indication of heterogeneity; otherwise, we adopted a random-effect model for the corresponding SNPs. The P value for heterogeneity was calculated using Cochran's Q, and the proportion of the total variation was quantified by I 2 statistic. In the replication stage, the association between each SNP and NSCL/P risk in case-parent trios was evaluated via the transmission disequilibrium test (TDT). The demographic characteristics of cases and controls were analysed by the chi-squared (χ 2 ) test.

| Study overview
To discover additional susceptibility variants for NSCL/P in the Chinese population, we first conducted a meta-analysis of two previously published GWAS, totalling 2,106 individuals from the Chinese population and 842,556 genetic variants that passed quality control (Table 1 and Figure 1).
Then, based on the selection criteria (see Materials and Methods), a total of 391 SNPs were chosen for replication (Table S2) in the Asian group of the dbGaP database, and 6 SNPs were selected and further replicated in an independent cohort of 1,050 cases and 919 controls.

| The prediction value of the identified variants for NSCL/P
The wGRS was based on the ORs reported for the cumulative effect of multiple genetic risk variants, which were calculated from F I G U R E 2 Regional plot of the five newly identified SNPs.

| Associations between SNPs and risk of NSCL/P across racial groups
We further evaluated the above six genetic variants in NSCL/P case-parent trios in the dbGaP database across racial groups.
Rs643118 was significantly associated with an increased risk of NSCL/P in Europeans (P = 2.76 × 10 −3 ), which was consistent with the findings in the Asian populations. However, the other five NSCL/P risk SNPs selected in the Asian group of the dbGaP database were not replicated in populations of European ancestry ( Figure 3).

| Annotation and functional assessment of genetic variants
As predicted by HaploReg v4.1, rs11119445 and rs12561877 have regulatory effects on the transcriptional enhancer factor-1 (TEF-1) motifs (Table S3). The 3D chromatin looping data in blood demonstrated that the interacting genes of rs11119445 are SERTAD4 and SERTAD4-AS1 ( Figure S2). We conducted eQTL analysis based on the GTEx database and found that rs643118 exhibited a significant association with the expression of TRAF3IP3 (P = 1.60 × 10 −12 ) in whole blood ( Figure S3).

| Gene expression in mouse craniofacial structures and human DPSCs
We examined the gene expression in the proximal and distal maxilla of mice during embryonic E10.5-E14.5 period ( Figure S4). The expression levels of the genes (Sertad4, Syt14, Traf3ip3, Nr6a1) vary greatly over that period. Comparison of Nr6a1 expression in the proximal and distal parts of the maxilla and mandible showed different expression patterns relative to the other three genes. Then, we compared microarray expression data from DPSCs of NSCL/P cases (n = 7) and controls (n = 6) and found that SERTAD4 was significantly down-regulated (P = .039), while NR6A1 was significantly up-regulated in the DPSCs of NSCL/P patients (P = .008, Figure S5).

| Effects of candidate genes in zebrafish embryo models
To explore the functional roles of the relevant genes during zebrafish embryogenesis, the zebrafish embryos after microinjection or zebrafish nr6a1a did not exhibit any significant abnormalities ( Figure 4B).

| D ISCUSS I ON
The present study investigated the additional promising signals re- Interestingly, of the five NSCL/P risk variants, only rs643118 was a susceptibility variant shared between Asian and European populations, suggesting obvious genetic heterogeneity between these two populations. Furthermore, we generated zebrafish models of the candidate genes based on the existing databases to explore the functional roles of the genes during embryogenesis.
The genetic region, 1q32.2, was demonstrated by multiple GWAS to be associated with NSCL/P and NSCPO. The LD between our newly identified SNPs (rs11119445, rs227227, rs12561877 and rs643118) and other SNPs in this region reported to be associated with NSCL/P is low (r 2 < .5 in 1000 Genomes Project data from an Asian population) (Table S4). Interestingly, rs227227 was in moderate LD with rs2485893 (r 2 = .6, 10 kb 3' of SYT14) whose G allele was associated with NSCPO among Western Han Chinese, 28 indicating that rs227227 may be a shared susceptibility factor for NSCL/P and NSCPO among the Chinese population. Together, these results indicate that our finding represents novel significant association signals and illustrates the complex genetic architecture of 1q32.2.
The associated SNPs at the 1q32.2 locus span 8.7 kb 3' of SERTAD4, SYT14 and TRAF3IP3. We evaluated the LD pattern of the significant SNPs at the 1q32.2 and found that almost all of the SNPs that in high LD with them were in these three genes on the Chr 1. Little is known about SERTAD4 in the craniofacial development, which is a conserved orthologue of the SERTA domain family.
Proteins containing the SERTA domain have previously been linked to cell cycle progression and chromatin remodelling. 29 The sertad4 knockdown in our zebrafish model induced mandibular deficiency, heart failure and down-regulation of Sox2 protein which had been implicated in various processes of early embryogenesis. 27,30 SYT14 participates in pathomechanical neurodegeneration and contributes to abnormal neurodevelopment. 31 Previous studies found that the expression of Syt14 was highly restricted to the mouse heart and testis but absent in the brain, suggesting that Syt14 may be involved in membrane trafficking in specific tissues other than the brain. 32 RNA-mediated SYT14 knockdown can inhibit proliferation and colony formation and promote apoptosis of glioma cells. 33 TRAF3IP3 is expressed in the immune system and participates in cell maturation, tissue development and immune response. 34 It was reported to be one of the network hubs, which suggested a potential role in the head and neck squamous cell carcinoma evolution mechanisms related to inflammation and the microenvironment. 35 Our study identified a new risk locus at 9q33.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.