A targeted sequencing panel identifies rare damaging variants in multiple genes in the cranial neural tube defect, anencephaly

Neural tube defects (NTDs) affecting the brain (anencephaly) are lethal before or at birth, whereas lower spinal defects (spina bifida) may lead to lifelong neurological handicap. Collectively, NTDs rank among the most common birth defects worldwide. This study focuses on anencephaly, which despite having a similar frequency to spina bifida and being the most common type of NTD observed in mouse models, has had more limited inclusion in genetic studies. A genetic influence is strongly implicated in determining risk of NTDs and a molecular diagnosis is of fundamental importance to families both in terms of understanding the origin of the condition and for managing future pregnancies. Here we used a custom panel of 191 NTD candidate genes to screen 90 patients with cranial NTDs (n = 85 anencephaly and n = 5 craniorachischisis) with a targeted exome sequencing platform. After filtering and comparing to our in‐house control exome database (N = 509), we identified 397 rare variants (minor allele frequency, MAF < 1%), 21 of which were previously unreported and predicted damaging. This included 1 frameshift (PDGFRA), 2 stop‐gained (MAT1A; NOS2) and 18 missense variations. Together with evidence for oligogenic inheritance, this study provides new information on the possible genetic causation of anencephaly.

Neural tube defects (NTDs) affecting the brain (anencephaly) are lethal before or at birth, whereas lower spinal defects (spina bifida) may lead to lifelong neurological handicap. Collectively, NTDs rank among the most common birth defects worldwide. This study focuses on anencephaly, which despite having a similar frequency to spina bifida and being the most common type of NTD observed in mouse models, has had more limited inclusion in genetic studies.
A genetic influence is strongly implicated in determining risk of NTDs and a molecular diagnosis is of fundamental importance to families both in terms of understanding the origin of the condition and for managing future pregnancies. Here we used a custom panel of 191 NTD candidate genes to screen 90 patients with cranial NTDs (n = 85 anencephaly and n = 5 craniorachischisis) with a targeted exome sequencing platform. After filtering and comparing to our in-house control exome database (N = 509), we identified 397 rare variants (minor allele frequency, MAF < 1%), 21 of which were previously unreported and predicted damaging. This NTDs are generally considered multifactorial with both genetic and environmental factors implicated in their etiology. Although most NTDs occur sporadically, there is a strong genetic component with a heritability estimate of up to 70%. 3 Despite this, the molecular basis in humans has proven difficult to resolve. This may be attributed to the high degree of heterogeneity between unrelated sporadic cases, incomplete penetrance and the general paucity of large individual families displaying Mendelian inheritance. The complex molecular requirements for normal neural tube closure are well illustrated by the occurrence of NTDs in more than 200 different mouse genetic models. 4 It has been estimated that up to 70% of potential NTD cases are preventable by the mother taking periconceptional supplements of folic acid. 5,6 However, evidence for causal mutations in genes regulating folate metabolism remain tenuous. 7 In humans, planar cell polarity (PCP) gene variants have been reported in heterozygous form throughout the NTD phenotypic spectrum. When tested, these variants are mostly found in an unaffected parent, prompting speculation regarding incomplete penetrance effects, which may be explained by co-inheritance of additional variants in other closely related or interacting genes that remain undetected using the methodologies employed. 8 This genetic interaction is evident in NTD mouse models doubly heterozygous for PCP genes, 9 suggesting that a comprehensive study of PCP genes in humans would be a valuable exercise.
The patient cohorts in most candidate gene studies contain mixed primary and secondary neurulation phenotypes, typically including only a small number of craniorachischisis or anencephalic cases. There is therefore a relative lack of information concerning the genetic basis of anencephaly in humans. In contrast, exencephaly, the developmental forerunner of anencephaly, is present in more than 90% of mouse mutants that transmit NTDs in a Mendelian fashion. It is also striking that the primary folate-sensitive NTD in mouse models is exencephaly, with less supporting evidence for primary prevention of mouse spina bifida by folic acid supplementation. 4,10,11 Even so, it is clear that both anencephaly and spina bifida can be prevented by folic acid in humans. 5,12 In addition, there is a marked excess in females in anencephaly and craniorachischisis compared to spina bifida. 1 Collectively, these data might suggest that genetic factors implicated in anencephaly and craniorachischisis could be different from those implicated in spina bifida cases.
In this study, we have carried out a comprehensive, targeted exome sequencing of fetuses with anencephaly using a panel of 191 genes with biological relevance to NTDs. Unlike the traditional gene-by-gene search method, this allowed us to test the hypothesis that multiple hits in different, but closely related genes may combine to provide an oligogenic explanation for NTD risk.

| NTD and healthy control human DNA samples
The sample group for this study comprised 85 fetuses with anencephaly and 4 with craniorachischisis that underwent pregnancy termination following prenatal diagnosis by ultrasound in the north east of England between 1992 and 2011. Although folic acid supplements were recommended from 1991 onwards there was no mandatory folate fortification of food for this population and folic acid supplement use, typically 400 μg/day, has been estimated to have an uptake of up to around 30%. 13

| Selection of NTD candidate genes
The targeted gene panel for NTDs included 191 genes that were selected by one of several criteria. These included genes that encode enzymes of folate one-carbon metabolism or components of the PCP or related cell polarity pathways that are implicated in normal neurulation in animal models. Additional members of these pathways were added even where a role in neurulation has not yet been directly indicated and also multiple gene family members, for example, CELSR1, CELSR2 and CELSR3, irrespective of whether there was a prior established link with NTDs. The panel design also included a number of prominent genes underlying NTDs in well-studied mouse models (eg, PAX3 and ZIC3; see gene panel list in Table S1 in Appendix S3, Supporting information).

| Target enrichment library preparation and sequencing
Custom capture libraries were generated from 3 μg gDNA using the SureSelectXT Reagent kit (Agilent Technologies, Santa Clara, California). RNA baits (

| Sequence alignment, variant annotation and filtering
An in-house pipeline was used for the sequence alignment, variant calling and annotation. FASTQ files were trimmed with Cutadapt 15 and the sequencing reads were aligned to the human reference genome (hg19) using the Burrows-Wheeler Aligner (v0.6.1-r104) 16
Rare variants were defined as minor allele frequency (MAF) <1% in the Exome Aggregation Consortium (ExAC) Database. 25 Variants that were specific to NTD patients and not present in the capture control (N = 6), in-house exome controls (N = 509) and in the dbSNP, 26 1000 Genomes Project, 27 ExAC, and the Genome Aggregation Database (gnomAD) 25 were labeled "Novel," and were verified by Sanger sequencing (Methods in Appendix S2).

| Mutation burden analysis
The burden of carrying multiple rare/novel variants within the 191 candidate genes was compared between the NTD and the ExAC cohorts. To allow an unbiased gene-level comparison, the sequencing coverage of the targeted regions was analyzed between the 2 cohorts.
The exon coordinates for the canonical transcripts of all 191 genes were extracted as a Browser Extensible Data (BED) file from the University of California Santa Cruz (UCSC) database, which was used as the reference sequence for target regions. The percentage coverage of the targeted bases at ≥×30 read depths was calculated using Picard (http://broadinstitute.github.io/picard/) for the NTD samples, and the coverage information for the ExAC data was downloaded from the ExAC database. The gene-level coverage between the cohorts was compared using Student's t test (2-tailed). Because the sample-level information is not available at the ExAC database, the number of each variant was compared in the burden analysis for consistency as described in D'Alessandro et al. 28 Only rare/novel variants predicted to be damaging were used for further analysis. Fisher's exact test (2-tailed) was used to assess the enrichment of rare/ novel variants on the genes showing comparable sequence coverage.
Bonferroni correction was performed by multiplying the each P-value with the number of genes used in the test (n = 13).

| RESULTS
On average, 97.5% of bases were covered at ≥×30 read depth across 191 genes, indicating a high targeting efficiency. A mean target coverage depth of ×110 was achieved with more than 96% of reads having Phred quality score greater than Q30 (corresponds to 99.9% base call accuracy). A total of 1380 variants, which passed quality control, were identified of which 1359 were single nucleotide variants (SNVs) and 21 were insertions/deletions (Indels) (Figure 1).

| Relatedness analysis
Principal component analysis identified 3 samples (52F03, 00F133 and 389F07) with higher variation than the rest of the samples which might be suggestive of different ethnic background ( Figure S1 in Appendix S1). Apart from the healthy control trios (child-motherfather) included in the capture sequencing, kinship analysis did not indicate relatedness between any of the samples ( Figure S2 in Appendix S1), including 2 NTD cases which carried an identical extremely rare variant in the CELSR1 gene (described below). Thr67Ile; rs28941785) in the cystathionine gamma-lyase (CTH) gene, which causes cystathioninuria in a recessive form. 33

| Unreported and rare variants
After filtering out intron, 3 0 and 5 0 UTR and synonymous variants, a total of 397 rare (MAF < 1%, including novel) variants were identified in 89/90 NTD cases from 110/191 of the panel of NTD candidate genes. Of these, 209 variants were predicted to be damaging by at least one of the mutation effect predictors described in Methods (summarized in Table S2 in Appendix S3). Following an additional filter to include our in-house exomes (n = 509) and publically available data (1000Genome, dbSNP, ExAC and gnomAD), there remained 21 novel variants that were predicted to be damaging (1 frameshift, 2 stop-gained and 18 missense; Table 1).
We identified 3 novel loss-of-function (LoF) variants. These  36 Disturbance in the methionine cycle has been found to cause NTDs in experimental models in the mouse. 31 However, clinical manifestations of MAT deficiency are variable, including no neurological abnormalities. 36 Also, a stop-gain variant c.1893C>A (p. Tyr631Ter) was detected in the nitric oxide synthase 2 (NOS2) gene ( Figure 2C). In ExAC, 20 LoF variants are reported in NOS2 with a pLI = 0, suggesting tolerance of LoF and low confidence of causal effect. However, NOS2 has been previously associated with a cranial NTD phenotype where A/G genotype of the rs4795067 SNP within NOS2 was shown to be associated specifically with increased cranial NTD risk. 37 One individual (283F06) was heterozygous for a novel missense variant in the catalytic N-terminal domain of the methylenetetrahydrofolate reductase (MTHFR) gene (c.601C>T; p.His201Tyr) ( Figure 2D), which was predicted to be damaging by all 6 mutation predictors tested (Table 1) (Table 1 and Table S2 in Appendix S3). SCRIB mutations have previously been implicated in human craniorachischisis. 40

| Genes harboring multiple novel and rare variants
In 51/191 genes we identified more than 1 novel and/or rare variants predicted to be damaging (Table S2 in Appendix S3). To assess for potential enrichment of rare/novel damaging variants in the NTD cohort (N = 90) compared to the ExAC controls (N = 60 706), a mutation burden analysis was carried out. To eliminate the possibility of identifying significant enrichment differences due to sequencing coverage variation between the cohorts, percentage coverage of the targeted regions at ≥30 read depth for all 191 genes were compared between the NTD cases and the ExAC controls. As expected, on average NTD cases showed higher percentage coverage, due to the nature of the targeted capture sequencing design. Therefore, further analysis was limited to genes showing percentage differences not greater than 3.5% between the cohorts. This resulted in 13 genes that were comparable to each other (97.5% in NTD cases vs 95.7% in controls; t test Pvalue = 0.13, 2-tailed), and 8 of these genes (COBL, FAT4, MTRR, PDGFRA, PRICKLE2, SALL4, TCN2 and TXN2) had at least 1 rare/novel variant predicted to be damaging which could be used for further analysis. Using Fisher's exact test, we identified 4 genes (COBL, FAT4, PDGFRA and TXN2) that showed significant enrichment in the NTD cases ( Table 2). Of these, COBL and FAT4 stayed statistically significant after the multiple test correction (Table S4 in Appendix S3).  predicted to be damaging. In the NTD capture controls (n = 4 unrelated), each carried 2 novel/rare variants on average, with approximately 1.5 of these predicted to be damaging (Table S3 in Appendix S3). Out of 90 NTD cases, 75 carried more than 1 novel/rare damaging variants involving 98 candidate genes (Table S2 in Appendix S3).
As can be seen from the diversity of the genes affected, there was

| DISCUSSION
In this study we have performed a comprehensive targeted exome sequencing analysis of NTD candidate genes in a cohort of unrelated fetuses with anencephaly and craniorachischisis, to identify genetic variants associated with these disorders. Few genetic studies have comprehensively investigated anencephaly, which may reflect the extra difficulty of collecting samples following termination rather than live born NTD cases, which are usually either spina bifida aperta, or closed spinal dysraphism. We have examined 191 genes previously implicated in either mouse or human NTDs, the majority of which have not been previously analyzed in a large cohort of anencephaly patients or studied for polygenic or burden effects.

| PDGFRA
A novel PDGFRA frameshift variant c.3029_3030delAG leading to premature stop codon (p.Arg1011ThrfsTer4) was identified in an anencephalic patient. PDGFRA is a cell-surface tyrosine kinase receptor, which has an essential role in embryonic development and cell proliferation. 41 Homozygous Pdgfra mutant mice are embryonic lethal presenting with a wavy neural tube and cranial region closure failure. 34 In humans, the PDGFRA-promoter haplotype H1 was found to confer low transcriptional activity and has been associated with increased NTD risk. 35 Therefore, it is possible that the reduced abun-

| Folate one-carbon metabolism
Despite the well-documented benefits of folic acid in NTD preven-  Abbreviations: CI, confidence interval; OR, Odds ratio; P-val, nominal P-values; P-adj, Bonferroni corrected P-values. Mutation burden analysis was performed for the 191 genes in the panel. Genes that showed significant differences are shown. Because sample-level information is not available at the ExAC database, the burden analysis was performed assuming that each variant serves as an independent sample. To allow unbiased selection, the genelevel coverage between the NTD cases and ExAC control was compared using Student's t test (2-tailed). The enrichment of the novel/rare variants predicted to be damaging were assessed using Fisher's exact test (2-tailed).
phenotype, Gldc deficient mice variably display NTDs and/or features of NKH. 11 We did not identify variants in other GCS genes, AMT or GCSH, after filtering for frequency (<1% MAF) and mutation prediction in our cohort.

| Burden analysis
To maximize the number of the cases to be analyzed, only a small number of controls were included in the sequencing platform. Therefore, to perform burden analysis, the public database (ExAC) was used. However, because of the lack of similarity in the sequence coverage in many of the genes between the NTD and control cohorts, we were only able to assess 13/191 genes. COBL encodes an actin regulator protein, and the Vangl2 Lp /Cobl C101/C101 mutant mice show exencephaly, indicating an interaction between these genes. 45 FAT4 encodes a protocadherin family protein that is associated with the PCP pathway. 46 A study of Fat4-null mice described impaired convergent extension with faulty ureteric tubule elongation. 46

| Study limitations
No parental samples were available for this study, excluding the possibility of identifying de novo mutations or evaluating penetrance or co-inheritance of risk alleles from each parent. Due to the lethality of this phenotype, it is also difficult to ascertain familial anencephalic cases.
Only DNA was available which excluded the possibility of testing gene/protein expression associations with the variants. As a candidate gene approach, we were unable to comprehensively include all genes that could be implicated in NTD causation. This can more easily be achieved using whole exome/genome sequencing (WES/WGS) in a hypothesis-free manner. However, recent WES studies 32,47 have still focused on analysis of known candidate genes. Our data and findings of these other studies concur that no major recurring genetic defect explains any substantial group of NTDs, consistent with the notion that all NTD types are most likely complex traits involving interaction of multiple loci and non-genetic factors. It remains possible that substantial genetic contributions come from outside the coding regions, which would also be missed in WES. Future investigations will need to include non-coding regulatory regions (eg, enhancers), using WES for more comprehensive analysis. However, identification of causal variants can be even more challenging at this level. For the moment the authenticity of reported human NTD mutations still needs to be unequivocally verified. Although functional analysis was beyond the scope of this study, it would be of great benefit to test candidate variants in mouse models to provide confirmation of causal effect.

| CONCLUSION AND FUTURE STUDIES
We have identified novel and rare variants within genes with known biological association with NTDs, specifically focusing on folate metabolism and PCP pathways. This study highlights the potential involvement of PCP genes including CELSR1 in association with anencephaly phenotypes, and also PDGFRA as strong NTD candidates in humans. Further identification and functional testing of genetic factors will lead to improved understanding of molecular mechanisms of NTD, and ultimately may help to create a targeted and cost-effective