Analysis of ANK3 and CACNA1C variants identified in bipolar disorder whole genome sequence data

Objectives Genetic markers in the genes encoding ankyrin 3 (ANK3) and the α-calcium channel subunit (CACNA1C) are associated with bipolar disorder (BP). The associated variants in the CACNA1C gene are mainly within intron 3 of the gene. ANK3 BP-associated variants are in two distinct clusters at the ends of the gene, indicating disease allele heterogeneity. Methods In order to screen both coding and non-coding regions to identify potential aetiological variants, we used whole-genome sequencing in 99 BP cases. Variants with markedly different allele frequencies in the BP samples and the 1,000 genomes project European data were genotyped in 1,510 BP cases and 1,095 controls. Results We found that the CACNA1C intron 3 variant, rs79398153, potentially affecting an ENCyclopedia of DNA Elements (ENCODE)-defined region, showed an association with BP (p = 0.015). We also found the ANK3 BP-associated variant rs139972937, responsible for an asparagine to serine change (p = 0.042). However, a previous study had not found support for an association between rs139972937 and BP. The variants at ANK3 and CACNA1C previously known to be associated with BP were not in linkage disequilibrium with either of the two variants that we identified and these are therefore independent of the previous haplotypes implicated by genome-wide association. Conclusions Sequencing in additional BP samples is needed to find the molecular pathology that explains the previous association findings. If changes similar to those we have found can be shown to have an effect on the expression and function of ANK3 and CACNA1C, they might help to explain the so-called ‘missing heritability’ of BP.

Bipolar disorder (BP) is a common disease with a worldwide average population prevalence of 1.4%, which rises to 2.4% if bipolar spectrum disorders are included (1). Twin and family studies indicate that BP is genetically related to some types of unipolar affective disorder (2). The genetic heritability of BP is thought to be between 79% and 93% (3)(4)(5)(6), with a ten-fold increase in risk to the relatives of probands with BP (7). Many linkage studies of specific chromosomal regions and whole genomes in multiply affected families support the presence of locus heterogeneity, with multiple susceptibility loci (8)(9)(10). About half of the segregation analyses of systematically ascertained families imply that BP has an autosomal dominant mode of inheritance (11). Other models have favoured a single major locus with a polygenic multifactorial background and pure polygenic transmission. Linkage and linkage disequilibrium (LD) analyses demonstrate locus heterogeneity (12,13). Genomewide association studies (GWAS), meta-analyses, and replication studies focusing on BP have been carried out on combined cohort sizes of up to 7,481 cases and 9,250 controls (14)(15)(16)(17). These and other single-locus case-control association studies have repeatedly implicated the L-type calcium channel a1C subunit (CACNA1C) and ankyrin 3 (ANK3) genes in BP. The strongest allelic association signal in CACNA1C is localized entirely within intron 3 of the gene with the single nucleotide polymorphisms (SNPs) rs1006737 (p = 7.0 9 10 À8 ) (14,18,19), rs4765913 (p = 1.52 9 10 À8 ) (14), rs4765914 (p = 1.52 9 10 À8 ) (20), and rs1024 582 (p = 1.7 9 10 À7 ) (17,21). GWAS results across five different psychiatric illnesses further implicate rs1024582 in susceptibility to both BP and schizophrenia, assuming that there has not been substantial misdiagnosis, especially where schizoaffective BP cases are included in the schizophrenia group (20). Intron 3 of CACNA1C contains a chromosomal region with high levels of LD, strong mammalian conservation, and multiple sites designated by the ENCyclopedia of DNA Elements (ENCODE) project as being able to affect gene expression. Studies show that that the presence of the rs1006737 CACNA1C BP risk variant may have an impact on certain brain activities. One study showed that the rs1006737 risk variant in healthy males is associated with lower extraversion, trait anxiety, paranoid ideation, and higher harm avoidance (22). The rs1006737 risk variant has been associated with increased amygdala functioning observed by magnetic resonance imaging during emotional processing; the enhancement of activation leads to impaired facial emotion recognition in BP patients (20,(23)(24)(25)(26). There has been conflicting evidence as to whether the presence of the CACNA1C variant results in brain volumetric alteration. Some reports state that this SNP has been associated with brainstem alterations, increased grey matter density, as well as a cortical volume increase (27)(28)(29). A conflicting study did not report any association between this SNP and brain volumetric alterations (30). Mutations/variants located in intronic regions can also affect the stability of RNA and protein expression, and can have a strong effect on the transcriptional regulation of the gene.
In ANK3, the strongest evidence for allelic association comes from SNPs rs10994338 (p = 1.20 9 10 À7 ) (31), rs4948418 (p = 8.93 9 10 À9 ) (15), rs 10994336 (p = 9.1 9 10 À9 ) (14,26,(32)(33)(34), rs1099 4397 (p = 7.1 9 10 À9 ) (17) at the 5 0 end, and the SNP rs9804190 (p = 1.20 9 10 À4 ) (17,26,35) at the 3 0 end of the longest isoform (NM_001204403) of the gene. These regions are over 340 kb apart and appear to be independently associated with BP, with no significant interactions between SNPs from the two regions (32). However, the existing data on ANK3 show that only low-frequency aetiological base-pair (bp) changes are present with an odds ratio less than 1.35 for BP (17). The SNP associations are not replicated in every study (36)(37)(38)(39); however, the ANK3 association has been reported in several different ancestral populations (36,(40)(41)(42). Several novel, rare potential aetiological bp changes have been identified by us through sequencing the gene in our samples. These were selected for having haplotypes associated with BP (43). Doyle et al. (44) sequenced the 8 kb brainexpressed exon 48 of ANK3 but could not find potential aetiological bp changes that were associated with BP. This exon is of recent evolutionary history, and variation in the exon appears to be tolerated. Sequencing analysis of ANK3 demonstrated the impact of heterogeneity on replication of allelic associations, even within well-defined ancestral populations (43). mRNA analysis has detected differential regulation of distinct ANK3 transcription start sites and coupling of specific 5 0 ends with 3 0 mRNA splicing events, suggesting that brain-specific cis-regulatory transcriptional changes might be relevant to BP molecular pathology (45). Gene network analysis and test of epistasis have found further support for an association of ANK3 with BP (46,47). The genetic variants associated with disease have no known biological function. However, one study showed that the presence of the rs10994336 BP risk variant in healthy males might predict lower novelty seeking, lower behavioural activation scores, and high startle reactivity (22). In healthy volunteers, rs10994 336 may be associated with reduced white matter integrity in the anterior limb of the internal capsule, as well as with altered set-shifting and decision-making (48). These findings may be consistent with previous diffusion tensor imaging studies in patients with BP (49)(50)(51)(52)(53)(54) and core phenotypes of BP (55)(56)(57)(58). Lithium has been shown to alter Ank3 mRNA levels in the mouse brain (59), and lithium and sodium valproate have been shown to change Ank3 protein amounts in rat neuronal dendritic spines (60). In another animal model, RNA interference of Ank3 in the hippocampus dentate gyrus induced a reduction of anxiety-related behaviours and increased activity during the light phase, which were attenuated by chronic treatment with the mood stabilizer, lithium. Similar behavioural alterations of reduced anxiety and increased motivation for reward were also exhibited by Ank3+/À heterozygous mice compared with wild-type Ank3+/+ mice (61).
Given the typical natural history of BP, which consists of episodes of both mania and depression with complete recovery very often between episodes, it can be argued that genetic susceptibility will involve aetiological bp changes influencing the control of gene expression and mRNA translation rather than mutations creating structural protein abnormalities. Therefore, we chose whole-genome sequencing (WGS) rather than exome sequencing in order to be able to investigate intronic and noncoding control regions of susceptibility genes along with the exonic coding regions.

Subjects
This study included 1,510 affected research subjects with BP. These were sampled in three cohorts. The first cohort, UCL1, included 506 research subjects with bipolar I disorder (BP-I), defined by the presence of mania and hospitalization according to Research Diagnostic Criteria (RDC) (62). UCL1 was included in the previously reported megaanalysis by the Psychiatric Genetic Consortium (PGC) BP GWAS (17). The second and third cohorts, UCL2 and UCL3, consisted, respectively, of 593 and 411 subjects with BP-I or bipolar II disorder (BP-II). Ancestry screening was used as a selection criterion for the inclusion of cases. Samples were included if at least three out of four grandparents were English, Irish, Scottish, or Welsh and if the fourth grandparent was non-Jewish European, before the European Union enlargement in 2004. The sample of 1,095 controls comprised 614 screened subjects who had no firstdegree family or personal history of psychiatric illness and an additional 481 unscreened normal British subjects, obtained from the European Collection of Animal Cell Cultures (ECACC). National Health Service (NHS) multicentre research ethics approval was obtained. All participants provided signed consent.
Research subjects with BP had been given an NHS clinical diagnosis of ICD-10 BP and then needed to fulfil RDC (62) for BP with clinical data collected by the lifetime version of the Schizophrenia and Affective Disorder Schedule (SADS-L) (63). DNA samples were collected from blood samples from the UCL1 cohort, saliva samples for the UCL2 cohort, and a mixture of both blood and saliva for the UCL3 samples. DNA from blood samples was extracted using a standard phenol-chloroform method and from saliva samples using the Oragene protocol for DNA extraction (DNA Genotek, Ottawa, ON, Canada).

WGS
WGS was performed on 99 of the subjects with BP-I selected from all our cohorts who had a positive family history of BP or bipolar spectrum disorder and an early age at onset. The genomic DNA was sequenced using 100 bp paired-end reads on a Hi-Seq 1000 (Illumina Inc., San Diego, CA, USA). Sequence data alignment to the National Center for Biotechnology Information human reference genome 37.1 (hg19) and variant calling was performed using the CASAVA 1.8.2 pipeline at Illumina (http://res.illumina.com/documents/products/ technotes/technote_snp_caller_sequencing.pdf). The sequence data from these individuals were further analysed and annotated using kGAP (Knome Inc., Boston, MA, USA).

Variant selection
ANK3 and CACNA1C non-synonymous variants present in the coding exons were identified using the Knome VARIANTS software (Knome Inc.) (Supplementary Table 1). The same software was used to identify variants in the 5 0 untranslated region (UTR), 3 0 UTR, splicing sites [donor site consists of 5 bp in the exon and 6 bp in the intron, acceptor site consist of 3 bp in the exon and 20 bp in the intron (64)], promoter region (1,000 bp from the first exon of every coding isoform), and the third intron of CACNA1C (Supplementary Table 1, Supplementary Fig. 1, Supplementary Fig. 2). Allele counts for each SNP in the BP samples were compared to those from the 372 European samples in the 1,000 Genomes (1,000G) Project (phase 1, version 3; ftp:// ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ ALL.wgs.phase1_release_v3.20101123.snps_indels_ sv.sites.vcf.gz). SNPs for which the variant allele was more common in subjects with BP than in the 1,000G Project data, significant at p < 0.05 using Fisher's exact test, were chosen for genotyping in the complete UCL case-control sample. Variants which were present in poly-base regions and insertions in repeat regions were excluded from genotyping.
The variants in the third intron of CACNA1C were selected if they met four criteria:

Genotyping
Genotyping for the selected SNPs in 1,510 BP cases (UCL1, UCL2, and UCL3 samples) and 1,095 ancestrally matched controls was performed in-house with allele-specific polymerase chain reaction (PCR) using KASPar reagents (KBiosciences, Hoddesdon, UK) on a LightCycler 480 (Roche, Burgess Hill, UK) real-time PCR machine. For all SNPs genotyped, 17% of samples were duplicated to detect error and confirm the reproducibility of genotypes. Allele-specific primers were designed for each of the SNPs using Primer Picker (KBiosciences), as shown in Supplementary Table 2. All these data were analysed to confirm Hardy-Weinberg equilibrium (HWE). Allelic associations for SNPs were performed using Fisher's Exact test. Significance values shown for all analyses are uncorrected for multiple testing, and a cutoff significance value of p < 0.05 was used.

Burden analysis
A burden analysis was performed on the data separately for ANK3 and CACNA1C. A chi-square test was used to compare the numbers of case and control individuals carrying one or more of the variant alleles against the numbers of case and control individuals who were found to be homozygous for the reference alleles at all of the loci tested.

Haplotype analysis
Haplotype analysis was performed using Haploview (66) to determine the LD between GWASassociated SNPs and the rare variants reported here. Haplotype blocks were determined using a solid spine of LD (D 0 = 1).

Variant calling and selection
WGS in 99 samples with BP produced a mean depth coverage of 37.0 with 90% of the genome sequenced. A total of 0.12% of bases were heterozygous and the transition/transversion (Ti/Tv) ratio was 2.0 (see Supplementary Table 3).
The ANK3 variant rs184389434 is located in the promoter region of the gene and was predicted to create a binding site for three new transcription factors [ETS-related gene (Erg-1), Ultrabithorax (Ubx) and Octamen-1 (Oct-1)]. rs139972937 causes a non-conservative amino acid change from asparagine to serine at position 2,643 (N2643S) in exon 34 of the alternative isoforms NM_020987.3 and CCDS7258.1. The N2643S amino acid substitution was predicted to be benign with a score of 0 (sensitivity 1, specificity 0) by PolyPhen-2 and tolerated with a score of 0.71 by SIFT. The previously unreported ANK3 variant, ss825679002, was located in the 3 0 UTR of the gene and was found to be in a microRNA binding site. Bioinformatic analysis using targetscan and miRanda predicted no effect on microR-NA binding.
One of the novel variants in CACNA1C, ss825679004, was in the promoter region of the gene. Alibaba 2.1 analysis of ss825679004 predicted it to disrupt the binding sites for three transcription factors [Activating Enhancer Binding Protein-2alpha (AP-2alph), NF-muE1, Specificity Protein-1 (Sp1)] and to create a new one for a different transcription factor [GC Factor (GCF)]. CACNA1C variant ss825679005 is the sixth base in the intron of the splice donor site for exon 17. This exon is present in all known isoforms of the gene and this variant might alter splicing efficiency.
The CACNA1C intron 3 variants rs146482058, rs79398153, rs191953785, rs112312080, rs113414 207, ss825679006, and ss825679007 were present in the region of high LD between chr12:2,230,353 and chr12:2,559,413. Each variant was also located in an ENCODE marked region (65). All seven SNP regions are marked by H3 mono-methylation of lysine 4 (H3K4me1), H3 tri-methylation of lysine 4 (H3K4me3), and H3 acetylation of lysine 27 (H3K27ac), and active transcriptional enhancers with distinct chromatin signatures (68). Enrichment for H3K4me1 and H3K27ac at a genetic level distinguishes active enhancers from inactive or poised enhancers (69,70). The presence of H3K4me1-and H3K27ac-marked chromatin, with low levels of H3K4me3 and an absence of another histone marker, H3K27me3, represent putative human embryonic stem cell (hESC) enhancers and have been shown to localize proximally to genes that are expressed during development in hESCs and in epiblast cells (70). Additionally, rs112312080 and ss825679007 were found to be present on DNAse I hypersensitivity sites, as listed by the ENCODE project.

Genotyping
Assays were designed for 13 SNPs which passed filtering tests for genotyping in the complete UCL BP case-control sample. Genotype data were generated for 12 of these variants and the genotype distributions for each SNP followed HWE in the case and control cohorts. The non-synonymous ANK3 variant rs139972937 was found to be associated with BP (Fisher's exact test p = 0.042) ( Table 1). Nine cases with BP and only one control were found to be heterozygous for this variant. None of the other ANK3 variants were found to be associated with BP, as shown in Table 1.
Of the SNPs found in the CACNA1C intronic region, rs79398153 was found to be associated with BP (Fisher's exact test p = 0.015). We detected 88 heterozygote and three homozygote cases for this variant and 44 heterozygote controls. The excess of homozygotes may possibly represent a recessive effect, as only one homozygote would be expected under HWE, but this excess is not statistically significant (71). rs79398153 is located in an ENCODE-marked region for H3K4me1, H3K27ac, and H3Kme3. None of the other intronic CACNA1C SNPs were associated with BP. Imputation for rs79398153 in UCL1 using GWAS data showed that it was still significantly associated with BP (p = 0.022) (17). Burden analysis showed that, overall, there was no excess of the variants genotyped in cases versus controls for either CAC-NA1C or ANK3.

Discussion
We have analysed genetic variation by WGS in two of the best-replicated bipolar susceptibility genes, CACNA1C and ANK3. This analysis has identified novel possible BP susceptibility variants in both of these genes. The CACNA1C intron 3 variant rs79398153 may impact CACNA1C gene expression by virtue of its presence in an ENCODE-marked transcriptional enhancer region.
The ANK3 amino acid changing variant, rs139972937 (N2643S), was associated with BP in our sample (p = 0.042). This rare variant was also found in two of 1,119 cases and one of 1,078 controls and in a family containing seven subjects with BP, a father and six offspring, where only the father and two of the offspring possessed the variant (44). These conflicting findings are not uncommon in the genetics of complex diseases but this lack of support casts doubt on the true aetiological importance of this variant in BP.
It is of note that the allele frequencies of some of the variants selected for genotyping were found to be markedly different in our control samples compared to the frequencies in the European 1,000G Project. This underlies the importance of typing variants in matched control samples on the same platform rather than relying on publically available data such as the 1,000G Project in order to mitigate possible spurious association findings.
The variants we have found appear to be acting independently of the allelic and haplotypic associations found in the previous BP allelic association studies. Independent genetic replication and biological validation of intronic potential aetiological bp changes would support the argument for carrying out WGS as well as exome sequencing in BP. The biphasic nature of BP makes a compelling argument for the existence of genetically determined pathological switch mechanisms that may manifest themselves in the loss of control of gene expression. Findings such as ours may help to explain the 'missing heritability' in this common complex disorder. Further analyses in much larger samples are needed to find aetiological bp changes in the ANK3 and CACNA1C genes that are carried by the main haplotypes showing strong association with BP. The outcome could be personalized treatment for BP, based on susceptibility genotypes. 588 Fiorentino et al.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. The introns and exons of the different splice variants of the ANK3 gene are shown, along with the genomic regions of the gene that were analysed for variant selection. The locations of the variants detected by sequencing are shown, as are the variants that were selected for genotyping in the full case control sample. Figure S2. The introns and exons of the different splice variants of the CACNA1C gene are shown, along with the genomic regions of the gene that were analysed for variant selection. The locations of the variants detected by sequencing are shown, as are the variants that were selected for genotyping in the full case-control sample. Table S1. Variants identified using the Knome VARIANTS software (Knome). VAF = variant allele(s) frequency. Table S2. Allele-specific primers designed using Primer Picker (KBiosciences) for genotyping. Table S3. Next Generation Sequencing Control information.