Coming to grips with complex disorders: Genetic risk prediction in bipolar disorder using panels of genes identified through convergent functional genomics

Authors


  • ABN is a scientific co-founder of Mindscape Diagnostics.

  • How to Cite this Article: Patel SD, Le-Niculescu H, Koller DL, Green SD, Lahiri DK, McMahon F, Nurnberger JI, Niculescu AB. 2010. Coming to Grips With Complex Disorders: Genetic Risk Prediction in Bipolar Disorder Using Panels of Genes Identified Through Convergent Functional Genomics. Am J Med Genet Part B 153B: 850–877.

Abstract

We previously proposed and provided proof of principle for the use of a complementary approach, convergent functional genomics (CFG), combining gene expression and genetic data, from human and animal model studies, as a way of mining the existing GWAS datasets for signals that are there already, but did not reach significance using a genetics-only approach [Le-Niculescu et al., 2009b]. CFG provides a fit-to-disease prioritization of genes that leads to generalizability in independent cohorts, and counterbalances the fit-to-cohort prioritization inherent in classic genetic-only approaches, which have been plagued by poor reproducibility across cohorts. We have now extended our previous work to include more datasets of GWAS, and more recent evidence from other lines of work. In essence our analysis is the most comprehensive integration of genetics and functional genomics to date in the field of bipolar disorder. Biological pathway analyses identified top canonical pathways, and epistatic interaction testing inside these pathways has identified genes that merit future follow-up as direct interactors (intra-pathway epistasis, INPEP). Moreover, we have put together a panel of best P-value single nucleotide polymorphisms (SNPs), based on the top candidate genes we identified. We have developed a genetic risk prediction score (GRPS) based on our panel, and demonstrate how in two independent test cohorts the GRPS differentiates between subjects with bipolar disorder and normal controls, in both European-American and African-American populations. Lastly, we describe a prototype of how such testing could be used to categorize disease risk in individuals and aid personalized medicine approaches, in psychiatry and beyond. © 2010 Wiley-Liss, Inc.

INTRODUCTION

As part of a convergent functional genomics (CFG) strategy, expanding upon our earlier work [Le-Niculescu et al., 2009b], we set out to comprehensively identify candidate genes for bipolar disorder, integrating available evidence in the field to date. We have used data from four publicly available genome-wide association studies (GWAS) datasets for bipolar disorder [WTCC, 2007; Baum et al., 2008; Sklar et al., 2008]. We integrated those data with gene expression data—human postmortem brain gene expression data and human blood gene expression data published by others or us, as well as with relevant animal model brain and blood gene expression data generated by our group [Niculescu et al., 2000a; Ogden et al., 2004; Le-Niculescu et al., 2007a, 2007b]. In addition, we have integrated as part of this comprehensive approach other genetic data-published human genetic (linkage or association) data for bipolar and related disorders to date, and relevant mouse genetic (QTL or transgenic) data (Fig. 1).

Figure 1.

Convergent functional genomics. Integration of multiple independent lines of evidence. The maximal possible score from GWAS data (6 pt) is equally weighed with the maximal possible score from other lines of evidence (other human and animal model gene expression and genetic data) (6 pt).

Once the genes involved in a disorder are identified, and prioritized for likelihood of involvement, then an obvious next step is developing a way of applying that knowledge to genetic testing of individuals to determine risk for the disorder. Based on our comprehensive identification of top candidate genes described in this paper, we have chosen the best SNPs in those genes by their P values in the GWAS datasets used, and assembled a genetic risk prediction (GRP) panel out of those SNPs. We then developed a genetic risk prediction score (GRPS) for bipolar disorder based on the presence or absence of the alleles of the SNPs associated with the illness, and tested the GRPS in an independent study (GAIN-BP) [Smith et al., 2009] for which we had both genotypic and clinical data available, comparing the bipolar subjects to demographically matched normal controls. Our results show that a relatively small size panel of genes identified by CFG analysis can differentiate very well between bipolar disorder subjects and controls at a population level, although at an individual level the margin is razor thin. The latter point suggests that the cumulative combinatorics of common variants plays a major role in risk for illness. Overall, our work sheds light on the genetic architecture and pathophysiology of bipolar disorder. In particular, it has implications for genetic testing to assess risk for illness before the illness manifests itself clinically.

METHODS

Genome-Wide Association Studies (GWAS) Data for Bipolar Disorder

Four bipolar GWAS were used for the expanded CFG discovery analysis. The GWAS data for the bipolar study from the Wellcome Trust Consortium (WTCC) 2007 is available at http://www.wtccc.org.uk/info/access_to_data_samples.shtml. The GWAS data from NIMH and German studies [Baum et al., 2008] are available at http://mapgenetics.nimh.nih.gov/bp_pooling. The GWAS data from the STEP-BD study is available at http://pngu.mgh.harvard.edu/∼purcell/bpwgas [Sklar et al., 2008].

One independent study, GAIN-BP [Smith et al., 2009], was used for testing the results of the discovery analyses. The GWAS data for GAIN-BP used for analyses described in this manuscript was obtained from the database of Genotype and Phenotype (dbGaP) found at www.ncbi.nlm.nih.gov through PHS project number 000017, data request numbers 2575-2, 2574-2, and 2573-2 provided to John I. Nurnberger, Jr.

The software package PLINK (http://pngu.mgh.harvard.edu/∼purcell) was used to extract individual genotype information for each subject from the GAIN-BP GWAS data files. We used European Americans (EA), and separately, African American (AA), bipolar subjects and controls. Out of 1001 EA bipolar subjects in GAIN-BP, we used for our GRPS testing analyses only 407, from wave 5 of the NIMH Bipolar Genetics Consortium collection, to avoid any individual overlap (16%) or even pedigree overlap (57%) with probands from waves 1 to 4 that were also used in the NIMH [Baum et al., 2008] study mentioned above. We also used 317 AA bipolar subjects from wave 5 of the NIMH Bipolar Genetics Consortium collection. Controls numbered 1034 for EA, and 671 for AA.

As a caveat, there was overlap in the control subjects within two of four discovery datasets (NIMH and STEP-BD), and between these two datasets and one of the datasets (GAIN-BP EA) used to test our results. However, as described below, multiple other studies and lines of evidence, human and animal model, are integrated in the CFG prioritization approach (Fig. 1), which minimizes the relative contribution and impact of individual studies and the controls overlap in the discovery dataset. More importantly, there is no overlap at the bipolar subject level within discovery datasets, and between discovery datasets and the test dataset. This ensures that there is at least a degree of independence within discovery cohorts, and between discovery and test cohorts. Finally, the fact that the GRPS differentiates as well or better in the completely independent GAIN-BP AA cohort provides strong reassurance and confirmatory evidence for the method.

SNPs with a nominal genotypic P-value <0.05 were selected for our analysis. No Bonferroni correction was performed.

Gene Identification

To identify the genes that correspond to the selected SNPs, the lists of SNPs from the GWAS was uploaded to the CHIP Bioinformatics Tools website (http://snpper.chip.org). In the cases where a SNP mapped to a region close to multiple genes, we selected all the genes that were provided by SNPper. SNPs for which no gene was identified were not included in our subsequent analysis.

Human Postmortem Brain Gene Expression

Information about our candidate genes was obtained using GeneCards (http://www.genecards.org), the Online Mendelian Inheritance of Man database (http://ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM), as well as database searches using PubMed (http://ncbi.nlm.nih.gov/PubMed) and various combinations of keywords (gene name, bipolar, depression, human, postmortem, brain).

Human Genetic (Linkage, Association) Convergence

To designate convergence for a particular gene, the gene had to map within 10 cM [see Niculescu et al., 2000b for detailed discussion] of a microsatellite marker for which at least one published study showed evidence for linkage for bipolar disorder or depression, or a positive association study for the gene itself was reported in the literature. The University of Southampton's sequence-based integrated map of the human genome (The Genetic Epidemiological Group, Human Genetics Division, University of Southampton: http://cedar.genetics.soton.ac.uk/public_html/) was used to obtain cM locations for both genes and markers. The sex-averaged cM value was calculated and used to determine convergence to a particular marker. For markers that were not present in the Southampton database, the Marshfield database (Center for Medical Genetics, Marshfield, WI, USA: http://research.marshfieldclinic.org/genetics) was used with the NCBI Map Viewer web-site to evaluate linkage convergence.

We have established in the lab manually curated databases of all the published human postmortem brain and human genetic literature to date on bipolar and related disorders [Niculescu and Le-Niculescu, 2010]. These large databases have been used in our CFG cross-validation analyses.

Human Blood Gene Expression Data

For human blood gene expression evidence, we have used previously generated data from our group [Le-Niculescu et al., 2009a], as well as published data from the literature.

Animal Model Brain and Blood Gene Expression Data

For animal model brain and blood gene expression evidence, we have used previously generated data from two different animal models for bipolar disorder developed by our group, one pharmacogenomic and one transgenic [Niculescu et al., 2000a; Le-Niculescu et al., 2007b].

Mouse Genetic (QTL, Transgenic) Convergence

To search for mouse genetic evidence—quantitative trait loci (QTL) or transgenic—for our candidate genes, we utilized the MGI_3.54—Mouse Genome Informatics (Jackson Laboratory, Bar Harbor, Maine) and used the search menu for mouse phenotypes and mouse models of human disease/abnormal behaviors, using the following sub-categories: abnormal emotion/affect behavior and abnormal sleep pattern/circadian rhythm. To designate convergence for a particular gene, the gene had to map within 10 cM of a QTL marker for the abnormal behavior, or a transgenic mouse of the gene itself displayed that behavior.

Convergent Functional Genomics (CFG) Analysis Scoring

We used two nominal P-value thresholds for scoring genes in the CFG analysis (see below) a lower stringency threshold (P < 0.05), and a higher stringency threshold (P < 0.001). Genes from each GWAS data that had at least one SNP with P-value of <0.05 received 1 point; those that had at least one SNP with P-value of <0.001 received 1.5 points. All other cross-validating lines of evidence (other human data, animal model data) received a maximum of 1 point each (for human genetic data: 0.5 points if it is linkage, 1 point if it is association; for mouse genetic data, 0.5 points if it is QTL, 1 point if it is transgenic; for human and mouse gene expression data, 1 point each for brain or blood data, 0.5 points if it is from cells in culture/cell lines). Thus the maximum possible CFG score for each gene is 12 (6 = 4 × 1.5 points from the four GWAS, and 6 points from the other lines of evidence). As we are interested in discovering signal in GWAS, we weighted data from GWAS more heavily, bringing the data from this one methodological approach on par with the data from all the other methodological approaches combined. It has not escaped our attention that other ways of weighing the scores of line of evidence may give slightly different results in terms of prioritization, if not in terms of the list of genes per se. Nevertheless, we feel this simple scoring system provides a good separation of genes based on our focus on identifying signal in the GWAS.

Pathway Analysis

Ingenuity 8.0 (Ingenuity Systems, Redwood City, CA) was employed to analyze the molecular networks, biological functions and canonical pathways of the top candidate genes resulting from our CFG analysis. The Ingenuity program generated the P values assigned to the different pathways (Table I).

Table I. Top Candidate Genes for Bipolar Disorder Identified by Convergent Functional Genomics (CFG) of Genome-Wide Association Studies (GWAS) Data and Replication of Findings in an Independent Study (GAIN-BP)
Entrez IDGene symbol/nameGWAS WTCC [2007] P-valueGWAS NIMH [Baum et al., 2008], P-valueGWAS German [Baum et al., 2008], P-valueGWAS step-BD [Sklar et al., 2008], P-valueMouse genetic evidence (QTL, TG)Mouse models brain evidence [Ogden et al., 2004; Le-Niculescu et al., 2008]Mouse models blood evidence [Le-Niculescu et al., 2008]Human genetic evidence (linkage or association)Human postmortem brain evidenceHuman blood/other peripheral tissue evidenceCFG scoreGAIN-BP P-value
  1. Top genes with a CFG score of 6.5 and above (n = 56) are shown. The complete list of genes (n = 1, 529) is available as Supplementary Information online. I, increased; D, decreased in expression. For human blood data: I, increased in high mood (mania); D, decreased in high mood (mania)/increased in low mood (depression). [For human blood data, where references other than Le-Niculescu et al. [2009a] are cited, the studies are in lymphoblastoid cell lines without correlation with mood state, I, increased; D, decreased]. METH, methamphetamine; VPA, valproate; PFC, prefrontal cortex; AMY, amygdala; CP, caudate putamen; NAC, nucleus accumbens; VT, ventral tegmentum; DBP, DBP knock-out mice; NST, nonstressed; ST, stressed; BP, bipolar disorder; BAD, bipolar affective disorder; MDD, major depressive disorder. TG, transgenic; QTL, quantitative trait locus. For additional human genetic evidence, assoc., association evidence; where that is not mentioned, the evidence is linkage only. Gene symbols underlined are blood biomarker candidate genes. P values in bold are <0.001. The column on the right of the bolded line depicts replication of findings in an independent bipolar GWAS (GAIN-BP). Forty-six of our top 56 genes had a P < 0.05 in the GAIN-BP study. The P-value cited is the lowest for any SNP designated by SNPPER to be within the gene or flanking regions. As there were 6,041 genes at P < 0.05 in that study, and the number of genes in the human genome is estimated at 20,500 [Clamp et al., 2007], the enrichment factor provided by our approach is (46/56)/(6,041/20, 500)  = 2.8-fold. (As a caveat, the GAIN-BP P values are calculated for the whole cohort in the study, which contains some overlap with the cohort in the NIMH study (see Materials and Methods Section)). In other words, the positive predictive value (PPV) of a CF6FG score greater than 6 (i.e., >50% maximum CFG score) for predicting possible involvement in bipolar disorder for a gene (P < 0.05) is 46/56 = 82.1%. The negative predictive value (NPV) is (20500-6041-10)/(20500-56) = 70.7%. Such numbers compare favorably with detection modalities used in science in general, and medicine in particular.

406ARNTL (Aryl hydrocarbon receptor nuclear translocator-like)7.71E−04, rs47571413.84E−02, rs47571383.72E−02, rs38163602.56E−02, rs11022781Abnormal sleep pattern/circadian rhythm (TG)PFC (D) Cat IV-meth 11p15.2I (BP) [Nakatani et al., 2006] 8.54.87E−02
         BP [Mansour et al., 2006; Nievergelt et al., 2006; Partonen et al., 2007] (assoc)    
4155MBP (Myelin basic protein) 8.30E−03, rs4705498.19E−04, rs4708211.17E−03, rs12967023 NST PFC (D) ST PFC (D) ST AMY (I)BP Blood (I) Cat IV-meth18q23D(BP) [Tkachev et al., 2003] 8.53.14E−03
         BP [Potash et al., 2008] (assoc)I (male BP)Mood (I) [Le-Niculescu et al., 2009a]  
         BP [Freimer et al., 1996; Schulze et al., 2003; Maziade et al., 2005]D (female BP) [Chambers and Perrone-Bizzozero, 2004]   
627BDNF (Brain derived neurotrophic factor)1.05E−02, rs169172343.76E−02, rs9259461.91E−03, rs12291063 Abnormal emotion/affect behavior (TG)PFC (D) Cat IV-METH 11p14.1D(BP) [Knable et al., 2004; Torrey et al., 2005; Pillai, 2008]BP (D) [Karege et al., 2004]8.04.19E−02
         BP [Neves-Pereira et al., 2002; Sklar et al., 2002; Liu et al., 2008] (assoc)    
         BP [McInnes et al., 1996; Detera-Wadleigh et al., 1999]    
         Depression [Aguilera et al., 2009] (assoc)    
         MDD [Licinio et al., 2009] (assoc)    
3084NRG1 (Neuregulin 1)1.07E−05, rs78211902.19E−03, rs3273804.51E−03, rs64680958.81E−04, rs2466085   8p12I (BP) [Tkachev et al., 2003]BP (I) [Begemann et al., 2008]8.01.16E−02
         BP [Green et al., 2005; Walss-Bass et al., 2006; Thomson et al., 2007; Georgieva et al., 2008] (assoc)D (unipolar depression) [Bertram et al., 2007]   
         BP [Cichon et al., 2001; Park et al., 2004]    
6096RORB (RAR-related orphan receptor beta)1.29E−02, rs108694355.88E−04, rs13278371.95E−02, rs13590738.99E−04, rs10869435Abnormal emotion/affect behavior (TG)ST AMY (I) ST PFC (D) 9q21.13  8.09.38E−03
         BP [Macgregor et al., 2004]    
         BP [McGrath et al., 2009] (assoc)    
27185DISC1 (Disrupted in schizophrenia 1)1.31E−02, rs66714232.99E−03, rs94317146.08E−03, rs75346818.61E−03, rs821577Abnormal emotion/affect behavior (TG)  1q42.2MDD [Sawamura et al., 2005]BP(D) [Maeda et al., 2006]8.02.73E−02
         BP [Hodgkinson et al., 2004; Thomson et al., 2005; Maeda et al., 2006; Perlis et al., 2008] (assoc)    
         MDD [Schosser et al., 2009] (assoc)    
54715A2BP1 (Ataxin-2-binding protein 1)3.42E−05, rs80461704.23E−04, rs18182901.59E−04, rs110771354.18E−03, rs7187986 VT (D) Cat III-VPA 16p13.2  7.51.11E−02
         BP [Baum et al., 2008; Johnson et al., 2009] (assoc)    
         BP [Ewald et al., 2002]    
216ALDH1A1 (Aldehyde dehydrogenase family 1, subfamily A1)1.29E−02, rs3484781.58E−04, rs3484583.34E−02, rs7873724 Abnormal sleep pattern/circadian rhythm (QTL)NST PFC (D) ST AMY (I)BP blood (D) Cat IV-Meth9q21.13I (BP) [Pennington et al., 2008] 7.53.08E−02
         BP [Macgregor et al., 2004]    
2890GRIA1 (Glutamate receptor, ionotropic, AMPA 1)1.47E−02, rs170962106.55E−03, rs49586679.19E−03, rs77192926.84E−03, rs1461232Abnormal emotion/affect behavior (QTL)VT (D) Cat IV-Meth 5q33.2I (BP), (MDD) [Choudary et al., 2005] 7.52.48E−04
         BP [Kerner et al., 2009] (assoc)    
         BP [Morissette et al., 1999; Sklar et al., 2004]    
4842NOS1 (Nitric oxide synthase 1), neuronal1.72E−02, rs16078173.73E−02, rs22719874.56E−02, rs128115832.12E−02, rs10850803Abnormal emotion/affect behavior abnormal sleep pattern/circadian rhythm (QTL)NST AMY (D) 12q24.22I/D (BP) [Benes et al., 2006] 7.51.13E−02
         BP [Fallin et al., 2005] (assoc)    
         BP [Morissette et al., 1999; Chagnon et al., 2004]    
773CACNA1A (Calcium channel, voltage-dependent, P/Q type, alpha 1A) subunit2.99E−02, rs177779412.12E−02, rs104218107.04E−04, rs104218102.70E−03, rs16016Abnormal emotion/affect behavior (QTL)  19p13.13D (BP) [Iwamoto et al., 2004] 7.05.67E−03
         BP [Ferreira et al., 2008b] (assoc)    
         MDD [Zubenko et al., 2003]    
10659CUGBP2 (CUG triplet repeat, RNA binding protein 2)2.84E−05, rs6829703.38E−03, rs9329182.66E−02, rs23789919.51E−03, rs1990 ST PFC (I) 10p14 BP(I) [Matigian et al., 2007]7.01.59E−02
         MDD [Zubenko et al., 2003]    
1826DSCAM (Down syndrome cell adhesion molecule) like 11.39E−03, rs4553042.72E−04, rs81292838.11E−04, rs72780731.51E−02, rs2837504   21q22.2I (BP) [Amano et al., 2008] 7.01.66E−03
         BP [Amano et al., 2008] (assoc)    
2913GRM3 (Glutamate receptor, metabotropic 3)3.43E−02, rs69559173.18E−03, rs102264017.36E−03, rs22375522.22E−04, rs2237554 ST AMY (I) 7q21.12D (MDD/suicide) [Klempan et al., 2009] 7.05.69E−03
         BP [Lambert et al., 2005; Etain et al., 2006]I (BP) [Choudary et al., 2005]   
2932GSK3B (Glycogen synthase kinase 3 beta)9.82E−03, rs178110131.62E−02, rs178102356.72E−03, rs6438552 Abnormal emotion/affect behavior (TG)CP (D) Cat IV-VPA ST PFC (D) ST AMY (I) PFC (D) Cat IV-METH 3q13.33D (BP) [Nakatani et al., 2006; Vawter et al., 2006] 7.0 
         BP [Szczepankiewicz et al., 2006; Lachman et al., 2007] (assoc)I (MDD) [Vawter et al., 2006]   
         BP [Bailer et al., 2002; Benedetti et al., 2004; Maziade et al., 2005]    
3356HTR2A (5-Hydroxytryptamine (serotonin) receptor 2A)1.86E−02, rs20252964.52E−02, rs9729791.65E−03, rs172887235.60E−03, rs977003Abnormal emotion/affect behavior, abnormal sleep pattern/circadian rhythm (TG)  13q14.2D (BP) [Knable et al., 2004; Torrey et al., 2005] 7.03.19E−03
         BP [Arranz et al., 1997; Lin et al., 2003; McAuley et al., 2009] (assoc)I (Suicide) [Klempan et al., 2009]   
         BP [Chee et al., 2001; Ranade et al., 2003] (assoc)    
         BP [Badenhop et al., 2002]    
         Major affective disorders [Bonnier et al., 2002] (assoc)    
         Mood disorders [Brezo et al., 2009] (assoc)    
         Response to antidepressants (SSRI) [Uher et al., 2009] (assoc)    
3775KCNK1 (Potassium channel, subfamily K, member 1)1.89E−02, rs38432507.60E−03, rs46492403.47E−04, rs7012094.38E−02, rs4649343   1q42.2D (BP) [Jurata et al., 2004]BP (I) [Matigian et al., 2007]7.0 
         BP [Curtis et al., 2003; Macgregor et al., 2004]    
11278KLF12 (Kruppel-like factor 12)2.76E−03, rs48851516.77E−04, rs96001601.68E−04, rs9543443 Abnormal emotion/affect behavior Abnormal sleep pattern/circadian rhythm (QTL)ST AMY (I) ST PFC (D) 13q22.1 Mood (D) [Le-Niculescu et al., 2009a]7.04.93E−04
         BP [Potash et al., 2003]    
10150MBNL2 (Muscleblind-like 2)2.94E−03, rs64913454.64E−02, rs73186234.02E−04, rs95845521.61E−02, rs16953952 AMY (D) Cat III-VPADBP NST blood (D)13q32.1  7.04.08E−02
         BP [Kelsoe et al., 2001]    
89797NAV2 (Neuron navigator 2)4.16E−03, rs21199815.77E−04, rs13727972.04E−03, rs22183291.87E−03, rs10500860Abnormal emotion/affect behavior (TG)  11p15.1D, bipolar suicide [Kim et al., 2007] 7.01.73E−03
         BP [Detera-Wadleigh et al., 1999]    
4684NCAM1 (Neural cell adhesion molecule 1)2.77E−02, rs112145012.61E−02, rs5869038.62E−03, rs122792611.75E−02, rs4366519   11q23.1D (BP) [Atz et al., 2007], MDD (D) [Tochigi et al., 2008]BP (D) peripheral blood cells [Wakabayashi et al., 2008]7.0 
         BP[Atz et al., 2007] (assoc), BP [Arai et al., 2004] (assoc)    
2908NR3C1 (Nuclear receptor subfamily 3, group C, member 1) (glucocorticoid receptor)4.03E−03, rs172092513.71E−02, rs104826722.96E−02, rs104826722.69E−02, rs2918417Abnormal emotion/affect behavior (TG)  5q31.3D (BP) [Knable et al., 2004; Torrey et al., 2005] 7.07.54E−03
         BP [Etain et al., 2006]I, MDD suicide [Sequeira et al., 2007]   
         MDD [van West et al., 2006]    
         MDD [Wong et al., 2008] (assoc)    
         Response to antidepressants (SSRI) [Uher et al., 2009] (assoc)    
4988OPRM1 (Opioid receptor, mu 1)7.82E−04, rs20108847.31E−03, rs6508251.90E−03, rs77454992.11E−02, rs2141289Abnormal emotion/affect behavior (TG)  6q25.2I (BP) [Ryan et al., 2006] 7.07.57E−03
         BP [Cheng et al., 2006]    
5101PCDH9 (Protocadherin 9)9.77E−03, rs170821491.19E−03, rs93176264.80E−04, rs7986387 Abnormal emotion/affect behavior, abnormal sleep pattern/circadian rhythm (QTL)NST AMY (I) 13q21.32D (MDD/suicid) [Klempan et al., 2009] 7.04.92E−03
         BP [Potash et al., 2003]    
         MDD [Wong et al., 2006] (assoc)    
5142PDE4B (Phosphodiesterase 4B), cAMP-specific (phosphodiesterase E4 dunce homolog, drosophila)6.02E−03, rs65881901.60E−03, rs5393221.41E−02, rs120215744.42E−02, rs17417507   1p31.2D (BP) [Fatemi et al., 2008]BP (I) [Padmos et al., 2008]7.01.54E−03
         BP [Millar et al., 2007] (assoc) MDD (I), leukocyte [Numata et al., 2009]  
5581PRKCE (Protein kinase C, epsilon)4.59E−03, rs27112932.37E−04, rs25952211.20E−02, rs67483752.48E−02, rs4557033Abnormal emotion/affect behavior (TG)  2p21D (BP) [Torrey et al., 2005] 7.01.46E−03
         BP [Etain et al., 2006]    
5764PTN (Pleiotrophin) (heparin binding growth factor 8, neurite growth-promoting factor 1)2.85E−02, rs69778191.90E−02, rs69777494.56E−03, rs3206828.78E−04, rs17169022 CP (I) Cat IV-Meth 7q33I (MDD) [Tochigi et al., 2008] 7.01.03E−03
         BP [Segurado et al., 2003]    
5797PTPM (Protein tyrosine phosphatase, receptor type, M)1.74E−02, rs7279511.10E−02, rs37863672.41E−04, rs169526201.01E−02, rs4121619   18p11.23I (BP) [Nakatani et al., 2006]Mood (D) [Le-Niculescu et al., 2009a]7.01.14E−03
         BP [Segurado et al., 2003]    
6263RYR3 (Ryanodine receptor 3)1.21E−03, rs169579452.89E−04, rs25962056.09E−03, rs26709558.07E−03, rs744776Abnormal emotion/affect behavior (TG)CP (I) Cat IV-VPA 15q13.3  7.01.11E−03
         Depression [Levinson et al., 2007]    
9522SCAMP1 (Secretory carrier membrane protein 1)1.71E−02, rs10198031.31E−02, rs19683822.46E−03, rs168753822.25E−02, rs16875428 ST PFC (D)DBP NST (D)5q14.1 Mood (D) [Le-Niculescu et al., 2009a]7.04.14E−02
287ANK2 (Ankyrin 2), neuronal4.77E−04, rs174454591.34E−02, rs175905938.90E−03, rs105165935.18E−03, rs1351998Abnormal emotion/affect behavior (QTL)ST PFC (I) 4q25  6.59.42E−05
         BP [Lambert et al., 2005]    
351APP (Amyloid beta (A4) precursor protein)3.37E−02, rs39919.86E−03, rs28299847.81E−03, rs37876201.04E−02, rs2830048Abnormal emotion/affect behavior, abnormal sleep pattern/circadian rhythm (TG)  21q21.3I (BP) [Jurata et al., 2004] 6.5 
         BP [Morissette et al., 1999]    
6310ATXN1 (Ataxin 1)1.11E−03, rs93708935.55E−03, rs22371986.58E−03, rs121988383.96E−04, rs909786 ST PFC (D) 6p22.3 Mood (I)6.51.12E−03
815CAMK2A (Calcium/calmodulin-dependent protein kinase II alpha) 1.76E−02, rs105156393.62E−02, rs37976172.30E−02, rs4958469Abnormal emotion/affect behavior, abnormal sleep pattern/circadian rhythm (TG)NST AMY (I) 5q32D (BP) [Xing et al., 2002], I (MDD) [Novak et al., 2006; Tochigi et al., 2008] 6.5 
         BP [Sklar et al., 2004; Etain et al., 2006]    
960CD44 (CD44 antigen)3.48E−02, rs169271003.94E−03, rs3536151.06E−02, rs7115768  CP (I) Cat IV-MethBP blood (D) Cat IV-Meth11p13 BP (I) [Middleton et al., 2005]6.54.01E−04
         BP [McInnes et al., 1996]    
1012CDH13 (Cadherin 13)5.89E−03, rs18626822.50E−03, rs71982529.08E−04, rs9314088.01E−03, rs7197423Abnormal emotion/affect behavior (QTL)NST AMY (D) 16q23.3  6.52.90E−03
         BP [Etain et al., 2006]    
1387CREBBP (CREB binding protein)5.02E−03, rs1300361.39E−03, rs133320763.64E−03, rs1299631.91E−02, rs11644593Abnormal emotion/affect behavior (TG)ST PFC (D) 16p13.3  6.51.37E−03
         BP [Ewald et al., 2002]    
1612DAPK1 (Death-associated protein kinase 1)4.02E−02, rs111419095.97E−05, rs108686444.04E−02, rs31242361.56E−03, rs3124236Abnormal emotion/affect behavior (QTL)AMY (D) Cat III-VPA 9q21.33  6.55.20E−03
         BP [Segurado et al., 2003]    
9201DCLK1 (Doublecortin-like kinase 1)1.20E−022.36E−035.27E−032.59E−02 BP (D) Cat IV-VPADBP NST (D)13q13.3  6.5 
         BP, SZ [Maziade et al., 2005]    
1729DIAPH1 (Diaphanous (Drosophila, homolog) 1)2.62E−02, rs7404744.70E−02, rs119546583.38E−03, rs3973277.63E−03, rs3792896 CP (I) Cat III-VPA 5q31.3 BP(I) [Matigian et al., 2007]6.5 
         BP [Etain et al., 2006]    
27086FOXP1 (Forkhead box P1)4.80E−03, rs9504439.66E−04, rs76402375.33E−03, rs38460311.58E−04, rs17718783 NST AMY (D) ST PFC (D) 3p13  6.52.97E−03
         BP [McInnes et al., 1996; Etain et al., 2006]    
2770GNAI1 (Guanine nucleotide binding protein, alpha inhibiting 1)4.98E−03, rs104291567.55E−03, rs69736161.55E−02, rs9169053.71E−02, rs2523189 ST PFC (D) 7q21.11D (BP) [Jurata et al., 2004] 6.54.32E−02
         BP [Lambert et al., 2005]    
2897GRIK1 (Glutamate receptor, ionotropic, kainate 1)5.39E−04, rs21544902.79E−03, rs28324763.36E−02, rs4649824.47E−02, rs467155Abnormal emotion/affect behavior (QTL)  21q21.3D (BP) [Iwamoto et al., 2004; Choudary et al., 2005; Nakatani et al., 2006] 6.55.91E−03
         BP [Detera-Wadleigh et al., 1999; Morissette et al., 1999]I (MDD) [Choudary et al., 2005]   
2939GSTA2 (Glutathione S-transferase, alpha 2) (Yc2)1.14E−03, rs22079501.93E−03, rs26086321.89E−03, rs22241981.52E−02, rs2749010  BP (D) Cat III-Meth6p12.2I (BP) [Benes et al., 2006] 6.5 
         BP [Lambert et al., 2005]    
3751KCND2 (Potassium voltage-gated channel, Shal-related family, member 2)5.78E−03, rs101561254.08E−03, rs125389905.24E−05, rs102685913.86E−02, rs2191736Abnormal emotion/affect behavior (QTL)ST PFC (D) 7q31.31  6.5 
         BP [Etain et al., 2006]    
4008LMO7 (LIM domain only 7)6.62E−05, rs95304601.11E−02, rs95931328.17E−03, rs15705546.59E−03, rs9530460Abnormal emotion/affect behavior, abnormal sleep pattern/circadian rhythm (QTL)  13q22.2 Anti-depressant (D) lymphocytes [Kalman et al., 2005]6.52.26E−02
         BP [Potash et al., 2003]    
23040MYT1L (Myelin transcription factor 1-like)2.25E−04, rs19917731.31E−02, rs14216141.25E−02, rs105194861.65E−02, rs17039396Abnormal sleep pattern/circadian rhythm (QTL)ST PFC (D) 2p25.3  6.55.83E−03
         BP [Detera-Wadleigh et al., 1999]    
4720NDUFS2 (NADH dehydrogenase (ubiquinone) Fe-S protein 2), 49 kDa (NADH-coenzyme Q reductase)4.27E−02, rs50851.08E−02, rs114214.67E−02, rs114213.61E−02, rs5085 BP (I) Cat III-VPA 1q23 BP(D) [Middleton et al., 2005]6.5 
         BP [Fallin et al., 2004]    
4897NRCAM (Neuronal cell adhesion molecule)1.63E−03, rs132278365.94E−04, rs37634618.60E−04, rs15489494.35E−02, rs11974528Abnormal sleep pattern/circadian rhythm (QTL)NST AMY (I) 7q31.1  6.51.16E−03
10846PDE10A (Phosphodiesterase 10A)1.50E−02, rs29835069.64E−03, rs125257631.50E−03, rs4541653.40E−03, rs2983521Abnormal emotion/affect behavior (TG)NST AMY (D) ST PFC (D) 6q27  6.52.16E−02
         BP [Cheng et al., 2006]    
11122PTPRT (Protein tyrosine phosphatase, receptor type, T)6.27E−03, rs60303853.45E−03, rs24254781.12E−02, rs10160714.67E−03, rs1883842 ST AMY (I) 20q12I (MDD/suicid) [Sequeira et al., 2007] 6.51.20E−03
         BP [Radhakrishna et al., 2001]D (MDD) [Aston et al., 2005]   
6546SLC8A1 (Solute carrier family 8 (sodium/calcium exchanger), member 1)4.57E−03, rs104900492.77E−04, rs170253722.28E−02, rs3817977.44E−03, rs12052585Abnormal emotion/affect behavior (QTL)ST AMY (I) ST PFC (D) 2p22.1  6.51.30E−02
         BP [Etain et al., 2006]    
8224SYN3 (Synapsin III)1.67E−04, rs110895994.94E−03, rs1303014.17E−03, rs37884672.03E−02, rs933255   22q12.3D (BP) [Vawter et al., 2002] 6.57.97E−03
         BP [Lachman et al., 2006] (assoc)    
         BP [Detera-Wadleigh et al., 1999; Kelsoe et al., 2001; Potash et al., 2003]    
7074TIAM1 (T-cell lymphoma invasion and metastasis 1)7.39E−05, rs133400181.82E−03, rs124827962.65E−03, rs22570622.49E−03, rs845945Abnormal emotion/affect behavior (QTL)  21q22.11D(MDD) [Aston et al., 2005] 6.53.31E−02
         BP [Morissette et al., 1999]    
128553TSHZ2 (Teashirt family zinc finger 2)1.98E−02, rs72631158.22E−03, rs27413563.58E−04, rs1692701.73E−02, rs6068531Abnormal emotion/affect behavior (QTL)  20q13.2 Mood (D) [Le-Niculescu et al., 2009a]6.58.16E−03
         BP [Radhakrishna et al., 2001]    
79683ZDHHC14 (Zinc finger, DHHC domain containing 14)4.09E−03, rs18854524.59E−03, rs5961833.56E−02, rs169002544.89E−03, rs17297221 ST AMY (I) 6q25.3 Mood (D) [Le-Niculescu et al., 2009a]6.52.40E−02
         BP [Cheng et al., 2006]    

Epistasis Testing

The GAIN-BP case and control data were employed to test for epistatic interactions among SNPs in genes from the GRPS panel having a role in one or more of the top canonical biological pathways from our pathway analysis. These pathways, and the genes comprising each that were considered, are listed in Table II. Within each pathway, SNP × SNP allelic epistasis was tested for each distinct pair of SNPs using the PLINK software package.

Table II. Biological Pathways
Top networksScore
(1) Nervous system development and function, neurological disease, genetic disorder38
(2) Cellular compromise, neurological disease, drug metabolism38
(3) Cellular assembly and organization, cardiovascular system development and function, cellular growth and proliferation24
(4) Amino acid metabolism, cancer, cell morphology19
 P-Value# Molecules
Diseases and disorders
 Genetic disorder1.52E−19 to 3.47E−0353
 Neurological disease1.52E−19 to 3.67E−0347
 Psychological disorders1.52E−19 to 1.95E−0335
 Endocrine system disorders4.63E−15 to 3.67E−0338
 Metabolic disease2.44E−14 to 1.05E−0738
Molecular and cellular functionsP-Value# Molecules
Cellular assembly and organization4.21E−08 to 3.67E−0323
Cell-to-cell signaling and interaction8.46E−08 to 3.67E−0326
Cellular movement2.87E−07 to 3.67E−0315
Amino acid metabolism1.68E−06 to 2.81E−0315
Molecular transport1.68E−06 to 3.67E−0318
Physiological system development and functionP-Value# Molecules
Behavior4.22E−13 to 2.94E−0319
Nervous system development and function2.83E−11 to 3.67E−0333
Organismal functions1.97E−10 to 1.45E−0910
Tissue morphology2.27E−05 to 3.67E−0317
Hematological system development and function7.44E−05 to 3.67E−0314
 P-ValueRatio
  1. Ingenuity pathway analysis of the top candidate genes from Table I.

Top canonical pathways
 (1) G-protein coupled receptor signaling8.50E−078/218 (0.037)
 (2) CREB signaling in neurons3.00E−067/194 (0.036)
 (3) Synaptic long term depression1.46E−056/164 (0.037)
 (4) cAMP-mediated signaling2.47E−056/164 (0.037)
 (5) Neuropathic pain signaling in dorsal horn neurons3.54E−055/104(0.048)

Genetic Risk Prediction Panel and Scoring

Out of our analysis, a panel of top genes prioritized by CFG scoring (Fig. 1) can be chosen. We developed a genetic risk prediction (GRP) panel, based on a list of top genes from Table III (n = 56, all the genes that had a CFG score better than 6, that is, >than 50% of the maximum possible CFG score of 12). All the SNPs for these genes that had nominal P values <0.05 in one or several of the four GWAS datasets (Wellcome, German, NIMH, STEP-BD) we used were identified. The best P values SNPs in each study were assembled in a GRP panel (Table III), and tested in the GAIN-BP data. As a caveat, not all the SNPs in our GRP panel had been genotyped in the GAIN-BP. Overall, out of 216 SNPs in our panel (4 SNPs × 56 genes = 224 SNPs theoretically, but some genes did not have a nominally significant SNPs in one or another of the 4 discovery GWAS), only 118 were tested in the GAIN-BP sample.

Table III. Intra-Pathway Epistasis (INPEP) Testing Identifies Genes That May Work Together
Ingenuity top canonical pathwaysGenesNominal epistatic P values
  1. Inside each of the top canonical pathways depicted in Table II, we tested for epistatic interactions between genes in the pathway, in an independent dataset, the GAIN-BP, as a way of identifying and prioritizing interactions. The top epistatic interactions in each pathway are depicted in bold. These genes merit future follow-up work to elucidate the biological and pathophysiological relevance of their interactions. As a caveat, the P-value was not corrected for multiple comparisons.

G-Protein coupled receptor signalingCAMK2AGRM3PDE10A*OPRM1GNAI1PRKCEPDE4B*HTR2A**0.0133
CREB signaling in neuronsCAMK2AGRM3GRIA1CREBBP*GNAI1*PRKCEGRIK1 *0.0138
Synaptic long term depressionNOS1*GRM3*GRIA1RYR3GNAI1PRKCE  *0.0173
cAMP-mediated signalingCAMK2AGRM3PDE10A*OPRM1GNAI1PDE4B*  *0.0133
Neuropathic pain signaling in dorsal horn neuronsCAMK2ABDNFGRM3*GRIA1PRKCE*   *0.0298

Each SNP has two alleles (represented by base letters at that position). One of them is associated with the illness (affected), the other not (non-affected), based on the odds ratio from the GAIN-BP. We assigned the affected allele a score of 1 and the non-affected allele a score of 0. A two-dimensional matrix of subjects by GRP panel alleles is generated, with the cells populated by 0 or 1 (Fig. S3). A SNP in a particular individual subject can have any permutation of 1 and 0 (1 and 1, 0 and 1, 1 and 0, 0 and 0). By adding these numbers, the minimum score for a SNP in an individual subject is 0, and the maximum score is 2. By adding the scores for all the alleles in the panel, averaging that, and multiplying by 100, we generate for each subject an average score corresponding to a genetic loading for disease, which we call Genetic Risk Predictive Score (GRPS). From lower to higher genetic risk, the GRPS has a minimum value of 0 and maximum value of 100. As a caveat, the assignments of 0 and 1 were made based on information for that allele in GAIN-BP for EA subjects, and separately for AA subjects, and rests on the assumption that the same alleles are associated with bipolar disorder in all subjects of the same ethnicity. However, the GAIN-BP test GWAS is not used to select the panel of genes and SNPs in the GRP panel, which is being derived completely independently from the CFG analysis of the four discovery GWAS.

The software package PLINK (http://pngu.mgh.harvard.edu/∼purcell) was used to extract individual genotype information for each subject from the GAIN-BP GWAS data files. We analyzed separately European-American (EA) and African-American (AA) bipolar subjects and controls, to examine any potential ethnicity variability (Fig. 3a,b). To test for significance between bipolar and control subjects, a one-tailed t-test was performed between the bipolar subjects and the control subjects. We also analyzed males and females separately from each other, to look at any gender-induced variability (Fig. 3c,d). Finally, we tested for the ability of the GRPS to distinguish between bipolar subjects based on an important clinical variable, episode frequency, which is the sum of all episodes of illness (depression and mania), divided by the number of years of illness. We used a case-case design with extremes in phenotype. Thus,we compared the GRPS in subjects with the top 1/3 of episode frequency scores versus subjects with the bottom 1/3 of episode frequency scores (Fig. 3e,f).

GRPS Prediction Testing

In a subsequent analysis, we used a split cohort design. We split the GAIN-BP samples for each ethnicity into a 2/3 cohort used for setting GRPS thresholds for bipolar and controls, and a 1/3 cohort used for testing the predictive value of these settings. Inside each ethnicity, the assignment to cohorts was matched for gender, but otherwise random (Table SI).

The average GRPS score for bipolar subjects in the 2/3 cohort is used as a cut-off for bipolar in the test 1/3 cohort (i.e., being above that threshold), and the average GRPS score for controls in the 2/3 cohort is used as a cut-off for controls in the test 1/3 cohort (i.e., being below that threshold). The subjects who are in between these two thresholds are called undetermined. Furthermore, to stratify risk, we categorized subjects in the 1/3 testing cohort into Category 1 if they fall within one standard deviation above the bipolar threshold, and Category 1 if they fall within one standard deviation below the control threshold. Category 2 are between one and two standard deviations from the thresholds, Category 3 between two and three standard deviations, and Category 4 are those who fall beyond three standard deviations of the threshold. The positive predictive value (PPV) of the test was calculated for each of the categories (Fig. 4).

RESULTS

Top Candidate Genes

In order to minimize false negatives, we initially cast a wide net, using as a filter a minimal requirement for a gene to have both some genetic and some functional genomic evidence. We thus generated an initial list of 1,657 unique genes with at least a SNP at P < 0.05 in at least one of the four primary GWAS analyzed, that also had some functional (gene expression) evidence (human or animal model data), implicating them in bipolar disorder or depression. Of interest, a previous similar analysis by us using just three GWAS [Le-Niculescu et al., 2009b] yielded 1,529 unique genes, suggesting that: (1) with our genetic-genomic filtering of the GWAS in the primary analysis we are already capturing most of the genes that may be involved in bipolar disorder, with additional studies providing an asymptotic contribution beyond this point; and (2) that, using our thresholds and minimal requirements, the number of genes potentially involved, directly or indirectly, in bipolar disorder may be indeed quite large, up to 10% of the genome (see also Supplementary Information—Fig. S1).

In order to minimize false positives, we then used a CFG analysis integrating multiple lines of evidence to prioritize this initial list of 1,657 genes, and focused our subsequent analyses on only the top CFG scoring candidate genes. 56 genes had a CFG score of above 6 (>50% of maximum possible score) (Table III and Fig. 2).

Figure 2.

Top bipolar candidate genes. The lines of evidence (CFG scoring) is depicted on the right side of the pyramid.

As a way of testing the validity of our approach, we have examined whether our top findings were over-represented in an independent GWAS of bipolar disorder, the GAIN-BP study. Forty-six of the top 56 genes identified by our approach had a P-value of <0.05 in that independent study, an estimated almost threefold enrichment over what would be expected by chance alone in that study (see Table III).

Candidate Blood Biomarkers

Of the top candidate genes from Table III (see also Fig. 2), 22 out of 56 have prior blood gene expression evidence implicating them as potential blood biomarkers. The additional evidence provided by GWAS data suggests a genetic rather than purely environmental (medications, stress) basis for their alteration in disease, and their potential utility as trait rather than purely state markers.

Biological Pathways

Ingenuity pathway analysis was carried out on the top 56 genes, revealing results similar to our previous work [Le-Niculescu et al., 2009b]. Notably, G-protein coupled receptor signaling, cAMP related signaling and synaptic long-term depression were the top canonical pathways over-represented in bipolar disorder, which is informative (and reassuring, as these pathways are highly druggable) for new drug discovery efforts by pharmaceutical companies.

Intra-Pathway Testing for Epistasis (INPEP)

Epistatic interactions testing inside each of these pathways, using the independent GAIN-BP data, revealed some nominally significant pair-wise P values (Table II). For example, the possible interactions between CREBBP and GNAI1, and between NOS1 and GRM3 are non-obvious.

These prioritized pairs of genes inside each pathway may merit future hypothesis driven confirmatory genetic studies in independent cohorts, as well as testing for mechanistic interactions relevant to bipolar disorder pathophysiology in follow-up biological studies, such as transgenic mice studies.

DISCUSSION

Our CFG approach helped prioritize, as in our previous work [Le-Niculescu et al., 2009b] and as expected, genes for which there was consistent evidence among the four discovery GWAS datasets, or stronger evidence in one or another of the datasets. However, it also prioritized genes with weaker evidence in the GWAS data, but with strong independent evidence in terms of gene expression studies and other prior human or animal genetic work.

At the very top of our list of candidate genes for bipolar disorder we have six genes: ARNTL, MBP, BDNF, NRG1, RORB and DISC1.

Aryl hydrocarbon receptor nuclear translocator-like (ARNTL), a transcription factor, is a circadian clock gene. Another circadian top candidate genes identified by our analysis is RORB (Fig. 2 and Table III). RORB was also recently reported by us to be associated with bipolar disorder in an independent pediatric bipolar sample [McGrath et al., 2009]. Circadian rhythm and sleep abnormalities have long been described in bipolar disorder—excessive sleep in the depressive phase, reduced need for sleep in the manic phase [Bauer et al., 2006]. Sleep deprivation is one of the more powerful and rapid acting treatment modalities for severe depression, and can lead to precipitation of manic episodes in bipolar patients [Wirz-Justice et al., 2004]. We had previously described the identification of clock gene D-box binding protein (DBP) as a potential candidate gene for bipolar disorder [Niculescu et al., 2000b], using a CFG approach. DBP was changed in expression by acute methamphetamine treatment in rat pre-frontal cortex (PFC) [Niculescu et al., 2000b], and mapped near a human genetic linkage locus for bipolar disorder [Morissette et al., 1999] and for depression [Zubenko et al., 2002] on chromosome 19q13. Subsequently, DBP was also reported changed in expression by acute and chronic amphetamine treatments in mice [Sokolov et al., 2003]. Moreover, DBP knock-out mice have abnormal circadian and homeostatic aspects of sleep regulation [Franken et al., 2000]. More recently, we have conducted extensive behavioral and gene expression studies in DBP KO mice. These mice display a bipolar-like phenotype [Le-Niculescu et al., 2008], which is modulated by stress. Decreases in DBP expression have also been recently reported in fibroblasts from bipolar subjects [Yang et al., 2008]. In parallel, work carried out by us using an expanded CFG approach in a mouse pharmacogenomic model for bipolar disorder identified ARNTL and a series of other clock genes (CRY2, CSNK1Ds, and CCR4/nocturnin), as potential bipolar candidate genes [Ogden et al., 2004]. Following that, three independent reports have shown some suggestive association for ARNTL in human bipolar samples [Mansour et al., 2006; Nievergelt et al., 2006; Shi et al., 2008]. ARNTL is upstream of DBP in the circadian clock intracellular molecular machinery, driving the transcription of DBP [Ripperger and Schibler, 2006; van der Veen et al., 2006]. An increase in ARNTL gene expression was reported in postmortem brains from bipolar subjects [Nakatani et al., 2006]. Seasonal affective disorder (SAD), a variant of bipolar disorder [Magnusson and Partonen, 2005], is tied to the amount of daylight, which is a primary regulator of circadian rhythms and clock gene expression; associations between polymorphisms in the clock genes ARNTL, PER2, and NPAS2 and SAD have previously been reported [Johansson et al., 2003; Partonen et al., 2007]. Overall, ARNTL, RORB, DBP and related circadian clock genes are compelling candidates for involvement in bipolar disorders, acting as rheostats as well as underlying the core clinical phenomenology of cycling and switching from depression to mania [Bunney and Bunney, 2000; Wager-Smith and Kay, 2000; Niculescu et al., 2000b; Niculescu and Kelsoe, 2001; Kelsoe and Niculescu, 2002; Lenox et al., 2002; Hasler et al., 2006; Wirz-Justice, 2006; McClung, 2007; Le-Niculescu et al., 2008].

Myelin basic protein (MBP) is involved in white matter build-up and connectivity processes [Harauz et al., 2009]. Based on human postmortem and blood biomarker work, as well as animal models (see Table III), it may be decreased in expression in the depressive phase of bipolar disorder, leading to a slowing of action potential transmission, potential disconnection between brain regions, and outward psychomotor retardation. The additional evidence provided by GWAS data indicates a genetic rather than purely environmental (medications, stress) basis for its alteration in disease, and its potential utility as trait marker for increased vulnerability. MBP alterations have also been reported in other neuropsychiatric disorders such as schizophrenia [Parlapani et al., 2009], alcoholism [Lewohl et al., 2005], multiple sclerosis [Zamvil et al., 1985], as well as an animal models of stress reactivity [Le-Niculescu et al., 2008]. Myelin-related genes may be a common if non-specific denominator of vulnerability to mental illness in response to stress [Le-Niculescu et al., 2008].

BDNF is a growth factor involved in neurotrophicity and synaptic transmission. Other growth factor top candidate genes identified by our analysis include NRG1 and PTN (Fig. 2 and Table III). BDNF has been previously implicated in a variety of neuropsychiatric disorders, by both animal model and human studies: depression [Pezawas et al., 2008; Sen et al., 2008], bipolar disorder [Ogden et al., 2004], anxiety, alcoholism [Rodd et al., 2007], and schizophrenia [Le-Niculescu et al., 2007a; Chao et al., 2008]. Notably, there are several candidate gene association studies to date implicating BDNF in bipolar disorder [Fan and Sklar, 2008; Liu et al., 2008].

Amyloid beta precursor protein (APP), an Alzheimer disease (AD) candidate gene, is among the top candidate gene for bipolar disorder (Table III). Another key gene involved in AD, GSK3b, is also present on our list of top candidate genes. Previous epidemiological literature has pointed to increased AD in bipolar patients, and the prophylactic effect of the mood stabilizer lithium on the incidence of AD in bipolar patients [Nunes et al., 2007]. Notably, GSK3b is a target of lithium treatment [Beaulieu et al., 2008a], as well as of serotonergic anti-depressants [Beaulieu et al., 2008b]. APP has recently been shown to have a neurotrophic role [Oh et al., 2008], similar to growth factors such as BDNF. APP has also been reported to be increased in expression in bipolar postmortem brains compared to normal controls [Jurata et al., 2004]. It remains to be seen if APP's role in AD is pathogenic or is in fact a defense/compensatory mechanism to try to maintain neuronal survival [Rohn et al., 2008]. The possibility that drugs that regulate APP levels may have an impact on mood (i.e., downregulation of APP may be depressogenic) needs to be explored, given the prevalence of depression in the elderly in general [Alexopoulos et al., 2005], and in AD patients in particular [Sun et al., 2008]. In any case, this is an intriguing example of potential genetic co-morbidity, overlap and interdependence between mood and cognition.

Genetic Risk Prediction

Once the genes involved in a disorder are identified, and prioritized for likelihood of involvement, then an obvious next step is developing a way of applying that knowledge to genetic testing of individuals to determine risk for the disorder. Based on our comprehensive identification of top candidate genes described above, we pursued a polygenic panel approach, with digitized binary scoring for presence or absence, similar to the one we have devised and employed in the past for biomarkers [Le-Niculescu et al., 2009a]. Somewhat similar approaches, looking however at larger panels of markers without CFG prioritization, were subsequently also described by other groups [Purcell et al., 2009].

We have chosen the best SNPs in our CFG prioritized genes by their P values in the GWAS datasets used, and assembled a genetic risk prediction (GRP) panel out of those SNPs (Table III). We then developed a genetic risk prediction score (GRPS) for bipolar disorder based on the presence or absence of the alleles of the SNPs associated with the illness, and tested the GRPS in an independent study (GAIN-BP) for which we had both genotypic and clinical data available, comparing the bipolar subjects to demographically matched normal controls (Fig. 3).

Figure 3.

The genetic risk prediction score (GRPS) for bipolar disorder differentiates between bipolar subjects and normal controls in an independent study, in two different ethnic groups. The GRPS is based on a panel of the best P-value SNPs (n = 216) from the best top genes (n = 56) for bipolar disorder identified by CFG of four GWAS for bipolar disorder (see Table III). The GRPS shows statistically significant differences between subjects with bipolar disorder and normal controls, in European American (EA) (a) as well as in African Americans (AA) (b), in an independent GWAS study [GAIN-BP, Smith et al., 2009]. Out of 216 SNPs in our panel, 118 SNPs were genotyped in the GAIN-BP study. Gender analyses exhibited slight trends towards higher GRPS in males than females in both ethnicities, that did not reach statistical significance (c,d). Episode frequency: of note, the GRPS is able to differentiate between high episode frequency and low episode frequency bipolar subjects in EA (e), but not AA subjects (f). Episode frequency is a measure of clinical severity, tied to recurrence and to cycling between euthymia, depression and mania.

We demonstrate that in independent test cohorts, the GRPS differentiates between subjects with bipolar disorder and normal controls, in both European-American (EA) and African-American (AA) subjects (Fig. 3a,b). The GRPS also differentiates between high episode frequency and low episode frequency bipolar subjects in EA, but not AA subjects (Fig. 3e,f). Gender analyses exhibited slight trends towards higher GRPS in males than females in both ethnicities, that did not reach statistical significance (Fig. 3c,d). Lastly, we also describe a prototype of how such testing could be used at an individual rather than population level, to categorize individuals by risk and aid diagnostic and personalized medicine approaches (Fig. 4).

Figure 4.

Prototype of how GRPS testing could be used at an individual rather than population level, to aid diagnostic and personalized medicine approaches. We split the GAIN-BP samples from each ethnicity into a 2/3 cohort used for setting GRPS thresholds for bipolar and controls (a,c), and a 1/3 cohort used for testing the predictive value of these settings (b,d). The average GRPS score for bipolar subjects in the 2/3 cohort is used as a cut-off for bipolar in the test 1/3 cohort (i.e., being above that threshold), and the average GRPS score for controls in the 2/3 cohort is used as a cut-off for controls in the test 1/3 cohort (i.e., being below that threshold). The subjects who are in between these two thresholds are called undetermined. Furthermore, to stratify risk, we categorized subjects in the 1/3 testing cohort into Category 1 if they fall within one standard deviation above the bipolar threshold, and Category 1 if they fall within one standard deviation below the control threshold. Category 2 subjects are between one and two standard deviations from the thresholds, Category 3 between two and three standard deviations, and Category 4 are those who fall beyond three standard deviations of the threshold. The positive predictive value (PPV) of the tests increases in the higher categories, and the test is somewhat better at distinguishing controls (i.e., in a practical application, individuals that are lower risk of developing the illness) than bipolars (i.e., in a practical application, individuals that are higher risk of developing the illness).

Our results show that a relatively small size panel identified by CFG analysis can differentiate very well between bipolar disorder subjects and controls at a population level, although at an individual level the margin is razor thin (Figs. 3 and 4). On average, a bipolar subject differs from a control subject by about 2 alleles out of 236 tested. The latter point suggests that the cumulative combinatorics of common gene variants plays a major role in genetic risk for illness. Overall, our work sheds light on the genetic architecture and pathophysiology of bipolar disorder. In particular, it has implications for genetic testing to assess risk for illness. Our evaluation of the predictive value of the GRPS suggests some utility by itself at identifying risk for illness (Fig. 4). More likely, such genetic information will have to be combined with family history and other clinical information (phenomics) [Niculescu et al., 2006], as well as with blood biomarker testing [Le-Niculescu et al., 2009a], to provide a comprehensive picture of risk of illness [Niculescu, 2006; Niculescu et al., 2009].

Limitations and Confounds

No correction of best P values for number of SNPs tested/gene size effect was performed. While this is arguably a valid statistical issue for genetic studies by themselves, some of the multiple SNPs tested per gene could be in linkage disequilibrium, and the Bonferroni correction might be too conservative [Rice et al., 2008]. One would expect some noise due to gene size, as larger genes have more SNPs tested per gene. However, we did not observe a significant correlation between gene size and our top candidate gene prioritization using CFG (Supplementary Information—Fig. S2). That may be due to the fact that we are using this evidence for integration across platforms and modalities, along with a series of other lines of evidence that have their own attendant noise, as part of a Bayesian-like approach to pull signal from noise and prioritize findings. The convergence of lines of evidence arguably factors out the noise of the different individual approaches, and makes our network-like CFG approach relatively resilient to error even when one or another of the nodes (lines of evidence) is weak (Fig. 1).

Our approach relies on a list of genes from the GWAS datasets generated by SNPPER identifying SNPs in genes. We may thus be missing genes where the assignment is not made by the software, and discarding SNPs that fall into intergenic or regulatory regions, such as promoter or enhancer regions. Moreover, genes where the illness associated SNPs do not lead to a change in expression levels are not included in our CFG-GWA cross-validation. Similarly, genes that have changes in expression levels but no intragenic SNP in the GWAS datasets are not included. Interestingly, some of these latter genes may be changed in expression as a consequence of distal regulatory SNPs or other genes in a network, an exciting area for future systems biology studies awaiting better bioinformatic tools and data analysis now on the horizon [Stumpf et al., 2008]. Our panel of genes prioritized by CFG is certainly not exhaustive, it is just an example of one approach. Some of the genes with strong published evidence of association, such as ANK3 [Ferreira et al., 2008a; Schulze et al., 2009] and CACNA1C [Ferreira et al., 2008a], are not prioritized by our approach, (although related genes, ANK2 and CACNA1A, are) (Table III). It may be that genes prioritized by P values in genetic studies alone are the result of a fit-to-cohort phenomenon, resulting in poor reproducibility and predictive value in independent cohorts [Niculescu and Le-Niculescu, 2010; Paynter et al., 2010]. The few of them that are reproduced across studies may be more of a common denominator in bipolar patients (which tend to be heterogeneous), somewhat like housekeeping genes, but not necessarily more biologically relevant or important. CFG arguably identifies and prioritizes genes that have functional evidence and hence are more likely to be biologically relevant [Niculescu and Le-Niculescu, 2010]. By being in essence a fit-to-disease approach, CFG also generates findings that are more reproducible and have predictive value in independent studies and cohorts, as we have demonstrated in our previous work on blood biomarkers [Kurian et al., 2009; Le-Niculescu et al., 2009a], and as we demonstrate in this current genetic work. That is the key litmus test, in our view.

Other animal models data could potentially be used for CFG cross-validation, in addition to the data from the pharmacogenomic (methamphetamine/valproate) [Ogden et al., 2004] and the genetic (DBP knock-out mouse) [Le-Niculescu et al., 2008] models that we generated and used. However, these are some of the best animal models with corresponding comprehensive brain and blood gene expression datasets published to date. Moreover, we relied, as an additional line of evidence, on an extensive public mouse QTL/transgenic database.

As new human blood, postmortem brain, and human genetic studies are published, new evidence will be available for some of the genes we have identified. However, any new evidence will likely not remove genes from our results, but rather move them up higher in the prioritization list/pyramid (Fig. 2).

Different ways of weighing the lines of evidence included in the CFG analysis rather than the equal weight approach we have used may become available in the future, based on more empirical and quantitative methods. Other ways of weighing the scores of line of evidence may give slightly different results in terms of prioritization, if not in terms of the list of top genes per se.

Pathways identified by Ingenuity may be based on some of the same body of knowledge and published literature used in our direct CFG scoring. However, it is reassuring to see that different independent systematization and curation efforts lead to a consistent picture of genes involved in behavior, neurological disease, psychological disorders, and nervous system development coming up at the top of the over-represented pathways from our top candidate genes for bipolar disorder identified by our genetic-genomic combined approach.

CONCLUSIONS AND FUTURE DIRECTIONS

First, in spite of these limitations, our analysis is arguably the most comprehensive integration of genetics and functional genomics to date in the field of bipolar disorder, yielding a series of candidate genes, blood biomarkers, pathways and mechanisms, that are prime targets for follow-up hypothesis driven studies. Such studies may include individual candidate gene association studies with more SNPs tested per gene, deep re-sequencing, and/or biological validation such as cell culture [Pletnikov et al., 2007] and transgenic animal work [Hikida et al., 2007; Le-Niculescu et al., 2008].

Second, our work provides additional integrated evidence focusing attention and prioritizing a number of genes as candidate blood biomarkers for bipolar disorder, with an inherited genetic basis (Table III). While prior evidence existed as to alterations in gene expression levels of those genes in whole-blood samples or lymphoblastoid cell lines (LCLs) from mood disorders patients, it was unclear prior to our analysis whether those alterations were truly related to the disorder or were instead related to medication effects and environmental factors.

Third, our work provides proof how a combined approach, integrating functional and genotypic data, can be used for other complex disorders-psychiatric and non-psychiatric. What we are seeing across GWAS of complex disorders are not necessarily the same genes showing the strongest signal, but rather consistency at the level of gene families or biological pathways. The distance from genotype to phenotype may be a bridge too far for genetic-only approaches, given the intervening complex layers of epigenetics, gene expression regulation and endophenotypes [Tan et al., 2008]. Using GWAS data in conjunction with gene expression data as part of CFG or integrative genomics [Degnan et al., 2008] approaches, followed by pathway-level analysis of the prioritized candidate genes, can lead to the unraveling of the genetic code of complex disorders such as bipolar disorder.

Fourth, we have focused attention on key biological pathways in bipolar disorder, and used genetic epistatic testing to identify and prioritize molecular interactions inside those pathways. We believe that this intra-pathway epistasis testing approach (INPEP) may help with future work aimed at dissecting the molecular architecture of complex disorders.

Fifth, we have put together a panel of best P value single nucleotide polymorphisms (SNPs), based on the top candidate genes we identified. Such a panel could be used for genetic testing for bipolar disorder. To that end, we have developed a genetic risk prediction score (GRPS) based on our panel, and demonstrate how in independent cohorts, the GRPS differentiates between patients with bipolar disorder and normal controls. Based on the GRPS, we demonstrate a prototype of individual subject categorization for risk of illness. We anticipate that the GRPS approach will have utility for other complex disorders, psychiatric and non-psychiatric.

Lastly, while we cannot exclude that rare genetic variants with major effects may exist in some individuals and families, we propose a cumulative combinatorics of common variants genetic model for bipolar disorder based on our findings (Fig. 5), to account for the razor thin genetic load margin between clinically ill subjects and normal controls, which leaves a major role to be played by the environment [Lahiri et al., 2009; Niculescu et al., 2009]. A stressful/hostile environment may lead to sub-threshold illness even in normal genetic load individuals, whereas a favorable environment may lead to supra normative functioning in certain life areas for individuals who carry a higher genetic risk. From a speculative standpoint, this proposed flexible interplay between genetic load, environment and phenotype may permit evolution to engender diversity, select and conserve alleles, ultimately shaping population groups.

Figure 5.

The genetic architecture of bipolar disorders: Cummulative combinatorics of common gene variants and environment (CC × CGV × E) model. We proposed in our previous work [Le-Niculescu et al., 2009b] that the repertoire of genes that may be involved directly or indirectly in bipolar disorders/mood regulation is large, up to 10% of the genes in the genome (complexity). Our current work suggests that different combinations of genes/alleles are found in different individuals (heterogeneity), and many alleles in these genes are shared between bipolars and controls (overlap). The environment may play a key interactive role in the trajectory from genetic risk to ultimate phenotype (interdependence), by modulating gene expression. For example, with the right environment a higher genetic load (GRPS score) individual may become normal or even a high performer. This shuffling of the genetic deck of cards and the interaction with the environment provide a basis for Darwinian adaptation and evolution of mood, a key bodily function synchronizing energy metabolism, trophicity and activity to external and internal milieu conditions[Le-Niculescu et al., 2009a,b]. Circadian clock molecular mechanisms, involving ARNTL, RORB, DBP and other molecules, may be essential mediators [Niculescu et al., 2000b; Ogden et al., 2004; Takahashi et al., 2008; Le-Niculescu et al., 2008; Zhang et al., 2009]. Geometric symbols in the figure depict different genes/alleles.

From a pragmatic utility standpoint, we would like to suggest that genetic testing with highly prioritized panels of best markers will have, by itself, a rather modest role in informing clinical decisions regarding early intervention and prevention efforts, for example before the illness fully manifests itself clinically, in young offspring from high-risk families. After the illness manifests itself, biomarker and phenomic testing approaches, including clinical data, may have higher yield than genetic testing, and a multi-modal integration of testing modalities may be optimal, as individual markers are likely to not be specific for a single disorder. The continuing re-evaluation in psychiatric nosology [Niculescu et al., 2009; O'Donovan et al., 2009] brought about by recent advances will have to be taken into account as well for final interpretation of any such testing. Our emerging appreciation of the complexity, heterogeneity, overlap and interdependence of major psychiatric disorders as currently defined, and their building blocks/Lego-like nature [Le-Niculescu et al., 2007a], may make the development of tests for specific modular and dimesional disease manifestations (mood, psychosis, anxiety) [Niculescu et al., 2009] more useful and precise than those for broad diagnostic categories like bipolar disorder, schizophrenia or post-traumatic stress disorder.

Acknowledgements

This work was supported by funds from INGEN (Indiana Genomics Initiative of Indiana University), INBRAIN (Indiana Center for Biomarker Research In Neuropsychiatry), NARSAD Young Investigator Award and VA Merit Award to ABN, as well as NIMH R01 MH071912-01 to Ming Tsuang and ABN. ABN would like to thank Nicholas Schork for extensive discussions on genetic data analyses, Daniel Salomon, Sunil Kurian and Howard Edenberg for help and advice with microarray data analyses, Ming Tsuang and Steven Faraone for help and advice with translational studies, as well as Mariano Erpe, Joyti Gupta and Jesse Townes for their precise work with database maintenance and data analyses. Most importantly, we would like to thank our coworkers in the field whose painstaking work we have cited and integrated in our analyses, particularly the BiGS consortium (see Supplementary Information), as well as the subjects who participated in these studies, their families and their caregivers. Without their contribution, such work to advance the understanding of mental illness would not be possible. This work is, in essence, a field-wide collaboration.

Ancillary