Genome-Wide Association Study Confirms SNPs in SNCA and the MAPT Region as Common Risk Factors for Parkinson Disease
*Corresponding author: Eden R. Martin, John P. Hussman Institute for Human Genomics, 1501 NW 10th Avenue, BRB 305 (M-860), Miami FL 33136. Tel: 305-243-2372; Fax: 305-243-1968; E-mail: Emartin1@med.miami.edu
Parkinson disease (PD) is a chronic neurodegenerative disorder with a cumulative prevalence of greater than one per thousand. To date three independent genome-wide association studies (GWAS) have investigated the genetic susceptibility to PD. These studies implicated several genes as PD risk loci with strong, but not genome-wide significant, associations.
In this study, we combined data from two previously published GWAS of Caucasian subjects with our GWAS of 604 cases and 619 controls for a joint analysis with a combined sample size of 1752 cases and 1745 controls. SNPs in SNCA (rs2736990, p-value = 6.7 × 10−8; genome-wide adjusted p = 0.0109, odds ratio (OR) = 1.29 [95% CI: 1.17–1.42] G vs. A allele, population attributable risk percent (PAR%) = 12%) and the MAPT region (rs11012, p-value = 5.6 × 10−8; genome-wide adjusted p = 0.0079, OR = 0.70 [95% CI: 0.62–0.79] T vs. C allele, PAR%= 8%) were genome-wide significant. No other SNPs were genome-wide significant in this analysis. This study confirms that SNCA and the MAPT region are major genes whose common variants are influencing risk of PD.
Parkinson disease [PD (OMIM 168600)] is a chronic neurodegenerative disorder with a cumulative prevalence of greater than one per thousand (Kuopio et al., 1999) with at least 1.5 million cases in the United States and 6 million worldwide (Thomas & Beal, 2007). Some genetic contributions to PD are well recognised. Mutations with high penetrance were initially identified in PD due to their easily detectable effects in relatively rare, early-onset, Mendelian forms of PD (Polymeropoulos et al., 1997; Kitada et al., 1998; Bonifati et al., 2003; Paisan-Ruiz et al., 2004; Zimprich et al., 2004). These known mutations explain less than 10% of PD cases (Lesage & Brice, 2009). Over the last several years significant effort has been focused on investigating the contributions of common variants to PD risk and age-at-onset. Candidate gene approaches to identify genetic associations with PD have been used to follow up family-based genome-wide linkage studies to identify genomic regions containing risk loci (reviewed by Lesage & Brice, 2009). This focused approach restricts the number of association tests performed, but is limited to identifying loci detectable by linkage analysis (Risch & Merikangas, 1996). More recently three genome-wide association studies (GWAS) have been conducted in PD, albeit with results that did not reach genome-wide significance (Maraganore et al., 2005; Fung et al., 2006a; Pankratz et al., 2009). While genes that achieve genome-wide significance in a complex disease GWAS are important, the presence of genetic heterogeneity, unobserved environmental interactions and sub-phenotypes with distinct genetic etiologies can all reduce the apparent contribution of important genes to below the threshold for genome-wide significance, requiring that large samples be assembled for adequate power.
Associations with PD have been replicated in the candidate gene and GWAS contexts, including those described early in PD association studies, such as alpha-synuclein (SNCA, Entrez Gene ID (EGI):6622) (Kruger et al., 1999; Farrer et al., 2001; Mueller et al., 2005; Maraganore et al., 2006; McCulloch et al., 2008; Myhre et al., 2008; Sutherland et al., 2009) and the microtubule-associated protein tau (MAPT, EGI:4137) inversion region on chromosome 17 (Martin et al., 2001; Scott et al., 2001; Zappia et al., 2003; Healy et al., 2004; Kwok et al., 2004; Levecque et al., 2004; Skipper et al., 2004; Mamah et al., 2005; Fidani et al., 2006; Fung et al., 2006b; Goris et al., 2007; Vandrovcova et al., 2007; Winkler et al., 2007; Zabetian et al., 2007), as well as ubiquitin-specific protease 24 (USP24, EGI:23358) (Oliveira et al., 2005; Li et al., 2006; Haugarvoll et al., 2009), ELAV-like 4 (ELAVL4, EGI:1996) (Noureddine et al., 2005; Haugarvoll et al., 2007; DeStefano et al., 2008), monoamine oxidase B (MAOB, EGI:4129) (Kurth et al., 1993), Apolipoprotein E (APOE, EGI:348) (Rubinsztein et al., 1994), and the mitochondrial haplogroups (Kosel et al., 1998; Ross et al., 2003; van der Walt et al., 2003; Autere et al., 2004; Ghezzi et al., 2005; Huerta et al., 2005; Pyle et al., 2005; Gaweda-Walerych et al., 2008). The consistency of results, particularly for SNCA and MAPT, suggest that the failure to reach genome-wide significance in previous studies is due to the relatively small GWAS datasets.
We have conducted a GWAS of PD at the Hussman Institute for Human Genomics at the University of Miami (HIHG) in a sample of 604 unrelated cases and 619 unrelated controls using 491,376 autosomal and sex chromosome SNPs. To increase the power of the analysis we included data from two of the three previous PD GWAS: The National Institute of Neurological Disorders and Stroke (NINDS) (Fung et al., 2006a), and the joint dataset from the Progeni/GenePD studies that was genotyped at the Center for Inherited Disease Research (CIDR) (Pankratz et al., 2009), and excluding the Mayo clinic GWAS which used discordant sibling pairs as subjects for analysis (Maraganore et al., 2005). This provided a joint analysis dataset with a combined sample size of 1752 cases and 1745 controls with genotypes at 495, 715 SNPs after imputation and sample and SNP quality control procedures.
We demonstrate in our Caucasian-based population that the SNCA and MAPT regions are the strongest genetic contributors to PD risk, reaching genome-wide significance and establishing these factors without controversy. In addition, several genes replicated in all three datasets, but with less stringent significance. Although they did not achieve genome-wide significance in the joint analysis, the consistency of their effects makes them strong candidates and may provide additional insight into the pathological mechanisms of PD.
Materials and Methods
Samples in the HIHG GWAS include individuals with PD collected by one of 13 ascertainment centers in the PD Genetics Collaboration (Scott et al., 2001) or by the Morris K. Udall Parkinson Disease Center of Excellence (J.M. Vance, PI) ascertainment core. These participants were recruited by participating movement disorder and neurology clinics, referrals, and advertisements. Unaffected spouse and friend controls were recruited when available and willing to participate. All participants provided written informed consent, in accord with protocols established by institutional review boards at each centre.
All individuals with PD were examined by a board-certified neurologist. A neurological exam and standard clinical evaluation was performed on all participants with PD. Affected individuals exhibited at least two cardinal symptoms of PD, e.g. bradykinesia, resting tremor, and rigidity and no other causes of parkinsonism or atypical clinical features. Unaffected individuals had no symptoms of PD upon physical examination and self-reported symptom questionnaire (Rocca et al., 1998). Individuals were excluded if there was a history of encephalitis, neuroleptic therapy within one year before diagnosis, evidence of normal pressure hydrocephalus, or a clinical course with unusual features suggesting atypical or secondary Parkinsonism. Additionally, a blood sample, family history, medical history, and standard cognitive test (Blessed Orientation Memory Concentration (BOMC) test (Katzman et al., 1983) or Modified Mini Mental Status exam (3MS) (Folstein et al., 1975)) were obtained for each individual. To ensure diagnostic consistency across sites, clinical data for all participants were reviewed by a panel consisting of a board-certified neurologist with fellowship training in movement disorders, a board certified neurologist and medical geneticist, and a certified physician assistant.
Genotypes for 635 PD cases and 255 PD controls were generated using the Illumina Infinium 610-quad BeadChip (Illumina, San Diego, CA, USA) and the Illumina Infinium II assay protocol (Gunderson et al., 2005). Additionally, we included 223 cognitively-normal controls with no PD symptoms by self-reported symptom questionnaire (Rocca et al., 1998) from a previous GWAS (Beecham et al., 2009) of late-onset Alzheimer disease (LOAD) genotyped using the Illumina HumanHap 550 BeadChip, and another 164 cognitively-normal controls from a second LOAD study with no self-reported PD symptoms by questionnaire (Rocca et al., 1998) genotyped using the 1M-Duo Infinium HD BeadChip. Genotypes were determined using Illumina BeadStudio Genotyping Module version 3.2.33; samples with 99% genotyping efficiency were used to redefine genotype clusters, per the manufacturer's recommendation. Concordance of genotype calls for two CEPH samples with six replicates each was 99.98%.
For the HIHG data, samples with genotyping efficiency of greater than 98% were included in subsequent quality control (QC) and statistical analysis steps. One case and seven control samples were removed for low efficiency. Population stratification was assessed using Structure (Pritchard et al., 2000; Falush et al., 2003) and Eigenstrat (Price et al., 2006). For Structure analysis, 5000 independent autosomal SNPs with minor allele frequency (MAF) > 0.25 were chosen using PLINK (Purcell et al., 2007) with an r2 threshold of 0.2 using 10,000 iterations of burn-in and 15,000 iterations of estimation. These analyses indicated that no stratification was present in our sample (Fig. S1a). For Eigenstrat analysis, 30,000 independent autosomal SNPs with MAF > 0.25 were used to generate plots of principal component loadings for samples and to remove outliers (6 cases, 7 controls) using the top ten principal components over five iterations with a threshold of six standard deviations (Fig. S1b).
Further quality control steps for the HIHG samples included checks for duplicated and related samples using mean identity by state (22 cases were removed as duplicate samples). Two cases and nine controls were identified for excess or insufficient heterozygosity at sex chromosomes and were removed. Finally SNPs with MAF < 0.01 in the combined case-control dataset were removed (21, 976 SNPs), SNPs with significant deviation from Hardy-Weinberg equilibrium (HWE) in controls were removed (78 SNPs with p < 1.00 × 10−7), and differential missing rate between cases and controls (29 SNPs with p < 1 × 10−5) were removed.
Since controls from the LOAD GWAS dataset were genotyped on different Illumina chips, we examined all controls for homogeneity of genotype frequencies. This filter is used to detect where genotyping error rates differ across studies, and is a critical QC step when multiple genotyping experiments are combined into a single data set for analysis. A 4 degree of freedom (df) Fisher's exact test with 10,000 permutations was used at all SNPs to test frequency differences for genotypes generated from the three genotyping chips in the study. SNPs were removed if the p-value for this test was less than 0.001 (546 SNPs). After all QC procedures, 604 cases and 619 controls with 491,376 SNPs were available for association analysis in the HIHG GWAS dataset. The sex and age distributions for these samples are described in Table 1.
Table 1. Demographic properties of the HIHG, CIDR and NINDS samples.
Data were downloaded from dbGAP (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap) for the CIDR (Pankratz et al., 2009) and NINDS (Fung et al., 2006a) PD GWAS. Imputation of SNP genotypes from these and the HIHG data was performed independently using the software package Impute (Marchini et al., 2007). Samples were removed from these GWAS for genotyping efficiency < 0.98 (CIDR: 9 cases, 4 controls removed; NINDS 11 cases, 9 controls removed). SNP data for these GWAS were removed for genotyping efficiency < 0.98 (CIDR: 7886 SNPs removed; NINDS: 16,886 SNPs removed), MAF < 0.01 (CIDR: 7,676 SNPs removed; NINDS: 4596 SNPs removed), HWE p < 10−7 (CIDR: 790 SNPs removed; NINDS: 67 SNPs removed), and differential missingness by disease status p < 10−5 (CIDR: 29 SNPs removed). Eigenstrat and Structure were used to ensure that the merged dataset did not contain stratified samples (Figs. S2a, b). The Fisher's exact test for homogeneity described above was run in controls for all 5 genotyping platforms using an 8 df test, removing SNPs for p < 0.001. Imputation was performed on each dataset independently after QC filters had been applied. The remaining high-quality samples and genotypes from each study were used to impute SNPs based on the HapMap reference panel. Individual genotypes with probability of 90% or greater were included. Imputed SNPs with greater than 5% missing genotypes were excluded from analysis. Upon merging the imputed GWAS files in PLINK, HWE, MAF, SNP genotyping efficiency, and tests for differential missing rate by phenotype filters were applied as described above. The final merged dataset for analysis contained 1752 cases and 1745 controls with genotypes at 495,715 genotyped and imputed SNPs.
Association analysis of the genotype data was conducted with PLINK (Purcell et al., 2007). Cochran-Armitage (Armitage, 1955) trend tests were calculated at each SNP to assess allelic association. Additional analyses evaluating dominant, recessive and genotypic exposures were performed using logistic regression in PLINK. To avoid over-correcting for multiple comparisons by using the conservative Bonferroni correction, we used PLINK to generate empirically adjusted p-values based on 10,000 permutations to adjust for multiple tests (Purcell et al., 2007). Additional logistic regression analysis fitting covariates for age of onset/age at exam, sex or history of smoking was conducted to assess confounding by these variables at all SNPs. If there was an indication that an individual had reported any history of smoking, they were scored as a smoker, a non-smoker if they reported not smoking, and missing otherwise.
The population proportional attributable risk (PAR%) was calculated using the formula for retrospective studies, [(SNP allele frequency)×(OR-1)]/[1+(SNP allele frequency)×(OR-1)], using the SNP allele frequency in controls as the exposure frequency, and the odds ratio (OR) in place of the relative risk (Woodward, 2005).
Assessment of associated SNPs and the presence of chromosome 17q21.31 alleles in the H1-H2 haplotype clades in MAPT was accomplished using rs1981997 as a haplotype tag SNP, because the major (G) and minor (A) alleles of this SNP are fixed in the H1 and H2 clades respectively (Stefansson et al., 2005). A 2-locus haplotype association analysis was performed using PLINK with SNPs in the MAPT region to determine which alleles were on the H1 haplotype, which has previously been associated with PD Golbe et al., 2001; Maraganore et al., 2001; Martin et al., 2001; Farrer et al., 2002; Skipper et al., 2004; Zabetian et al., 2007; Refenes et al., 2009).
In addition to the statistical analysis of the joint sample, as a final check against cryptic bias arising due to genotyping error rate heterogenetity across studies, we evaluated association with meta-analysis techniques using METAL (Abecasis & Willer, 2007).
In the HIHG sample no SNPs were statistically significantly associated with PD at the genome-wide level using permutation tests with a multiple testing corrected threshold of p < 0.05 (Table 2). SNCA showed the strongest association in the HIHG study (SNCA intron, rs356220 p = 2.7 × 10−6, OR = 1.48; 95% CI [1.25–1.74]). Other top associations included SNPs in the chromosome 1p22 gene chloride channel accessory 4 (CLCA4, EGI:22802, intron, rs1543467 p = 3.0 × 10−6, OR = 1.51; 95% CI [1.27–1.80]), the chromosome 3q26 gene neuroligin 1 (NLGN1, EGI:22871, intron, rs976683 p = 1 × 10−5, OR = 1.49; 95% CI [1.25–1.78]) and the chromosome 20p11 gene solute carrier family 24, member 3 (SLC24A3, EGI:57419, intronic SNPs, rs1406968 p = 1.2 × 10−5, OR = 1.47; 95% CI [1.24–1.75]; rs4813368 p = 1.2 × 10−5, OR = 1.48; 95% CI [1.24–1.76]; r2 for rs1406968:rs4813368 = 0.94). Neither sex nor age-at-onset (age-at-exam in controls) were confounders for the top associations in the HIHG sample.
Table 2. Trend test of association with PD results from the HIHG GWAS sample and exact Hardy-Weinberg test results for all SNPs associated with PD at p < 5 × 10−5 or less.
|rs356220||2.67 × l0−6||0.930||4q22.1||SNCA|
|rsl543467||2.97 × l0−6||0.117||lp22.3||CLCA4|
|rsl2063142||5.02 × l0−6||0.468||lp36.13||—|
|rs9513249||5.92 × l0−6||0.011||13q32.1||—|
|rsl2870589||9.45 × l0−6||0.002||13q32.2||—|
|rs976683||1.04 × l0−5||0.529||3q26.31||NLGN1|
|rsl816879||1.16 × l0−5||0.784||15q22.1||—|
|rsl406968||1.16 × l0−5||0.705||20p11.23||SLC24A3|
|rs4813368||1.22 × l0−5||0.701||20p11.23||SLC24A3|
|rs7646773||1.30 × l0−5||0.865||3pll.2||—|
|rs1992695||1.34 × l0−5||0.422||4q28.3||—|
|rs7322222||1.35 × l0−5||0.061||13q32.1||—|
|rsl0851073||1.44 × l0−5||0.001||13q32.2||—|
|rs4238458||1.52 × l0−5||0.407||15q24.1||—|
|rsl1625012||1.61 × l0−5||0.507||14q22.1||—|
|rs7698161||1.76 × l0−5||0.738||4q28.3||—|
|rs7496513||1.85 × l0−5||0.402||15q24.1||—|
|rsl35066||1.91 × l0−5||1.000||22ql3.2||MPPED1|
|rsl3032621||1.95 × l0−5||1.000||2q37.2||—|
|rs9457743||2.03 × l0−5||0.931||6q25.3||—|
|rsl 159278||2.21 × l0−5||0.025||13q32.1||RAP2A|
|rsl 2142266||2.31 × l0−5||0.406||lp22.3||—|
|rs358079||2.99 × l0−5||0.794||3pl4.3||CACNA2D3|
|rsl3411180||3.09 × l0−5||0.832||2q31.2||ZNF533|
|rsl3009601||3.13 × l0−5||0.848||2q31.2||ZNF533|
|rs6930229||3.14 × l0−5||1.000||6q25.3||—|
|rsl 515274||3.39 × l0−5||0.850||2q31.2||ZNF533|
|rsl3157||3.52 × l0−5||1.000||lp36.11||RUNX3|
|rs929708||3.70 × l0−5||0.378||3p25.3||ATP2B2|
|rsl0882088||3.89 × l0−5||0.612||10q23.33||KIF11|
|rsll833635||4.09 × l0−5||0.743||12ql5||PTPRR|
|rs6798732||4.51 × l0−5||0.736||3pll.2||—|
|rs4777585||4.57 × l0−5||0.496||15q24.1||NEOl|
|rsl2938031||4.61 × l0−5||1.000||17q21.31||CRHR1|
|rs6598020||4.63 × l0−5||0.270||llpl5.5||TMEM16J|
|rs4927602||4.79 × l0−5||0.936||2p25.3||SNTG2|
Two SNPs were statistically significant after empirical adjustment for multiple comparisons (permutation p < 0.05) in the joint analysis of the HIHG, CIDR and NINDS samples (Table 3, Table S1); notably these SNPs were genotyped, not imputed, in all three datasets. These SNPs are in the genes pleckstrin homology domain-containing protein, family M, member 1 (PLEKHM1, EGI:9842, promoter, rs11012 p = 5.6 × 10−8, OR = 0.71; 95% CI [0.62–0.79]), and alpha-synuclein (SNCA intron, rs2736990 p = 6.7 × 10−8, OR = 1.30; 95% CI [1.18–1.43]). In addition to rs2736990 in SNCA, there were two SNPs with 9 × 10−5 > p-value > 1 × 10−5 (rs11931074, rs356220), two SNPs with 9 × 10−4 > p-value > 1 × 10−4 (rs365188, rs1866995), two with 9 × 10−3 > p-value > 1 × 10−3 (rs2583985, rs3775439) as well as three more SNPs with p < 0.05 (rs3857059, rs12502363, rs3822095), out of 15 total SNCA SNPs. The SNP rs356220, which was the most associated SNP from the HIHG study, had a p-value of 9.7 × 10−5 in the joint sample. The LD in SNCA is weaker than that at chromosome 17q21.31, due to the large inversion in the 17q region which inhibits recombination. The 3’ terminus of PLEKHM1 is 400 kilobases from the 3’ terminus of MAPT on chromosome 17, and rs11012 is in linkage disequilibrium (LD; mean r2= 0.75; Fig. S3) with SNPs in MAPT. Strong association (p < 1 × 10−5) with PD spans the entire region around MAPT, including SNPs in intramembrane protease 5 (IMP5 exonic SNPs, rs12373139 p = 2.9 × 10−6, OR = 0.75; 95% CI [0.67–0.85]; rs12185268 p = 3.6 × 10−6 OR = 0.76; 95% CI [0.67–0.85]), MAPT intron (rs8070723 p = 4.5 × 10−6, OR = 0.76; 95% CI [0.67–0.85]) and chromosome 17 open reading frame 69 (C17orf69 exon:Y132C, rs393152 p = 3.5 × 10−6, OR = 0.76; 95% CI [0.67–0.85]). The smallest-to-largest ranking of the p-values for association with these SNPs and PD varied greatly within the three datasets (rs11012 ranked 290 in HIHG, 11 in CIDR and 78,027 in NINDS; rs2736990 ranked 38 in HIHG, 440 in CIDR and 41,827 in NINDS) illustrating that real associations might be deep within the ranked associations in underpowered GWAS studies, and increased sample size will help true association rise to the top. Neither sex nor age-at-onset (age-at-exam in controls) were confounders for the top associations in the joint analysis. Additionally, no SNP was significant after correction for multiple tests in the dominant, recessive or genotypic analyses.
Table 3. Results for SNPs with p < 1 × 10−6 from the joint analysis with imputation of the HIHG, CIDR and NINDS GWAS data.
|rs11012||17q21.31||PLEKHM1||GGG||6.58 × 10−4||4.40 × 10−5||0.188||5.65 × 10−8||0.011|
|rs2736990||4q22.1||SNCA||GGG||5.08 × 10−5||8.67 × 10−4||0.102||6.74 × 10−8||0.014|
|rs12063142||1p36.13||TAS1R2||GGG||6.41 × 10−6||0.032||0.033||4.83 × 10−7||0.159|
|rs4837628||9q33.1||DBC1||GGG||4.83 × 10−3||5.66 × 10−5||0.984||1.03 × 10−6||0.281|
|rs11248060||4p16.3||DGKQ||GGG||0.244||2.05 × 10−5||0.005||2.26 × 10−6||0.509|
|rs1981997||17q21.31||MAPT||GGG||4.96 × 10−3||1.37 × 10−4||0.733||2.45 × 10−6||0.539|
|rs12373139||17q21.31||IMP5||GGI||4.96 × 10−3||1.92 × 10−4||0.349||2.85 × 10−6||0.602|
|rs10464059||5q35.3||–||III||1.86 × 10−4||4.78 × 10−4||0.939||2.91 × 10−6||0.610|
|rs13355682||5q35.3||–||III||1.86 × 10−4||4.78 × 10−4||0.888||3.16 × 10−6||0.640|
|rs393152||17q21.31||C17orf69||GGG||5.86 × 10−3||2.58 × 10−4||0.281||3.46 × 10−6||0.674|
|rs12185268||17q21.31||IMP5||GGG||4.96 × 10−3||2.52 × 10−4||0.349||3.62 × 10−6||0.689|
|rs8070723||17q21.31||MAPT||GGG||5.86 × 10−3||2.30 × 10−4||0.392||4.49 × 10−6||0.723|
|rs611199||9q33.1||DBC1||GGG||5.43 × 10−3||9.61 × 10−4||0.212||7.65 × 10−6||0.908|
|rs17080196||5q35.3||GFPT2||III||8.35 × 10−4||1.52 × 10−3||0.633||7.73 × 10−6||0.911|
|rs1635291||17q21.31||–||GGG||7.27 × 10−3||1.89 × 10−4||0.517||7.76 × 10−6||0.912|
|rs7703402||5q35.3||GFPT2||III||1.57 × 10−4||1.06 × 10−3||0.192||8.41 × 10−6||0.936|
|rs6864729||5q35.3||GFPT2||III||8.35 × 10−4||1.52 × 10−3||0.696||8.87 × 10−6||0.945|
|rs2303012||5q35.3||GFPT2||III||9.09 × 10−4||1.52 × 10−3||0.696||9.49 × 10−6||0.951|
|rs974002||14q13.1||NPAS3||GGG||6.29 × 10−3||7.37 × 10−3||0.017||9.85 × 10−6||0.954|
Haplotype analysis with associated chromosome 17q21.31 SNPs and the H1-H2 haplotype clade tag SNP rs1981997 revealed high D’ and r2 values for all SNPs in the region, and alleles at those SNPs with positive effect sizes were in strong LD with the G allele of rs1981997, indicating their presence in the H1 haplotype clade (Table S3, Fig. S3).
Additional strong association signals in the joint data were observed in SNPs in the chromosome 9q33 gene deleted in bladder cancer 1 (DBC1, EGI:1620, intron, rs4837628 p = 1.07 × 10−6, OR = 0.79; 95% CI [0.72–0.87]), the chromosome 14q13 gene neuronal PAS domain protein 3 (NPAS3, EGI:64067, intron, rs974002 p = 9.9 × 10−6, OR = 1.32; 95% CI [1.17–1.49]), and in imputed SNPs in the chromosome 5q35 gene glucose fructose-6-phosphate transaminase 2 (GFPT2, EGI:9945, intronic SNPs, rs17080196 p = 7.7 × 10−6, OR = 0.77; 95% CI 0.69–0.87; rs7703402 (promoter) p = 8.4 × 10−6, OR = 0.76; 95% CI [0.66–0.85]; rs6864729 p = 8.9 × 10−6, OR = 0.77; 95% CI [0.69–0.87]; rs2303012 9.5 × 10−6, OR = 0.77; 95% CI [0.69–0.87]; Table 3, Table S1). However, none of these associations survive multiple testing adjustments.
Weaker associations in the joint data that replicated at an uncorrected p≤0.05 with consistent effect directions in each of the three GWAS were observed at several loci (Table 4, Table S2). These associations were observed at rs12063142 in the chromosome 1p36.13 gene taste receptor 1, member 2 (TAS1R2, intron, rs12063142 p = 4.83 × 10–7, OR = 0.76; 95% CI [0.69–0.85]), rs974002 in NPAS3, the chromosome 8q21 gene matrix metallopeptidase 16 (MMP16, EGI:4325, intron, rs3851539 p = 3.8 × 10−5, OR = 1.21; 95% CI [1.11–1.34]), the chromosome 12q24 gene kinase suppressor of ras 2 (KSR2, EGI:283455, intron, rs7960736 p = 6.3 × 10−5, OR = 1.22; 95% CI [1.11–1.34]), the chromosome 14q24 SNP rs11159221 (p = 1 × 10−4, OR = 0.79; 95% CI [0.69–0.89]), the chromosome 2q31 gene WAS/WASL-interacting protein family, member1 (WASPIP, EGI:7456, intron, rs1991601 p = 3 × 10−4, OR = 1.23; 95% CI [1.09–1.38]), the chromosome 11q32 gene family with sequence similarity 55, family A (FAM55A, EGI:120400, intron, rs1080074 p = 4 × 10−4, OR = 1.29; 95% CI [1.12–1.48]), and the chromosome 15q22 gene RAR-related orphan receptor A (RORA, EGI:6095, intron, rs1863270 p = 5 × 10−4, OR = 1.19; 95% CI [1.08–1.33]).
Table 4. Association results that replicate in the HIHG, CIDR and NINDS studies at the p < 0.05 level with effects in the same direction. All SNPs were genotyped.
|rs12063142||1p36.13||TAS1R2||6.41 × 10−6||0.032||0.033||4.83 × 10−7|
|rs974002||14q13.1||NPAS3||6.32 × 10−3||7.39 × 10−3||0.017||9.85 × 10−6|
|rs3851539||8q21.3||MMP16||7.86 × 10−3||0.015||0.034||3.81 × 10−5|
|rs7960736||12q24.23||KSR2||0.026||0.022||6.94 × 10−3||6.26 × 10−5|
|rs11159221||14q24.3||–||0.024||0.045||5.21 × 10−3||1.32 × 10−4|
|rs1991601||2q31.1||WIPF1||0.019||0.043||0.043||3.26 × 10−4|
|rs1080074||11q23.2||FAM55A||0.038||0.036||0.035||3.77 × 10−4|
|rs1863270||15q22.2||RORA||0.047||0.044||0.032||5.33 × 10−4|
Results from each GWAS and the joint analysis for SNPs in the top 10 genes from the PDgene database (http://www.pdgene.org) are summarised in Table 5. The PDgene database is a free online meta-analysis summary of PD genetic association studies. No associations with p < 2.00 × 10−3 were observed in genes other than SNCA and MAPT. The strongest association among all SNPs in these genes was in monoamine oxidase B (MAOB intron, rs209766 p = 0.0026, OR = 0.86; 95% CI [0.76–0.96]). Weaker associations were also detected in leucine-rich repeat kinase 2 (LRRK2, EGI:120892, intron, rs1907632 p = 0.0321, OR = 1.15; 95% CI [1.01–1.30]), USP24 (intron, rs12065953 p = 0.0078, OR = 1.46; 95% CI [1.10–1.94]), and ELAVL4 (intron, rs17105974 p = 0.0418, OR = 0.82; 95% CI [0.65–0.99]).
Table 5. Summary of results for the top 10 genes in order from PDgene (June 2009) for each GWAS dataset for the SNP in each gene with the smallest joint analysis p-value.
|SNCA||rs2736990||5.08 × 10−5||8.67 × 10−4||0.102||6.70 × 10−8||15||112,876|
|MAPT||rs1981997||4.96 × 10−3||1.37 × 10−4||0.733||2.50 × 10−6||15||134,002|
|USP24||rs12065953||8.96 × 10−3||0.195||0.129||7.81 × 10−3||2||149,006|
|ELAVL4||rs17105974||3.39 × 10−3||0.059||0.142||0.042||10||153,854|
Results from the meta-analysis of the results from the three GWAS were consistent with those observed in the joint analysis (Data not shown. Pearson's correlation coefficient for p-values from the two analyses = 0.95). The SNPs rs11012 (meta-analysis p-value = 7.33 × 10−8) in the chromosome 17q21.31 region and rs2736990 (meta-analysis p-value = 7.88 × 10−8) in SNCA were the most strongly associated SNPs in the meta-analysis.
This study unambiguously identifies SNCA and MAPT as risk factors for idiopathic PD. The power to declare association at genome-wide levels of significance in any of these GWAS is low for the effects observed in this study; however, in aggregate these datasets provide conclusive evidence for major risk genes and highlight other genes of interest. Our study illustrates the utility of freely available online data resources and large collaborative studies that are better powered to detect associations with modest effects in GWAS data. It is only with joint analysis of three independent GWAS datasets that SNPs in the SNCA and the MAPT region reached genome-wide significance. The exact mechanisms by which variation in SNCA and MAPT influence risk for PD are unknown (Devine & Lewis, 2008). Joint analysis with trend tests of allelic association also suggests other candidates that show strong, but not genome-wide evidence of association: DBC1, NPAS3, and GFPT2.
The SNCA gene product α-synuclein (AS) is an abundant brain protein which is localised in axon terminals where it may mediate synaptic processes (Cabin et al., 2002). PD is considered a synucleinopathy in which aggregations of precipitated filamentous AS (which form Lewy bodies) are a common finding in PD brains at autopsy (Lewy, 1912; Tretiakoff, 1919; Okazaki et al., 1961; Kosaka et al., 1976; Spillantini et al., 1997). Mutations in SNCA have been shown to cosegregate with PD with an autosomal dominant mode of inheritance (Polymeropoulos et al., 1997; Kruger et al., 1998; Singleton et al., 2003; Chartier-Harlin et al., 2004; Ibanez et al., 2004; Zarranz et al., 2004). Other variation in SNCA has been reported to increase the risk of PD (Kruger et al., 1999).
The MAPT protein (tau) is critical for the assembly and stabilisation of the microtubule network, which is essential for axonal transport in neurons (Garcia & Cleveland, 2001). Neurological disorders where aggregations of tau are observed in brain tissue are known as tauopathies. The most common tauopathy is AD, where hyperphosphorylated tau accumulates in intraneuronal neurofibrillary tangles (NFTs) (Goedert et al., 1989). Additionally, progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD) are neurodegenerative diseases characterised by tau deposition in neurons and glia (Litvan et al., 1996; Dickson et al., 2002) and parkinsonism. A further set of disorders referred to as frontotemporal dementia and parkinsonism linked to chromosome 17 (FTDP-17) is also characterised by the abnormal deposition of tau (Ingram & Spillantini, 2002). Haplotypes in the chromosome 17q21.31 region have been associated with progressive supranuclear palsy (PSP), corticobasal degeneration (CBD) (Pittman et al., 2005), and PD (Golbe et al., 2001; Maraganore et al., 2001; Martin et al., 2001; Farrer et al., 2002; Healy et al., 2004; Skipper et al., 2004; Zabetian et al., 2007; Refenes et al., 2009). In the current study we report haplotypic associations of alleles on the H1 haplotype clade which is consistent with previous investigations of PD, PSP, and CBD. The H1 clade has also been shown to have higher expression levels of tau than H2 (Kwok et al., 2004), which may provide some insight into PD pathology as previous studies have shown that tau mediates and promotes the polymerisation of AS (Giasson et al., 2003).
The gene Diacylglycerol kinase, theta (DGKQ) was previously observed to be associated with PD in the Pankratz et al. study (Pankratz et al., 2008). This gene is thought to be active in the phosphatidylinositol signaling pathway and is expressed in the brain.
Interestingly, associations between SNPs in NPAS3 and smoking cessation have been observed in two of three GWAS datasets investigating genetic differences between the ability of individuals to quit smoking (Uhl et al., 2008). Given the known relationship between smoking and PD (Dorn, 1959; Allam et al., 2004), this gene may merit further investigation in PD studies. We saw no evidence of confounding by or interaction with history of smoking at NPAS3 SNPs (data not shown).
The HIHG data and the CIDR data were analysed for interactions with and confounding by history of smoking. Although no SNPs were significant after correction for multiple tests (data not shown), these and other exposures may explain or modify genetic susceptibilities to PD. Previous research in PD families has observed interactions with smoking and the genes nitric oxide synthase 2A (NOS2A) (Hancock et al., 2006), glutathione S-transferase omega 1 (GSTO1) (Wahner et al., 2007) and SNCA (McCulloch et al., 2008). This study only included two NOS2A SNPs, neither of which were included in the previous study, and which do not provide sufficient coverage of the gene to evaluate the previous findings of interaction. Similarly, the report of interaction with smoking with the Rep1 variant in SNCA was not replicable with the SNPs in this study. The interaction with history of smoking and GSTO1 at rs4925 was not replicated.
DBC1 is a gene that has been observed to be deleted in some bladder cancer cell lines (Habuchi et al., 1998). The DBC1 gene product is detectable in several tissues, including brain, spinal cord, cortex and cerebellum (http://www.genecards.org/). The DBC1 protein inhibits cell proliferation by negative regulation of the G1/S transition (Nishiyama et al., 2001). Additionally, this protein mediates non-apoptotic cell death (Wright et al., 2004), and is a regulator of components of the plasminogen pathway (Louhelainen et al., 2006). The relationship between PD and DBC1 is not obvious based on the known biology of this gene; however, the cell-death phenotypes for this gene in sensitive brain regions might play a role in PD pathophysiology.
It should also be noted that the GFPT2 results were obtained by imputation in the three GWAS samples used here. These genotypes are likely to be assigned with higher error rates than assayed SNPs. GFPT2 is the rate-limiting enzyme for the entry of glucose into the hexosamine biosynthesis pathway and is thus a biologically important gene (Oki et al., 1999). Energy production in the central nervous system is a mixture of glycolysis and electron transport in the mitochondrion. As mitochondria are clearly affected in PD (Beal, 2007; Schapira, 2008), enzymes such as GFPT2 may become increasingly important in energy production and thereby cell survival. Additionally, variants in this gene have been associated with type II diabetes mellitus (T2DM) and diabetic nephropathy in Caucasian and African American samples (Zhang et al., 2004); T2DM has been suggested as a risk factor for PD (Hu et al., 2007; Driver et al., 2008).
Despite the intriguing results of our study, there are some limitations. The error rate of imputed genotypes is likely to be higher than that of assayed genotypes, although this effect should be mild for imputed genotypes with high posterior probabilities (Marchini et al., 2007). It is possible that PD is composed of several sub-phenotypes, each with a distinct genetic etiology. This analysis would have optimal power to detect only associations that underlie the common features of the trait definitions for all three GWAS. The power of the joint analysis is also not optimal for the magnitude of effects seen in this study (≥80% power for the combined sample at α= 1 × 10−7 and MAF 0.25 for odds ratios <0.59 or >1.59; ≥80% power for the HIHG sample at α= 1 × 10−7 and MAF 0.25 for odds ratios <0.39 or >2.14); however, with the addition of larger PD studies more genes might be conclusively associated. These results support the meta-analysis of Pankratz et al. of the CIDR GWAS results and the NINDS GWAS results to the extent that the MAPT region is prominently represented in the top associations (Pankratz et al., 2009); however, the overall similarity among top hits is sparse. A final limitation is the lack of detailed covariate data across datasets for exposures such as smoking, pesticide exposure and coffee drinking for model adjustment and gene x environment interaction analysis. This is an important point, as the value of these datasets would increase greatly with the inclusion of well-documented environmental and sub-phenotype data, such as diabetes status, depression and dementia. Future studies of PD should include important environmental factors in the data collection protocol.
In conclusion, SNCA and MAPT are significantly associated at a genome-wide level with idiopathic PD. Several other biologically plausible genes are associated with PD but do not meet the genome-wide significance threshold. PD is a complex phenotype with substantial variation in age of onset and clinical course. It is possible that more genes are associated with subsets of PD cases or are associated with more modest effects than those detected here. These results suggest that observing additional PD-gene associations will require the pooling of datasets to achieve very large samples of thousands of cases and controls from which to obtain statistically significant p-values for loci of more modest effects after correction for multiple tests. Collaborative projects with coordinated ascertainment are necessary to advance PD genetic epidemiology.
Conflict of Interest
We are grateful to the patients and control subjects who participated in this study. We thank the members of the PD Genetics Collaboration: Martha A. Nance, Ray L. Watts, Jean P. Hubble, William C. Koller, Kelly Lyons, Rajesh Pahwa, Matthew B. Stern, Amy Colcher, Bradley C. Hiner, Joseph Jankovic, William G Ondo, Fred H. Allen, Jr., Christopher G. Goetz, Gary W. Small, Donna Masterman, Frank Mastaglia, and Jonathan L. Haines who contributed subjects to this study. Some of the samples used in this study were collected while the Udall PDRCE was at Duke University. This work was supported by National Institutes of Health grant NS39764. We also thank the investigators from the Pankratz et al. (2009) (accession number: phs000126.v1.p1), and the Fung et al. (2006a) (accession number: phs000089.v1.p1) studies for making their data available on dbGAP.