Replication of GWAS Associations for GAK and MAPT in Parkinson's Disease
Corresponding author: Shannon L. Rhodes, Department of Epidemiology, UCLA School of Public Health, 650 Charles E. Young Drive, Box, 951772, Los Angeles, California 90095–1772. Tel: 310 206 7458; Fax: 310 206 6039; E-mail: firstname.lastname@example.org
In the investigation of disease aetiology, the genome-wide association study (GWAS) provides a hypothesis-free investigation of the broader human genome and, as with all scientific investigations, replication is essential to validate any findings. To date, six GWAS have been performed to investigate the influence of common genetic variation in Parkinson's disease (PD) and only two associations have been replicated: alpha synuclein (SNCA) and microtubule-associated protein tau (MAPT), both PD candidate genes before GWAS. In our population-based study, we genotyped four of the top single-nucleotide polymorphisms (SNPs) from a previous study. By using the identical analytic method and genetic model in our independent sample, we provide evidence for replication of rs1724425 near MAPT (OR = 0.74, P= 0.0163) and rs1564282 in cyclin G-associated kinase (GAK; OR = 1.61, P= 0.0151); rs3775478 of multimerin 1 (MMRN1) (P= 0.30) and rs356229 of SNCA (P= 0.14) did not replicate in our study population. While MAPT has been considered a PD candidate gene and has been observed in association with PD in other GWAS, GAK is a new candidate for investigation in future studies.
Parkinson's disease (PD; OMIM #168600), a debilitating and progressive neurodegenerative disorder characterized by motor (rigidity, tremor, bradykinesia) and nonmotor (sleep disturbances, constipation, depression, autonomic dysfunction) symptoms, contributes substantially to disability and loss of quality of life in greater than 1% of people over age 50. There is currently no preventive or curative therapy for PD, and medications merely attempt to ameliorate symptoms. While many rare genetic factors have been identified in studies of small and large pedigrees containing multiple subjects with PD, the genetic susceptibilities underlying idiopathic, typically later-onset PD remain largely unidentified.
In the investigation of disease etiology, the genome-wide association study (GWAS) provides a hypothesis-free method to interrogate the broader human genome and has been informatively applied to numerous common complex diseases (e.g., Wellcome Trust Case Control Consortium, 2007). As stated by the NCI-NHGRI Working Group on Replication in Association Studies “as the transition to genome-wide association studies occurs, the challenge will be to separate true associations from the blizzard of false positives attained” (Chanock et al., 2007). To date, six GWAS of PD have been published, five of which utilized US or European Caucasian subjects. In the case of the first (Maraganore et al., 2005), second (Fung et al., 2006), and a meta-analysis (Evangelou et al., 2007) of the first and second PD GWAS, the top single-nucleotide polymorphism (SNP) findings did not replicate (Elbaz et al., 2006; Evangelou et al., 2010). Two more recent PD GWAS, one in Caucasians (Simón-Sánchez et al., 2009) and the other in Asians (Satake et al., 2009), included replication samples in their primary publication, as well as replicating each others’ findings for two alpha synuclein (SNCA) SNPs (rs11931074 and rs3857059) and a new PD locus, PARK16, although the latter is infrequent (minor allele frequency (MAF) 3–4%) in Caucasians. The most recently published PD GWAS (Edwards et al., 2010) pooled data from 635 PD cases and 612 cognitively normal controls from three different study populations with two prior GWAS (Fung et al., 2006; Pankratz et al., 2009). This effort reported a genome-wide significant association between PD and rs2736990 in SNCA, a variant also observed to be highly significant in the Simón-Sánchez et al. (2009) GWAS; but, to our knowledge, no published studies have provided replication for other top findings of Pankratz et al. (2009).
According to the NCI-NHGRI Working Group on Replication in Association Studies (Chanock et al., 2007) specific factors that contribute to a quality replication study include independence of the datasets, sufficient sample size, similarity of phenotype definition and study population, and genetic variant assessed. The Parkinson's Environment and Genes (PEG) study is a case–control study of PD that has enrolled, predominately Caucasian, incident PD cases, and population-based controls from a three county region of central California. Similar to other studies of PD, our cases have been evaluated by a neurologist according to established criteria; unlike many GWAS studies, our controls have been drawn from the same region from which our cases arose, likely providing a comparison group more appropriate than those constructed from databanks or other sources. In the PEG study, we have specifically genotyped four of the top SNPs from the GenePD/PROGENI (Pankratz et al., 2009) GWAS in an attempt to provide replication for those findings.
Materials and Methods
Written informed consent was obtained from all enrolled subjects, and all procedures were approved by the University of California at Los Angeles (UCLA) Human Subjects Committee. Subject recruitment methods have been published previously (Ritz et al., 2009), and case definition criteria have been described in detail elsewhere (Kang et al., 2005). Briefly, incident PD cases (diagnosis within 3 years of enrolment) were recruited between January 2001 and December 2007 through neurologists, large medical groups, and public service announcements in a three county (Fresno, Kern, and Tulare) area of central California. Cases were examined by UCLA movement disorder specialists at least once and confirmed as having clinically “probable” or “possible” PD according to published criteria (Hughes et al., 1992). Population-based controls were recruited from the same three counties as cases, initially utilizing Medicare lists and later, after implementation of the Health Insurance Portability and Accountability Act, from randomly selected residential parcels identified from publicly available tax-collector records providing addresses for all zoned living units in the three counties. Controls were marginally matched to cases according to age, gender, and self-identified racial or ethnic background and were free of PD according to self-report at the time of enrolment. There were no statistically significant genotype or allele frequency differences between the Medicare-based and random parcel-based controls.
Cases and controls completed a telephone interview for the collection of demographic (age, gender, self-identified racial or ethnic background, parental and grandparental country of origin) and risk factor (family history of PD, smoking behaviour) data; and provided blood or buccal samples for the extraction of DNA. Four of the top variants identified by Pankratz et al. (2009)– rs1564282 of cyclin G-associated kinase (GAK); rs3775478 in multimerin 1 (MMRN1), which is 5′ of SNCA; rs356229, which is 3′ of SNCA; and rs1724425 in the C17orf69/CRHR1/MAPT region – were genotyped on the Applied Biosystems SNPlex array (Tobler et al., 2005). Three hundred fifty-three cases and 438 controls were included in the array; 341 (96.6%) case and 402 (91.8%) control samples met quality control standards and provided data for these analyses; overall genotyping rate for included subjects was 99.3%.
For comparability to the GenePD/PROGENI analysis, only subjects reporting European/Caucasian ancestry (273 cases, 306 controls) were included in these analyses. All four SNPs were evaluated by χ2 test for deviations from Hardy–Weinberg equilibrium. Odds ratios (ORs), 95% confidence intervals (95% CIs), and two-sided P-values were estimated by logistic regression adjusting for gender (male/female), age (of onset for PD cases/interview for controls), and smoking status (ever smoked for at least 1 year/never smoked for at least 1 year). As the PEG study population has a different distribution of age and family history of PD compared to the GenePD/PROGENI study, sensitivity analyses were conducted: (1) adjusted for family history of PD (as defined as at least one first-degree relative with a diagnosis of PD) in addition to other covariates, and (2) stratified by age of onset/interview below or equal to versus above the mean age that was 69 for cases and 67 for controls. All analyses were performed using PLINK (Purcell et al., 2007). Each SNP was evaluated under the additive model as previously reported in Pankratz et al. (2009) and under the dominant model with the exception of rs1724425 of MAPT that was investigated under a recessive model as reported by Pankratz et al. (2009). The P-values presented are uncorrected unless stated otherwise. For multiple comparisons consideration, eight tests were performed in this investigation. Meta-analysis of PROGENI/GenePD results and PEG results was performed using METAL (Abecasis & Willer, 2007; Willer et al., 2010).
PD cases enrolled in the PEG study are slightly older, more likely to be male, and more likely to have been nonsmokers compared to controls (Table 1). The minor allele distributions observed for the four SNPs in our study population are similar to those observed in the GenePD/PROGENI GWAS (“the GWAS”) and in the HapMap data for Caucasians. Under the additive model (Table 2), we observed a nearly identical effect estimate to that observed in the GWAS for rs1724425 near MAPT (P= 0.0158) and a similar magnitude OR for rs1564282 in GAK (P= 0.0142). The associations with PD for rs3775478 of MMRN1 and rs356229 near SNCA were not statistically significant in our study population (P= 0.29 and 0.16, respectively). Under the alternative genetic models (Table 3) only rs1564282 of GAK reached statistical significance (P= 0.0054). Sensitivity analyses including family history as a covariate did not change the effect estimates. In meta-analysis combining the GWAS results with our results, rs1564282 in GAK nearly reached the conservative Bonferroni corrected genome-wide significance level of 1.5 × 10−7 (rs1564282 P= 2.7 × 10−7; Table 2).
Table 1. PEG study population demographics, Caucasians only.
|Mean (SD)||69.0 (10.4)||67.2 (12.1)|| || ||0.058|
|Range||34–88||34–92|| || || |
|≤mean age||120||44.0||135||44.1|| |
|>mean age||153||56.0||171||55.9|| |
|Family history of PD||Negative||232||85.0||278||90.8|| |
|Parent and sibling||2||0.70||0||0.0||n.c.|
|Sibling and child||1||0.35||0||0.0||n.c.|
Table 2. Additive model association results.
|GAK rs1564282||4||852313||T||13.1||8.7||1.70||6.0 × 10−6||13.2||9.0||1.61||1.10–2.37||0.0142||2.7 × 10−7|
|SNCA rs356229||4||90606597||G||43.5||36.9||1.35||5.5 × 10−5||42.2||38.1||1.19||0.94–1.52||0.1559||2.7 × 10−5|
|MMRN1 rs3775478||4||90842840||G||10.2||6.9||1.69||6.1 × 10−5||8.9||7.2||1.27||0.82–1.96||0.2896||6.4 × 10−5|
|MAPT rs1724425||17||43828055||T||38.7||44.9||0.75||7.8 × 10−5||37.9||45.1||0.74||0.58–0.95||0.0158||3.7 × 10−6|
Table 3. Dominant or recessive model association results, PEG study only.
| ||CT||24.2||15.4|| || || || |
| ||TT||1.1||1.3|| || || || |
| ||AG||48.9||47.2|| || || || |
| ||GG||17.8||14.5|| || || || |
| ||AG||17.0||13.1|| || || || |
| ||GG||0.4||0.7|| || || || |
| ||CT||47.6||52.1|| || || || |
| ||TT||14.1||19.0|| || || || |
We provide a published replication of the association between PD and rs1564282 of GAK, and do so in an independent, nonfamilial study sample. MAPT and SNCA have been considered candidate genes for PD for many years, but GAK, located p16.3 on chromosome 4, is new and intriguing. GAK has been shown to be differentially expressed in the substantia nigra of PD brains compared to controls (Grünblatt et al., 2004). This kinase is involved in the cell cycle (Kimura et al., 1997) and in microtubule growth around the chromosome during spindle formation (Tanenbaum et al., 2010). In addtion, GAK has been observed to play a role in clathrin-mediated endocytosis/vesicle trafficking (Ungewickell & Hinrichsen, 2007) and clathrin has been observed to colocalize with SNCA aggregates in microglia (Liu et al., 2007) suggesting a possible mechanistic pathway for GAK influence in PD.
Our study differs from the GenePD/PROGENI GWAS study in that 15% of our cases and 9.2% of our controls reported a first-degree relative with PD. In the GWAS, all cases had at least one affected sibling, and all controls reported no family history of PD. Furthermore, ∼30% of cases in the GWAS reported a parent with PD; in PEG 9.5% of cases and 6.5% of controls reported a parent with PD. Sensitivity analysis adjusting for or excluding subjects with a family history of PD did not alter the results.
In an effort to understand why we observed no association for SNCA– the first gene linked to PD by familial genetic methods (Golbe et al., 1990) – we considered family history of PD and age of onset of PD as factors by which our study differs from the GenePD/PROGENI study and through which the SNCA-PD association might be confounded. As discussed above, although the PEG distribution of family history of PD is different from the GenePD/PROGENI distribution, sensitivity analyses adjusting for family history of PD did not alter our results. Because the PEG cases (mean age of 69 years) and controls (mean 67 years) are older than the GenePD/PROGENI cases (mean 62 years) and controls (mean 55 years), we performed a sensitivity analysis stratifying by index age (defined as age of onset for cases and age at interview for controls). When limiting our analysis to the younger (below the mean) cases (≤69 years) and controls (≤67 years), we observed a suggestive association for rs356229 SNCA (OR = 1.38, 95% CI: 0.95–2.02, P= 0.0946); this association was absent in our strata of older subjects (OR = 1.09, 95% CI: 0.79–1.50, P= 0.5945). Considering both family history of PD and age of onset together, 17% of cases and 12% of controls in the younger strata have at least one first degree relative with PD compared to 13% of cases and 7% of controls in the older strata; this is not statistically different (χ2 P-value > 0.2).
Therefore, we hypothesize that the absence of an association for SNCA in our study population may be due to an underrepresentation of younger cases compared to the GenePD/PROGENI study population and other mostly clinic-based studies of PD reporting positive SNCA associations. In fact, in the 31 study populations designated on the PDGene.org website as reporting a positive result for any variant in SNCA, the lowest mean age of PD onset is 47 (Winkler et al., 2007), the highest mean age of onset is 65 (the Elbaz, Hadjigeorgiou, and Checkoway populations included in Maraganore, 2006), and the average of the means for age of onset is 58.9 years. In our PEG study, the mean age of PD onset is 69 years. In addition, in a gene-pesticide analysis of the SNCA Rep1 variants in our PEG study population, Gatto et al. (2010) report an almost threefold increase in PD risk associated with the 263 allele combined with high exposure to the pesticide paraquat, but only in subjects with an age of onset prior to age 68. This is consistent with a multifactorial disease process where genetic effects vary by age and supports a model where SNCA effects are more pronounced in middle age.
An additional issue to consider in evaluating our replication is the comparability of environmental factors between our population and those of GenePD and PROGENI. Hypothetically, if an environmental factor is necessary for a genetic factor to influence disease risk – as is likely the case for a disease of complex etiology such as Parkinson's – a population with a greater proportion of genetic factor carriers also experiencing the necessary environmental exposure will have a higher likelihood of detecting a gene-disease association. This is similar to the observation that in a population where an environmental risk factor is ubiquitous the disease will appear to be solely genetic in origin. Our study population is derived from a region of California with extensive commercial agricultural pesticide use, likely resulting in a greater proportion of our study subjects being exposed to pesticides, a recognised risk factor for PD. The GenePD and PROGENI studies are unlikely to have a similarly pesticide-exposed population, suggesting that the GAK and MAPT associations might not be strongly influenced by pesticide exposure. Correspondingly, this also suggests that the MMRN1 and SNCA associations might be influenced by an environmental exposure present in the GenePD and PROGENI studies but absent or less frequent in our PEG Study; or, alternatively, that the PEG study participants might be more highly exposed than GenePD and PROGENI participants to one or more environmental risk factors that impact PD risk regardless of SNCA genotype and effectively mask the SNCA effect.
Another consideration for our replication, or nonreplication, of these SNPs is the power of our study to detect the previously observed association ORs. Specifically, for SNCA and MMRN1 at the respective minor allele frequencies in our controls and given our study sample size, we have 0.81–0.82 power to detect an ORs greater than one under the log-additive model at an alpha of 0.05 (one sided), if the true ORs are as reported by GenePD/PROGENI. If the true ORs for SNCA and MMRN1 are smaller, then our chance to detect their effects would be diminished.
The PEG study is similar to the GenePD/PROGENI study, and therefore provides an acceptable replication sample, in that all cases underwent neurologic evaluation based on standard criteria and all data analysed were from subjects reporting Caucasian ancestry. In addition, similar to the GenePD and PROGENI studies, no PD cases in the PEG study carried the LRRK2 G2019S mutation. Finally, we used analytic methods and covariate adjustment identical to those reported by Pankratz et al. (2009). Our findings support an association between PD and rs1724425 near MAPT. In addition, our replication of the association for rs1564282 in GAK in this population-based case–control study of idiopathic PD suggests benefit to be gained by further study of the mechanistic role of this gene in the etiology of PD.
This work was supported by the National Institute of Environmental Health Science [ES10544, ES16732] and the National Institute of Neurological Disorders and Stroke [NS 038367].