Parkinson's Disease and Low Frequency Alleles Found Together Throughout LRRK2

Authors

  • Coro Paisán-Ruiz,

    1. Molecular Neuroscience Department and Reta Lila Weston Laboratories, UCL Institute of Neurology, 9th Floor, Queen Square House, Queen Square, London WC1N 3BG, England
    Search for more papers by this author
  • Nicole Washecka,

    1. Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
    Search for more papers by this author
  • Priti Nath,

    1. Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
    Search for more papers by this author
  • Andrew B. Singleton,

    1. Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
    2. Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
    Search for more papers by this author
  • Elizabeth H. Corder

    Corresponding author
    1. Matrix Genomics, Inc. 3900 Paseo del Sol, Santa Fe, NM 87505
    Search for more papers by this author

*Corresponding author: Elizabeth H. Corder, Ph.D., Matrix Genomics, Inc., 3900 Paseo del Sol, Santa Fe, NM 87507. Tel: +1-505-216-0660; Fax: +1-505-216-0885; E-mail: elizabethcorder@hotmail.com

Summary

Mutations within LRRK2, most notably p.G2019S, cause Parkinson's disease (PD) in rare monogenic families, and sporadic occurrences in diverse populations. We investigated variation throughout LRRK2 (84 SNPs; genotype or diplotype found for 49 LD blocks) for 275 cases (European ancestry, onset at age 60 or older) and 275 neurologically healthy control subjects (NINDS Neurogenetics Repository). Three grade-of-membership groups, i.e. genetic risk sets, were identified that exactly matched many subjects (cases: 46, 4, 137; controls: 0, 178, 0), and distinguished 94% of the subjects (i.e. >50% likeness to one set). Set I, affected, carried certain low frequency alleles located in multiple functional domains. Set II was unaffected. Set III, also affected, resembled set II except for slightly elevated frequencies of minor alleles not defining set I. We conclude that certain low frequency alleles distributed throughout LRRK2 are a genetic background to a third of cases, defining a distinct subset.

Introduction

Parkinson's disease (PD, OMIM #168600) is a chronic neurodegenerative disease with a cumulative prevalence of greater than one per thousand people (Kuopio et al., 1999). It is well characterized clinically (resting tremor, bradykinesia, postural instability, rigidity) and pathologically (loss of dopaminergic neurons in the pars compacta of the substantia nigra). Genetically, rare monogenic families have identified five causative genes, the most common (∼7%) (Healy et al., 2008) being leucine-rich repeat kinase 2 (LRRK2; OMIM *609007) located on chromosome 12q12. Candidate gene studies have to date been less successful in demonstrating the genetic background to sporadic PD.

Our interest focused on the LRRK2 gene as both familial and sporadic cases in diverse populations are known to carry the p.G2019S mutation, e.g. ∼1% of sporadic cases with European ancestry (Healy et al., 2008). The gene is large, spanning 1.4 Mb and consisting of 51 exons and multiple functional domains (Leucine-rich repeats (LRR), Roc, COR, RAS, Kinase, WD40 motif). The encoded protein dardarin is thought to play a role in intracellular signaling (Marin et al., 2008). It is expressed in multiple brain regions, particularly in the substantia nigra, consistent with direct involvement in dopaminergic cell death.

Nonetheless, a genome-wide search (Fung et al., 2006) did not identify any SNP within (51 SNPs), or close to, LRRK2 as relevant to PD in the NINDS Neurogenetic Repository sample of 275 cases of European ancestry with onset at age 60 or older, and 275 neurologically healthy control subjects applying a rather stringent criterion (uncorrected P-value < 0.0001). These negative findings were not likely related to differing population structure for case and control subjects as comparison of the two groups demonstrated no appreciable differences (STRUCTURE; http://pritch.bsd.uchicago.edu/structure.html) (Fung et al., 2006; Falush et al., 2003).

Paisan-Ruiz et al. then sequenced all 51 exons, and at least 50 bp of flanking intronic sequence, for the sample (Paisan-Ruiz et al., 2008). Coding variants were found in twelve patients (4 cases carried p.G2019S) as well as in seven control subjects (no p.G2019S mutations). A total of 135 variants were identified in the sample including SNPs from the genome-wide study and those identified by sequencing, many were unique to one or several subjects. Considering the 84 SNPs having minor allele frequency of 5 or more, six SNPs scattered throughout the gene were associated with PD by χ2 testing using a weaker criterion (p < 0.05), indicated by an asterisk throughout this paper: rs1157655* (A allele, intron 2), rs1907632* (T, intron 11), rs11564205* (G, intron 34), rs11564203* (A, intron 39), rs11829088* (G, intron 39), and rs11564173* (A, intron 46).

Our goal was to take this information a step further by first identifying linkage disequilibrium blocks (LD) to simplify the data and render it more meaningful, and then, by identifying genetic risk sets for PD, each defined by genotype/diplotype frequencies for the LD blocks. This was accomplished by grade-of-membership analysis (GoM) (Clive et al., 1983; Woodbury & Clive, 1974; Woodbury et al., 1978; Woodbury et al., 1994).

GoM is a form of latent classification analysis that incorporates large amounts of information to identify major patterns within the data. It allows individuals to resemble one of the identified patterns, or GoM groups (here genetic risk sets) or more often, to partly resemble two or more groups. The degree of likeness of individuals to each GoM group is given by membership scores in the groups, like weights, which range from zero (no likeness of the subject to the GoM group) to one (an exact match), summing to one for each subject. This fuzziness with respect to individuals minimizes the number of groups needed to represent the sample. Unlike other forms of latent classification, it operates efficiently in L1 space (linear differences), rather than L2 space (sum of squares differences), providing 5-fold better ability to identify patterns according to the signal detection literature, i.e. high power compared to more usual genetic epidemiologic approaches (Corder et al., 2001). Importantly, the GoM groups (represented here by frequencies for genotypes/diplotypes), and the likeness of individuals to the groups (represented by membership scores), are jointly estimated using maximum likelihood (see Methods section), closely defining the space concerning LRRK2, avoiding multiple comparisons. The best number of groups is decided according to an information criterion or empirically, as in this instance when three groups were sufficient to distinguish most (94%) case and control subjects, also identifying a distinct subset, about a third, of cases.

Methods

Study Subjects

The 275 sporadic PD cases age 60 or older, and 275 neurologically healthy control subjects having a similar age-sex distribution, are members of a NINDS cohort hosted by the Coriell Institute (http://ccr.coriell.org/Sections/Collections/NINDS/). Initially, 51 SNPs tagging each intron were investigated for the subjects in a genome-wide association analysis using HumanHap317 SNP arrays (Fung et al., 2006). Subsequently, all 51 exons were sequenced for the subjects including at least 50 bp of flanking intronic sequence to identify additional mutations and SNPs. This identified 12 cases who carried mutations (p.H275H, p.M712V, p.A1430A, p.R1728L, p.R1728H, p.G2019S (n = 4), p.T2141M, p.R2143H, p.L2466H) as well as 7 controls with LRRK2 variants (p.C228S, p.Y707Y, p.A716V, p.K871E, p.L1870F, p.E2395K, p.G2432G) (Paisan-Ruiz et al., 2008). Information on age and sex was available for cases.

Coding of the Data for Entry into GoM Analysis

GoM evaluates categorical data, usually 2 to 6 possible outcomes for each variable. PD status: 0 = control, 1 = onset < age 65, 2 = onset age 65 to 74, 3 = onset age 75 to 88. Sex: 0 = control, 1 = male (case), 2 = female (case). Mutation status: 0 = no mutation, 1 = mutation (case), 2 = mutation (control). Number of very low frequency alleles found at 34 loci (minor allele count ≤ 5 at each locus): 0, 1, or 2 (at 2 to 5 of these loci).

A total of 84 SNPs had minor allele counts of 5 or more. These loci were in strong LD (Paisan-Ruiz et al., 2008). Thus, we created relatively independent variables and facilitated the identification of relevant low frequency alleles by the identification of LD blocks. The LD blocks were identified using the Carlson method (HelixTree Software) specifying the minor allele frequency threshold as 0.01 (not the default value of 0.10) and the R^2 LD threshold as 0.80.

There were 49 LD blocks, labeled from B0 to B48 (Table 1). B0 had minor allele frequency 0.009, i.e. <0.01. B1 to B12 consisted of multiple loci usually dispersed over a large section of the gene, and overlapping each other. Three of these LD blocks contained the six SNPs previously associated with PD in the sample: B15*, rs1157655* (intron 2); B4*, rs1907632* (intron 11) + rs11564205* (intron 34) + rs11564203* (intron 39) + rs11829088* (intron 39); and, B43*, rs11564173* (intron 46). Thus B4* contained four of the six associated SNPs distributed across functional domains including LRR, Roc, COR and Kinase domains.

Table 1. Table 1 LD blocks located within the LRRK2 gene.
Block NumberSNP NameMinor Allele Freq.Location
  1. 49 LD blocks (B0 to B48) were identified from 84 SNPs located within the LRRK2 gene having minor allele count of 5 or more (Carlson method, R^2 minimum LD threshold 0.80, minimum allele frequency threshold 0.01, HelixTree Software). The SNPs are listed in map order from 5′ to 3′ except for non-contiguous SNPs belonging to B1 to B12 consisting of multiple SNPs, highlighted in gray. SNPs previously associated with PD [B15* (rs1157655), B4* (rs1907632, rs11564205, rs11564203, rs11829088), B43* (rs11564173); p < 0.05 in χ2 testing] (Paisan-Ruiz et al., 2008) and those identified here according to a less stringent criterion [B28* (rs10784498), B31* (rs33958906); p < 0.10 in an additive co-dominant model] are in bold and tagged with an asterisk.

13rs13885870.2945′ near gene
7rs22011440.0815′ near gene
7rs21310880.068Intron 4
7rs13885960.067Intron 7
14rs122306850.156Intron 2
15*rs11175655*0.144Intron 2
16rs108782450.399Exon 5
17rs108782460.163Intron 5
18rs108782470.291Intron 5
8rs108782490.399Intron 5
8c.839–160C>T0.354Intron 7
8rs79559020.361Intron 9
19rs115641870.027Intron 5
20rs71343790.273Intron 8
21rs14919380.433Intron 10
22rs79696770.194Intron 11
4*rs1907632*0.166Intron 11
4*rs11564205*0.164Intron 34
4*rs11564203*0.164Intron 39
4*rs11829088*0.164Intron 39
23rs27232640.209Intron 12
3rs107844610.467Intron 13
3rs107844620.463Intron 14
3rs111758470.460Intron 18
3rs362207400.468Intron 19
3rs128209200.461Intron 21
2rs73087200.066Exon 14
2rs71339140.068Exon 30
2rs111759640.066Exon 30
2rs111760220.063Intron 37
2rs108783860.069Intron 39
2rs111761950.069Intron 47
2rs124264980.066Intron 50
9rs107844700.285Intron 15
9rs115641480.283Exon 34
9rs47682300.282Intron 35
10rs115641290.093Intron 16
10rs115641490.095Intron 28
24rs105061510.138Intron 16
25rs108783070.068Exon 18
26c.2680 + 11insA0.075Intron 20
27rs79665500.138Exon 22
28*rs10784498*0.357Intron 26
29c.4309 + 12delT0.025Intron 30
1rs106503880.368Intron 30
1rs73025030.362Intron 31
1rs14272670.362Intron 32
1rs14272630.364Exon 34
1rs71376650.362Intron 36
1rs47682360.366Intron 47
1rs37618630.368Exon 49
1rs124263620.370Intron 49
30rs111759850.134Intron 31
31*rs33958906*0.026Exon 32
5rs18962520.461Intron 33
5rs111760130.455Exon 34
5rs108783710.459Exon 37
5rs72989300.457Intron 39
32rs353037860.014Exon 34
33rs174440540.028Intron 37
34rs108783720.218Intron 37
0c.5656 + 7C>T0.009Intron 38
35c.5656 + 35G>A0.027Intron 38
36rs123709960.040Intron 39
11rs111760520.262Intron 39
11rs111760530.262Intron 39
37rs73075620.378Intron 39
38rs24048350.329Intron 40
6rs14272710.165Intron 40
6rs73073100.138Intron 43
6rs47682380.137Intron 50
6rs14655270.1373′ near gene
39rs107359340.494Intron 40
40rs105061550.328Intron 41
41rs339958830.021Exon 42
12rs108784050.312Exon 43
12rs104671470.3173′ near gene
42rs111761430.113Intron 43
43*rs11564173*0.114Intron 46
44c.7029–8(8>9T)0.369Intron 47
45rs339629750.130Exon 48
46rs118351050.184Intron 48
47rs37893290.029Intron 49
48rs18205450.4383′ near gene

Next, diplotype for individuals was inferred for LD blocks B1 to B12 using the E-M algorithm. Each value had >99% probability, with 9 exceptions (>92% probability). Diplotype was not inferred when there was missing genotype information as this would misidentify infrequent alleles, if present. The data was nearly complete (648 missing values among the 49 blocks). Missing values did not substantially represent untyped very low frequency alleles: the majority of missing data was limited to 5 case and 8 control subjects. Diplotype for B1 to B12, and genotype for B0, B13 to B48 were coded numerically, grouping low frequency values <5% together (Table 2).

Table 2.  Haplotype frequencies for B1 to B12 consisting of multiple SNPs.
BlockHaplotypeFrequency
  1. Haplotype frequencies for LD blocks B1 to B12 consisting of multiple SNPs.

7TAT0.910
CTC0.060
CAT0.010
CTT0.002
TAC0.002
TTC0.001
8TCC0.590
CTA0.350
CCC0.040
CCA0.010
TTA0.003
CTC0.003
TTC0.002
4*CAGT0.830
TGAG0.160
TAAG0.008
CGGT0.004
TGGT0.002
TAGT0.002
CGAG0.001
CGGG0.001
3ACGCA0.510
GGTAG0.450
GGGCA0.010
ACGAA0.010
ACTAG0.050
GCTAG0.050
GGGAA0.003
GGTCG0.003
ACTCG0.002
AGGCA0.002
ACGCG0.001
GCGCA0.001
2CGGTAAT0.930
GAACTGC0.060
GAATTGC0.004
CAACTGC0.003
CAGTAAT0.002
GAACAGT0.002
CGGTAAC0.001
CGGTAGC0.001
9GTG0.700
TAA0.270
TTG0.020
GAA0.006
GAG0.005
10TC0.900
CT0.090
TT0.004
CC0.002
1CAGATACA0.590
AGACCCTT0.330
CAGATCTT0.020
AGACCACA0.020
CAGATATT0.006
CGACCCTT0.004
AAGATACA0.003
CAGATCCA0.003
CAGACACA0.003
CGAATCCT0.002
AGACCATA0.002
AGACTCTT0.002
CAGCTACA0.002
AGACCCCT0.002
AAGCCATA0.002
CGACCACA0.001
CGACTCTT0.001
CAGACACT0.001
AAGCTACA0.001
CAGATCCT0.001
AAGATCCT0.000
AAGATCTT0.000
5CGCC0.530
TATA0.450
TGCC0.005
CGCA0.003
TGTA0.003
CGTC0.003
CATA0.002
TATC0.001
CACC0.001
TACC0.001
11CC0.740
TT0.260
6CCGT0.830
TTAC0.140
TCGT0.030
TTGC0.005
CCAT0.002
CCAC0.001
TTGT0.001
TCAC0.001
12GG0.670
AA0.300
GA0.020
AG0.010

Grade-of Membership Analysis

Patterns of polymorphisms associated with high and low risk were identified by grade-of-membership analysis (GoM), alluding to the graded membership scores of individuals in the identified GoM groups; here, patterns of polymorphisms. These scores reflect the resemblance of an individual to the groups. Each group is represented by a set of outcome probabilities for the variables, e.g. being male or female.

More formally, the GoM model likelihood can be described after first identifying four indices. One is the number of subjects I (i= 1, 2, … , I). Here, I= 550. The second index is the number of variables J (j= 1, 2, … , J). There are J= 9 variables (final model). Our third index is Lj: the set of response levels for the Jth variable. This leads to the definition of the basic GoM model where the probability that the ith subject has the Ljth level of the Jth variable is defined by a binary variable (ie, yijl= 0, 1). The model with these definitions is

image(1)

where the gik are convexly constrained scores (i.e., 0.0 ≤gik≤ 1.0; ∑k gik= 1.0) for subjects and the λkjl are probabilities that, for the Kth latent group, the Ljth level is found for the Jth variable. The procedure thus uses this expression to identify K profiles representing the pattern of J×Lj responses found for I subjects.

The parameters gik and λkjl are estimated simultaneously using the likelihood function (in its most basic form).

image(2)

In the likelihood yijl is 1.0 if the Ljth level is present and 0.0 if it is not present.

Variables used to define the GoM groups are termed ‘internal’ variables. Initially, each facet of the data was used to construct models specifying K = 2, 3, 4, 5, or 6 groups. These models, despite efforts to minimize LD, reflected relationships among the variables unrelated to PD status. Thus a second set of models was constructed employing a reduced set of ‘internal’ variables: PD status, sex, mutation status, number of very low frequency alleles, and information on five SNP blocks that demonstrated a modicum of association with PD (B15*, B4*, B28*, B31*, B43*) (Armitage trend test, p < 0.10, HelixTree Software).

Information on the other LD blocks further characterized the groups as ‘external’ variables. One option in the likelihood is to separate calculations for ‘internal’ and ‘external’ (here, LD blocks not demonstrating evidence of association) variables. For internal variables, maximum likelihood estimations [MLE] of gik and λkjl are generated and the information in internal variables is used to define the K groups. For external variables the likelihood is evaluated (and MLE of λkj; generated) but the information is not used to redefine the K groups; that is, the likelihood calculations for likelihood equations involving the gik are disabled for external variables so that the gik, and the definition of the K groups, is not changed. The model presented here represents three patterns of polymorphisms that distinguish parsimoniously between high and low risk, and identifies two patterns associated with risk, one concise and the other diverse.

Results

Overview

Three patterns of genetic variation were identified that represent 94% of the subjects. The number of very low frequency alleles and the occurrence of mutation played very limited roles in defining the patterns and distinguishing the subjects. Pattern I represented a specific set of minor alleles as a background to PD among about a third of cases. Pattern II was unaffected. Pattern III represented PD associated with a more diverse occurrence of other low frequency alleles. Age at the time of diagnosis ranged from age 60 to the late 80's for both patterns I and III, but occurrence before age 65 was more likely for pattern I (22% vs 11%).

Pattern I: A specific set of minor alleles scattered throughout LRRK2 was a common background to sporadic PD (Table 3). A core set of minor alleles was found: Minor alleles were the rule for B15* (GA, not GG), for B4* (CAGT:TGAG– or else diverse minor diplotypes, not the common diplotype CAGT:CAGT) and for B43* (GA or AA, not GG). These three LD blocks include all six of the SNPs individually associated with PD in the sample, and were the most informative blocks (B4*, H = 0.55; B15*, H = 0.53; B43*, H = 0.52). The H statistic (Shannon, Bell Laboratories, Berkeley Heights, NJ, USA) describes the extent to which outcomes for the variable differ for patterns I, II, and III. Values above 0.50 denote strong differences among the GoM groups for the variable.

Table 3.  Patterns of risk (I, II, III) for sporadic PD involving LRRK2.
VariableOutcomeIIIIIIH
  1. Patterns of risk for PD are displayed (I, II, III). Each pattern is defined by the displayed probabilities. Pattern I is affected and carries a set of minor alleles for B15*, B17, B4* (most notably), B28*, B31*, B11, B43* and B46. Pattern II is unaffected and at low risk with respect to genetic variation within LRRK2. Pattern III is affected and has slightly elevated minor allele frequencies at locations not found for pattern I. Variables used to identify the patterns are shown in bold. The GoM maximum likelihood method automatically generates outcome probabilities for other variables based on the membership of individuals in the identified sets. Most subjects matched (100% membership) or closely resembled (>50% membership) one of these patterns: I: 46 cases were exact matches (91 were close matches); II: 178 control subjects were exact matches (250 were close matches); III: 137 cases were exact matches (178 were close matches). None of the control subjects were exact matches to I or III; 10% of control subjects resembled I. Information content for each variable is denote by ‘H’. Values near zero indicate similar outcome frequencies similar for each set; higher values indicate greater information content.

PD statusControl010000.71
Onset age 59 to 6422011 
Onset age 65 to 7443055 
Onset age 75 to 8835034 
SexControl010000.70
Male68059 
Female32041 
MutationNo9897950.02
Yes, case205 
Yes, control030 
Number of very low freq. alleles07480730.004
1221620 
2 to 5447 
B13CC4750520.004
CG464437 
GG7711 
B7TAT:TAT9085860.03
TAT:CTC01213 
Low freq.1041 
B14CC9167670.03
CT93333 
B15*GG01001000.53
GA10000 
B16CC6230260.07
CT385251 
TT01823 
B17TT092980.42
TG10082 
B18CC7247390.04
CT284647 
TT0814 
B8TCC:TCC6028290.07
TCC:CTA384342 
CTA:CTA01715 
Low freq.21314 
B19AA9895930.004
AG or GG257 
B20CC7650410.05
CT244347 
TT0812 
B21TT6125190.09
TC395354 
CC02227 
B22GG9059570.04
GA104143 
B4*CAGT:CAGT01001000.55
CAGT:TGAG8600 
Low freq.1400 
B23CC8753580.04
CT134742 
B3ACGCA:ACGCA036380.10
ACGCA:GGTAG483945 
GGTAG:GGTAG451511 
Low freq. 7106 
B2CGGTAAT:CGGTAAT9582830.01
Low freq.51817 
B9GTG:GTG7045430.04
GTG:TAA254142 
TAA:TAA0119 
Low freq.546 
B10TC:TC8583770.01
TC:CT121620 
Low freq.313 
B24CC9070700.02
CA103030 
B25AA9584850.005
AG or GG51615 
B26AA9285830.005
AC or CC81517 
B27TT8970710.02
TC113029 
B28*GG058580.25
AG624242 
AA3800 
B29AA9895930.03
AC257 
B1CAGATACA:CAGATACA5927320.06
AGACCCTT:CAGATACA254840 
AGACCCTT:AGACCCTT01518 
Low freq.16810 
B30CC8671700.01
CT or TT142930 
B31*CC94100890.03
CT or TT6011 
B5CGCC:CGCC4920210.08
CGCC:TATA475447 
TATA:TATA02329 
Low freq.442 
B32TT9399980.01
CT712 
B33TT9894930.003
GT or GG267 
B34AA8054570.02
AG204643 
B35GG9894940.003
AG or AA266 
B00AA9998980.001
AG122 
B36CC9892890.01
CT2811 
B11CC:CC172780.24
CC:TT992822 
B37GG6529310.07
GT355251 
TT01918 
B38CC7042380.05
CT304242 
TT01519 
B6CCGT:CCGT8769620.02
CCGT:TTAC82228 
Low freq.4811 
B39AA4620160.09
AC545050 
CC03134 
B40GG6538360.04
AG345051 
AA11213 
B41AA9896940.003
AG246 
B12GG:GG4846430.02
GG:AA414041 
AA:AA11213 
Low freq.933 
B42GG8677770.005
GA142323 
B43*GG01001000.52
GA or AA10000 
B44AA6532340.05
AC335247 
CC11719 
B45AA8674730.01
AG142627 
B46TT580850.24
TG952015 
B47TT9595940.0003
TC or CC556 
B48TT5125240.06
CT505052 
CC02524 

The two LD blocks that demonstrated weaker evidence of association when considered individually contributed minor alleles to pattern I with some probability: B28*, AA (38% chance) or AG (H = 0.25); B31*CT or TT (6% chance) (H = 0.03). Values of H above 0.05 might be considered notable, while lower values indicate that the groups differ only slightly in genotypic frequency. However, it might be noted that SNPs with low frequency minor alleles can vary in frequency of the minor allele, e.g. 3-fold, and still have a low H-value.

Other LD blocks, not providing evidence of association when considered individually, also contributed minor alleles to pattern I: B17 carried TG, while II & III carried TT almost exclusively (H = 0.42); B3 carried ACGCA:GGTAG– or else diverse minor diplotypes, while II & III often carried ACGCA:ACGCA (H = 0.10); B11 carried CC:TT, while II & III often carried CC:CC (H = 0.24); and, B46 usually carried usually TG, while II & III usually carried TT (H = 0.46). This extends the pattern of association of minor alleles from intron 2 to intron 48 as representing a subset of PD.

Finally, diverse low frequency diplotypes were more likely for B7 (10%; 5′ near gene, intron 4, intron 7) (H = 0.03), B1 (16%; introns 30, 31, & 32, exon 34, introns 36 & 47, exon 49, intron 49) (H = 0.06), and B12 (9%; exon 43, 3′ near gene) (H = 0.02). Therefore, pattern I, taking the broadest definition, extends from the 5′ near-gene region to the 3′ near-gene region.

Pattern II: Unaffected. Minor alleles had low frequency. Four case subjects unexpectedly matched this low risk pattern with respect to LRRK2. Thus, PD was possible, if infrequent, when neither high-risk pattern for LRRK2 was present.

Pattern III: This typology also represents sporadic PD at age 60 and older. However, it does not follow the pattern of minor alleles found together for pattern I, and was more diverse. Minor alleles were slightly more likely for LD blocks not part of pattern I: B13, B16, B18, B8, B19, B20, B21, B22, B9, B26, B29, B33, B35, B00, B36, B38, B6, B40, B41, B42, B44, B45, and B47. Again, this pattern involved essentially the whole gene. Curiously, minor genotype frequencies at B19, B29, B33, B35 and B41 were essentially identical – higher for III than for I or II, suggesting that they might be found together for a small subset of cases. Both affected patterns had some chance of carrying minor alleles at B31*, this being the only point of overlap between the relatively distinct pattern found for I and the less distinct pattern of low frequency alleles found for III.

Resemblance of Individuals to the Patterns

This data analytic approach does not force individuals into discrete groups. Instead, individuals divide membership among model-based idealized groups, essentially stereotypes, here labeled, I, II or III, depending on the degree of resemblance. Subjects who match a particular group have a membership score of one in that group, and scores of zero in the other groups. Other subjects have positive scores in two or three groups, summing to one.

Here, the size of group I was 120.044, summing the membership of subjects in group I; clearly smaller than the number of case subjects. The size of group II was 242.766, less than the number of control subjects, indicating that some control subjects carry risk factors for PD. Group III was larger than group I, size 187.19, the sum of I and III being larger than the number of case subjects.

These model-based groups exactly matched many subjects: cases: 46 (17% of all cases), 4, 137; controls: 0, 178, 0. Most (94%) subjects resembled (>50% match) one of these patterns: cases: 91 (33% of all cases), 4, 178; controls: 27, 250, 0. Each case carrying p.G2019S had membership in III (1.00, 0.47, 0.83, 1.00) and, possibly, membership in I (0.00, 0.53, 0.17, 0.00), suggesting that the pattern of minor alleles found for I was not required for causation when the mutation was present.

Multiple Minor Alleles Found Together for Pattern I

To verify that multiple minor alleles were found together for a subset of cases, we went back to the data. All 46 cases who matched pattern I carried minor alleles for the core LD blocks B4* (“TGAG” or diverse minor alleles), B15* (AG), and B43* (AG or AA), whereas all 178 control subjects matching II (65% of all control subjects) and 137 cases matching III (50% of all cases) carried two copies of the common alleles for B4* (“CAGT”), B15* (GG), and B43* (GG).

The core set of minor alleles was usually found with minor alleles at B17 and B46. All 46 cases matching I carried TG or GG at B17, while the common TT genotype was usually found for controls matching II (166 of 178) and cases matching III (134 of 137). Almost all (45 of 46) cases matching I carried TG or GG at B46, whereas the common TT genotype was usually found for controls matching II (146 of 178) and cases matching III (118 of 137). Therefore, alterations in LRRK2 were often occurring together from intron 2 to intron 48 among a subset of 17% of the cases.

We then considered the 94% of cases who resembled (>50% match) one of the patterns (Fig. 1): Minor alleles were much more likely for all the SNPs comprising these mentioned LD blocks for the third of cases like pattern I than for other cases or the control subjects. This was most evident for each of the four loci that composed B4* (>95% probability at each locus).

Figure 1.

The frequency of minor alleles is shown for PD cases like (>50% match, i.e. a membership score of 0.50 or higher) pattern I (n = 91), control subjects like pattern II (n = 250), and PD cases like pattern III (n = 178).

Discussion

There is ample evidence that the LRRK2 gene is a determinant of PD in certain families and for the general population, involving the p.G2019S mutation (Healy et al., 2008) and allelic associations of SNPs located within LRRK2 (Paisan-Ruiz et al., 2008). We sought to identify patterns of polymorphisms within the gene that more fully describe the genetic background of sporadic PD in relation to LRRK2. To accomplish this aim we identified LD blocks to simplify the data and allow identification of low frequency alleles. Information on genotype/diplotype for these blocks identified three patterns of risk represented by GoM groups (I, II, and III). GoM has been employed in a similar way to reduce complex data to a tractable number of patterns in previous medical and genetic studies; to define subtypes of disease and patterns of disease progression, endophenotypes, genetic risk sets for disease, and as a form of sibpair linkage analysis having apparently high statistical power (Corder & Woodbury, 1993; Corder et al., 2000; Corder et al., 2001; Corder et al., 2005; Corder et al., 2006; Corder & Hefler, 2006; Corder & Mellick, 2006; Corder et al., 2007; Corder et al., 2008a,b; Corder & Beaumont, 2007; Golanska et al., 2009; Hallmayer et al., 2005; Helisalmi et al., 2004; Iivonen et al., 2004; Licastro et al., 2007a,b).

The three patterns distinguished between high (I, III) and low (II) risk with respect to LRRK2, also defining a distinct subset of minor alleles found together, distributed widely throughout the gene across functional domains (I). These stereotypic backgrounds effectively partitioned the subjects into three groups: 94% of the subjects resembled one of the patterns, and a third of the cases carried most of the minor alleles characteristic of pattern I.

To emphasize that a distinct subset of cases was characterized by a set of minor alleles found together distributed throughout the LRRK2 locus, the 46 cases who matched pattern I, did, in fact, carry one or two copies of minor alleles at all six loci previously identified as associated with sporadic PD in the dataset, here, represented by B4*, B15*, and B43*. These cases usually carried additional minor alleles also identified as part of pattern I. None of the subjects matching patterns II (65% of all controls) or III (50% of all cases) carried minor alleles at these locations. Four of the six associated loci (Paisan-Ruiz et al., 2008) were located in one LD block (B4*), rs1907632_T (intron 11), rs11564205_G (intron 34), rs11564203_A (intron 39) and rs11829088_G (intron 39) (i.e. ‘TGAG’ or other very minor alleles), that extends from intron 11 to intron 39 across several functional domains of dardarin, including LRR, Roc, COR and Kinase domains.

The minor alleles at high probability for pattern I, taking the broadest definition, were located in non-coding regions throughout the gene region from the 5′ near gene region to the 3′ near gene region, except for B31* located in exon 32. Thus, alterations within introns located throughout the gene appear to alter LRRK2 function especially when they are found together, possibly defining a distinct very high risk allele. This information might possibly be used in the future to identify persons at very high risk before the onset of symptoms, when preventive interventions might be undertaken. It might also motivate focused cell culture studies of LRRK2 function.

Moreover, when evaluated in this way, diverse low frequency diplotypes were more likely for I than for II or III at B7 (which provided no statistically significant evidence of association on its own) (10%; 3 SNPS extending from the 5′UTR to intron 7), B1 (16%; 7 SNPs extending from intron 30 to intron 49), and B12 (9%; 1 SNP located within the 3′ UTR). These low frequency variants were also scattered throughout the LRRK2 locus and most of them were located within intronic sequences, with the exception of rs1427263 and rs3761863 at B1 spanning functional domains such as Roc, COR, Kinase and WD40 domains. Thus, LRRK2 alterations were dispersed from the 5′ UTR to the 3′ UTR regions, involving essentially the whole gene.

Taking a weaker criterion, namely, at least 50% match to pattern I, 91 cases (33%) had a relatively distinct genetic background to PD involving minor alleles at the LRRK2 locus: >95% of these cases carried minor alleles at all four loci that comprise LD block B4*. In contrast, none of the control subjects matched pattern I, although 27 had >50% resemblance to pattern I and might possibly be at elevated risk for PD.

Pattern III representing the majority of cases had only slight elevations in the frequencies of minor alleles at other locations. Many cases matched (n = 137) or resembled (n = 178) this pattern. None of the control subjects matched or resembled pattern III. Possibly, pattern III is a mixture of many patterns of vulnerability involving LRRK2. The only point of overlap between I and III was that both patterns involved the possible occurrence of minor alleles at B31*. The stereotypic groups had the following probability of carrying a minor allele at B31*: 6% chance, 0%, 11%. Thus the minor allele was associated with high risk. However, eight control subjects (3%) carried B31*TC; three of whom resembled pattern I and might be considered to be at elevated risk for PD, and five of whom had limited resemblance to I and/or III having membership scores of from 0.33 to 0.41, and might be considered to be at lesser risk.

The conclusion that we draw is that a well-defined subset of PD occurring at ages 60 and older in populations with European ancestries has a pattern of multiple minor alleles found together. This information might be useful to define risk for presently healthy individuals. Whether these findings apply to other populations is an open question. None of the six SNPs included in a core set of matches to pattern I (B15*, B4*, B43*) would be useful when investigating Asian populations (Table 4); therefore, the investigated set of SNPs are not relevant to all other populations.

Table 4.  Core SNP frequencies in diverse populations.
CEUSNPPositionObsHETPredHETHWpvalMAFAlleles
 rs11175655389099940.1830.19310.108G:A
rs1907632389367690.2330.23110.133G:A
rs11564205390002760.2330.23110.133A:G
rs11564203390108480.2330.23110.133G:A
rs11829088390140460.2330.23110.133T:G
rs11564173390367380.150.1670.790.092G:A
YRBSNPPositionObsHETPredHETHWpvalMAFAlleles
 rs11175655389099940.1170.1390.550.075G:A
rs1907632389367690.2170.21910.125G:A
rs11564205390002760.3330.33910.217A:G
rs11564203390108480.3170.2890.870.175G:A
rs11829088390140460.3330.3210.2T:G
rs11564173390367380.3330.2990.750.183G:A
CHB-JPTSNPPositionObsHETPredHETHWpvalMAFAlleles
  1. Core SNP frequencies in diverse populations. Analysis performed by Haploview 4.1 software (http://www.broad.mit.edu/haploview/haploview) with HapMap data http://www.hapmap.org/. YRI: Yoruba in Ibadan, Nigeria; JPT: Japanese in Tokyo, Japan; CHB: Han Chinese in Beijing, China; CEU: CEPH (Utah residents with ancestry from northern and western Europe).

 rs11175655389099940000G:G
rs1907632389367690.0110.01110.006G:A
rs11564205390002760.0110.01110.006A:G
rs11564203390108480.0110.01110.006G:A
rs11829088390140460.0110.01110.006T:G
rs11564173390367380000G:G

Mutations were not a major background to sporadic PD (<5%). The four case subjects who carried p.G2019S resembled III more than I, suggesting that the pattern of multiple minor alleles found for I was not needed for the mutation to be penetrant. The phenotypic variability, and incomplete penetrance, found in some p.G2019S carriers may depend on specific alterations found for LRRK2 and contributions of interacting proteins. Very low frequency alleles, mostly found in flanking intronic regions, played only a small role.

One limitation of many genetic studies is that control subjects are drawn from persons who are not yet affected rather than persons established to be at low risk. Here, 10% of the control subjects who resembled I may not have displayed any clinical features because of their age at the sample collection, or the absence of other important risk factors. The data analytic approach taken here tends to minimize the problem of control subjects at high-risk when identifying the genetic background relevant to disease.

No significant association between disease and common variability in LRRK2 has been previously reported in samples of European ancestry (Biskup et al., 2005; Paisan-Ruiz et al., 2005, 2006); however, these data suggest that LRRK2 variations may contribute to the risk for sporadic PD in the North American population and that this contribution is triggered mainly by multiple low frequency minor alleles scattered throughout the LRRK2 locus. One speculation is that low frequency alleles as a class are less robust compared to the more common alleles. These results are cautionary suggesting that information on low frequency alleles should not be ignored in data analysis, e.g. they can be grouped together, that stringent p-values in genome-wide studies may ignore what might later turn out to be important risk factors, and that where possible the use of LD and higher dimensional data analysis may be needed to establish a pattern(s) of risk.

These findings indicate the importance of specific multiple minor alleles within the LRRK2 gene as a background to perhaps one-third of sporadic PD occurring at ages 60 and older, and that a second pattern of risk involving minor alleles at alternate loci might, in part, be a background to sporadic PD among the majority of cases. However, further analyses in the LRRK2 gene and additional molecular approaches, such as gene-gene interactions and gene-environment-interactions, are probably necessary in order to assess the role of minor alleles within the LRRK2 locus in idiopathic PD and to gain molecular insights into the biochemical pathway that underlies this complex disorder.

Acknowledgements

All samples used here were from the National Institute of Neurological Disorders and Stroke–supported Neurogenetics Repository hosted by the Coriell Institute for Research (Camden, NJ; http://ccr.coriell.org/Sections/Collections/NINDS/). This work was supported in part by the Intramural Research Program of the National Institute on Aging, National Institutes of Health, Department of Health and Human Services, project Z01 AG000957-06.

Ancillary