Summary
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
Mutations within LRRK2, most notably p.G2019S, cause Parkinson's disease (PD) in rare monogenic families, and sporadic occurrences in diverse populations. We investigated variation throughout LRRK2 (84 SNPs; genotype or diplotype found for 49 LD blocks) for 275 cases (European ancestry, onset at age 60 or older) and 275 neurologically healthy control subjects (NINDS Neurogenetics Repository). Three grade-of-membership groups, i.e. genetic risk sets, were identified that exactly matched many subjects (cases: 46, 4, 137; controls: 0, 178, 0), and distinguished 94% of the subjects (i.e. >50% likeness to one set). Set I, affected, carried certain low frequency alleles located in multiple functional domains. Set II was unaffected. Set III, also affected, resembled set II except for slightly elevated frequencies of minor alleles not defining set I. We conclude that certain low frequency alleles distributed throughout LRRK2 are a genetic background to a third of cases, defining a distinct subset.
Introduction
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
Parkinson's disease (PD, OMIM #168600) is a chronic neurodegenerative disease with a cumulative prevalence of greater than one per thousand people (Kuopio et al., 1999). It is well characterized clinically (resting tremor, bradykinesia, postural instability, rigidity) and pathologically (loss of dopaminergic neurons in the pars compacta of the substantia nigra). Genetically, rare monogenic families have identified five causative genes, the most common (∼7%) (Healy et al., 2008) being leucine-rich repeat kinase 2 (LRRK2; OMIM *609007) located on chromosome 12q12. Candidate gene studies have to date been less successful in demonstrating the genetic background to sporadic PD.
Our interest focused on the LRRK2 gene as both familial and sporadic cases in diverse populations are known to carry the p.G2019S mutation, e.g. ∼1% of sporadic cases with European ancestry (Healy et al., 2008). The gene is large, spanning 1.4 Mb and consisting of 51 exons and multiple functional domains (Leucine-rich repeats (LRR), Roc, COR, RAS, Kinase, WD40 motif). The encoded protein dardarin is thought to play a role in intracellular signaling (Marin et al., 2008). It is expressed in multiple brain regions, particularly in the substantia nigra, consistent with direct involvement in dopaminergic cell death.
Nonetheless, a genome-wide search (Fung et al., 2006) did not identify any SNP within (51 SNPs), or close to, LRRK2 as relevant to PD in the NINDS Neurogenetic Repository sample of 275 cases of European ancestry with onset at age 60 or older, and 275 neurologically healthy control subjects applying a rather stringent criterion (uncorrected P-value < 0.0001). These negative findings were not likely related to differing population structure for case and control subjects as comparison of the two groups demonstrated no appreciable differences (STRUCTURE; http://pritch.bsd.uchicago.edu/structure.html) (Fung et al., 2006; Falush et al., 2003).
Paisan-Ruiz et al. then sequenced all 51 exons, and at least 50 bp of flanking intronic sequence, for the sample (Paisan-Ruiz et al., 2008). Coding variants were found in twelve patients (4 cases carried p.G2019S) as well as in seven control subjects (no p.G2019S mutations). A total of 135 variants were identified in the sample including SNPs from the genome-wide study and those identified by sequencing, many were unique to one or several subjects. Considering the 84 SNPs having minor allele frequency of 5 or more, six SNPs scattered throughout the gene were associated with PD by χ2 testing using a weaker criterion (p < 0.05), indicated by an asterisk throughout this paper: rs1157655* (A allele, intron 2), rs1907632* (T, intron 11), rs11564205* (G, intron 34), rs11564203* (A, intron 39), rs11829088* (G, intron 39), and rs11564173* (A, intron 46).
Our goal was to take this information a step further by first identifying linkage disequilibrium blocks (LD) to simplify the data and render it more meaningful, and then, by identifying genetic risk sets for PD, each defined by genotype/diplotype frequencies for the LD blocks. This was accomplished by grade-of-membership analysis (GoM) (Clive et al., 1983; Woodbury & Clive, 1974; Woodbury et al., 1978; Woodbury et al., 1994).
GoM is a form of latent classification analysis that incorporates large amounts of information to identify major patterns within the data. It allows individuals to resemble one of the identified patterns, or GoM groups (here genetic risk sets) or more often, to partly resemble two or more groups. The degree of likeness of individuals to each GoM group is given by membership scores in the groups, like weights, which range from zero (no likeness of the subject to the GoM group) to one (an exact match), summing to one for each subject. This fuzziness with respect to individuals minimizes the number of groups needed to represent the sample. Unlike other forms of latent classification, it operates efficiently in L1 space (linear differences), rather than L2 space (sum of squares differences), providing 5-fold better ability to identify patterns according to the signal detection literature, i.e. high power compared to more usual genetic epidemiologic approaches (Corder et al., 2001). Importantly, the GoM groups (represented here by frequencies for genotypes/diplotypes), and the likeness of individuals to the groups (represented by membership scores), are jointly estimated using maximum likelihood (see Methods section), closely defining the space concerning LRRK2, avoiding multiple comparisons. The best number of groups is decided according to an information criterion or empirically, as in this instance when three groups were sufficient to distinguish most (94%) case and control subjects, also identifying a distinct subset, about a third, of cases.
Discussion
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
There is ample evidence that the LRRK2 gene is a determinant of PD in certain families and for the general population, involving the p.G2019S mutation (Healy et al., 2008) and allelic associations of SNPs located within LRRK2 (Paisan-Ruiz et al., 2008). We sought to identify patterns of polymorphisms within the gene that more fully describe the genetic background of sporadic PD in relation to LRRK2. To accomplish this aim we identified LD blocks to simplify the data and allow identification of low frequency alleles. Information on genotype/diplotype for these blocks identified three patterns of risk represented by GoM groups (I, II, and III). GoM has been employed in a similar way to reduce complex data to a tractable number of patterns in previous medical and genetic studies; to define subtypes of disease and patterns of disease progression, endophenotypes, genetic risk sets for disease, and as a form of sibpair linkage analysis having apparently high statistical power (Corder & Woodbury, 1993; Corder et al., 2000; Corder et al., 2001; Corder et al., 2005; Corder et al., 2006; Corder & Hefler, 2006; Corder & Mellick, 2006; Corder et al., 2007; Corder et al., 2008a,b; Corder & Beaumont, 2007; Golanska et al., 2009; Hallmayer et al., 2005; Helisalmi et al., 2004; Iivonen et al., 2004; Licastro et al., 2007a,b).
The three patterns distinguished between high (I, III) and low (II) risk with respect to LRRK2, also defining a distinct subset of minor alleles found together, distributed widely throughout the gene across functional domains (I). These stereotypic backgrounds effectively partitioned the subjects into three groups: 94% of the subjects resembled one of the patterns, and a third of the cases carried most of the minor alleles characteristic of pattern I.
To emphasize that a distinct subset of cases was characterized by a set of minor alleles found together distributed throughout the LRRK2 locus, the 46 cases who matched pattern I, did, in fact, carry one or two copies of minor alleles at all six loci previously identified as associated with sporadic PD in the dataset, here, represented by B4*, B15*, and B43*. These cases usually carried additional minor alleles also identified as part of pattern I. None of the subjects matching patterns II (65% of all controls) or III (50% of all cases) carried minor alleles at these locations. Four of the six associated loci (Paisan-Ruiz et al., 2008) were located in one LD block (B4*), rs1907632_T (intron 11), rs11564205_G (intron 34), rs11564203_A (intron 39) and rs11829088_G (intron 39) (i.e. ‘TGAG’ or other very minor alleles), that extends from intron 11 to intron 39 across several functional domains of dardarin, including LRR, Roc, COR and Kinase domains.
The minor alleles at high probability for pattern I, taking the broadest definition, were located in non-coding regions throughout the gene region from the 5′ near gene region to the 3′ near gene region, except for B31* located in exon 32. Thus, alterations within introns located throughout the gene appear to alter LRRK2 function especially when they are found together, possibly defining a distinct very high risk allele. This information might possibly be used in the future to identify persons at very high risk before the onset of symptoms, when preventive interventions might be undertaken. It might also motivate focused cell culture studies of LRRK2 function.
Moreover, when evaluated in this way, diverse low frequency diplotypes were more likely for I than for II or III at B7 (which provided no statistically significant evidence of association on its own) (10%; 3 SNPS extending from the 5′UTR to intron 7), B1 (16%; 7 SNPs extending from intron 30 to intron 49), and B12 (9%; 1 SNP located within the 3′ UTR). These low frequency variants were also scattered throughout the LRRK2 locus and most of them were located within intronic sequences, with the exception of rs1427263 and rs3761863 at B1 spanning functional domains such as Roc, COR, Kinase and WD40 domains. Thus, LRRK2 alterations were dispersed from the 5′ UTR to the 3′ UTR regions, involving essentially the whole gene.
Taking a weaker criterion, namely, at least 50% match to pattern I, 91 cases (33%) had a relatively distinct genetic background to PD involving minor alleles at the LRRK2 locus: >95% of these cases carried minor alleles at all four loci that comprise LD block B4*. In contrast, none of the control subjects matched pattern I, although 27 had >50% resemblance to pattern I and might possibly be at elevated risk for PD.
Pattern III representing the majority of cases had only slight elevations in the frequencies of minor alleles at other locations. Many cases matched (n = 137) or resembled (n = 178) this pattern. None of the control subjects matched or resembled pattern III. Possibly, pattern III is a mixture of many patterns of vulnerability involving LRRK2. The only point of overlap between I and III was that both patterns involved the possible occurrence of minor alleles at B31*. The stereotypic groups had the following probability of carrying a minor allele at B31*: 6% chance, 0%, 11%. Thus the minor allele was associated with high risk. However, eight control subjects (3%) carried B31*TC; three of whom resembled pattern I and might be considered to be at elevated risk for PD, and five of whom had limited resemblance to I and/or III having membership scores of from 0.33 to 0.41, and might be considered to be at lesser risk.
The conclusion that we draw is that a well-defined subset of PD occurring at ages 60 and older in populations with European ancestries has a pattern of multiple minor alleles found together. This information might be useful to define risk for presently healthy individuals. Whether these findings apply to other populations is an open question. None of the six SNPs included in a core set of matches to pattern I (B15*, B4*, B43*) would be useful when investigating Asian populations (Table 4); therefore, the investigated set of SNPs are not relevant to all other populations.
Table 4. Core SNP frequencies in diverse populations. | CEU | SNP | Position | ObsHET | PredHET | HWpval | MAF | Alleles |
|---|
| | rs11175655 | 38909994 | 0.183 | 0.193 | 1 | 0.108 | G:A |
| rs1907632 | 38936769 | 0.233 | 0.231 | 1 | 0.133 | G:A |
| rs11564205 | 39000276 | 0.233 | 0.231 | 1 | 0.133 | A:G |
| rs11564203 | 39010848 | 0.233 | 0.231 | 1 | 0.133 | G:A |
| rs11829088 | 39014046 | 0.233 | 0.231 | 1 | 0.133 | T:G |
| rs11564173 | 39036738 | 0.15 | 0.167 | 0.79 | 0.092 | G:A |
| YRB | SNP | Position | ObsHET | PredHET | HWpval | MAF | Alleles |
|---|
| | rs11175655 | 38909994 | 0.117 | 0.139 | 0.55 | 0.075 | G:A |
| rs1907632 | 38936769 | 0.217 | 0.219 | 1 | 0.125 | G:A |
| rs11564205 | 39000276 | 0.333 | 0.339 | 1 | 0.217 | A:G |
| rs11564203 | 39010848 | 0.317 | 0.289 | 0.87 | 0.175 | G:A |
| rs11829088 | 39014046 | 0.333 | 0.32 | 1 | 0.2 | T:G |
| rs11564173 | 39036738 | 0.333 | 0.299 | 0.75 | 0.183 | G:A |
| CHB-JPT | SNP | Position | ObsHET | PredHET | HWpval | MAF | Alleles |
|---|
|
| | rs11175655 | 38909994 | 0 | 0 | 0 | 0 | G:G |
| rs1907632 | 38936769 | 0.011 | 0.011 | 1 | 0.006 | G:A |
| rs11564205 | 39000276 | 0.011 | 0.011 | 1 | 0.006 | A:G |
| rs11564203 | 39010848 | 0.011 | 0.011 | 1 | 0.006 | G:A |
| rs11829088 | 39014046 | 0.011 | 0.011 | 1 | 0.006 | T:G |
| rs11564173 | 39036738 | 0 | 0 | 0 | 0 | G:G |
Mutations were not a major background to sporadic PD (<5%). The four case subjects who carried p.G2019S resembled III more than I, suggesting that the pattern of multiple minor alleles found for I was not needed for the mutation to be penetrant. The phenotypic variability, and incomplete penetrance, found in some p.G2019S carriers may depend on specific alterations found for LRRK2 and contributions of interacting proteins. Very low frequency alleles, mostly found in flanking intronic regions, played only a small role.
One limitation of many genetic studies is that control subjects are drawn from persons who are not yet affected rather than persons established to be at low risk. Here, 10% of the control subjects who resembled I may not have displayed any clinical features because of their age at the sample collection, or the absence of other important risk factors. The data analytic approach taken here tends to minimize the problem of control subjects at high-risk when identifying the genetic background relevant to disease.
No significant association between disease and common variability in LRRK2 has been previously reported in samples of European ancestry (Biskup et al., 2005; Paisan-Ruiz et al., 2005, 2006); however, these data suggest that LRRK2 variations may contribute to the risk for sporadic PD in the North American population and that this contribution is triggered mainly by multiple low frequency minor alleles scattered throughout the LRRK2 locus. One speculation is that low frequency alleles as a class are less robust compared to the more common alleles. These results are cautionary suggesting that information on low frequency alleles should not be ignored in data analysis, e.g. they can be grouped together, that stringent p-values in genome-wide studies may ignore what might later turn out to be important risk factors, and that where possible the use of LD and higher dimensional data analysis may be needed to establish a pattern(s) of risk.
These findings indicate the importance of specific multiple minor alleles within the LRRK2 gene as a background to perhaps one-third of sporadic PD occurring at ages 60 and older, and that a second pattern of risk involving minor alleles at alternate loci might, in part, be a background to sporadic PD among the majority of cases. However, further analyses in the LRRK2 gene and additional molecular approaches, such as gene-gene interactions and gene-environment-interactions, are probably necessary in order to assess the role of minor alleles within the LRRK2 locus in idiopathic PD and to gain molecular insights into the biochemical pathway that underlies this complex disorder.