Genetic Association Studies in Epilepsy: “The Truth Is Out There”
Address corresponding Author: Professor Samuel F Berkovic, Epilepsy Research Centre, University of Melbourne, 1st Floor Neurosciences Building, Austin Health, Banksia Street, Heidelberg West, VIC 3081, Australia. E-mail: firstname.lastname@example.org
Summary: Success has been achieved in identifying many mutations in rare monogenic epilepsy syndromes by using linkage analysis, but dissecting the genetic basis of common epilepsy syndromes has proven more difficult. Common epilepsies are genetically complex disorders believed to be influenced by variation in several susceptibility genes. Association studies can theoretically identify these genes, but despite more than 50 association studies in epilepsy, no consistent or convincing susceptibility genes have emerged, leading to scepticism about the association-study approach. We review the results of existing association studies in focal epilepsies, generalized epilepsies, febrile seizures, and epilepsy pharmacogenetics. By using an illustrative example, we discuss how methodologic issues of sample size, selection of appropriate controls, population stratification, and significance thresholds can lead to bias and false-positive associations; the importance of biologic plausibility also is emphasized. Newer methodologic refinements for association studies, such as use of two control groups, genomic control, haplotyping, and use of two independent datasets, are discussed. A summary of existing guidelines and a checklist for planning and appraising such association studies in epilepsy is presented. We remain cautiously optimistic that with methodologic refinements and multicenter collaborations with large sample sizes, association studies will ultimately be useful in dissecting the genetic basis of common epilepsy syndromes.
Epilepsy is heterogeneous, incorporating numerous epilepsy syndromes with different etiologies. When a genetic basis exists for epilepsy, defining the genetic contribution has proven to be a formidable task (1). Success has been achieved in some families with rare monogenic epilepsy syndromes (2), with mutations of large effect where concordance between genotype and phenotype is reasonably strong. In contrast, progress has been slow for most of the common epilepsy syndromes encountered in daily clinical practice.
It is posited that most epilepsies, much like other common diseases such as diabetes or asthma, are influenced by the effect of variation at several or multiple genes. Where this variation is in the form of common single nucleotide polymorphisms (SNPs), the model is termed the common-disease/common-variant hypothesis (3); some evidence exists to support this hypothesis (4). However, where the underlying genetic basis is a rare variation (present at a frequency of less than 1%), then the model is termed the common-disease/rare-variant hypothesis (5). Irrespective of which model, or likely mixture of both models, each gene contributes a small or modest effect to the epilepsy phenotype, and by itself is insufficient to cause epilepsy. These are “susceptibility genes.” In addition, environmental factors may play a part (6). Concordance between genotype and phenotype is therefore relatively weak compared with monogenic disorders. Common epilepsy syndromes are thus polygenic or complex disorders, with the latter term preferred, as it also includes environmental influences. Approaches for genetic dissection of complex disorders have been reviewed in detail (6,7), and a summary of the two commonly used methods follows.
Linkage analysis has been the traditional tool used in monogenic diseases to narrow the number of candidate genes, one of which will have the underlying molecular defect (7). Classic linkage analysis usually requires large families with many affected members (multiplex families), or multiple families in which the same mutant gene is responsible. This procedure is efficient for monogenic disorders, in which the pathogenic effect is strong enough to define the affected status on clinical grounds, without too much confusion from incomplete penetrance (mutation carriers who are not clinically affected), or phenocopies [such as relatives with epilepsy but different biologic disease mechanisms, which may or may not be genetic (e.g., poststroke seizures)].
Complex disorders such as the common epilepsy syndromes seen in daily clinical practice are theoretically amenable to linkage analyses, particularly nonparametric approaches (8); however, these have less power compared with linkage analyses carried out for monogenic disorders, and success with these approaches has been limited. The pathogenic effect of each putative “susceptibility gene” is small or modest, making detection difficult. To complicate matters, phenocopies and nonmendelian modes of genetic inheritance (8) further muddy the waters. Therefore those who are clinically affected may not all have the same susceptibility genes, whereas a particular susceptibility gene may be found in both affected and unaffected persons, frustrating our attempts at dissecting common epilepsies.
Use of linkage methods for common epilepsy syndromes also is limited by availability of multiplex families. Although such families are not uncommon in the idiopathic generalized epilepsies or febrile seizures, they are less common in focal epilepsies. Two genome-wide linkage studies have thus been performed in idiopathic generalized epilepsies (9,10), resulting in several possible susceptibility loci; a novel epilepsy gene has been identified after investigation of one such locus (11). Overall success, although encouraging, has been limited, partly because of the practical difficulties and also because of relatively small sample sizes used so far. Attention has thus turned to association studies for dissecting complex diseases.
Association studies compare the frequency of specific alleles in affected cases against those in unaffected controls. Instead of tracking the co-inheritance or otherwise of marker and disease alleles within families, association studies generally involve populations (and usually larger numbers of individuals). An allele is said to be associated with the disease if its frequency differs between cases and controls more than would be predicted by chance (8), provided the controls are representative of the test population in all aspects other than disease affection status.
Guilt by association, however, is not sufficient proof of causation (12). Linkage disequilibrium (LD) occurs when alleles are found together more than would be predicted by chance; an associated allele may therefore not be causative but instead in LD with the actual pathogenic allele. Alternatively, the association may have arisen by chance, or it may be artifactual because of methodologic weaknesses and consequent bias (12,13). Proof of an association ultimately lies in replication of the initial findings in several different populations and demonstrating that the associated allele plausibly alters biologic function, leading to disease (12,14).
NOT QUITE READY FOR PRIME TIME?
Association studies may provide greater statistical power than linkage analysis in dissection of complex diseases under certain circumstances (15,16). Case collection also is easier. However, association studies have performed poorly thus far. A cross-disciplinary review showed that of more than 600 associations, only 166 were studied 3 times or more, and of these, only six were consistently replicated (17). Indeed, replication failure was the norm, leading one journal openly to discourage submission of association studies in complex disorders unless major biologic insights are found (18). This has led to scepticism about each new published association, and journals have therefore published guidelines for genetic association studies in an attempt to stem the tide of nonreplicable studies (19–22).
However, methodology in association studies is evolving, and nonreplicable studies from even as recent as 5 to 6 years ago may reflect the lower stringency of the available methods of that time. New methods for correction of population stratification (13), increasing use of family-based controls (as opposed to population-based controls), use of two independent datasets of cases and controls within one study, and increasing rigor in defining statistical significance have gradually become more common in published studies (14).
EPILEPSY: THE SCORE SO FAR
The situation in neurology and epilepsy is not dissimilar, with many reports of a novel association cast into doubt by subsequent replication failure. Multiple conflicting and nonreplicable association studies have been done in the past 7 years for both focal and generalized epilepsy syndromes. Studies on febrile seizures and the pharmacogenetics of medically refractory epilepsy also have been performed.
We summarize the results of the existing studies, with a brief analysis of their strengths and weaknesses, and endeavor to identify common limitations underlying these studies. Association studies of interleukin genes and temporal lobe epilepsy (TLE) provide a good overview of the problems of association studies and will be used as an example. Analysis of the other association studies is detailed in Tables 1 to 3. Finally, we also attempt to synthesize the existing sets of guidelines for association studies (1,19–22) into a checklist, specific for epilepsy, for readers and researchers in this evolving field.
Table 1. Summary of association studies in focal epilepsy
|Interleukin 1β and temporal lobe epilepsy (TLE) See example|
|Prodynorphin gene and TLE|
| Stogmann et al. (39)||155 (43 with family history of seizures)||L-allele associated with increased risk of TLE in subgroup with family history of seizures||Unclear if subgroup analysis was prespecified before analysis or not Corrected p value = 0.002||1. If positive results were derived through analysis of multiple subgroups defined after primary analysis (as opposed to preplanned analyses), this may lead to false positives (22,42)|
| Tilgena et al. (40)||182 (46 with family history of seizures)||L-allele not associated with increased risk of TLE as a whole or in subgroup with family history of seizures in both studies (40,41)||Both negative replication studies have only slightly more cases with family history than initial study. Initial association possibly due to chance but debated (40).||2. Negative replication studies had low number of TLE with family history of seizures; all three prodynorphin study findings pertain to small subgroups of <61 with resulting imprecision in risk estimates. Larger studies needed|
| Gambardellaa et al. (41)||175 (60 with family history of seizures)|| |
|ApoE gene and TLE|
| Blumcke et al. (85) |
Gambardellaa et al. (86)
|No association between ApoEɛ4 and TLE or age at onset of TLE in both studies (85,86)||Three ApoE alleles imply six possible genotypes; demonstrating significant association between genotypes is thus more difficult, as each comparison group is small, leading to reduced statistical power||1. Positive subgroup analysis should be considered as hypothesis generating rather than confirmatory |
2. Overall evidence suggests ApoE probably not a major susceptibility gene for TLE
| Briellmann et al. (43)||43||ApoEɛ4 allele may shorten latency between initial insult and seizure onset||Positive findings through subgroup analysis only|| |
| Mercier et al. (87)||48||ApoEɛ4 allele associated with TLE and earlier onset of seizures||Small sample size, borderline significance (p = 0.028)|| |
|GABA(B) receptor 1 gene and TLE|
| Gambardella et al. (48)||141||G1465A polymorphism in GABA(B) receptor 1 gene associated with TLE||Missense variant is located within highly conserved region; high genotypic (A/G vs. GG) odds ratio of 38; logistic regression used to offset potential population stratification||1. Missense variant is reasonably plausible; functional confirmation is lacking |
2. Logistic regression offsets overt stratification but not cryptic stratification
|Brain-derived neurotrophic factor (BDNF) gene and partial epilepsy|
| Kanemoto et al. (47)||219||T allele of C240T polymorphism associated with increased risk of partial epilepsy||Link between this polymorphism and epilepsy unclear; association also contrary to authors' a priori hypothesis that T-allele was protective against epilepsy||1. Weak biologic plausibility |
2. Presence of linkage disequilibrium within BDNF suggests that the C240T polymorphism may be a marker for the true susceptibility allele, assuming the C240T association is not spurious
|Sodium-potassium transporting ATPase gene (ATP1A2) and TLE|
| Buono et al. (44)||56||4 base-pair polymorphism in ATP1A2 not associated with TLE||Frequency of this allele in blacks more than twice that of whites; population admixture seen (blacks and whites in cases and controls), but proportions were equal because of matching||1. Potential false-positive association from population stratification avoided because of ethnic matching |
2. Small sample size, however, raises possibility of false-negative findings due to inadequate power
|Prion protein gene and mesial TLE with hippocampal sclerosis (HS)|
| Walz et al. (45)||100||Variant allele (Asn171Ser) associated with TLE-HS and with poorer seizure control after epilepsy surgery in a Brazilian cohort||Population admixture seen but proportions were equal due to matching; no genomic control used||Genetic admixture in Brazil (46) raises the possibility of cryptic stratification; genomic control useful for confirming veracity of association|
Table 2. Summary of association studies in idiopathic generalised epilepsy
|A. Positive studies with replication|
|Opioid receptor μ-subunit gene (OPRM) and idiopathic absence epilepsy (IAE)|
| Sander et al. (50)||72||G-allele of A118G polymorphism in OPRM1 associated with IAE||Borderline significance (p = 0.019), possible stratification as cases and controls drawn from different sources||1. Analyses with family-based controls are stratification-free by design, which if positive strengthens association found with population-based controls|
|Wilkiea et al. (51)||230 (mixed group of 115 IAE and 115 of other IGE types)||G-allele of A118G polymorphism in OPRM1 associated with IGE as a whole, but not with IAE specifically||Both family-based and population-based controls used to confirm association with IGE||2. Note initial association with IAE not borne out in replication study (51) despite larger numbers of IAE subjects |
3. Further confirmation of this association needs to be done in nonwhite populations as patterns of LD varies in different populations (51)
|B. Positive studies with replication failure|
|1. Voltage-gated calcium channel gene (CACNA1A) and IGE|
| Chioza et al. (88)||188||Single nucleotide polymorphism (SNP) designated SNP8 in CACNA1A associated with IGE||Both family-based and population-based controls used to confirm association with IGE||1. SNP8 is a silent SNP that does not change the coding sequence, implying that it is in LD with the true variant|
| Sandera et al. (89)||354||SNP8 not associated with IGE||Both family-based and population-based controls used to refute association||2. Replication failure occurred despite larger sample size, thus statistical power is unlikely to be the cause |
3. Either the initial association was a false-positive due to chance, or pattern of LD differed in the two study populations (less likely)
|2. Neuronal nicotinic acetylcholine receptor α4 subunit gene (CHRNA4) and IGE|
| Steinlein et al. (55)||103||Silent polymorphism in CHRNA4 tentatively associated with IGE||Borderline significance (p = 0.02) and weak biologic plausibility||1. Initial study (done in 1997) showed weakness inherent in earlier studies –“off-the-shelf” controls from different populations, weak significance (p = 0.08 after correction for multiple comparisons)|
|Chiozaa et al. (90)||182||Initial published association of CHRNA4 not confirmed||Both family-based and population-based controls used to refute association||2. Second study (2000) showed better methodology— two control sources, and matched population controls from the same source|
|3. GluR5 kainate receptor gene (GRIK1) and juvenile absence epilepsy (JAE)|
| Sander et al. (91)||15||Tetranucleotide polymorphism in GRIK1 associated with JAE||Family-based controls used||1. First study probably false-positive |
2. There may still be a true susceptibility allele in LD with GRIK1 but unlikely
| Izzia et al. (92)||25||Sequencing of GRIK1 in JAE subjects with associated tetranucleotide polymorphism showed no causative mutations||Exons, regulatory regions, and intron-exon boundaries were screened but proved negative|| |
|4. Voltage-gated potassium channel gene (KCNQ3) and IGE/juvenile myoclonic epilepsy (JME)|
| Haug et al. (61)||71 (IGE)||No association found between IGE and KCNQ3 using transmissiondisequilibrium test (TDT)||Family-based controls used||1. Each study used slightly different subjects (all types of IGE in 1 study, only JME in the other), thus possible that KCNQ3 is associated with JME and not other IGE types, but remains to be replicated|
| Vijaia et al. (93)||119 (JME)||Association found between JME and KCNQ3 using TDT||Family-based controls used||2. Studies using family-based controls provide less statistical power than those using population controls|
|C. Positive studies pending replication|
|1. IGE studies|
| Sander et al. (94)||366||Human anion exchanger 3 gene (AE3) associated with IGE||p = 0.021; both family-based and population-based controls used||Borderline statistical significance|
| Chioza et al. (95)||187||G protein–activated inward-rectifying potassium channel (KCNJ3) associated with IGE||p = 0.051||Borderline statistical significance|
| Sander et al. (96)||133||Dopamine transporter gene (DAT) associated with IGE||p = 0.043||Borderline statistical significance|
|2. Childhood absence epilepsy (CAE) studies|
| Chen et al. (97)||118||12 mutations in calcium channel gene CACNA1H found in CAE subjects but not in controls||Reasonably plausible findings as these missense mutations are in a highly conserved region of CACNA1H||Combined approach of mutation screening in cases, then testing for allelic association against controls|
| Gu et al. (98)||42||Polymorphism in leucine-rich glioma activated 4 gene (LGI4) associated with CAE||Small sample size, borderline significance (p = 0.01)||Combined approach of mutation screening in cases, then testing for allelic association against controls|
|3. JME studies|
| Pal et al. (75)||20||SNPs in promoter region of BRD2 (RING3) associated with JME||Three control groups (both family-based and population-based controls) used||Good methodology but biologic plausibility uncertain, replication pending|
|D. Negative studies|
| Samochowiec et al. (52)||119||5HT2c receptor gene not associated with IGE|| |
| Haug et al. (99)||143||Activity-regulated cytoskeleton-associated gene (ARC) not associated with IGE|| ||Mutation screening then testing for allelic association|
| Sander et al. (96)||133||Glutamate transporter (EAAT2) and serotonin transporter (SERT) genes not associated with IGE|| |
| Sander et al. (100)||118||GABA(B) receptor 1 gene not associated with IGE|| |
| Lu et al. (56)||68||GABA(B) receptor 1 gene not associated with CAE in Chinese|| ||Mutation screening then testing for allelic association|
| Sobetzko et al. (59)||104||Glycine receptor subunit genes (GLRA3 and GLRB) not associated with IGE|| ||Mutation screening then testing for allelic association|
| Goodwin et al. (101)||115||Type 3 metabotropic glutamate receptors mGluR7 and mGluR8 not associated with IGE|| |
| Izzi et al. (57)||144||Type 4 metabotropic glutamate receptor (GRM4) not associated with JME|| ||Mutation screening then testing for allelic association|
| Chioza et al. (102)||220||C5733T and R482stop mutations in calcium channel CACNA1A not associated with IGE|| |
| Chen et al. (58)||192||T-type calcium channel gene α (1G) is not associated with CAE in Chinese|| ||Mutation screening then testing for allelic association|
| Steinlein et al. (103)||115||Potassium channel gene KCNQ2 not associated with IGE|| ||Mutation screening then testing for allelic association|
| Sander et al. (104)||126||Potassium channel hKCa3 not associated with IGE||Both family-based and population-based controls used|| |
| Kananura et al. (105)||65||Potassium channel TASK-3 not associated with idiopathic absence epilepsy (IAE)|| ||Mutation screening then testing for allelic association|
| Haug et al. (106)||46||Sodium channel gene SCN2A not associated with IGE|| ||Mutation screening then testing for allelic association|
| Haug et al. (107)||92||Sodium channel gene SCN2B not associated with IGE|| ||Mutation screening then testing for allelic association|
| Haug et al. (53)||248 (126 with JME, 122 with IAE)||Monoamine oxidase A gene not associated with either JME or IAE|| |
| Sander et al. (54)||125 (70 with JME, 55 with IAE)||Paired-box–containing gene (PAX6) not associated with either JME or IAE|| |
Table 3. Summary of association studies in febrile seizures
|A. Positive studies with replication failure|
|1. GABAA receptor γ2-subunit gene (GABRG2)|
| Chou et al. (62)||104||C-allele of SNP211037 in GABRG2 gene associated with febrile seizures||Silent SNP, does not change amino acid sequence. Population-based controls used||1. Weak biologic plausibility for association |
2. Initial study (62) also examined two SNPs for association but did not correct for multiple testing
| Nakayamaa et al. (65)||94||C-allele of SNP211037 in GABRG2 gene not associated with febrile seizures||Mutation screening then testing for allelic association; both family-based and population-based controls used|| |
| Virta et al. (63)||35||Allele 2 at position –511 of the interleukin-1β associated with febrile seizures||Borderline significance (p = 0.03); allele is the same allele discussed in the interleukin-1β example||1. As in the example, biologic plausibility is weak |
2. Initial study also had a small sample size
| Tilgena et al. (66)||99||Association with allele 2 not confirmed|| |
|3. Neuronal nicotinic acetylcholine receptor α4 subunit gene (CHRNA4)|
| Chou et al. (64)||102||T-allele at SNP1044396 in CHRNA4 associated with febrile seizures||Silent SNP, also no clear dose–response relation||Overall, biologic plausibility is weak|
| Mulleya et al. (67)||49||Association with T-allele not confirmed|| ||Replication study negative but underpowered to rule out a small effect|
|B. Positive studies pending replication|
| Tsai et al. (68)||51||Polymorphism in interleukin-1 receptor antagonist gene associated with febrile seizures||Borderline significance (p = 0.03); weak biologic plausibility; no correction for multiple comparisons||Replication pending, but possibly a spurious association|
|C. Negative studies|
| Chou et al. (69)||77||Potassium channel gene KCNQ2 not associated with febrile seizures|| |
| Tsai et al. (70)||51||Interleukin-4 gene not associated with febrile seizures|| |
AN ILLUSTRATIVE EXAMPLE: INTERLEUKIN-1β AND TLE
Interleukin is a proinflammatory cytokine, and receptors for interleukin 1 have been found in the hippocampus. An initial report of association between a polymorphism in the promoter region of the interleukin-1β (IL-1β) gene and TLE (23) was followed by three negative replication attempts (24–26), with one other positive report of association in a different study cohort using patients with partial epilepsy (27).
The initial Japanese study (23) examined four interleukin polymorphisms in three groups of subjects: 50 with TLE and hippocampal sclerosis (HS), 53 with TLE but without HS, and 112 healthy controls. All subjects came from the same geographic area.
Only one polymorphism was found to be significantly associated. TLE with HS was associated with homozygosity for the T allele (TT homozygosity) at position –511 of the promoter region of IL-1β. After correction for multiple comparisons, the corrected p value was 0.017. No significant difference was found in T-allele frequencies between groups.
Three replication studies examining the same IL-1β-511 polymorphism were performed in German (24), American (25), and Chinese (26) populations, comparing between 61 and 86 cases of TLE with HS against normal controls. None found evidence of an association.
A Finnish study using a heterogeneous group of 48 subjects with focal epilepsy found an association between the T allele but not TT homozygosity (27). A later subset analysis of the Japanese cohort (with a slightly larger cohort) found an association between T alleles and TLE with prolonged febrile convulsions (28).
Nonreplicability: Possible reasons
What is one to make of the conflicting data? Do the three unsuccessful replication studies negate the findings of the first study? The possible reasons for nonreplication have recently been reviewed (14), but we discuss those pertinent to the IL-1 studies.
Population stratification and spurious association
Populations differ in the frequency distributions of alleles at the same loci, reflecting differing ancestral history that might include responses to natural selection, migration patterns, and stochastic (random) effects such as population bottlenecks (e.g., natural disasters) and founder events. Different ethnic groups may thus have different allele frequencies in both disease and nondisease genes. If controls do not exactly reflect the ethnic structure of the test population and the test and control populations have different allele frequencies at loci unrelated to the disease studied, the study can show spurious association attributable to such population stratification. This leads to reports of associations between a disease and genes that are biologically unrelated to the disease (13).
A whimsical but pithy example is given by Lander and Schork (8). If one were to do a genetic association study in San Francisco by using skill with chopsticks as the “disease phenotype,” one might find that the HLA-A1 allele is positively associated with chopstick skills, although it clearly has no biologic plausibility. This is because allele HLA-A1 is more common in Asians, who probably would be overrepresented in cases if controls were not accurately selected for ethnic origin. Population stratification has thus caused a spurious association.
Undetected population stratification is often suspected for nonreplicable studies (13), and good evidence indicates that population stratification exists even in well-designed studies (29). Selection of appropriate controls is therefore critical. By sampling and matching cases and controls from the same source population or geographic region, stratification can be minimized, although not necessarily eliminated. Complementary methods for controlling for cryptic population stratification by using genetic markers unlinked to the marker locus being examined also can be used during analysis to determine the veracity of the association (13,29,30).
In addition, available alternative strategies use family-based controls such as siblings or parents (for example, the transmission-disequilibrium test) (6). These methods detect associations independent of population stratification and have become more widely used, although they have less statistical power than traditional case–control designs with population controls (22).
Returning to our example, the initial Japanese study (23), as well as the three subsequent replication attempts (24–26), drew cases and controls from the same geographic area or matched them for ancestry or both. Stratification is less likely to be the cause of nonreplication. Neither of the positive studies (23,27) used methods to control for cryptic stratification, as these methods were available only from the year 2000.
Statistical power: Bigger is better
An important cause of replication failure is statistically underpowered replication attempts. Assuming that the initial positive study reflects a true association, publication bias may overestimate the true effect size. Estimates of effect size will tend to regress to the true effect size in subsequent studies, which is usually less extreme (14,31,32). To avoid false-negatives (type II errors), replication attempts must factor this in by increasing sample size to improve statistical power to detect the association at a lower odds ratio (14,32).
Of the three IL replication studies, only one (24) specifically calculated the statistical power of their replication attempt, deliberately amassing a study cohort 72% larger than the that of the initial study. Case sample size was only 22 to 34% larger for the other two studies (25,26).
It should be noted, however, that the average number of cases for all three replication studies was only 71. Although the initial odds ratio (23) was 3.3, if the true effect size was smaller (for example, <2.0, which is not inconceivable in a complex disorder), all three studies would be underpowered. A minimum of ∼250 case–control pairs would be required to detect a difference reliably if the true genotype risk ratio were 2.0, with the sample size needed inversely proportional to risk (6). Although this sample size may appear onerous, it illustrates that any replication study must amass larger numbers both relative to the study it is trying to replicate and on an absolute numeric basis, with tacit assumptions of small effect sizes. Bigger is indeed better (20).
False-positive associations and statistical significance
Given that numerous researchers worldwide are performing association studies on multiple polymorphisms in many epilepsy syndromes, it is inevitable that despite robust study designs and analyses, positive results that are significant at the 5% level will be found by chance. Publication bias further exacerbates this problem, as initial negative studies are less often submitted or published (14,32), although recent evidence suggests that this issue may be less a problem than originally feared (4).
Adoption of more rigorous criteria for declaring statistical significance will help to reduce this problem. Published criteria are available for statistical significance in genome-wide linkage studies in complex diseases (33), but no good consensus has been reached on the significance threshold for association studies. Although thresholds such as 0.01, 0.0005, or even 1 × 10-8 have been suggested (14,20,31), other existing authorities are less specific on thresholds (19,21). Furthermore, the issue of multiple comparisons and appropriate use (and misuse) of the Bonferroni method must be dealt with (34), although Bayesian methods have been proposed as an alternative to the Bonferroni method (35).
An insufficiently stringent significance threshold is the most likely cause of nonreplicability in our IL example. Leaving aside the issue of Bonferroni corrections, the corrected p value was 0.017, which would not meet any of the suggested significance criteria (14,20,31). It is therefore possible that the findings arose by chance alone, which explains the subsequent replication failures. However, we cannot discount that a true (but small) effect may exist that all three replication studies failed to confirm because of inadequate sample size. In such situations, meta-analysis may provide an answer in the face of conflicting data (32,36).
Even if an allele–phenotype association has been demonstrated and replicated in several methodologically robust studies, it is not automatically the genetic culprit (12). Because of LD, a possibility exists that this allele is not the “smoking gun.” It may merely be a marker that has “hitchhiked” to significance through linkage to the true pathogenic variant within the same or adjacent gene (12). The human genome appears to be structured into blocks of LD (37); this in turn implies that the true culprit could lie anywhere within a block comprising primarily “hitchhikers.”
Ultimately, the validity of the association must be demonstrated in a biologically meaningful way (14,19). However, such proof is usually not part of association studies, perhaps because it entails use of different technologies, which must be pursued through a substantial collaboration. Presence of a gene dose–response effect (such as risk increasing progressively with zero, one, or two alleles), or association with a class of polymorphism that is predicted to be at a higher risk of causing disease (such as a missense variant within an otherwise conserved region) (38), can be used as circumstantial evidence to infer causality. These are still less persuasive than experimentally demonstrating derangement of biologic function, although to be fair, it is sometimes the case that biologic function may not be fully understood at the time of the discovery, and biologic plausibility may emerge only later with increasing knowledge of gene function.
The initial IL study (23) failed to show a clear gene dose–response effect with the T allele. In addition, if the T allele truly conferred disease susceptibility, one would expect the T allele to be overrepresented in the cases. However, the allele frequencies were not significantly different before or after Bonferroni adjustment. Finally, no convincing link was shown between the associated polymorphism and epilepsy. These findings weaken the validity of the association.
In summary, the initial positive association between the IL-1β-511 polymorphism and TLE with HS was likely to have arisen by chance, given the weak p value and lack of biologic plausibility. This probably accounts for the three replication failures, although their small sample sizes weaken their credibility as replication studies.
OTHER ASSOCIATION STUDIES IN FOCAL EPILEPSY
Association studies in focal epilepsy have focused mainly on possible susceptibility alleles for TLE; these are summarised in Table 1, and we highlight some important points.
Interpretation of conflicting reports of association of prodynorphin with TLE (39–41) is difficult. Initial reports of association were limited to a subgroup of TLE with a family history of seizures (33); all prodynorphin studies, however, were statistically underpowered to detect association in this subgroup. Multiple subgroup analyses (especially if these are not prespecified before analysis) will increase the chances of a positive finding by chance (22,42), and resultant positive findings (39,42) should be considered as hypothesis-generating rather than definitive. Small subgroups also result in lack of precision in risk estimates.
The ATP1A2 study (44) provides a good illustration of population admixture. Cases and controls included blacks and whites, but because of ethnic matching, the ratios of blacks to whites were identical in both groups. Strikingly, the insertion polymorphism was more than twice as common in blacks as in whites. If not for ethnic matching, differences in the degree of population admixture in cases and controls could well have caused a spurious association as an outcome of population stratification. This illustrates the importance of detecting and correcting for stratification. A Brazilian study (45) also showed population admixture but also ethnically matched cases and controls; however, because of genetic admixture in Brazilians even within one ethnic group (46), cryptic population stratification remains a potential cause of spurious association (29). Genomic control, by using a set of unlinked markers, is one method of overcoming this problem (13,29).
The lack of clear biologic plausibility is another problem affecting most of these studies in focal epilepsy, without convincing functional abnormalities demonstrated. On the whole, it is reasonable to conclude that no susceptibility allele has been conclusively demonstrated thus far, although several candidates (45,47,48) await replication.
ASSOCIATION STUDIES IN IDIOPATHIC GENERALIZED EPILEPSIES
Genetic influences play a larger role in generalized epilepsies than in focal epilepsies (49); accordingly, association studies in idiopathic generalized epilepsy (IGE) have proliferated in the past 5 years in the hope of identifying susceptibility genes, especially given linkage data suggesting certain candidate regions (9,10).
No fewer than 25 genes have been examined as possible candidates. The majority of the studies have been negative, and thus no further replication attempts have been made on these. No clear susceptibility gene for IGE has yet been identified, save possibly the opioid receptor μ-subunit gene (50,51).
Our summary of association studies in IGE (Table 2) is divided into four groups: positive studies that have been replicated, initial positive studies with subsequent replication failure, weakly positive studies without published replication attempts, and finally, negative studies.
Notably, methodology has evolved over the past 5 years. Earlier studies used simpler case–control designs (52–54) and a less rigorous threshold of significance (55) compared with later studies. More recent studies have moved toward performing mutation analysis of candidate genes to identify possible susceptibility alleles in affected individuals, and then comparing the frequencies of these alleles with those in controls in a case–control design (56–59). Encouragingly, family-based controls are increasingly used, often in addition to population-based controls, allowing stratification-free analysis (6). This is in contrast to studies in focal epilepsy in which population-based controls predominate.
Unfortunately, many of the underlying problems that bedevil association studies in focal epilepsy still plague those in IGE. Population stratification, weak biologic plausibility, and significance thresholds continue to be concerns.
Small statistically underpowered studies also are common. Although sample sizes are generally larger than those in studies in focal epilepsy, probably because of easier patient availability, sample sizes in general still involve <200 cases. It is thus possible that sample size does not permit identification of alleles conferring low risk (6). In addition, two negative studies (60,61) with only 68 to 71 subjects used the transmission-disequilibrium test exclusively to test for association. This test has less statistical power than association studies using population controls; thus these may be false-negative results (22).
Overall, despite 33 association studies, no definite susceptibility gene for IGE has yet been identified, except perhaps the opioid receptor μ-subunit gene (50,51), although biologic plausibility is still uncertain for this association. Concern also is expressed that IGE may conform to a genetic model in which IGE develops only if an individual has sufficient polygenic variation to exceed a threshold, but requiring only a subset of a much larger group of polygenic susceptibility loci. Under this model, association studies would require even larger sample sizes, perhaps on the order of thousands or tens of thousands of subjects (6), although this increase in subject numbers has to be weighed against an increase in heterogeneity within the study population.
Genetic association studies in febrile seizures also have given discouraging and conflicting results (Table 3). Initial reports of positive association between the γ2-subunit of the γ-aminobutyric acid (GABA)A receptor (62), the IL-1β-511 polymorphism (63), and the α4 subunit of the neuronal nicotinic acetycholine receptor (64) were followed by negative replication studies (65–67). A single positive study associating IL-1–receptor antagonist polymorphisms (68) with febrile seizures is pending replication, whereas other negative studies appear to exclude a potassium channel (KCNQ2) (69) and the IL-4 gene (70) as major susceptibility genes.
Small sample sizes are again a problem, the largest study recruiting only 104 cases, which is disappointing given that febrile seizures are common. Varying proportions of simple and complicated febrile seizures in different studies also complicate comparison, rendering the overall picture far from clear. Given the frequency of febrile seizures, studies with sample sizes of thousands are feasible and ideally should be conducted on a multicenter stage. However, febrile seizures may be just as genetically heterogeneous as IGE, and even larger sample sizes may be needed.
One study has reported an association between a potassium channel and overall seizure susceptibility in a heterogeneous group of subjects with epilepsy (71). The study population consisted of a mixture of IGE and focal epilepsies, and a minor allele of the KCNJ10 potassium channel gene was found to be associated with seizure resistance. The subjects, all of European ancestry, were derived from two geographically different source populations (United States and German), with differing degrees of admixture in cases (30% German) and controls (50% German). Differing allele frequencies were noted in United States and German controls. Population stratification resulted, despite apparent ethnic matching, which accords with recent evidence that stratification exists even in well-designed matched studies (29). Statistical adjustment was performed, but given that this merely adjusts for overt but not cryptic stratification, methods such as genomic control (30) should be used as a statistical adjunct to verify association.
An association between resistance to antiepileptic drugs (AEDs) and a silent SNP in the drug-transporter gene ABCB1 was recently reported (72). The risk of pharmacoresistance was 2.7 times higher in the CC versus the TT genotype. Subject recruitment in this study was robust; prospectively recruited cases were clearly phenotyped, and all study subjects were drawn from the same source. Although these results are intriguing, the associated polymorphism is silent; the actual causal variant remains unidentified, as the associated SNP lies within a large block of LD. The authors also identified common haplotypes within ABCB1 containing the associated SNP, but no information was published as to whether any particular haplotype conferred risk of pharmacoresistance.
Notably, this was the sole genetic-association study in epilepsy using genomic control, in which a set of unlinked markers is used to correct for cryptic population stratification (30). The association with ABCB1 persisted even after this correction, confirming that this association was unlikely to be due to population stratification. Our group, however, was unable to replicate these findings despite doubling the sample size (73); we believe that the initial findings may be due to chance.
GENOME-WIDE ASSOCIATION STUDIES
With the recent compilation of the draft human genome sequence, genome-wide studies of association are becoming a reality. More than 2 million SNPs are available as markers for association throughout the entire human genome. Each SNP could potentially be tested for association with any epilepsy phenotype, leading to an avalanche of positive associations, a large number of which will be false positives due to chance. The problems that plague existing association studies will thus be magnified with genome-wide association studies and inflation of false-positive rates (22). The most pressing problem of defining an appropriate level of significance is yet unsolved, whereas biologic plausibility for any association is often unclear.
Some success has been reported for genome-wide association studies in myocardial infarction (74), including demonstration of biologic plausibility of the candidate gene, but such studies are not commonplace as yet. Unlike genome-wide linkage studies (9,10), no published genome-wide association studies in epilepsy are available.
RECOMMENDATIONS AND FUTURE APPROACHES
Despite more than 50 association studies in epilepsy involving more than 30 genes, the overall impression is that of uncertain associations and failed replication attempts. Problems of potential population stratification, inadequate sample size, borderline statistical significance, and unconvincing biologic plausibility recur. The search for a plausible susceptibility gene for any common epilepsy syndrome has so far been unsuccessful; the truth is still “out there.”
Recent guidelines have attempted to improve standards for genetic-association studies (1,19–22). We attempt to synthesize the key points and provide a checklist for readers (Table 4). Emphasis has been made earlier on issues of methodology, statistical analysis, and biologic plausibility. It is worthwhile now to examine newer methodologic refinements that are applicable to the dissection of the genetic basis of epilepsy.
Table 4. Guidelines for appraising genetic association studies in epilepsy
|1. Study design|
| a. Clearly defined phenotype||Cases should be defined by using standard ILAE classification (108), with supporting clinical, EEG, and neuroimaging data|
| b. Defined source of controls||For population controls, preference should be given to ethnically matched controls derived from the same source population as cases; family-based controls should be considered; use of two or more independent control groups is preferred, with controls selected to provide maximal similarity to cases to reduce confounding; arbitrary “off-the-shelf” controls are discouraged|
| c. Adequate sample size||Consideration must be given to effect size and allele frequencies when calculating appropriate sample sizes to provide ≥80% statistical power; larger sample sizes are ideal; other factors such as weak effects, rare alleles, and incomplete LD should ideally be factored in as a “worst-case scenario”; replication studies should be statistically powered to detect a smaller effect size than the original positive study|
|2. Molecular genetics|
| a. Blinding||Analysis should be performed blinded|
| b. Case–control batches||If analysis is performed in batches, cases and controls should be present in each batch to avoid genotyping errors introducing bias|
|3. Statistical analysis|
| a. Clear a priori hypothesis||Testing multiple hypotheses, then publishing only positive findings is disingenuous and leads to false-positive results|
| b. Multiple testing||If multiple testing with multiple markers or with independent phenotypes is done, this must be stated and corrected for; subgroup analysis should be considered only as hypothesis generating, unless the subgroup was prespecified in the study design, and biologic plausibility demonstrated|
| c. Population stratification||Cases and controls should be matched for ethnic and/or geographic origin; methods used to correct for cryptic stratification should be considered for positive studies|
| d. Haplotypes||If haplotypes are used, the method of construction must be described|
| a. Strength of association||Odds ratios and 95% confidence intervals should be stated|
| b. Positive results||Steps taken to minimize potential sources of bias in the study should be described|
| c. Negative results||Was the study adequately powered? possible reasons for negative findings such as sample size, unexpectedly low allele frequencies, or decreased statistical power with family-based controls should be stated|
|5. Biologic plausibility|
| a. Functional changes||The putative associated allele should be shown to alter function in a biologically meaningful way linked to the epilepsy phenotype|
| b. Marker polymorphisms||If functional changes are unconvincing (e.g., silent SNPs), demonstration of the size of the block of linkage disequilibrium that the associated polymorphism falls into should be demonstrated; sequencing for other potentially pathogenic variants within this block should be carried out to search for the true causative variant|
| c. Dose–effect relation||A linear trend in disease risk should be shown to exist with increasing numbers of copies of the risk allele; if this is not shown, is the mechanism for a dominant effect explicable from a biologic viewpoint?|
| a. Independent replication||Replication in an independent group lends strength to the association|
| b. Two independent datasets||Studies involving two independent datasets of case–control groups that both show convincing association are preferred|
Use of two or more independent control groups
Detection of association by using both population- and family-based controls strengthens the association. This approach is particularly suited for IGE, as it is an early-onset epilepsy syndrome, implying that genetic material will often be available from both parents for use as controls. In contrast, other late-onset neurologic diseases such as stroke or dementia are disadvantaged by unavailability of such controls. One recent study has used three control groups, drawing from population- and family-based controls, which is encouraging (75).
Use of haplotypes in association studies
LD is a double-edged sword for analyzing complex diseases (37). It complicates association studies, as an associated polymorphism falls within a haplotype block, and it is therefore difficult to identify the true causative variant of all the SNPs within the block.
However, compensatory advantages exist. It appears that much of the human genome can be parsed into haplotype blocks (76), and haplotype diversity is limited, with between two and four haplotypes in a specific region describing most of the diversity in a population. The implication is that only a few “tagging” SNPs of the total number of SNPs within a block must be identified, and their identity can then be used to extrapolate the identity of the other SNPs within the haplotype block (37,77).
The HapMap consortium is currently developing a haplotype map of the human genome (78). This reduces the complexity and the cost of genome-wide association studies. The daunting problem of millions of candidate SNPs may be simplified to possibly thousands of haplotypes.
Additionally, one can instead use the haplotype as a risk factor, implicitly acknowledging indeterminate SNP causality within a block. This method has been used for Crohn's disease (79) and more recently for Parkinson's disease, in which association studies using SNPs had given conflicting results (80). Use of haplotypes rather than SNPs may potentially improve statistical power (81) under certain circumstances, although debate still exists as to whether SNPs or haplotypes are more appropriate for association studies (82).
Use of two independent datasets within one study
Two recent examples from cardiology and immunology are illustrative of this approach (74,83). Both used an initial dataset to screen and find several polymorphisms that were significantly associated. A second independent dataset was then used to verify the initial associations.
This approach in effect entails two studies: an initial and a replication study. Needless to say, each dataset must have sufficient statistical power; it is not a simple matter of taking a study cohort and dividing it in half for analysis! Such study designs demand large sample sizes, but this is still possible for common epilepsy syndromes such as IGE or TLE, assuming that these conditions (especially IGE) have a tractable degree of genetic heterogeneity. Even rare disorders such as graft-versus-host disease have used this approach (83); thus it is incumbent on us to do better with common epilepsy syndromes. Collaboration between centers is one way of generating datasets of sufficient size for analysis, although this also means that phenotyping must be consistent across different centers.
With refinements in the methodology of association studies and better understanding of the structure and function of the human genome, we now have an opportunity to improve our ability to dissect the genetic basis of epilepsy. Pessimism about the suitability of genetic-association studies for epilepsy should be tempered by a cautious optimism—optimism that many past methodologic mistakes are remediable, yet being mindful that dissecting the underlying genetic basis of common epilepsy syndromes may eventually prove impossible by using the association-study approach with realistic sample sizes.
In a sense, genetic epidemiology in epilepsy is no different from traditional epidemiology. Adherence to basic principles of robust study design and appropriate statistical analysis is vital (21); no panacea exists for a poorly performed study, no matter what the results. Multicenter collaborations are imperative in the field of genetics, and epilepsy must move in that direction (84). Coordination of such studies to ensure uniformity on such a scale will be the challenge, especially at the clinical level. We look forward to multicenter genetic studies involving subjects numbered in the thousands or tens of thousands.
Acknowledgment: We thank Drs. R. Briellmann and I. Taylor for their helpful comments. N.C.K.T. is supported by research fellowships from the National Healthcare Group and the National Medical Research Council, Singapore.