An Analysis Paradigm for Investigating Multi-locus Effects in Complex Disease: Examination of Three GABAA Receptor Subunit Genes on 15q11-q13 as Risk Factors for Autistic Disorder.

Authors


Correspondence author: Allison Ashley-Koch, Ph.D., Center for Human Genetics, Duke University Medical Center, Box 3400, 2007 Snyderman Genomic Sciences Building, 595 LaSalle Street, Durham, NC 27710, Phone: (919) 684-1805; Fax: (919) 684-0912; E-mail: allison.ashleykoch@duke.edu.

Summary

Gene-gene interactions are likely involved in many complex genetic disorders and new statistical approaches for detecting such interactions are needed. We propose a multi-analytic paradigm, relying on convergence of evidence across multiple analysis tools. Our paradigm tests for main and interactive effects, through allele, genotype and haplotype association. We applied our paradigm to genotype data from three GABAA receptor subunit genes (GABRB3, GABRA5, and GABRG3) on chromosome 15 in 470 Caucasian autism families. Previously implicated in autism, we hypothesized these genes interact to contribute to risk. We detected no evidence of main effects by allelic (PDT, FBAT) or genotypic (genotype-PDT) association at individual markers. However, three two-marker haplotypes in GABRG3 were significant (HBAT). We detected no significant multi-locus associations using genotype-PDT analysis or the EMDR data reduction program. However, consistent with the haplotype findings, the best single locus EMDR model selected a GABRG3 marker. Further, the best pairwise genotype-PDT result involved GABRB3 and GABRG3, and all multi-locus EMDR models also selected GABRB3 and GABRG3 markers. GABA receptor subunit genes do not significantly interact to contribute to autism risk in our overall data set. However, the consistency of results across analyses suggests that we have defined a useful framework for evaluating gene-gene interactions.

Introduction

Gene-gene interactions are hypothesized to play an important role in the etiology of many complex genetic disorders. In spite of this, most candidate gene association studies typically assess effects of candidate genes independently of each other. Studies of the joint effect of other candidate genes are rare. It is our working hypothesis that for some genes these epistatic, or gene-gene, interactions may be more important than the independent effects of the single genes. The idea that gene-gene interactions play an important role in human biology is not new. Wright (1932) emphasized that the relationship between genotype and phenotype is dependent on dynamic interactive networks of genes and environmental factors. This idea holds true today. Gibson (1996) stressed that gene-gene and gene-environment interactions must be ubiquitous, given the complexities of intermolecular interactions that are necessary to regulate gene expression and the hierarchical complexity of metabolic networks. The identification and characterization of common complex disease susceptibility genes remains one of the great challenges facing human geneticists.

Traditional parametric statistical methods, such as logistic regression, can be applied to the detection of gene-gene interactions. But for a few reasons, these traditional methods may not be sufficiently powered or flexible to detect genetic effects that are only observed in the presence of other genes or environmental factors (Templeton, 2000; Schlichting & Pigliucci, 1998). For example, with many parametric statistical methods modeling high-order interactions may result in several contingency table cells that have no observations, and therefore lead to very large standard errors of coefficient estimates (Hosmer & Lemeshow, 2000). This situation is exacerbated as the number of polymorphisms increases and the number of combinations of variations that needs to be evaluated increases exponentially. Some methods, such as stepwise logistic regression, deal with this difficulty by only considering factors that have a statistically significant marginal or main effect for the model. The limitation with this approach is that factors with purely interactive effects will be missed.

To address these issues, the multifactor dimensionality reduction (MDR) method, a data reduction approach, has been developed (Ritchie et al. 2000). We have subsequently modified this approach into the ‘extended’ MDR or EMDR which allows for more analysis options, including additional test statistics and permutation tests (Mei et al., in press). Recently, Martin and colleagues have also expanded the original pedigree disequilibrium test (Martin, 2000b) to the genotype-PDT in order to examine genotypic association, allowing for analysis of not only a single locus, but also multi-locus effects (Martin et al. 2003). These are just a few examples of recently developed methods that can be applied to detect gene-gene interactions. However, this methodology is still in its infancy, and no single method has thus far proven to be superior to others. Furthermore, in order to sort out true interactive effects from joint effects that are driven by a strong main effect, one must also concurrently assess for single locus effects. Thus, given all these issues, a multi-analytic approach to analysis of gene-gene interactions in complex disease, searching for consistency of results and preponderance of evidence to make conclusions, should prove most useful.

One of the prototypical complex diseases hypothesized to include epistatic genetic effects is autistic disorder (AutD; MIM209850), a neurodevelopmental disorder characterized by disturbances in social, communicative and behavioural functioning. Significant evidence from twin and family studies supports a strong genetic component (Bailey et al. 1995; Lotspeich & Ciaranello, 1993; Folstein & Piven, 1991; Steffenburg et al. 1989; Ritvo et al. 1985; Folstein & Rutter, 1977), but the inheritance of AutD appears to be quite complex. The concordance rates among monozygotic and dizygotic twins are not consistent with Mendelian modes of inheritance (Bailey et al. 1995), and the pattern of familial recurrence risks in AutD is most consistent with an oligogenic model including epistasis (Jorde et al. 1991).

Examination of functional candidate genes is one approach that has been utilized by many investigators in the genetic dissection of the complex AutD phenotype. The GABAergic system, in particular, has received much attention. GABA is the primary inhibitory neurotransmitter in the adult brain, but during development GABA acts as an excitatory neurotransmitter, due to high intracellular chloride concentration in immature neurons (Jentsch et al. 2002). In brain, GABA acts on the GABA receptor complex, a heteropentameric structure forming a central chloride channel. Eighteen different receptor subunit genes have been characterized in mammals. Classes of subunits include α, β, δ, ε, γ, π, θ, and ρ. In addition to providing binding sites for GABA, the GABA receptor contains sites for several therapeutic agents and drugs, including benzodiazepines, barbiturates, anesthetics and alcohols. Binding studies using labeled ligands in human children and nonhuman primates indicate that GABA receptor density is greatest early in life, and then dramatically decreases to adult levels (Chugani et al. 2001).

Several lines of genetic evidence specifically implicate the involvement of three GABAA receptor subunit genes located on 15q11-q13 (α5, β3, γ3) in AutD susceptibility. First and foremost, these three GABAA receptor subunit genes are physically positioned in the region on chromosome 15q which is the most common site of chromosomal abnormalities observed in autistic patients (Wolpert et al. 2000; Gillberg, 1998; Schroer et al. 1998). But there is also evidence connecting the individual genes to Aut D. Genetic markers within the GABRB3 gene have been implicated in AutD susceptibility through both genetic linkage (Shao et al. 2002, 2003; Philippe et al. 1999; Liu et al. 2001) and linkage disequilibrium (Cook et al. 1997; Martin et al. 2000a; Buxbaum et al. 2002) analyses, making GABRB3 an excellent candidate gene for AutD. However, this association with GABRB3 has not been universally confirmed (Maestrini et al. 1999; Salmon et al. 1999). Phenotypic subsets of AutD have provided linkage evidence for even stronger association with the GABRB3 region. Shao and colleagues (Shao et al. 2003) found a significant increase in the linkage evidence for the GABRB3 region in families in which affected individuals had a high degree of insistence on sameness. Similarly, recent examination of this region in Aut D multiplex families (2 or more affected individuals) revealed that the evidence for linkage to the GABRB3 region was increased in the subset of families with probands who had greater savant skills (Nurmi et al. 2003). This association between savant skills and the GABRB3 region, however, was not replicated in our own data set (Ma et al. 2005a). In addition to evidence implicating the GABRB3 region, we have also observed evidence for association between AutD and the GABRG3 region (Menold et al. 2001). Further evidence implicating a role for GABRG3 is the observation that GABRG2 mutations have been detected in families with epilepsy phenotypes (Wallace et al. 2001). This may be important with regard to GABRG3 because seizures are observed in subjects with autism more than in the general population (Pavone et al. 2004; Tuchman & Rapin, 2002).

There are functional data to support a role of the GABAA receptors as well. PET imaging in vivo using [11C]flumazenil (FMZ) shows decreased binding in autistic children compared with controls (Chugani, 2001). Further, there is a significant decrease in GABAA receptors in the hippocampus from autistic brains compared with control brains (Blatt et al. 2001). The hippocampus composes part of the limbic system, controlling both emotion and long-term memory. Disruptions of the limbic system could lead to some of the hallmark characteristics of autism, namely difficulties in social interactions.

Due to these functional and genetic data, we have examined the possibility that this cluster of GABA genes on chromosome 15q11-q13 may be acting epistatically to contribute to autism susceptibility. We have chosen to address this complex question with a multi-analytic paradigm that emphasizes reproducibility of results across different analytic tools. As a prototype for analysis of gene-gene interactions, we have utilized in our analysis approach both methods to assess main effects (PDT and genotype-PDT, Martin et al. 2000b, 2003) and the Haplotype Based Association Test (Horvath, 2004), as well as methods to detect multi-locus effects (EMDR, Mei et al. in press), and the multi-locus genotype-PDT (Martin et al. 2003).

Materials and Methods

Data Set

Statistical analyses were performed on a total of 470 Caucasian AutD families (265 multiplex (two or more affected individuals) and 205 trios (parents and affected offspring)). The Collaborative Autism Team (CAT) from the Duke Center for Human Genetics and the WS Hall Psychiatric Institute contributed 246 families. Two-hundred-and-twenty-four families were ascertained by the Autism Genetic Resource Exchange (AGRE). All affected individuals were ascertained on the basis of a clinical diagnosis of AutD and were between 3 and 21 years of age. The Autism Diagnostic Interview-Revised (ADI-R) was used to confirm the clinical diagnosis of autism. The classification of an individual with autism required that their ADI-R scores exceed cutoffs in each of the three critical areas: social behaviour, communication (nonverbal or verbal) and restricted, repetitive behaviours. Families were excluded in cases where the AutD diagnosis was not idiopathic (e.g. Fragile X Syndrome, Tuberous Sclerosis Complex) or was associated with a cytogenetic abnormality. Blood was obtained from patients and other family members under IRB-approved procedures. DNA was extracted from whole blood using standard protocols (Vance, 1998).

SNP Genotyping

SNPs within each candidate region were identified using NCBI's Single Nucleotide Polymorphism database (dbSNP) (http://www.ncbi.nlm.nih.gov/SNP). Table 1 describes the locations of the SNPs that were genotyped within GABRB3, GABRA5 and GABRG3. We genotyped 5 SNPs within GABRB3 and GABRG3 and 4 SNPs within the GABRA5 gene. SNPs were selected for analysis based on availability of assay from the manufacturer (Applied Biosystems, Foster City, CA), coverage across the gene, and rare allele frequencies. Most markers had a heterozygosity of over 35%, with the exception of markers rs2081648 in GABRB3 (23%), rs140681 in GABRA5 (16%) and hcv428306 in GABRG3 (21%). Since we examined multiple SNPs per gene, we used Haploview (Barrett et al. 2004) to assist us in identifying ‘haplotype tagging’ SNPs, such that per gene we reduced the repetitive information coming from SNPs within the same haplotype block. However, as demonstrated by the linkage disequilbrium table (Table 2), there were only two markers in significant LD as defined by r2 > 0.20 (hcv42974 and rs7173260 in GABRA5). Thus, we used all markers in all analyses, with the exception of the EMDR analysis where we omitted rs7173260.

Table 1.  SNP Locations within the GABAA receptor subunit genes
GENESNP Ref ID NCBI or CeleraSNP location within geneSNP chromosomal location (bp) (NCBI build 35)
GABRB3rs2081648Intron 824349292
rs1426217Intron 624372218
rs754185Intron 324438972
rs890317/hcv8865209Intron 324473294
rs2059574Intron 324548136
GABRA5hcv42974Intron 724743281
rs7173260Intron 724754690
rs140681Intron 724764958
rs140683Intron 924771081
GABRG3rs7172534/hcv2078506Intron 324855745
rs208129Intron 325007653
rs897173Intron 325052647
hcv428306Intron 625406295
rs140679Exon 825446271
Table 2.  Linkage disequilibrium between all SNPs typed, calculated by R2 and D′
 Affected Individuals R2
rs2081648rs1426217rs754185hcv8865209rs2059574hcv42974rs7173260rs140681rs140683hcv2078506rs208129rs897173hcv428306rs140679
rs2081648 0.1130.0250.0020.0050.0030.0030.0050.0010.0020.0000.0010.0010.001
rs14262170.111 0.0000.0060.0000.0040.0090.0000.0010.0000.0020.0000.0000.000
rs7541850.0200.000 0.0020.0060.0010.0010.0000.0010.0030.0110.0000.0000.000
hcv88652090.0180.0210.007 0.0120.0010.0010.0000.0010.0020.0020.0000.0000.001
rs20595740.0170.0110.0000.016 0.0010.0000.0080.0000.0010.0040.0020.0000.000
hcv429740.0270.0050.0010.0000.012 0.3620.0020.1260.0030.0000.0000.0090.001
rs71732600.0040.0010.0240.0190.0210.273 0.0020.1030.0000.0030.0050.0020.002
rs1406810.0000.0010.0410.0250.0000.0010.001 0.0610.0100.0010.0040.0070.001
rs1406830.0170.0010.0010.0000.0130.1260.0060.062 0.0000.0070.0060.0020.006
hcv20785060.0080.0310.0030.0040.0040.0000.0010.0000.000 0.0040.0010.0040.011
rs2081290.0000.0070.0070.0030.0000.0030.0190.0040.0010.006 0.0010.0040.001
rs8971730.0110.0000.0000.0010.0000.0000.0070.0140.0000.0000.012 0.0000.000
hcv4283060.0330.0130.0000.0000.0050.0120.0020.0030.0010.0000.0000.011 0.000
rs1406790.0040.0150.0010.0170.0010.0000.0000.0040.0040.0010.0160.0130.006 
Normal Individuals R2
Affected Individuals D′
rs2081648rs1426217rs754185hcv8865209rs2059574hcv42974rs7173260rs140681rs140683hcv2078506rs208129rs897173hcv428306rs140679
rs2081648 1.0000.2800.1710.1800.0950.1400.5680.0590.0860.0320.1840.2420.063
rs14262171.000 0.0060.1370.0110.1020.1040.0030.0380.0040.0480.0260.0350.026
rs7541850.3070.001 0.0900.1090.0320.0470.0800.0370.0640.1110.0040.0350.020
hcv88652090.6440.2570.184 0.1690.0260.0540.0010.0400.0490.0520.0160.0350.047
rs20595740.3660.1150.0270.211 0.0530.0200.2980.0110.0420.0840.0760.0120.001
hcv429740.3340.1060.0270.0450.157 0.9260.2340.4280.0900.0020.0180.1770.041
rs71732600.2260.0310.2770.1840.1981.000 0.1270.4230.0020.0810.1240.0890.053
rs1406810.1870.1480.9940.9990.0130.0650.064 1.0000.2670.1270.1060.1030.092
rs1406830.3530.0350.0350.0160.1200.4660.1211.000 0.0300.0960.1720.1450.097
hcv20785060.1980.2530.0600.0800.0760.0100.0560.0330.006 0.0970.0510.1350.122
rs2081290.0250.1030.0830.1240.0270.0950.1420.3410.0470.129 0.0710.2160.048
rs8971730.5850.0230.0020.0360.0180.0180.1210.2370.0310.0480.284 0.0070.019
hcv4283060.1860.3740.0380.0130.2080.2320.0870.0730.0870.0270.0460.595 0.006
rs1406790.1580.1230.0410.2500.0260.0260.0220.2080.0650.0450.1480.1930.207 
 Normal Individuals D′

SNP genotyping was performed by TaqMan, using ‘Assays-on-Demand’ or ‘Assays-by-Design’ SNP genotyping products (Applied Biosystems, Foster City, CA). For all genotype assays, quality control measures were applied, including genotyping a series of blinded duplicate samples and CEPH controls. The genotypes of all duplicate samples had to match in order for the assay to pass quality control. Further, we required that each assay achieve 95% efficiency (i.e. the genotypes of at least 95% of the samples could be called with certainty) to be considered for statistical analysis. Assays were performed on ABI 9700 dual 384-well Geneamp PCR systems according to the manufacturer's instructions (Applied Biosystems, Foster City, CA). Genotypes were analyzed using an ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA). Each reaction contained 2.7ng of total genomic DNA.

Statistical Methods

The first step of the multi-analytic paradigm described the characteristics of each marker with respect to deviations from Hardy-Weinberg equilibrium, and linkage disequilibrium with the other markers under examination. Hardy-Weinberg equilibrium was assessed using exact tests implemented in the Genetic Data Analysis program (Zaykin et al. 1995). Pairwise linkage disequilibria (D′ and r2) between markers within each gene were calculated in the affected individuals and unaffected individuals separately, using one affected and one unaffected from each family by applying the software package GOLD (Abecasis & Cookson, 2000). The patterns of linkage disequilibrium amongst the markers are shown in Table 2.

The second step of the paradigm defines single locus allelic and genotypic associations with the phenotype. To accomplish this, we used the pedigree disequilibrium test (Martin, 2001; Martin, 2000, the genotype-PDT (Martin et al. 2003), and the family based association test (Horvath, 2004). The PDT and FBAT are similar allele-based tests of association. However, PDT is a valid test of both linkage and association in extended pedigrees. The FBAT treats nuclear families within an extended pedigree as independent. The genotype-PDT is a genotype-based association test and an extension of the original PDT. All approaches provide both allele- or genotype-specific P-values, as well as global P-values adjusting for all alleles/genotypes. We report here only the global P-values.

The third step of the paradigm defines haplotype associations. We used the haplotype based association test (Horvath, 2004) to accomplish this. For the HBAT we obtained global scores for haplotypes for each pair of SNPs within each gene. That is, the HBAT calculates both haplotype-specific and global P-values for significance testing. Here we report only the global P-values which adjust for all possible haplotypes. For each SNP pair observed haplotypes with fewer than 10 families were not considered, because these haplotypes were observed in only 2% of our data set. We did not examine haplotypes across genes as this violates the assumption of tight linkage between markers.

The fourth step of the paradigm defines possible multi-locus or gene-gene interactions that may occur with the phenotype. The multi-locus genotype-PDT, an extension of the genotype-PDT (Martin et al. 2003), was one method used to examine the joint effects of genes. We also used the EMDR (Mei et al. in press), an extension of the MDR data reduction program (Ritchie et al. 2000), to examine possible multi-locus effects. The EMDR differs from the original MDR program in that it allows missing data for individuals who were incompletely genotyped, includes a chi-square statistic and allows for a non-fixed permutation test (described below); please also see Ma et al. 2005b for further description.

The current version of the EMDR is restricted to either unrelated case-control data or matched pairs (e.g. discordant sibpairs or constructed from family triad data). In our analysis, we selected the proband (or most completely genotyped affected child) from each multiplex and triad family (n = 470 total) as the affected, and then generated an inferred ‘unaffected’ sibling (e.g. the untransmitted alleles) based on parental genotypes. The use of an “inferred” unaffected which is constructed from untransmitted alleles in triad data (AFBAC population) has been previously examined in detail by Thomson (1995) and shown to be a reasonable approach. We chose to use untransmitted alleles as the control, because in complex diseases individuals who do not meet established diagnostic criteria may still carry the at-risk genotype(s), making the untransmitted alleles of the cases a better control. And in fact it has been shown that for diseases with low prevalence, unaffected siblings provide little information in family-based association tests (Kaplan & Martin, 2001). For each of the EMDR runs we allowed the program to find the best possible combination of loci that predicted affection status. We did this for single, double, triple and quadruple combinations of loci in our data set. While it is possible to examine even higher order interactions with this methodology, the interpretation of results becomes quite difficult, and the power is also reduced because of the inherent correction for multiple testing in the EMDR permutation test.

To evaluate which combination of loci was the best at predicting affection status, two test statistics were used for each analysis run: chi-square statistic, and classification error (CE). CE is the misclassification error in the entire data set, and is calculated in a marginal table as the percentage of controls classified as high-risk and the percentage of cases classified as low-risk. Therefore, the best model is the locus model that yields the largest chi-square or smallest CE. This approach differs from the original MDR by Ritchie et al. (2001) in that it does not use the cross-validation approach, which divides the data set into n-1 subsets for training and uses the remaining subset for validation. For example, in the original MDR 10-fold cross-validation our data set of 470 families would have been subdivided into 10 subsets of 47 families each, where 9 subsets would be used for training to identify a model and the final subset for validation of the model. We previously found that cross-validation produced a large variation in test statistics, leading to inconsistent conclusions (Mei et al, in press). Because cross-validation subdivides the data set into several smaller data sets, rather than utilizing the entire sample as a whole, the inconsistencies that we observed could be due to sample size and/or genetic heterogeneity. Thus, for the purposes of our analysis, we concluded that non-cross-validation performs more reliably than cross-validation.

To determine statistical significance for the best overall predicted combinations of loci, the EMDR provides three kinds of permutation tests (fixed, non-fixed and omnibus), each of which attempts to adjust for the data reduction technique across many locus combinations to a different extent. All permutation tests hypothesize that a specific n-locus model is independent of case or randomly associated with risk of case. Data are simulated under the null hypothesis by permuting case and control status within each family triad (e.g., for a family triad transmitted and non-transmitted genotypes are permuted randomly). For a review of these permutation tests, please see Mei et al. (in press). Briefly, the fixed permutation test only permutes a specific n-locus model (i.e. if marker A and marker C were selected as the best model for the 2-locus model, only that combination would be permuted), the non-fixed permutation test permutes over all possible models within a n-locus test (i.e. all possible 2 locus combinations), and the omnibus test permutes over the entire set of models (i.e. all possible 1-locus, 2-locus...k-locus models). For the purpose of this analysis we used the non-fixed permutation test, which accounts for the multiple testing of all possible models for each n-locus combination.

Results

We began first with the analyses of main effects with single loci. None of the SNPs examined (Table 1) deviated from Hardy-Weinberg equilibrium expectations in the subset of independent unaffected individuals observed in our sample. However, we did observe a deviation from equilibrium in the affected individuals at marker rs1426217 in GABRB3 (p = 0.02). This could represent a potentially interesting disease association, since the deviation was observed only in the affected individuals. For the single locus PDT and FBAT, as well as the genotype-PDT, none of the markers provided a statistically significant association when considering allelic or genotypic association at each marker independently (Table 3). However, when we examined haplotype associations within each gene, we did observe an association at the GABRG3 locus. As shown in Table 4, for pairwise analyses of markers within GABRG3, three pairwise combinations of markers provided global p-values of 0.05 or less. Furthermore, when we simultaneously examined all 5 markers that were genotyped in GABRG3, we observed a single haplotype that was over-transmitted to the affected individuals (p = 0.01). However, this haplotype was only present at a frequency of 2.5%. When we examined all markers for GABRB3 and GABRA5 simultaneously within each gene, there was no significant haplotype association detected for either gene (data not shown).

Table 3.  Single locus PDT, FBAT and Genotype-PDT results
GeneSNPP-value for PDTP-value for FBATGlobal P-value for Genotype-PDT
GABRB3rs20816480.600.630.84
rs14262170.190.170.31
rs7541850.670.730.85
hcv88652090.340.350.52
rs20595740.300.250.41
GABRA5hcv429740.650.720.07
rs71732600.940.900.85
rs1406810.890.810.76
rs1406830.830.830.98
GABRG3hcv20785060.080.080.24
rs2081290.280.250.27
rs8971730.240.240.45
hcv4283061.000.950.61
rs1406790.410.390.59
Table 4.  Pairwise HBAT analysis within each gene
GeneSNP ASNP BGlobal P-value
GABRB3rs2081648rs14262170.18
rs2081648rs7541850.93
rs2081648hcv88652090.17
rs2081648rs20595740.59
rs1426217rs7541850.69
rs1426217hcv88652090.36
rs1426217rs20595740.47
rs754185hcv88652090.35
rs754185rs20595740.79
hcv8865209rs20595740.39
GABRA5hcv42974rs71732600.69
hcv42974rs1406810.89
hcv42974rs1406830.62
rs7173260rs1406810.93
rs7173260rs1406830.92
rs140681rs1406830.69
GABRG3hcv2078506rs2081290.03
hcv2078506rs8971730.04
hcv2078506hcv4283060.05
hcv2078506rs1406790.13
rs208129rs8971730.36
rs208129hcv4283060.15
rs208129rs1406790.64
rs897173hcv4283060.46
rs897173rs1406790.44
hcv428306rs1406790.66

For the multi-locus analyses, we began by performing pairwise analyses with the multi-locus genotype PDT. None of the global p-values were less than 0.05 (data not shown). The smallest global p-value obtained was 0.12 for hcv8865209 in GABRB3 with rs140679 in GABRG3.

Table 5 describes the results of the best locus models identified by the EMDR analysis. As shown in the Table, no single or multi-locus model provided evidence for association with the AutD phenotype when considering either the chi-square or prediction error statistic. The best single locus model selected marker rs208129 in GABRG3. The best two-locus model selected markers rs2059574 in GABRB3 and rs208129 in GABRG3. The best three-locus model selected marker hcv8865209 in GABRB3 and markers rs208129 and rs140679 in GABRG3. And finally, the best four-locus model selected markers hcv8865209 and rs2059574 in GABRB3 and markers rs208129 and rs140679 in GABRG3.

Table 5.  EMDR results on chromosome 15
Location (cM)MarkerMarker numberBest-model*P-value for chi-squareP-value for classification error
  1. *Best-model numbers refer to the “marker number” in the previous column. i.e. Single locus best-model is marker number “10” which is rs208129 in GABRG3.

11.07rs20816481100.220.71
11.08rs142621725 100.490.56
11.23rs75418534 10 130.840.85
11.33hcv886520944 5 10 130.620.88
11.54rs20595745 
12.06hcv429746 
12.12rs1406817 
12.14rs1406838 
12.38hcv20785069 
12.81rs20812910 
12.94rs89717311 GABRB3
14.46hcv42830612 GABRA5
14.66rs14067913 GABRG3

Discussion

Whilst it is hypothesized that gene-gene interactions play an important role in the etiology of many complex disorders, the methodology for detecting gene-gene interactions is still in its infancy. In this manuscript we present an analysis paradigm for examining multi-locus effects in complex diseases, and test this approach on real data from families with autistic disorder. We chose to take a multi-analytic approach and looked for convergence of evidence among the various methods, rather than relying solely on the results from a single analytic tool. We believe that it will be necessary to use multiple analysis tools in order to interpret findings of higher-order interactions among such data. In particular, one must not consider only interactive effects, but simultaneously assess main effects of the genes as well. The results from the main effects analyses will be necessary to sift through true interactive effects versus effects that are solely driven by a strong main effect of a single gene. We do acknowledge that one of the difficulties with this multi-analytic approach is the issue of multiple testing. We did not apply a correction for multiple testing across the various methods that we used. As described by the linkage disequilibrium amongst the markers in Table 2, the markers that we examined were not completely independent. Thus, applying a Bonferroni correction for multiple testing would be conservative. However, it is unclear what the correction should be. This issue of multiple testing is undoubtedly an area for future development. In conjunction with a multi-analytic approach, it will be necessary for replication in other independent data sets, and for molecular biological experiments to conclusively elucidate complex genetic interactions.

The results of the application of our analysis paradigm to the real genotype data in autistic disorder do not support the presence of multi-locus effects amongst the GABA receptor subunit genes on chromosome 15 as a major contributor to autism etiology. Nonetheless, we were still pleased with the consistency of results that were obtained from the analysis paradigm. For example, in the main effects analysis the marker with the smallest p-value for the allele-based tests (PDT p = 0.08 and FBAT p = 0.08) was hcv2078506 in GABRG3. When we performed the HBAT analysis, it was again markers in GABRG3 that provided the smallest p-values. These results were also supported by the results of the EMDR, which selected marker rs208129 in GABRG3 as the best single locus model. Thus, although the results of the single locus EMDR analysis were not statistically significant, they were consistent with the more traditional allele- and haplotype-based analyses. That is to say that if there is a main effect in our data set of one of the GABAR subunit genes in Aut D susceptibility, the consistency across the various analyses would suggest that it may lie within the GABRG3 gene. Furthermore, in the EMDR analysis, the two-, three- and four-locus models consistently selected markers in both GABRB3 and GABRG3, even though they were not statistically significant. And although we observed no significant global p-values for the multi-locus genotype-PDT analysis, the best result was obtained with marker hcv8865209 in GABRB3 and marker rs140679 in GABRG3, both markers also having been selected in EMDR multi-locus analyses. Thus the results, although not statistically significant, were consistent across analyses suggesting a possible interactive effect between GABRB3 and GABRG3. It is interesting to note that these results are also consistent with the previous main effects analyses of the GABA receptor subunit genes on chromosome 15 in autism. That is, both the GABRB3 and GABRG3 regions have been implicated by linkage and/or association analysis, but GABRA5 has not. Thus, we conclude that our multi-analytic paradigm is a useful approach for evaluating the presence of multi-locus effects in complex disease.

Since we did observe consistency of results, albeit not statistically significant, it is possible that our sample size was insufficient to detect multi-locus interactions of small effect size. The power of the original MDR has been described previously in both simulated and real data from breast cancer patients (Ritchie et al. 2003). This analysis showed that the approach had reasonable power for sample sizes of 200 cases and 200 controls. This is less than half the sample size used in the current analysis (n = 470 cases and 470 controls). The EMDR, using the non-fixed permutation test, no cross-validation and a sample of 440 triads, has demonstrated sufficient power to identify the underlying multi-locus effects generated in the GAW14 simulated data (Mei et al. in press). While it is possible that the effect sizes that we were trying to detect our analysis were so small that the present data set was under-powered, our previous experience with both the MDR and the EMDR would suggest that this is not the case.

Another possibility, given our previous findings, is that there is a specific phenotypic subset of patients with autistic disorder that harbours susceptibility in these GABA receptor subunit genes. In particular, we previously found that multiplex families with a high-degree of insistence on sameness provided the most evidence for linkage at GABRB3 in our data set (Shao et al. 2003). Thus, we also analyzed these 23 multiplex families for association with the markers in this current analysis. The most significant result was obtained at marker hcv8865209 in GABRB3 (PDT p = 0.02 and genotype-PDT p = 0.04), consistent with our previous linkage findings. Furthermore, we identified an additional 69 singleton families whose probands also fit our definition of the ‘insistence on sameness’ subphenotype. When we combined the singleton and multiplex families, for a total of 92 families in the association analysis, we again found that marker hcv8865209 provided the most significant evidence of association (PDT p = 0.004, genotype-PDT p = 0.008). Thus, there does appear to be a specific subgroup of patients with autistic disorder that is providing the most evidence for autism susceptibility on chromosome 15. Because the combined subset of singleton and multiplex families who fit the criteria for insistence on sameness was less than 100, we did not attempt to perform the more complicated multi-locus analyses with these families. However, as our data set grows, we will be very interested in looking more closely at this subset. Thus, these data suggest that the GABA receptor subunit genes remain excellent candidates for autism.

In conclusion, we have presented a multi-analytic framework for the examination of multi-locus effects in complex diseases. This analysis paradigm has combined multiple analysis tools that test for both main and interactive effects, at the level of allele, genotype and haplotype. The consistency of results that we obtained with this approach in our application to autistic disorder data suggests that this multi-analytic paradigm performs well, and is a reasonable framework for approaching the analysis of gene-gene interactions in all complex disorders. While other investigators may decide to use different software in this paradigm approach, we believe the keys to success are to use multiple analysis tools to identify both the main and interactive effects of the markers, as well as looking for consistency of results across those various analysis tools.

Acknowledgements

We wish to thank the patients with autism and family members who agreed to participate in this study and the personnel of the Center for Human Genetics at Duke University Medical Center, for their input on this project.

This research was supported in part by National Institutes of Health (NIH) program project grant NS26630, NIH R01 grants HD36701, AG20135, and NS36768, and by the National Alliance of Autism Research (NAAR). The research conducted in this study complies with current U.S. laws.

Ancillary