### Summary

- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References

To identify interacting loci in genetic epidemiological studies the application of multi-locus methods of analysis is warranted. Several more advanced classification methods have been developed in the past years, including multiple logistic regression, sum statistics, logic regression, and the multifactor dimensionality reduction method. The objective of our study was to apply these four multi-locus methods to simulated case-control datasets that included a variety of underlying statistical two-locus interaction models, in order to compare the methods and evaluate their strengths and weaknesses. The results showed that the ability to identify the interacting loci was generally good for the sum statistic method, the logic regression and MDR. The performance of the logistic regression was more dependent on the underlying model and multiple comparison adjustment procedure. However, identification of the interacting loci in a model with two two-locus interactions of common disease alleles with relatively small effects was impaired in all methods. Several practical and methodological issues that can be considered in the application of these methods, and that may warrant further research, are identified and discussed.

### Introduction

- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References

Complex diseases or traits are a result of the interplay between and among genes and environmental factors. One way in which genetic factors can act together in relation to a certain trait is epistatically. Generally speaking, epistasis refers to the phenomenon that the relation between a gene and a trait outcome is dependent on another gene. More specifically speaking, from a statistical point of view, epistasis or gene-gene interaction has been defined as the presence of non-additivity in a mathematical model that describes the relation between genetic variants and a trait in a population (Cordell et al. 2001, 2002).

The ability of multi-locus methods of analysis to detect and incorporate statistical gene-gene interaction in genetic association studies is likely to favour the chance of retrieving causal genes and a better prediction of trait outcome based on genotype data (Cordell et al. 2001; Culverhouse et al. 2002). Researchers that want to identify interacting loci in their genetic association studies through application of multi-locus techniques can nowadays choose from a variety of methods and software packages that are readily available, in addition to traditional multiple regression analysis. The goal of our study was to apply four available multi-locus classification techniques, namely multiple logistic regression, the set association method as implemented in the Sumstat software, the multifactor dimensionality reduction (MDR), and logic regression, on simulated data to identify strengths and weaknesses in the application of these methods. More specifically, we have focussed on their ability to identify causal loci that may be statistically interacting according to some specific underlying model. This will give researchers more insight into the performance of the methods and allow more informed choices about when and how to apply them.

Many multi-locus methods that can be applied to investigate gene-gene interactions have been introduced in the past decade, as reviewed by Hoh & Ott (2003) and Thornton-Wells et al. (2004). A number of these methods are particularly suitable for genetic case-control association studies, and for some of those freely available software has been introduced. The MDR software was introduced in 2003 and was especially developed to analyse high-order joint effects of loci (and environment) in genetic association case-control and sib-pair study designs (Hahn et al. 2003). The general idea behind the non-parametric and genetic model-free MDR approach is that it reduces the dimensionality of the multi-locus data by pooling those combinations of genotypes that can be defined as high-risk and those that can be defined as low-risk, based on the case-control ratio for the specific multi-locus genotype. The reduction of the dimensionality of the data overcomes the problem of a low number of observations in high-order data combinations. An exhaustive search over all possible high-order genotype combinations for a varying number of loci is performed, and the best combination of single nucleotide polymorphisms (SNPs) for a certain model size is chosen based on classification error. Using cross-validation the ability of the new one-dimensional multi-locus variable to predict case-control status, and in addition its cross-validation consistency, is determined. The locus or combination of loci that has the best predictive value and highest cross-validation consistency is considered to be the outcome of interest. An empirical p-value for the testing accuracy and cross-validation consistency of the final selected model can be determined using a permutation testing procedure. The MDR method has successfully identified interacting loci in several real data applications (Brassat et al. 2006; Cho et al. 2004; Coffey et al. 2004; Ma et al. 2005; Ritchie et al. 2001; Williams et al. 2004), and its performance has been evaluated on simulated data with a focus on a number of epistasis models that show no or small single locus main effects (Ritchie et al. 2001, 2003).

Logic regression is a generalized regression methodology that was introduced in 2001 and was also mainly developed to study high-order interactions in genetic studies (Kooperberg et al. 2001; Ruczinski et al. 2004). It has been implemented in freely available software as an R and S-Plus package. In short, the goal of logic regression is to find Boolean combinations of binary predictors, for example SNPs, that are associated with the outcome of interest and that are incorporated into a regression model. The combinations of the binary predictors are efficiently organized in a tree form. The performance is evaluated by comparing the fitted values and response using a scoring function that is dependent on the type of regression. So instead of modelling the interaction between several SNPs using a high number of model parameters in a regression model, a combination can be captured through one tree parameter using Boolean operators; SNPs that are included in the same regression tree instead of in separate trees may be acting in a non-additive way on the scale of interest. The building and selection of the logic trees can be done using a stochastic simulated annealing algorithm, and the size and number of trees can vary. Model selection can be guided by comparing the predictive values of the models, based on cross-validation or permutation testing procedures. The latest software version also includes a Monte Carlo regression option that can identify several best tree models that are related to trait outcome and that may fit the data equally well (Kooperberg & Ruczinski, 2005). The method has been applied to a post PTCA restenosis dataset and has also been successfully used in the analysis of simulated genetic data (Kooperberg et al. 2001; Ruczinski et al. 2004).

The set-association method, as implemented in the Sumstat software, introduced in 2001 by Hoh et al. (2001), is a non-parametric method that uses sum statistics to evaluate the joint effect of loci related to trait outcome. The locus or set of loci associated with the trait of interest is identified by creating and testing sum statistics that capture the combined information from multiple SNPs. Based on single locus test statistics, for instance Chi-square values from 2 by 3 tables, Chi-square values for deviation from Hardy Weinberg equilibrium, or Chi-squares for pair-wise interactions, SNPs are selected and added to the sum statistic that is calculated for an increasing number of loci. The SNPs are chosen based on the value of the test statistic and are added sequentially to the sum statistic in order of decreasing value. The significance of each sum is evaluated using a permutation procedure. Then, the smallest empirical p-value for the sum statistic is in its turn evaluated for global significance via permutation tests, thereby correcting for the testing of multiple sums. So the goal is to find the subset of loci that is most significantly associated with the trait of interest. This method has been evaluated for its power by use of simulated data without a focus on interaction (Kim et al. 2003; Wille et al. 2003), and has been applied in several case-control genetic association studies (de Quervain et al. 2004; Hoh et al. 2001; Maitland-van der Zee et al. 2005).

As well as several advantages, such as transparency, familiarity, and the estimation of interpretable effect parameters, the traditional multiple logistic regression technique has known disadvantages in identifying interacting loci in case-control association studies. Sparse data can easily become a problem in studies with relatively small sample sizes when including (high-order) interaction parameters. Furthermore, the number of tests can be very high when the number of SNPs included in the study, and the order of interactions to evaluate, becomes large. This can all lead to unreliable parameter estimates, inflated type I errors, and low power to detect the associated loci. Methods to deal with multiple comparisons like the Bonferroni correction, the False Discovery Rate (FDR), and randomization procedures, have been introduced as solutions to mitigate elevated type I error rates. Another difference from the previous discussed techniques is that the logistic regression is a parametric, model-based method that requires enumeration of the (interaction) model(s) to analyse; the available choice of model definitions that can be used in logistic regression to investigate gene-gene interactions is large. Logistic regression has successfully identified interacting loci in the past in studies that included a relatively small number of loci, and it was recently shown that it is also feasible to successfully apply the technique to identify interacting loci in genome-wide association studies (Marchini et al. 2005).

The general interpretation of statistical interaction as deviance from additivity makes its presence dependent on the scale that is used (i.e. logit, or penetrance). The multiplicative model on a penetrance scale is often considered to represent the standard statistical interaction model; this model approximates an additive model on the log odds scale, implicit in the standard logistic regression model commonly used in the analysis of case-control studies. However, other statistical models besides the multiplicative also meet the criterion of non-additivity. In complex diseases the statistical interaction present in the collected population data is not known beforehand. It will then be important to know if the applied multi-locus methods can detect the interacting loci for different underlying interaction models. Therefore, we evaluated the performance of the above-mentioned multi-locus methods for a selection of two-locus statistical interaction models.

### Results

- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References

Figure 2 shows the number of replicates that contained the two causal loci, either both loci separately or via an interaction, for all the scenarios and methods. For the logistic regression and Sumstat results we could indicate the number of replicates in which the loci were identified via an interaction term, and these therefore would have been positive replicates if we would had selected on interaction terms only (black shading). For the genetic heterogeneity model, with two two-locus interactions acting to increase risk of disease, we counted the number of replicates in which the four loci were retrieved, either by identifying the simulated interaction terms or the single locus effects (Figure 3). Because of the low penetrance and low frequency of the risk increasing genotypes for Mod2f0.05, Mod3f0.1 and Mod3f0.05, the final simulated case-control datasets contained too few cases (<100) and were therefore dropped.

For the traditional logistic regression technique, without correction for multiple testing, the interacting loci were identified in almost all replicates, even when the causal allele frequency dropped to 0.05. The detection rate was diminished for the four-locus models with a frequency of 0.3 and 0.5. After application of the FDR procedure and Bonferroni method, the number of replicates in which the causal loci were detected was somewhat less in most models, but seriously impaired in the models with an allele frequency of 0.5 and the four-locus models, the Bonferroni correction being more conservative than the FDR method. The loci were found specifically through the interaction terms in Mod4f0.5, Mod5f0.5 and Mod6f0.5, where there were no expected marginal effects. Only relying on the interaction parameters generally resulted in lower detection of causal loci, and failed especially in Mod2. It was also worse for the 0.05 MAF models where the interaction terms were dropped due to sparseness of data. Furthermore, we saw that the number of replicates with significant interaction parameters was lower for the FDR corrected results and even lower for the Bonferroni results, in most scenarios.

The number of false positives for the null model in the uncorrected analysis was highly inflated. There was a high false positive detection rate for the null model with a frequency of 0.1 that could not be mitigated by application of the multiple comparison procedures. Looking at the results and analysis more carefully showed that the sparse number of observations in the genotype cells for this allele frequency led to unreliable parameter estimates, and a high rate of extreme low p-values that passed the significance criteria, whereas the parameters of the causal loci were often dropped and not tested in the 0.05 frequency model, due to too few observations in the heterozygote, but especially homozygous, mutant genotype groups. Likewise, the interaction terms were dropped, explaining their absence in the null model findings.

In general, Sumstat performed very well for the 0.3, 0.1 and 0.05 MAF scenarios and not well for Mod1f0.5, Mod2f0.5 and high frequency Mod6. Regarding the last model, in Figure 3 we only counted those replicates as positive if the correct single loci and/or both simulated interaction terms were identified. We saw, however, that in the lower frequency models the four causal loci were identified via interaction terms between one locus from the first two-locus interaction and one locus from the second. If we had counted those replicates as positive results too, the number of replicates in Figure 3 for the 0.1 and 0.05 frequencies would have been 100 and the results for the 0.5 and 0.3 frequencies would have increased to 22 and 32, respectively. For all frequencies the Sumstat method retrieved the causal loci for Mod4 and Mod5 via the interaction terms. In contrast to the logistic regression approach the causal loci for Mod2 were also detected via the interaction term, especially for the 0.1 allele frequency, and Mod3 was mainly dependent on finding the separate single locus effects. The results for both single causal loci (not shown) were expected to be symmetrical, but for some models they differed substantially up to a maximum of 19 replicates for Mod1f0.5. We saw that for this genetic model all methods gave asymmetrical results for the two causal loci; the first locus was retrieved more often than the second. The overall low number of falsely detected loci, as observed in the null model, was low relative to the other methods, even without selection on the basis of the empirical sum statistic p-values or the final global p-value. In the ModNull we saw a higher selection of interaction terms over single terms.

We also checked how the results would change when only the replicates with a global p-value of less than 0.05 were considered. The number of false positives in the null models was reduced to zero. It had marginal consequences for the results of Mod1f0.5, Mod2f0.5, Mod4f0.5, where the number of replicates containing the true loci dropped at most 7, and had large consequences for Mod3f0.05 and Mod5f0.5/f0.3, where the number of replicates that contained the causal loci was lowered to 16, 49 and 28, respectively. For the four-locus models the numbers in Figure 3 changed for the f0.5 and f0.3 models to 4 and 14, respectively.

The results were different if the sum statistic with the smallest number of loci for the smallest p-values, instead of the largest number, was chosen (data not shown). We then saw that the method was much worse for Mod1, Mod2f0.3 and Mod3 in terms of elevated type I error, because the p-value often reached its minimum of zero after including one of the two causal loci. The method was also not able to identify the four loci in Mod6f0.1 and Mod6f0.05.

The logic regression analysis retrieved a high number of replicates containing the true causal loci, with the exception of the high frequency multiplicative model (Mod1f0.5) and Mod6, where it had difficulties finding all 4 loci. The amount of noise varied; Mod1 and Mod3 in general contained at least 3 terms related to the two causal loci, with the exception of Mod1f0.5 and Mod2 which contained just two terms referring to the causal loci and two referring to non-causal loci. The lower frequency Mod5 often contained 4 causal loci terms, and all Mod4 contained 2 to 4 causal terms out of the four leaves. In all scenarios one tree was constructed, often with the maximum number of leaves allowed. The gain from allowing six leaves instead of four was very limited, and led to a higher inclusion of non-causal loci (results not shown). The smaller the model, that is the lower the number of leaves allowed, the lower the number of false positive findings. We observed a tendency for a lower number of false positive findings for low frequency SNPs, reflected in the decreasing values for ModNull from f0.5 to f0.05.

The high frequency four locus Mod6 contained a high number of non-causal loci and a varying number of true loci. The performance for the low frequency scenarios was remarkably better. The difference between the analysis allowing 6 and 8 leaves was most pronounced for the 0.3 and 0.5 MAFs, where we saw a higher number of correct positive findings but also much higher false positive results for the analysis with 8 leaves.

The same trends in results were observed for the MDR as for the logic regression: it retrieved the loci in all situations, except for the high frequency Mod6, and the performance was slightly impaired in Mod1f0.5. Because the model could only contain two loci, no non-causal loci were included in those models that correctly identified the loci. There was no additional value from models containing three instead of two loci; the retrieved number of causal loci remained similarly high but the number of false positives was substantially higher in the high frequency models. Also here we observed a bias towards high frequency SNPs for the null model.

### Discussion

- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References

We performed a simulation study to identify strengths and weaknesses of four multi-locus methods employed to identify interacting loci in several case-control scenarios. The number of ways in which these methods can be applied is numerous. Due to pragmatic and computational reasons we have not used the methods to their full capacity regarding model selection and model testing options. Therefore we can only draw conclusions about the strengths and weaknesses we encountered in the ways we applied them to the simulated data. Since we did not fix the type I error for all methods in an equivalent way it is not possible to make fair direct comparisons of power between the methods, and we have focussed on identifying strengths and weaknesses for each method separately.

In the application of the logistic regression the known problems in standard regression techniques of inflated findings of false positives, and diminished power caused by the presence of sparse data and multiple testing problems, were encountered, despite there being only 10 loci in these datasets. We can therefore underscore the importance of the use of a model testing procedure, such as permutation tests, even in studies with few genetic variants. Furthermore, extensions of the traditional regression models, that are especially developed to deal with sparse data, like penalized regression models could be applied. We chose to apply a full model containing parameters for the two loci and their interaction terms. We observed that the significance of the interaction parameters in a fully saturated two-locus logistic regression model in the identification of the causal loci was dependent on the underlying genetic models, the allele frequency, and the multiple comparison correction procedure. The use of interaction parameters especially improved the identification of interacting loci in cases of statistical interaction models that showed no expected marginal effects. North et al. (2005) evaluated how well logistic regression models that included different model parameters, including interaction parameters, fitted the data compared to a fully saturated model, and how well they corresponded to the true underlying penetrance based models for a variety of two-locus disease models. They showed that for some models the inclusion of interaction parameters is advantageous but there is no direct correspondence between the interactive effects in the logistic regression models and the underlying penetrance based models displaying some kind of epistasis effect (North et al. 2005). The latter has been confirmed in our study.

An approach that we have not evaluated here but that can be applied to identify gene-gene interactions, is the case-only approach. It can be used to efficiently identify deviation from a multiplicative model for the relative risks by testing the association between loci in the cases. This design can be more powerful than the traditional case-control analysis. However, it is restricted to the identification of interactions only; the main effects of loci cannot be estimated and tested. Furthermore, this approach is sensitive to deviation of the underlying assumption of independence of the loci in the general population (Albert et al. 2001); bias will be introduced if linkage disequilibrium between the loci of interest exists in the control population.

The Sumstat method had difficulty finding the causal loci in the high frequency four-locus models, but for the other scenarios the results were good. We have not performed the set-association method without construction of the two-locus interaction input variables, and it is therefore not possible to discuss the added value of using these in addition to the single locus parameters as a test statistic. We did see that the identification of the loci through the interaction parameter or single loci variables was dependent on the underlying model and the MAF, but the loci were consistently found via the interaction term in the ‘exclusive OR’ and ‘missing lethal genotype’ scenarios. The inability to deal with interacting loci that show no or weak main effects is an often mentioned disadvantage of the set association approach (Heidema et al. 2006). Heidema and colleagues state in their review that genetic interactions are only tested for the loci that are incorporated into the sum statistic. The fact that loci are incorporated into the sum statistic does, in our view, not deal with interactions between these loci since the separate test statistics are simply added, but we saw that this can be overcome by introducing the interaction test statistics prior to calculating the sum statistics. The current Sumstat version is not equipped to handle high-order interactions; only two-locus interaction parameters can be constructed. This was not a problem in our simulations, but could be a shortcoming in real usage. Selection of the results based on the permutation global p-values reduced the type I error perfectly. However, this was at the expense of a large loss of power to retrieve the causal loci for some models.

The results for the logic regression and MDR were similar. They performed well for all models, with the exception of the high frequency heterogeneity models with two different two-locus interactions, and showed a slightly impaired performance for the high frequency models with relatively small effect sizes. One disadvantage of our procedure is that we did not apply model selection techniques other than limiting the number of genetic parameters in the analysis. This meant that we could not evaluate the underestimation of the type II error and overestimation of the type I error in our study that was caused by not selecting the optimal model size and not using global p-value cut-off points to select the outcome of interest.

One of the strengths of the MDR that is often highlighted, is its high power to identify high-order interactions for loci without a main effect (Heidema et al. 2006). Our results confirm this for two-locus interactions. We also saw that this characteristic was not only limited to the MDR method, as the logic regression and Sumstat were capable of identifying these loci too. We have however not compared their power for a fixed type I error.

The number of replicates containing the loci under the null model decreased when applying a simple model size limiting strategy, by maximizing the number of leaves and loci, respectively. Limiting the model size to the smallest possible size considering the number of true causal loci was advantageous compared to a larger model. However, restraining the model size to a number lower than the actual number of causal SNPs present in the data obviously impaired the performance of the methods. The publicly available MDR software is currently limited to a maximum of 15 loci. When a larger number of loci might be involved in the genetic aetiology this limit would be too small.

There was a tendency in both methods to show a lower number of positive replicates under the null model when the frequency of the SNPs of interest decreased. This bias towards high frequency SNPs might warrant follow-up study to investigate more carefully the causes and consequences of this preference.

To judge if the loci included in the multi-locus combination act additively or not, the user of the logic regression and MDR method can evaluate the number of trees and the way the loci are included in the trees, and the graphically displayed case-control ratios for all multi-locus genotype combinations, respectively. This can become a daunting task, especially for the MDR method, when the number of loci included increases, and also when the number of datasets is large as in our case. We therefore did not explore this option in the current study, but we did find that for the logic regression all loci were consistently combined in one tree, pointing towards non-additive association between the loci and the case-control status in the logistic regression.

Finding similar results for the MDR method and the logic regression was not surprising. The similarities in methodology between the MDR and an extension of a recursive partitioning technique related to the logic regression approach were discussed by Bastone et al. (2004); the MDR method can be viewed as a special form of recursive partitioning technique (Bastone et al. 2004). One major distinction between the MDR and logic regression is that in the first approach tree growth is restricted to a single split in one tree. Based on this fact one might expect the logic regression to perform better under genetic heterogeneity because of its ability to create multiple splits and more than one tree. Also Heidema et al. (2006) state that recursive partitioning techniques, of which logic regression is one example, should be able to detect genetic heterogeneity. This was not confirmed in the results of this study, where the logic regression had difficulty identifying the causal loci for some four-locus models. The two-locus heterogeneity based model was tackled well by the logic regression as well as by the MDR method. Ritchie et al. (2003) identified the impaired performance under genetic heterogeneity of the MDR method. In their simulations the power of the MDR for a model comparable to our four-locus model was worse, probably reflecting the selection on the statistical significance of the output they used. The authors propose the use of cluster analysis or recursive partitioning techniques, to identify clusters of individuals with similar genetic backgrounds prior to performing the MDR to mitigate the low power in genetic heterogeneity, and state that more research into these methods is needed.

We applied the methods on a large number of different statistical interaction models for a variety of allele frequency possibilities, but our simulations were not exhaustive. The high frequency models with two two-locus interactions turned out to be the most challenging. They were not only characterized by interacting risk factors and genetic heterogeneity, but also by small expected marginal effects. So even though, in the individual situations of a statistical heterogeneity model or an underlying penetrance model with small marginal effects, the causal loci were retrievable, in the combined situation power was diminished. Furthermore, it is obvious that we have not set out to explore the limits regarding model complexity, effect size, allele frequency, sample size, and their combinations for these methods. The authors of the methods have touched upon these aspects. Future research could explore the performance of the methods for more complex underlying models with several one-locus effects and high-order interactions, gene-environment interactions and covariates.

We have simulated data for a small number of SNPs in order to evaluate the performance of multi-locus methods regarding the identification of interacting loci. With the increasing application of large candidate gene arrays and genome-wide SNP arrays it will be of importance to evaluate the strengths and weaknesses of multi-locus methods for handling large amounts of genetic data where linkage disequilibrium is likely to be present. The specific relative strengths and weaknesses in handling a large amount of SNP data for a diversity of methods, including logistic regression, the set association approach and the MDR, have been recently discussed by Heidema et al. (2006).

Ideally, one would limit the analysis of genetic association data to one or a few methods and apply them in a way that would enable capture of all the underlying signals from associated genes, whether acting singularly or interactively, and give insight into the underlying complex aetiology of the disease or trait of interest. Since every method has its own strengths and weaknesses, and within every method a diversity of approaches can exist, a multi-analytic approach could help in distinguishing between true and false positive findings. It is however important to apply the methods in the most optimal way, and to understand the limits of each method in order to correctly interpret the results. The results of this study can give researchers insights into how to apply the discussed methods best in practice, judge where they perform similarly, and help in interpreting the results of the different methods.