SEARCH

SEARCH BY CITATION

Keywords:

  • false discovery rate;
  • gene-gene interactions;
  • Lasso;
  • model selection;
  • multiple testing;
  • regression model

Abstract

It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity. Genet Epidemiol 25:36–47, 2003. © 2003 Wiley-Liss, Inc.