A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction

Authors

  • Kristine A. Pattin,

    1. Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Bill C. White,

    1. Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Nate Barney,

    1. Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Jiang Gui,

    1. Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire
    2. Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Heather H. Nelson,

    1. Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts
    Search for more papers by this author
  • Karl T. Kelsey,

    1. Department of Community Health, Brown University, Providence, Rhode Island
    Search for more papers by this author
  • Angeline S. Andrew,

    1. Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire
    2. Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Margaret R. Karagas,

    1. Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire
    2. Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, New Hampshire
    Search for more papers by this author
  • Jason H. Moore

    Corresponding author
    1. Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire
    2. Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire
    3. Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, New Hampshire
    4. Department of Computer Science, University of New Hampshire, Durham, New Hampshire
    5. Department of Computer Science, University of Vermont, Burlington, Vermont
    6. Translational Genomics Research Institute, Phoenix, Arizona
    • HB 7937, One Medical Center Drive, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756
    Search for more papers by this author

Abstract

Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1,000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1,000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1,000-fold permutation test. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc.

Ancillary