Importance Measures for Epistatic Interactions in Case-Parent Trios

Authors


Corresponding author: Ingo Ruczinski, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe Street, Baltimore, MD 21218, USA. Tel: +1 410 614 7840; Fax +1 410 955 0958; E-mail: ingo@jhu.edu

Summary

Ensemble methods (such as Bagging and Random Forests) take advantage of unstable base learners (such as decision trees) to improve predictions, and offer measures of variable importance useful for variable selection. LogicFS has been proposed as such an ensemble learner for case-control studies when interactions of single nucleotide polymorphisms (SNPs) are of particular interest. LogicFS uses bootstrap samples of the data and employs the Boolean trees derived in logic regression as base learners to create ensembles of models that allow for the quantification of the contributions of epistatic interactions to the disease risk. In this article, we propose an extension of logicFS suitable for case-parent trio data, and derive an additional importance measure that is much less influenced by linkage disequilibrium between SNPs than the measure originally used in logicFS. We illustrate the performance of the novel procedure in simulation studies and in a case study of 461 case-parent trios with autistic children.

Ancillary