Boosting for detection of gene–environment interactions


H. Pashova, Department of Biostatistics, University of Washington, F-600 Health Sciences Building, Campus Mail Stop 357232, Seattle, Washington 98195, U.S.A.



In genetic association studies, it is typically thought that genetic variants and environmental variables jointly will explain more of the inheritance of a phenotype than either of these two components separately. Traditional methods to identify gene–environment interactions typically consider only one measured environmental variable at a time. However, in practice, multiple environmental factors may each be imprecise surrogates for the underlying physiological process that actually interacts with the genetic factors. In this paper, we develop a variant of L2 boosting that is specifically designed to identify combinations of environmental variables that jointly modify the effect of a gene on a phenotype. Because the effect modifiers might have a small signal compared with the main effects, working in a space that is orthogonal to the main predictors allows us to focus on the interaction space. In a simulation study that investigates some plausible underlying model assumptions, our method outperforms the least absolute shrinkage and selection and Akaike Information Criterion and Bayesian Information Criterion model selection procedures as having the lowest test error. In an example for the Women's Health Initiative-Population Architecture using Genomics and Epidemiology study, the dedicated boosting method was able to pick out two single-nucleotide polymorphisms for which effect modification appears present. The performance was evaluated on an independent test set, and the results are promising. Copyright © 2012 John Wiley & Sons, Ltd.