Identification of association between disease and multiple markers via sparse partial least-squares regression
Article first published online: 15 JUN 2011
© 2011 Wiley-Liss, Inc.
Volume 35, Issue 6, pages 479–486, September 2011
How to Cite
Chun, H., Ballard, D. H., Cho, J. and Zhao, H. (2011), Identification of association between disease and multiple markers via sparse partial least-squares regression. Genet. Epidemiol., 35: 479–486. doi: 10.1002/gepi.20596
- Issue published online: 21 AUG 2011
- Article first published online: 15 JUN 2011
- Manuscript Accepted: 19 APR 2011
- Manuscript Revised: 29 MAR 2011
- Manuscript Received: 11 NOV 2010
- NIH. Grant Numbers: GM50597, U01 DK062422, 1R01DK072373, UL1 RR024139
- NSF. Grant Number: DMS-0714817.
- multi-marker association study;
- Crohn's disease
Although genome-wide association studies have led to the identifications of hundreds of genes underlying dozens of traits in recent years, most published studies have primarily used single marker-based analysis. Intuitively, more information may be utilized when multiple markers are jointly analyzed. Therefore, many methods have been proposed in the literature for association analysis between traits and multiple markers. Among these methods, simulation and real data analyses have shown that it is often more effective to reduce the dimensionality of the markers in a region through principal components analysis of all the markers first, and then to perform association analysis between traits and those principal components that account for most of the genetic variations in the region. However, one major limitation of this approach is that the principal components are derived purely from marker genotypes, without consideration of their relevance to traits. Furthermore, these components are constructed as linear combinations of all the markers even when only a limited number are potentially relevant to traits. In this manuscript, we propose the use of sparse partial least-squares regression to derive the components that are linear combinations of only relevant markers. This approach is able to use information from both traits and marker genotypes. Extensive simulations and real data analyses on a Crohn's disease data set suggest the superiority of this approach over existing methods. Genet. Epidemiol. 2011. © 2011 Wiley-Liss, Inc. 35: 479-486, 2011