A Partially Linear Tree-based Regression Model for Multivariate Outcomes
Article first published online: 10 MAY 2009
© 2009, The International Biometric Society
Volume 66, Issue 1, pages 89–96, March 2010
How to Cite
Yu, K., Wheeler, W., Li, Q., Bergen, A. W., Caporaso, N., Chatterjee, N. and Chen, J. (2010), A Partially Linear Tree-based Regression Model for Multivariate Outcomes. Biometrics, 66: 89–96. doi: 10.1111/j.1541-0420.2009.01235.x
- Issue published online: 17 MAR 2010
- Article first published online: 10 MAY 2009
- Received May 2008. Revised December 2008. Accepted December 2008.
- Generalized estimating equation;
- Genetic association study;
- Model selection;
- Multiple-comparison adjustment;
- Tree-based model
Summary In the genetic study of complex traits, especially behavior related ones, such as smoking and alcoholism, usually several phenotypic measurements are obtained for the description of the complex trait, but no single measurement can quantify fully the complicated characteristics of the symptom because of our lack of understanding of the underlying etiology. If those phenotypes share a common genetic mechanism, rather than studying each individual phenotype separately, it is more advantageous to analyze them jointly as a multivariate trait to enhance the power to identify associated genes. We propose a multilocus association test for the study of multivariate traits. The test is derived from a partially linear tree-based regression model for multiple outcomes. This novel tree-based model provides a formal statistical testing framework for the evaluation of the association between a multivariate outcome and a set of candidate predictors, such as markers within a gene or pathway, while accommodating adjustment for other covariates. Through simulation studies we show that the proposed method has an acceptable type I error rate and improved power over the univariate outcome analysis, which studies each component of the complex trait separately with multiple-comparison adjustment. A candidate gene association study of multiple smoking-related phenotypes is used to demonstrate the application and advantages of this new method. The proposed method is general enough to be used for the assessment of the joint effect of a set of multiple risk factors on a multivariate outcome in other biomedical research settings.