A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing
Article first published online: 12 AUG 2011
© 2011, The International Biometric Society
Volume 68, Issue 1, pages 307–315, March 2012
How to Cite
Han, F. and Pan, W. (2012), A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing. Biometrics, 68: 307–315. doi: 10.1111/j.1541-0420.2011.01649.x
- Issue published online: 23 MAR 2012
- Article first published online: 12 AUG 2011
- Received July 2010. Revised June 2011. Accepted June 2011.
- Genome-wide association study;
- Latent model;
- Logistic regression;
- Multimarker analysis;
- Multivariate discrete distribution
Summary Many statistical tests have been proposed for case–control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.