Control of Population Stratification by Correlation-Selected Principal Components
Article first published online: 6 DEC 2010
© 2010, The International Biometric Society
Volume 67, Issue 3, pages 967–974, September 2011
How to Cite
Lee, S., Wright, F. A. and Zou, F. (2011), Control of Population Stratification by Correlation-Selected Principal Components. Biometrics, 67: 967–974. doi: 10.1111/j.1541-0420.2010.01520.x
- Issue published online: 14 SEP 2011
- Article first published online: 6 DEC 2010
- Received June 2009. Revised July 2010. Accepted September 2010.
- Genomic control;
- Population stratification
Summary In genome-wide association studies, population stratification is recognized as producing inflated type I error due to the inflation of test statistics. Principal component-based methods applied to genotypes provide information about population structure, and have been widely used to control for stratification. Here we explore the precise relationship between genotype principal components and inflation of association test statistics, thereby drawing a connection between principal component-based stratification control and the alternative approach of genomic control. Our results provide an inherent justification for the use of principal components, but call into question the popular practice of selecting principal components based on significance of eigenvalues alone. We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the (disease) phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Analyses of simulated and real data demonstrate the usefulness of the proposed approach.