These authors jointly directed the work.
Marbled Inflation From Population Structure in Gene-Based Association Studies With Rare Variants
Article first published online: 6 MAR 2013
© 2013 Wiley Periodicals, Inc.
Volume 37, Issue 3, pages 286–292, April 2013
How to Cite
Liu, Q., Nicolae, D. L. and Chen, L. S. (2013), Marbled Inflation From Population Structure in Gene-Based Association Studies With Rare Variants. Genet. Epidemiol., 37: 286–292. doi: 10.1002/gepi.21714
- Issue published online: 25 MAR 2013
- Article first published online: 6 MAR 2013
- Manuscript Accepted: 5 FEB 2013
- Manuscript Revised: 18 JAN 2013
- Manuscript Received: 29 NOV 2012
- NIH. Grant Numbers: HL087665, MH090937, HG005773
- sequencing studies;
- gene-based association test;
- genomic control;
- principal component analysis;
- C-alpha test;
- burden test
Accurate genetic association studies are crucial for the detection and the validation of disease determinants. One of the main confounding factors that affect accuracy is population stratification, and great efforts have been extended for the past decade to detect and to adjust for it. We have now efficient solutions for population stratification adjustment for single-SNP (where SNP is single-nucleotide polymorphisms) inference in genome-wide association studies, but it is unclear whether these solutions can be effectively applied to rare variation studies and in particular gene-based (or set-based) association methods that jointly analyze multiple rare and common variants. We examine here, both theoretically and empirically, the performance of two commonly used approaches for population stratification adjustment—genomic control and principal component analysis—when used on gene-based association tests. We show that, different from single-SNP inference, genes with diverse composition of rare and common variants may suffer from population stratification to various extent. The inflation in gene-level statistics could be impacted by the number and the allele frequency spectrum of SNPs in the gene, and by the gene-based testing method used in the analysis. As a consequence, using a universal inflation factor as a genomic control should be avoided in gene-based inference with sequencing data. We also demonstrate that caution needs to be exercised when using principal component adjustment because the accuracy of the adjusted analyses depends on the underlying population substructure, on the way the principal components are constructed, and on the number of principal components used to recover the substructure.