A Novel Bayesian Semiparametric Algorithm for Inferring Population Structure and Adjusting for Case-Control Association Tests



While the population-based case-control approach is the popular study design for association mapping of complex genetic traits because of ease of data collection and statistical analyses, it suffers from the inherent problem of population stratification. There have been methodological developments for adjusting these studies for population substructure, but efficient estimation of the number of subpopulations (K), which has evolutionary significance, remains a statistical challenge. In this article, we propose a Bayesian semiparametric approach to estimate population substructure under the assumption that K is random. Using extensive simulations, we find that our proposed method is not only computationally much faster than an existing Bayesian approach Structure, but also estimates the number of subpopulations more accurately, and thus, yields more power in detecting association in case-control studies.