A Bayesian method for estimating prevalence in the presence of a hidden sub-population


Michelle Xia, Department of Statistics, University of British Columbia, Vancouver, BC, Canada V6T1Z2.

E-mail: cxia@stat.ubc.ca


When estimating the prevalence of a binary trait in a population, the presence of a hidden sub-population that cannot be sampled will lead to nonidentifiability and potentially biased estimation. We propose a Bayesian model of trait prevalence for a weighted sample from the non-hidden portion of the population, by modeling the relationship between prevalence and sampling probability. We studied the behavior of the posterior distribution on population prevalence, with the large-sample limits of posterior distributions obtained in simple analytical forms that give intuitively expected properties. We performed MCMC simulations on finite samples to evaluate the effectiveness of statistical learning. We applied the model and the results to two illustrative datasets arising from weighted sampling. Our work confirms that sensible results can be obtained using Bayesian analysis, despite the nonidentifiability in this situation. Copyright © 2012 John Wiley & Sons, Ltd.