Variable selection for high dimensional Bayesian density estimation: application to human exposure simulation


Brian J. Reich, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Box 8203, Raleigh, NC 27695-8203, USA. E-mail:


Summary.  Numerous studies have linked ambient air pollution and adverse health outcomes. Many studies of this nature relate outdoor pollution levels measured at a few monitoring stations with health outcomes. Recently, computational methods have been developed to model the distribution of personal exposures, rather than ambient concentration, and then relate the exposure distribution to the health outcome. Although these methods show great promise, they are limited by the computational demands of the exposure model. We propose a method to alleviate these computational burdens with the eventual goal of implementing a national study of the health effects of air pollution exposure. Our approach is to develop a statistical emulator for the exposure model, i.e. we use Bayesian density estimation to predict the conditional exposure distribution as a function of several variables, such as temperature, human activity and physical characteristics of the pollutant. This poses a challenging statistical problem because there are many predictors of the exposure distribution and density estimation is notoriously difficult in high dimensions. To overcome this challenge, we use stochastic search variable selection to identify a subset of the variables that have more than just additive effects on the mean of the exposure distribution. We apply our method to emulate an ozone exposure model in Philadelphia.