Get access

Assessing geographic heterogeneity and variable importance in an air pollution data set



In this article, we examine data on the relationship between air quality and mortality in the United States using a published observational data set. Observational studies are complex and open to various interpretations. We show that there is geographic heterogeneity for the effect of air pollution on longevity. We also show that the relative importance of air pollution on longevity is much less than that of income or smoking. Most often authors do not address the relative importance of variables under consideration, choosing instead to concentrate on specific claims of significance. Yet good policy decisions require knowledge of the magnitude of relevant effects. Our analysis uses three methods for determining variable importance, showing how this puts predictor variables into a context that supports sound environmental policymaking. In particular, using both regression and recursive partitioning, we are able to confirm a spatial interaction with the air quality variable PM2.5; there is no significant association of PM2.5 with longevity in the west of the United States. We also determine the relative importance of PM2.5 in comparison to other predictor variables available in this data set. Our findings call into question the claim made by the original researchers. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013