Get access

Regression analysis using dependent Polya trees


  • Angela Schörgendorfer,

    Corresponding author
    1. IBM T.J. Watson Research Center, Yorktown Heights, New York, U.S.A.
    • Correspondence to: Angela Schörgendorfer, IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York, 10598, U.S.A.


    Search for more papers by this author
  • Adam J. Branscum

    1. School of Biological and Population Health Sciences, Oregon State University, Corvallis, Oregon, U.S.A.
    Search for more papers by this author


Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

Get access to the full text of this article