A Pseudo-Bayesian Shrinkage Approach to Regression with Missing Covariates


  • Nanhua Zhang,

    Corresponding author
    1. Department of Epidemiology & Biostatistics, College of Public Health, University of South Florida, Tampa, Florida 33612–3085, U.S.A.
    Search for more papers by this author
  • Roderick J. Little

    Corresponding author
    1. Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109–2029, U.S.A.
    Search for more papers by this author




Summary We consider the linear regression of outcome Y on regressors W and Z with some values of W missing, when our main interest is the effect of Z on Y, controlling for W. Three common approaches to regression with missing covariates are (i) complete-case analysis (CC), which discards the incomplete cases, and (ii) ignorable likelihood methods, which base inference on the likelihood based on the observed data, assuming the missing data are missing at random (Rubin, 1976b), and (iii) nonignorable modeling, which posits a joint distribution of the variables and missing data indicators. Another simple practical approach that has not received much theoretical attention is to drop the regressor variables containing missing values from the regression modeling (DV, for drop variables). DV does not lead to bias when either (i) the regression coefficient of W is zero or (ii) W and Z are uncorrelated. We propose a pseudo-Bayesian approach for regression with missing covariates that compromises between the CC and DV estimates, exploiting information in the incomplete cases when the data support DV assumptions. We illustrate favorable properties of the method by simulation, and apply the proposed method to a liver cancer study. Extension of the method to more than one missing covariate is also discussed.