• Open Access

Sensitivity to missing data assumptions: Theory and an evaluation of the U.S. wage structure


  • Patrick Kline,

    1. Department of Economics, University of California, Berkeley and NBER; pkline@econ.berkeley.edu
    Search for more papers by this author
  • Andres Santos

    1. Department of Economics, University of California, San Diego; a2santos@ucsd.edu
    Search for more papers by this author
    • We thank Elie Tamer and two anonymous referees for comments and suggestions that greatly improved this paper. We are also grateful to David Card, Guido Imbens, Justin McCrary, Azeem Shaikh, Hal White, and seminar participants at UC Berkeley, UC San Diego, the University of Michigan, USC, Stanford, Wisconsin, the 2010 NBER Summer Institute, the 2010 Western Economic Association Summer Meeting, the 2010 Seoul Summer Economics Conference, and the 2010 Econometric Society World Congress for useful comments and corrections. We thank Ivan Fernández-Val for assistance in replicating the results of Angrist, Chernozhukov, and Fernández-Val (2006). A previous version of this paper circulated under the title “Interval estimation of potentially misspecified quantile models in the presence of missing data.”


This paper develops methods for assessing the sensitivity of empirical conclusions regarding conditional distributions to departures from the missing at random (MAR) assumption. We index the degree of nonignorable selection governing the missing data process by the maximal Kolmogorov–Smirnov distance between the distributions of missing and observed outcomes across all values of the covariates. Sharp bounds on minimum mean square approximations to conditional quantiles are derived as a function of the nominal level of selection considered in the sensitivity analysis and a weighted bootstrap procedure is developed for conducting inference. Using these techniques, we conduct an empirical assessment of the sensitivity of observed earnings patterns in U.S. Census data to deviations from the MAR assumption. We find that the well documented increase in the returns to schooling between 1980 and 1990 is relatively robust to deviations from the missing at random assumption except at the lowest quantiles of the distribution, but that conclusions regarding heterogeneity in returns and changes in the returns function between 1990 and 2000 are very sensitive to departures from ignorability.