Inference on Regressions with Interval Data on a Regressor or Outcome



This paper examines inference on regressions when interval data are available on one variable, the other variables being measured precisely. Let a population be characterized by a distribution P(y, x, v, v0, v1), where yR1, xRk, and the real variables (v, v0, v1) satisfy v0vv1. Let a random sample be drawn from P and the realizations of (y, x, v0, v1) be observed, but not those of v. The problem of interest may be to infer E(y|x, v) or E(v|x). This analysis maintains Interval (I), Monotonicity (M), and Mean Independence (MI) assumptions: (I) P(v0vv1)=1; (M) E(y|x, v) is monotone in v; (MI) E(y|x, v, v0, v1)=E(y|x, v). No restrictions are imposed on the distribution of the unobserved values of v within the observed intervals [v0, v1]. It is found that the IMMI Assumptions alone imply simple nonparametric bounds on E(y|x, v) and E(v|x). These assumptions invoked when y is binary and combined with a semiparametric binary regression model yield an identification region for the parameters that may be estimated consistently by a modified maximum score (MMS) method. The IMMI assumptions combined with a parametric model for E(y|x, v) or E(v|x) yield an identification region that may be estimated consistently by a modified minimum-distance (MMD) method. Monte Carlo methods are used to characterize the finite-sample performance of these estimators. Empirical case studies are performed using interval wealth data in the Health and Retirement Study and interval income data in the Current Population Survey.