This is a revision of an October 2010 ArXiv/CEMMAP paper with the same title. Preliminary results of this paper were first presented at Chernozhukov's invited Cowles Foundation lecture at the Northern American meetings of the Econometric Society in June of 2009. We thank seminar participants at Brown, Columbia, Hebrew University, Tel Aviv University, Harvard–MIT, the Dutch Econometric Study Group, Fuqua School of Business, NYU, and New Economic School, for helpful comments. We also thank Denis Chetverikov, JB Doyle, and Joonhwan Lee for thorough reading of this paper and very useful comments.
Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain
Article first published online: 26 NOV 2012
© 2012 The Econometric Society
Volume 80, Issue 6, pages 2369–2429, November 2012
How to Cite
Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012), Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica, 80: 2369–2429. doi: 10.3982/ECTA9626
- Issue published online: 26 NOV 2012
- Article first published online: 26 NOV 2012
- Manuscript received October, 2010; final revision received June, 2012.
- Inference on a low-dimensional parameter after model selection;
- imperfect model selection;
- instrumental variables;
- data-driven penalty;
- non-Gaussian errors;
- moderate deviations for self-normalized sums
We develop results for the use of Lasso and post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, p. Our results apply even when p is much larger than the sample size, n. We show that the IV estimator based on using Lasso or post-Lasso in the first stage is root-n consistent and asymptotically normal when the first stage is approximately sparse, that is, when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show that the estimator is semiparametrically efficient when the structural error is homoscedastic. Notably, our results allow for imperfect model selection, and do not rely upon the unrealistic “beta-min” conditions that are widely used to establish validity of inference following model selection (see also Belloni, Chernozhukov, and Hansen (2011b)). In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark.
Optimal instruments are conditional expectations. In developing the IV results, we establish a series of new results for Lasso and post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances that uses a data-weighted ℓ1-penalty function. By innovatively using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that logp = o(n1/3). We also provide a data-driven method for choosing the penalty level that must be specified in obtaining Lasso and post-Lasso estimates and establish its asymptotic validity under non-Gaussian, heteroscedastic disturbances.