The editor in charge of this paper was edited by Stefano DellaVigna.
PROXYING FOR UNOBSERVABLE VARIABLES WITH INTERNET DOCUMENT-FREQUENCY
Version of Record online: 15 JAN 2013
© 2012 by the European Economic Association
Journal of the European Economic Association
Volume 11, Issue 1, pages 137–165, February 2013
How to Cite
Saiz, A. and Simonsohn, U. (2013), PROXYING FOR UNOBSERVABLE VARIABLES WITH INTERNET DOCUMENT-FREQUENCY. Journal of the European Economic Association, 11: 137–165. doi: 10.1111/j.1542-4774.2012.01110.x
Acknowledgments: This is an improved version on an older circulating working paper (Saiz and Simonsohn, 2007). We thank participants at departmental presentations at Wharton, Berkeley, and IZA-Bonn, and at NARSC and SJDM conferences, the editor, and three referees for useful comments. Remaining errors are ours. Saiz acknowledges support from the Research Sponsors Program of the Zell/Lurie Real Estate Center at Wharton. Shalini Bhutani, David Kwon, Caleb Li, Joe Evangelist, and Blake Wilmarth provided excellent research assistance. Saiz is also a Research Fellow at IZA.
- Issue online: 15 JAN 2013
- Version of Record online: 15 JAN 2013
The internet contains billions of documents. We show that document frequencies in large decentralized textual databases can capture the cross-sectional variation in the occurrence frequencies of social phenomena. We characterize the econometric conditions under which such proxying is likely. We also propose using recently-introduced internet search volume indexes as proxies for fundamental locational traits, and discuss their advantages and limitations. We then successfully proxy for a number of economic and demographic variables in US cities and states. We further obtain document-frequency measures of corruption by country and US state and replicate the econometric results of previous research studying its covariates. Finally, we provide the first measure of corruption in American cities. Poverty, population size, service-sector orientation, and ethnic fragmentation are shown to predict higher levels of corruption in urban America.