Note: Ken Bollen made valuable suggestions, and Nash Herndon provided useful editorial comments. Suggestions of the two referees and the editor were crucial in putting together a concise and informative paper. Partial financial support was provided by the U.S. Agency for International Development through the MEASURE Evaluation project, Carolina Population Center, University of North Carolina at Chapel Hill, under the terms of Cooperative Agreement #GPO-A-00-03-00003-00. The views represented in the paper are those of the authors, and the remaining errors are their responsibility.
SOCIOECONOMIC STATUS MEASUREMENT WITH DISCRETE PROXY VARIABLES: IS PRINCIPAL COMPONENT ANALYSIS A RELIABLE ANSWER?
Version of Record online: 16 FEB 2009
© 2009 The Authors. Journal compilation © 2009 International Association for Research in Income and Wealth Published by Blackwell Publishing
Review of Income and Wealth
Volume 55, Issue 1, pages 128–165, March 2009
How to Cite
Kolenikov, S. and Angeles, G. (2009), SOCIOECONOMIC STATUS MEASUREMENT WITH DISCRETE PROXY VARIABLES: IS PRINCIPAL COMPONENT ANALYSIS A RELIABLE ANSWER?. Review of Income and Wealth, 55: 128–165. doi: 10.1111/j.1475-4991.2008.00309.x
- Issue online: 16 FEB 2009
- Version of Record online: 16 FEB 2009
The last several years have seen a growth in the number of publications in economics that use principal component analysis (PCA) in the area of welfare studies. This paper explores the ways discrete data can be incorporated into PCA. The effects of discreteness of the observed variables on the PCA are reviewed. The statistical properties of the popular Filmer and Pritchett (2001) procedure are analyzed. The concepts of polychoric and polyserial correlations are introduced with appropriate references to the existing literature demonstrating their statistical properties. A large simulation study is carried out to compare various implementations of discrete data PCA. The simulation results show that the currently used method of running PCA on a set of dummy variables as proposed by Filmer and Pritchett (2001) can be improved upon by using procedures appropriate for discrete data, such as retaining the ordinal variables without breaking them into a set of dummy variables or using polychoric correlations. An empirical example using Bangladesh 2000 Demographic and Health Survey data helps in explaining the differences between procedures.