Get access

What to Do about Missing Values in Time-Series Cross-Section Data


  • All information necessary to replicate the results in this article can be found in Honaker and King (2010). We have written an easy-to-use software package, with Matthew Blackwell, that implements all the methods introduced in this article; it is called “Amelia II: A Program for Missing Data” and is available at Our thanks to Neal Beck, Adam Berinsky, Matthew Blackwell, Jeff Lewis, Kevin Quinn, Don Rubin, Ken Scheve, and Jean Tomphie for helpful comments, the National Institutes of Aging (P01 AG17625-01), the National Science Foundation (SES-0318275, IIS-9874747, SES-0550873), and the Mexican Ministry of Health for research support.

James Honaker is a lecturer at The Pennsylvania State University, Department of Political Science, Pond Laboratory, University Park, PA 16802 ( Gary King is Albert J. Weatherhead III University Professor, Harvard University, Institute for Quantitative Social Science, 1737 Cambridge Street, Cambridge, MA 02138 (,


Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation with three related developments. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we enable analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also make it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms.