Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data

Authors


  • Earlier versions of this article were presented in seminars at Harvard University, Washington University, and Binghamton University—SUNY, at the 2002 Annual Meeting of the American Political Science Association, the 2002 Political Methodology Summer Meeting, and the 2002 Annual Meeting of the Midwest Political Science Association, and significantly different versions of some parts were presented at the 2001 Joint Statistical Meetings, and at the 2001 Political Methodology Summer Meeting. We thank Jonathan Wand for contributions to earlier versions of this work, Todd Rice and Lamarck, Inc., for generous support and provision of computing resources, John Jackson for giving us his FORTRAN code and Poland data, and Gary King for helpful comments. The authors share equal responsibility for all errors.

Walter R. Mebane, Jr. is Professor of Government, Cornell University, 217 White Hall, Ithaca, NY 14853-4601 (wrm1@cornell.edu). Jasjeet S. Sekhon is Assistant Professor, Government, Harvard University, 34 Kirkland Street, Cambridge, MA 02138 (jasjeet_sekhon@harvard.edu).

Abstract

We develop a robust estimator—the hyperbolic tangent (tanh) estimator—for overdispersed multinomial regression models of count data. The tanh estimator provides accurate estimates and reliable inferences even when the specified model is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are identified as part of the estimation. A Monte Carlo sampling experiment shows that the tanh estimator produces good results at practical sample sizes even when ten percent of the data are generated by a significantly different process. The experiment shows that, with contaminated data, estimation fails using four other estimators: the nonrobust maximum likelihood estimator, the additive logistic model and two SUR models. Using the tanh estimator to analyze data from Florida for the 2000 presidential election matches well-known features of the election that the other four estimators fail to capture. In an analysis of data from the 1993 Polish parliamentary election, the tanh estimator gives sharper inferences than does a previously proposed heteroskedastic SUR model.

Ancillary