How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches


  • I am indebted to my reviewers and the editor Rick Wilson, whose criticisms and suggestions improved the article considerably. Equal thanks to Jeff Gill, Thomas Gschwend, Adam Ramey, Tobias Heinrich, Thomas Pluemper, Vera Troeger, Tom Scotto, Ray Duch, and Michel Becher for their excellent comments and advice. Earlier versions of this article were presented at the 2011 annual meeting of the Midwest Political Science Association and the first conference of the European Political Science Association in Dublin, 2011. I thank participants and discussants for helpful comments and suggestions. Replication data and supplemental material are available at the author's website and the AJPS dataverse at


Researchers in comparative research increasingly use multilevel models to test effects of country-level factors on individual behavior and preferences. However, the asymptotic justification of widely employed estimation strategies presumes large samples and applications in comparative politics routinely involve only a small number of countries. Thus, researchers and reviewers often wonder if these models are applicable at all. In other words, how many countries do we need for multilevel modeling? I present results from a large-scale Monte Carlo experiment comparing the performance of multilevel models when few countries are available. I find that maximum likelihood estimates and confidence intervals can be severely biased, especially in models including cross-level interactions. In contrast, the Bayesian approach proves to be far more robust and yields considerably more conservative tests.