How to Analyze Political Attention with Minimal Assumptions and Costs


  • An earlier version of this article was presented to the Midwest Political Science Association and was awarded the 2006 Harold Gosnell Prize for Excellence in Political Methodology. We would like to thank Steven Abney, Scott Adler, Scott Ainsworth, Frank Baumgartner, Ken Bickers, David Blei, Jake Bowers, Janet Box-Steffensmeier, Patrick Brandt, Barry Burden, Suzie Linn, John Freeman, Ed Hovy, Will Howell, Simon Jackman, Brad Jones, Bryan Jones, Kris Kanthak, Gary King, Glen Krutz, Frances Lee, Bob Luskin, Chris Manning, Andrew Martin, Andrew McCallum, Iain McLean, Nate Monroe, Becky Morton, Stephen Purpura, Phil Schrodt, Gisela Sin, Betsy Sinclair, Michael Ward, John Wilkerson, Dan Wood, Chris Zorn, and seminar participants at UC Davis, Harvard University, the University of Michigan, the University of Pittsburgh, the University of Rochester, Stanford University, the University of Washington, and Washington University in St. Louis for their comments on earlier versions of the article. We would like to give special thanks to Cheryl Monroe for her contributions toward development of the Congressional corpus in specific and our data collection procedures in general. We would also like to thank Jacob Balazer (Michigan) and Tony Fader (Michigan) for research assistance. In addition, Quinn thanks the Center for Advanced Study in the Behavioral Sciences for its hospitality and support. This article is based upon work supported by the National Science Foundation under grants BCS 05-27513 and BCS 07-14688. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation. Supplementary materials, including web appendices and a replication archive with data and R package, can be found at

Kevin M. Quinn is Professor of Law, University of California, Berkeley, 490 Simon #7200, Berkeley, CA 94720-7200 ( Burt L. Monroe is Associate Professor of Political Science and Director of the Quantitative Social Science Initiative, The Pennsylvania State University, 230 Pond Lab, University Park, PA 16802-6200 ( Michael Colaresi is Associate Professor of Political Science, Michigan State University, 303 South Kedzie Hall, East Lansing, MI 48824 ( Michael H. Crespin is Assistant Professor of Political Science, University of Georgia, 407 Baldwin Hall, Athens, GA 30602 ( Dragomir R. Radev is Associate Professor, School of Information and Department of Electrical Engineering and Computer Science, University of Michigan, 3310 EECS Building, 1301 Beal Avenue, Ann Arbor, MI 48109-2122 (


Previous methods of analyzing the substance of political attention have had to make several restrictive assumptions or been prohibitively costly when applied to large-scale political texts. Here, we describe a topic model for legislative speech, a statistical learning model that uses word choices to infer topical categories covered in a set of speeches and to identify the topic of specific speeches. Our method estimates, rather than assumes, the substance of topics, the keywords that identify topics, and the hierarchical nesting of topics. We use the topic model to examine the agenda in the U.S. Senate from 1997 to 2004. Using a new database of over 118,000 speeches (70,000,000 words) from the Congressional Record, our model reveals speech topic categories that are both distinctive and meaningfully interrelated and a richer view of democratic agenda dynamics than had previously been possible.