Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research


*Wayne McIntosh, Department of Government & Politics, University of Maryland, College Park, MD 20742; e-mail: Evans is in the Department of Government & Politics, University of Maryland; McIntosh is Associate Professor, Department of Government & Politics, University of Maryland; Lin is Assistant Professor, College of Information Studies, University of Maryland; Cates is Professor, Department of Political Science, Towson University.


Political scientists in general and public law specialists in particular have only recently begun to exploit text classification using machine learning techniques to enable the reliable and detailed content analysis of political/legal documents on a large scale. This article provides an overview and assessment of this methodology. We describe the basics of text classification, suggest applications of the technique to enhance empirical legal research (and political science more broadly), and report results of experiments designed to test the strengths and weaknesses of alternative approaches for classifying the positions and interpreting the content of advocacy briefs submitted to the U.S. Supreme Court. We find that the Wordscores method (introduced by Laver et al. 2003), and various models using a Naïve Bayes classifier, perform well at accurately classifying the ideological direction of amicus curiae briefs submitted in the Bakke (1978) and Bollinger (2003) affirmative action cases. We also find that automated feature selection techniques can enable the detection of disparate issue conceptualizations by opposing sides in a single case, and facilitate analysis of relative linguistic “reliance” and “dominance” over time. We conclude by discussing the implications of our results and pointing to areas where technical and infrastructure improvements are most needed.