Volume 58, Issue 4
ARTICLE

Structural Topic Models for Open‐Ended Survey Responses

First published: 06 March 2014
Citations: 349

Our thanks to the Caltech SURF program, IQSS's Program on Text Analysis, and Dustin Tingley's dean support for supporting Jetson's initial participation during the summer of 2012. Brandon Stewart gratefully acknowledges funding from a National Science Foundation Graduate Research Fellowship. Alex Storer helped get computers to do their job. We thank the following for helpful comments and suggestions: Neal Beck, Justin Grimmer, Jennifer Jerit, Luke Keele, Gary King, Mik Laver, Rose McDermott, Helen Milner, Rich Nielsen, Brendan O'Connor, Mike Tomz, and participants in the Harvard Political Economy and Applied Statistics Workshops, UT Austin Government Department IR Seminar, Visions in Methodology 2013, and Stanford Methods Seminar. Replication files are available in the AJPS Data Archive on Dataverse (http://dvn.iq.harvard.edu/dvn/dv/ajps). The supplementary appendix is available at http://scholar.harvard.edu/les/dtingley/les/ajpsappendix.pdf.

Abstract

Collection and especially analysis of open‐ended survey responses are relatively rare in the discipline and when conducted are almost exclusively done through human coding. We present an alternative, semiautomated approach, the structural topic model (STM) (Roberts, Stewart, and Airoldi 2013; Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment (if an experimental study). This article focuses on how the STM is helpful for survey researchers and experimentalists. The STM makes analyzing open‐ended responses easier, more revealing, and capable of being used to estimate treatment effects. We illustrate these innovations with analysis of text from surveys and experiments.

Number of times cited according to CrossRef: 349

  • Prescriptive analytics: Literature review and research challenges, International Journal of Information Management, 10.1016/j.ijinfomgt.2019.04.003, 50, (57-70), (2020).
  • Censorship’s Effect on Incidental Exposure to Information: Evidence From Wikipedia, SAGE Open, 10.1177/2158244019894068, 10, 1, (215824401989406), (2020).
  • Understanding Demand for Project Manager Competences in the Construction Industry: Data Mining Approach, Journal of Construction Engineering and Management, 10.1061/(ASCE)CO.1943-7862.0001865, 146, 8, (04020083), (2020).
  • Text as Data for Conflict Research: A Literature Survey, Computational Conflict Research, 10.1007/978-3-030-29333-8_3, (43-65), (2020).
  • Who cares about Norway's energy transition? A survey experiment about citizen associations and petroleum, Energy Research & Social Science, 10.1016/j.erss.2019.101357, 62, (101357), (2020).
  • Beyond Fact-Checking: Network Analysis Tools for Monitoring Disinformation in Social Media, Complex Networks and Their Applications VIII, 10.1007/978-3-030-36687-2_36, (436-447), (2020).
  • Measuring book impact via content-level academic review mining, The Electronic Library, 10.1108/EL-08-2019-0184, ahead-of-print, ahead-of-print, (2020).
  • Willingness of people who are blind to accept autonomous vehicles: An empirical investigation, Transportation Research Part F: Traffic Psychology and Behaviour, 10.1016/j.trf.2019.12.012, 69, (13-27), (2020).
  • “Reporting on climate change: A computational analysis of U.S. newspapers and sources of bias, 1997–2017”, Global Environmental Change, 10.1016/j.gloenvcha.2020.102038, 61, (102038), (2020).
  • Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education, Computers & Education, 10.1016/j.compedu.2020.103855, 151, (103855), (2020).
  • Let's speak more? How the ECB responds to public contestation, Journal of European Public Policy, 10.1080/13501763.2020.1712457, 27, 3, (400-418), (2020).
  • Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach, IEEE Access, 10.1109/ACCESS.2020.2974983, 8, (35318-35330), (2020).
  • EU Cohesion Policy under the Media Spotlight: Exploring Territorial and Temporal Patterns in News Coverage and Tone, JCMS: Journal of Common Market Studies, 10.1111/jcms.13016, 58, 4, (1034-1055), (2020).
  • The use of Twitter for innovation in business markets, Marketing Intelligence & Planning, 10.1108/MIP-06-2019-0349, ahead-of-print, ahead-of-print, (2020).
  • Topic modeling and technology forecasting for assessing the commercial viability of healthcare innovations, Technological Forecasting and Social Change, 10.1016/j.techfore.2020.120041, 156, (120041), (2020).
  • Classification of Social Media Users Based on Disagreement and Stance Analysis, Intelligent Technologies and Applications, 10.1007/978-981-15-5232-8_27, (309-321), (2020).
  • Evaluating universities' strategic online communication: how do Shanghai Ranking's top 50 universities grow stakeholder engagement with Facebook posts?, Journal of Communication Management, 10.1108/JCOM-06-2019-0090, ahead-of-print, ahead-of-print, (2020).
  • Combination of Topic Modelling and Decision Tree Classification for Tourist Destination Marketing, Advanced Information Systems Engineering Workshops, 10.1007/978-3-030-49165-9_9, (95-108), (2020).
  • Automated text analysis for understanding radical activism: The topical agenda of the North American animal liberation movement, Research & Politics, 10.1177/2053168020921742, 7, 2, (205316802092174), (2020).
  • A Collaborative Way of Knowing: Bridging Computational Communication Research and Grounded Theory Ethnography, Journal of Communication, 10.1093/joc/jqaa013, 70, 3, (447-472), (2020).
  • Examining Senior Drivers’ Attitudes Toward Advanced Driver Assistance Systems After Naturalistic Exposure, Innovation in Aging, 10.1093/geroni/igaa017, 4, 3, (2020).
  • Political Science: Moving from Numbers to Words in the Case of Brexit, Statistical Semantics, 10.1007/978-3-030-37250-7, (249-262), (2020).
  • Understanding retail quality of sporting goods stores: a text mining approach, International Journal of Sports Marketing and Sponsorship, 10.1108/IJSMS-03-2020-0029, ahead-of-print, ahead-of-print, (2020).
  • Variable Importance Analysis in Imbalanced Datasets: A New Approach, IEEE Access, 10.1109/ACCESS.2020.3008416, 8, (127404-127430), (2020).
  • Fuzzy metatopics predicting prices of Airbnb accomodations, Journal of Intelligent & Fuzzy Systems, 10.3233/JIFS-189193, (1-13), (2020).
  • TRENDS AND FOUNDATIONS IN RESEARCH ON STUDENTS’ CONCEPTUAL UNDERSTANDING IN SCIENCE EDUCATION: A METHOD BASED ON THE STRUCTURAL TOPIC MODEL, Journal of Baltic Science Education, 10.33225/jbse/20.19.551, 19, 4, (551-568), (2020).
  • What does Congress want from the National Science Foundation? A content analysis of remarks from 1995 to 2018, Science Advances, 10.1126/sciadv.aaz6300, 6, 33, (eaaz6300), (2020).
  • Exploring the underlying factors of customer value in restaurants: A machine learning approach, International Journal of Hospitality Management, 10.1016/j.ijhm.2020.102643, 91, (102643), (2020).
  • Hatred She Wrote: A Comparative Topic Analysis of Extreme Right and Islamic State Women-Only Forums, Radicalization and Counter-Radicalization, 10.1108/S1521-613620200000025011, (183-205), (2020).
  • Data science: developing theoretical contributions in information systems via text analytics, Journal of Big Data, 10.1186/s40537-019-0280-6, 7, 1, (2020).
  • Application of machine learning techniques to assess the trends and alignment of the funded research output, Journal of Informetrics, 10.1016/j.joi.2020.101018, 14, 2, (101018), (2020).
  • The Problem of Data Bias in the Pool of Published U.S. Appellate Court Opinions, Journal of Empirical Legal Studies, 10.1111/jels.12253, 17, 2, (224-261), (2020).
  • Trump Tweets: How Often and on What Topics, Trump, Twitter, and the American Democracy, 10.1007/978-3-030-44242-2, (53-87), (2020).
  • Why Machines Matter for Survey and Social Science Researchers, Big Data Meets Survey Science, 10.1002/9781118976357, (9-62), (2020).
  • Humanistic interpretation and machine learning, Synthese, 10.1007/s11229-020-02806-w, (2020).
  • Unravelling animal exposure profiles of human Q fever cases in Queensland, Australia, using natural language processing, Transboundary and Emerging Diseases, 10.1111/tbed.13565, 67, 5, (2133-2145), (2020).
  • Umfragen als Erhebungsinstrument in der politischen Kommunikationsforschung, Handbuch Politische Kommunikation, 10.1007/978-3-658-26242-6, (1-10), (2020).
  • Big Data, Fortgeschrittene Analyseverfahren in den Sozialwissenschaften, 10.1007/978-3-658-30237-5, (377-405), (2020).
  • A Twitter Political Corpus of the 2019 10N Spanish Election, Text, Speech, and Dialogue, 10.1007/978-3-030-58323-1_4, (41-49), (2020).
  • Understanding residents’ perceptions of nature and local economic activities using an open-ended question before protected area designation in Amami Islands, Japan, Journal for Nature Conservation, 10.1016/j.jnc.2020.125857, (125857), (2020).
  • What Are MOOCs Learners’ Concerns? Text Analysis of Reviews for Computer Science Courses, Database Systems for Advanced Applications. DASFAA 2020 International Workshops, 10.1007/978-3-030-59413-8_6, (73-79), (2020).
  • Employing structural topic modelling to explore perceived service quality attributes in Airbnb accommodation, International Journal of Hospitality Management, 10.1016/j.ijhm.2020.102676, 91, (102676), (2020).
  • What are the main patient safety concerns of healthcare stakeholders: a mixed-method study of Web-based text, International Journal of Medical Informatics, 10.1016/j.ijmedinf.2020.104162, (104162), (2020).
  • If a Tree Falls in the Forest: COVID-19, Media Choices, and Presidential Agenda Setting, SSRN Electronic Journal, 10.2139/ssrn.3697069, (2020).
  • Does Public Opinion Affect Political Speech?, American Journal of Political Science, 10.1111/ajps.12516, 64, 4, (921-937), (2020).
  • Improving public services by mining citizen feedback: An application of natural language processing, Public Administration, 10.1111/padm.12656, 0, 0, (2020).
  • Running as a Woman? Candidate Presentation in the 2018 Midterms, Political Research Quarterly, 10.1177/1065912920915787, (106591292091578), (2020).
  • Organizational context and budget orientations: A computational text analysis, International Public Management Journal, 10.1080/10967494.2019.1706677, (1-22), (2020).
  • Mapping the scattered field of research on higher education. A correlated topic model of 17,000 articles, 1991–2018, Higher Education, 10.1007/s10734-020-00500-x, (2020).
  • A General Model of Author “Style” with Application to the UK House of Commons, 1935–2018, Political Analysis, 10.1017/pan.2019.49, (1-23), (2020).
  • Modeling law search as prediction, Artificial Intelligence and Law, 10.1007/s10506-020-09261-5, (2020).
  • Explaining a bag of words with hierarchical conceptual labels, World Wide Web, 10.1007/s11280-019-00752-3, (2020).
  • ‘Party competition and dual accountability in multi-level systems’ the independence echo: the rise of the constitutional question in Scottish election manifestos and voter behaviour, Journal of Elections, Public Opinion and Parties, 10.1080/17457289.2020.1727486, (1-22), (2020).
  • Automatic translation, context, and supervised learning in comparative politics, Journal of Information Technology & Politics, 10.1080/19331681.2020.1731245, (1-10), (2020).
  • Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches, Social Science Computer Review, 10.1177/0894439320907027, (089443932090702), (2020).
  • SNAP judgments into the digital age: Reporting on food stamps varies significantly with time, publication type, and political leaning, PLOS ONE, 10.1371/journal.pone.0229180, 15, 2, (e0229180), (2020).
  • Characterizing a legal–intellectual culture: Bacon, Coke, and seventeenth-century England, Cliometrica, 10.1007/s11698-020-00202-5, (2020).
  • Russian Twitter Accounts and the Partisan Polarization of Vaccine Discourse, 2015–2017, American Journal of Public Health, 10.2105/AJPH.2019.305564, (e1-e7), (2020).
  • Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis, Applied Sciences, 10.3390/app10062157, 10, 6, (2157), (2020).
  • The Effects of Green Restaurant Attributes on Customer Satisfaction Using the Structural Topic Model on Online Customer Reviews, Sustainability, 10.3390/su12072843, 12, 7, (2843), (2020).
  • Understanding a bag of words by conceptual labeling with prior weights, World Wide Web, 10.1007/s11280-020-00806-x, (2020).
  • The Diversity–Innovation Paradox in Science, Proceedings of the National Academy of Sciences, 10.1073/pnas.1915378117, (201915378), (2020).
  • Male and female politicians on Twitter: A machine learning approach, European Journal of Political Research, 10.1111/1475-6765.12392, 0, 0, (2020).
  • Topicalizer: reframing core concepts in machine learning visualization by co-designing for interpretivist scholarship, Human–Computer Interaction, 10.1080/07370024.2020.1734460, (1-29), (2020).
  • Less Fragmented Than We Thought? Toward Clarification of a Subdisciplinary Linkage in Communication Science, 2010–2019, Journal of Communication, 10.1093/joc/jqaa009, (2020).
  • Incorporating citation impact into analysis of research trends, Scientometrics, 10.1007/s11192-020-03508-3, (2020).
  • Theory In, Theory Out: The Uses of Social Theory in Machine Learning for Social Science, Frontiers in Big Data, 10.3389/fdata.2020.00018, 3, (2020).
  • State Policy and Lobbying in a Federal System: Evidence from the Production Tax Credit for Renewable Energy, 1998–2012, State Politics & Policy Quarterly, 10.1177/1532440020918865, (153244002091886), (2020).
  • Policy Diffusion: The Issue‐Definition Stage, American Journal of Political Science, 10.1111/ajps.12521, 0, 0, (2020).
  • The Voice of Drug Consumers: Online Textual Review Analysis Using Structural Topic Model, International Journal of Environmental Research and Public Health, 10.3390/ijerph17103648, 17, 10, (3648), (2020).
  • Understanding the Complexity of Teacher Emotions From Online Forums: A Computational Text Analysis Approach, Frontiers in Psychology, 10.3389/fpsyg.2020.00921, 11, (2020).
  • Dictionaries, Supervised Learning, and Media Coverage of Public Policy, Political Communication, 10.1080/10584609.2020.1763529, (1-19), (2020).
  • Do Women Make More Credible Threats? Gender Stereotypes, Audience Costs, and Crisis Bargaining, International Organization, 10.1017/S0020818320000223, (1-24), (2020).
  • Authoritarian media and diversionary threats: lessons from 30 years of Syrian state discourse, Political Science Research and Methods, 10.1017/psrm.2020.28, (1-16), (2020).
  • Case Study of Trend Mining in Transportation Research Record Articles , Transportation Research Record: Journal of the Transportation Research Board, 10.1177/0361198120936254, (036119812093625), (2020).
  • Tracing Personality Structure in Narratives: A Computational Bottom‐Up Approach to Unpack Writers, Characters, and Personality in Historical Context, European Journal of Personality, 10.1002/per.2270, 0, 0, (2020).
  • ‘Good American citizens’: a text-as-data analysis of citizenship manuals for immigrants, 1921–1996, Journal of Ethnic and Migration Studies, 10.1080/1369183X.2020.1785852, (1-24), (2020).
  • Transformation Inspired from the Margins, ASIANetwork Exchange: A Journal for Asian Studies in the Liberal Arts, 10.16995/ane.308, 27, 1, (73), (2020).
  • Defining the key issues discussed by problematic gamblers on web-based forums: a data-driven approach, International Gambling Studies, 10.1080/14459795.2020.1801793, (1-15), (2020).
  • A Structural Topic Modeling-Based Bibliometric Study of Sentiment Analysis Literature, Cognitive Computation, 10.1007/s12559-020-09745-1, (2020).
  • Climate politics in hard times: How local economic shocks influence MPs attention to climate change, European Journal of Political Research, 10.1111/1475-6765.12415, 0, 0, (2020).
  • Cause for concerns: gender inequality in experiencing the COVID-19 lockdown in Germany, European Societies, 10.1080/14616696.2020.1808692, (1-14), (2020).
  • What’s the talk in Brussels? Leveraging daily news coverage to measure issue attention in the European Union, European Union Politics, 10.1177/1465116520902530, (146511652090253), (2020).
  • A machine-learning history of English caselaw and legal ideas prior to the Industrial Revolution I: generating and interpreting the estimates, Journal of Institutional Economics, 10.1017/S1744137420000326, (1-19), (2020).
  • Topics and trends in artificial intelligence assisted human brain research, PLOS ONE, 10.1371/journal.pone.0231192, 15, 4, (e0231192), (2020).
  • Measuring Discretion and Delegation in Legislative Texts: Methods and Application to US States, Political Analysis, 10.1017/pan.2020.9, (1-15), (2020).
  • Regulated Dependence: Platform Workers’ Responses to New Forms of Organizing, Journal of Management Studies, 10.1111/joms.12577, 0, 0, (2020).
  • A structural topic model approach to scientific reorientation of economics and chemistry after German reunification, Scientometrics, 10.1007/s11192-020-03640-0, (2020).
  • Public views on carbon taxation and its fairness: a computational-linguistics analysis, Climatic Change, 10.1007/s10584-020-02842-y, (2020).
  • Examining Corporate Communications of Environmental Responsibility on Corporate Websites: Main Themes, Linguistic Features, and Text Reuse, Journal of Promotion Management, 10.1080/10496491.2020.1746467, (1-25), (2020).
  • Forecasting the Equity Premium: Mind the News!, Review of Finance, 10.1093/rof/rfaa007, (2020).
  • The German Far-right on YouTube: An Analysis of User Overlap and User Comments, Journal of Broadcasting & Electronic Media, 10.1080/08838151.2020.1799690, (1-24), (2020).
  • Online influence, offline violence: language use on YouTube surrounding the ‘Unite the Right’ rally, Journal of Computational Social Science, 10.1007/s42001-020-00080-x, (2020).
  • Mapping the Dutch Energy Transition Hyperlink Network, Sustainability, 10.3390/su12187629, 12, 18, (7629), (2020).
  • Computational Identification of Media Frames: Strengths, Weaknesses, and Opportunities, Political Communication, 10.1080/10584609.2020.1812777, (1-23), (2020).
  • The Political Logic of Government Disclosure: Evidence from Information Requests in Mexico, The Journal of Politics, 10.1086/709148, (000-000), (2020).
  • Trends and Foundations of Creativity Research in Education: A Method Based on Text Mining, Creativity Research Journal, 10.1080/10400419.2020.1821554, (1-13), (2020).
  • Tracking tourism and hospitality employees’ real-time perceptions and emotions in an online community during the COVID-19 pandemic, Current Issues in Tourism, 10.1080/13683500.2020.1823336, (1-5), (2020).
  • Who do sovereign wealth funds say they are? Using structural topic modeling to delineate variegated capitalism in their official reports, Environment and Planning A: Economy and Space, 10.1177/0308518X20951808, (0308518X2095180), (2020).
  • Behavioral Change Towards Reduced Intensity Physical Activity Is Disproportionately Prevalent Among Adults With Serious Health Issues or Self-Perception of High Risk During the UK COVID-19 Lockdown, Frontiers in Public Health, 10.3389/fpubh.2020.575091, 8, (2020).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.