Different Arenas, Different Deliberative Quality? Using a Systemic Framework to Evaluate Online Deliberation on Immigration Policy in Germany

© 2020 The Authors. Policy & Internet published by Wiley Periodicals, Inc. on behalf of Policy Studies Organization. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Different Arenas, Different Deliberative Quality? Using a Systemic Framework to Evaluate Online Deliberation on Immigration Policy in Germany


Introduction: Deliberative Quality, Heterogeneous Online Deliberative Practices, and a Research Framework for a Comparative Assessment
A major strand of contemporary research on political participation addresses public deliberation as a cornerstone of democracy (Delli Carpini, 2002). Public deliberation is the process through which deliberative democracy occurs: a communicative practice of reason-giving between citizens, civil society organizations, government officials, and mass media concerning disputed issues in the public sphere (Chambers, 1996;Dryzek, 2000;Habermas, 2006;Wessler, 2018). Since the Internet has become prevalent in all areas of life and now complements traditional forms of political participation, a growing number of participatory practices today takes place online (e.g., Coleman & Shane, 2012;Kersting, 2013;Warren, 2009). One of the reasons for this is the Internet's potential to provide an infrastructure for public deliberation (e.g., Dahlberg, 1998;Vromen, 2007;Wright & Street, 2007). Over time, public online deliberation has become increasingly pluralistic: deliberation processes are initiated for various purposes, by various actors and on various online platforms with different designs and different users (Esau, Friess, & Eilders, 2017;Janssen & Kies, 2005;Jensen, 2003). Accordingly, different types of public online deliberation can be expected to display different characteristics and fulfill different functions in democratic opinion-and will-formation, as well as in decision making.
The present study, therefore, used a systemic framework to evaluate different examples of online deliberation processes that raise different expectations (Ercan & Dryzek, 2015). A systemic approach to deliberative democracy is one that can conceptualize heterogeneous deliberative practices and sites, as well as their respective contributions to a deliberative system's overall quality (Elstub, Ercan, & Mendonça, 2016). This study refers to a crucial premise of the systemic approach: that the deliberative quality of communication will vary between different deliberative arenas within a deliberative system (Conover & Searing, 2005).
However, although a few case studies have successfully applied this premise to public deliberation (Felicetti, Niemeyer, & Curato, 2016;Jensen, 2003;Jonsson, 2015;Pedrini, 2014), no study has yet systematically analyzed this claim using both contemporary deliberative theory and empirical analysis of real-world deliberation. Our article addresses this gap and derives a framework that distinguishes between three different arenas of deliberation. We then flesh out various hypotheses concerning the deliberative quality of the deliberation processes that take place in these arenas and refer to both (i) classic concepts of deliberation (including rationality, reciprocity, respect, and constructiveness) and (ii) tolerant concepts of deliberation (including emotional expressions and storytelling as alternative forms of communication). Given the limited existing evidence on online deliberation processes from a systemic point of view, we seek to answer the following research questions: RQ1: How does the deliberative quality of online participatory practices vary between different deliberative arenas? RQ1.1: Does deliberative quality, according to a classic concept of deliberation, vary between different arenas? RQ1.2: Does the proportion of alternative forms of communication (expressions of emotion and storytelling) vary between arenas?
Previous empirical studies have shown that a broad variety of context variables can influence the deliberative quality of online political communication, including the design and functionality of an online platform (e.g., Esau et al., 2017;Wright & Street, 2007), participant motivation and other participant characteristics (e.g., Springer, Engelmann, & Pfaffinger, 2015), and the degree of controversy of the topic under discussion (e.g., Ziegele, Breiner, & Quiring, 2014). While we agree that these factors influence the deliberative quality, we also expect it to be affected by the nature of the public sphere (which we refer to here as "arena"). Accordingly, we selected different types of online deliberation sites for this study: an online consultation platform, three mass media platforms, and social media (Facebook) community page. These selected cases are ideal for the purpose of this study because they all center on the same discussion topic and can each be allocated to one of three arenas: formal, semi-formal, or informal. We then applied a quantitative content analysis of user comments written in the different deliberative arenas to analyze their deliberative quality.
This article seeks to answer the research questions outlined above. We begin by briefly summarizing the core premises of systemic deliberative theory, on which basis we then develop an analytical framework that enables the categorization of the different arenas within a deliberative system. We outline each arena's crucial characteristics according to this framework and examine the implications for the quality of deliberation in each arena. Next, we develop seven theory-driven hypotheses that focus analytically and empirically on online deliberation sites. Finally, the last three sections outline the empirical study's methodology, results, and discussion of the results.
The Systemic Approach: Deliberation, the Deliberative System and Deliberative Quality

Three Core Premises of Systemic Approaches and the Concept of Deliberative Quality
In contemporary democratic theory and practice, deliberative democracy is one of the most influential paradigms of democratic legitimacy (Dryzek & Niemeyer, 2010, pp. 21-42). Even though all deliberative accounts share the core assumption that taking and giving reasons on behalf of citizens and political elites is of outstanding importance for democratic legitimacy, there is a broad variety of approaches that spell out this basic idea in different ways. In contrast to previous generations of deliberative democracy research, 1 the systemic approach aims to "reconnect deliberative democratic theory to its initial macro ambitions: to enhance and understand democracy at large scale" (Boswell & Corbett, 2017, p. 803). 2 This point of departure has crucial implications for empirical evaluations of deliberative quality. First, we must differentiate assessments of the overall systems' 3 deliberative quality from assessments of an individual deliberation's quality, since "[a]pplying deliberative principles to evaluate particular instances of communication […] does not automatically translate to a concept that is useful in analysing and evaluating whole regimes or political systems" (Dryzek, 2009(Dryzek, , p. 1382. 4 Systemic scholars conceptualize "the 'deliberative quality' of a democracy as an 'emergent property' of the system as a whole"; accordingly, evaluations of a system's deliberative quality pose their own challenges (Fleuß et al., 2018, p. 13;Niemeyer, Curato, & Bächtiger, 2015). Second, a systemic approach to deliberative democracy implies that deliberation processes occur in a broad variety of deliberative sites, including for example parliaments, federal courts, civil society organizations, and pubs or kitchen tables (Mansbridge, 1999). Deliberations that occur in different sites fulfill varying functions for the system at large, and their quality may have to be evaluated by different standards (see Dryzek & Niemeyer, 2010, p. 7;Mansbridge et al., 2012, p. 22; also see Parkinson, 2006).
Classic deliberative theory conceptualizes deliberation as an approximation to Habermas's ideal speech situation (Habermas, 1990(Habermas, , 1996. This original conceptualization of deliberation, which has been termed "Type I deliberation" (Bächtiger, Niemeyer, Neblo, Steenbergen, & Steiner, 2010), characterizes deliberation as a process of strictly rational reason-giving that is restrained solely by the "forceless force of the better argument" and conforms with norms of mutual respect and reciprocity. The result to which it aspires is a consensual decision that realizes the common good (Goodin, 2018, pp. 884-885;Habermas, 1996). This concept also provides the basis for the most frequently applied measurement of deliberative quality-the Discourse Quality Index (DQI; Steenbergen, Bächtiger, Spörndli, & Steiner, 2003). This classic concept must be differentiated from tolerant ("Type II deliberation") concepts. Although scholars have proposed a variety of deliberation concepts that are summarized under this category, they share one fundamental feature: "Whatever the hue, type II deliberation involves a shift away from the idea of purely rational discourse toward a conception of deliberation that incorporates alternative forms of communication […]" (Bächtiger et al., 2010, p. 33). Systemic scholars tend to adopt a tolerant conception of deliberation that does justice to this heterogeneity of styles or forms, goals and sites of deliberation in real-world societies (Dryzek, 2000;Elstub et al., 2016;Mansbridge, 1999;Mansbridge et al., 2012).
In summary, systemic approaches to deliberation commit to three crucial claims (see Dryzek, 2000;Mansbridge et al., 2012;Owen & Smith, 2014): 5 Premise 1: Tolerant Conception of Deliberation. The deliberative practices that must be included in an evaluation of the overall deliberative quality of a (political) system are "deliberative" in a broad sense of the term: they are reason-giving processes between individuals or groups of individuals, although they do not necessarily conform with the rigid standards presupposed by early normative deliberative theory (Cohen, 1989;Habermas, 1990). The concept of deliberation, therefore, refers to different styles or forms of reason-giving which must be taken into account in measuring deliberative quality.
Premise 2: Heterogeneity of Deliberative Sites and Communication Styles. Deliberative systems are composed of a variety of deliberative sites-institutionalized and noninstitutionalized, decision making and non-decision making, online and offlinewhich may fulfill different functional roles within a complex political-societal system characterized by a division of labor (Christiano, 2012, p. 28;Kuyper, 2015;Mansbridge et al., 2012, pp. 2-3, 22-23). 6 Deliberative practices that occur at different types of deliberative sites are expected to display different qualities. Accordingly, they are likely to differ with regards to their conformity with classic standards of deliberation (e.g., as measured by the DQI). They are also assumed to differ in terms of the forms of communication used in deliberative procedures: in different contexts of deliberative systems, deliberators are likely to use different communicative means-and to complement or substitute formal and rational "taking and giving of reasons" by rhetoric, narrative, and emotional expression.
From the perspective of systemic deliberative theories, communication flows, or "transmissions," between different deliberative sites (most importantly, from the system's periphery to its administrative decision-making core) are of outstanding importance to the quality of a deliberative system (Dryzek & Niemeyer, 2010, p. 11;Habermas, 1996). The analysis of the variances in deliberative quality between different deliberative sites conducted in this article constitutes a basis for a more comprehensive analysis of the quality of deliberative systems. This more comprehensive analysis would have to include an analysis of the interactions between different deliberative sites, as indicated by Premise 3 (see Fleuß & Helbig, forthcoming): Premise 3: Interactive Relationships Between Deliberations in Different Sites. Within political-societal systems, deliberative and participatory practices interact with each other. Even though "[the] different settings are evaluated normally by different standards […], they are meant to fit together to make contributions to a process of collective decision-making" (Hendriks, 2016). Accordingly, the overall deliberative quality of a political system cannot be reduced to a mere accumulation of the deliberative quality of individual sites or spaces, but must include an analysis of their interactions .
Systemic approaches suggest applying a fine-grained conceptual framework to assess the vastly heterogeneous deliberative practices employed in contemporary polities and societies-on digital platforms and analogous fora, in innovative and "traditional" democratic institutions, in citizens' everyday communications and among government officials (see Boswell, 2013, p. 626). However, these conceptual possibilities are accompanied by a requirement to develop an empirically applicable framework that is suitable for reducing this overarching complexity . The next subsection, therefore, offers a systematization for the heterogeneity of deliberative procedures in political systems and allocates deliberations to three different arenas. This "map" then constitutes our conceptual starting point for deriving theory-driven hypotheses about deliberative quality in different online deliberation sites. The topic of all analyzed deliberation processes was immigration in Germany.

Mapping Out Deliberative Systems: Arenas of Deliberation and Their Characteristics
As mentioned previously, thinking about deliberative democracy in systemic terms is not an entirely new approach. The core idea can be traced "back to Habermas' (1996) notion of the dual-track model of deliberation" (Elstub et al., 2016, p. 143). Habermas conceptualized the socio-political system as being constituted by a political-administrative "center" and the "peripheries of discursive production": The center of the political system consists of the familiar institutions: parliaments, courts, administrative agencies and government. Each branch can be described as a specialized deliberative arena.
[…] At the periphery of the political system, the public sphere is rooted in networks for wild flows of messages-news, reports, commentaries, talks, scenes and images, and shows and movies with an informative, polemical, educational, or entertaining content. These published opinions originate from various types of actors-politicians and political parties, lobbyists and pressure groups, or actors of civil society. They are selected and shaped by mass-media professionals and received by broad and overlapping audiences, camps, subcultures, and so on (Habermas, 2006, pp. 415-416).
This basic understanding of the socio-political system is shared by most systemic scholars of deliberation (e.g., Dryzek, 2000;Mansbridge et al., 2012). In his broadly Habermasian approach, Dryzek distinguished between the public and the empowered spaces of deliberative systems (Dryzek, 2000; see also Dryzek & Niemeyer, 2010, p. 11). In democratic political systems, the public space is supposed to be the origin of the majority of democratic decision-making processes; it is "[…] ideally hosting free-ranging and wideranging communication, with no barriers limiting who can communicate, and few legal restrictions on what they can say" (Dryzek & Niemeyer, 2010, p. 11). The empowered space, by contrast, is "[…] home to deliberation among actors in institutions clearly producing collective decisions. Institutions here need to be formally constituted and empowered" and include deliberative sites or spaces such as parliaments, federal courts, cabinets or empowered stakeholders (Dryzek & Niemeyer, 2010, p. 11). This capacity to make collectively binding decisions is thus the primary criterion that distinguishes deliberations in empowered spaces from those in public spaces.
Dryzek's categorization has the advantage of being intuitive and easily applicable. The present article suggests a similar but adapted analytical framework. For the purposes of exploring the differences in deliberative quality in different deliberative sites throughout political systems, we need to balance the heterogeneity and complexity of deliberations in the real world's deliberative systems with the pragmatic need for a simplified and empirically feasible analytical framework. We refer to Conover & Searing (2005; see also , who differentiated between three arenas of deliberation within deliberative systems, wherein arena 1 essentially coincides with Dryzek's (2000;) empowered space and arenas 2 and 3 can be conceptualized as subcategories of Dryzek's public space: (1) Arenas of highly formal deliberations that "occur within institutions such as national courts, parliaments, and civil science departments" (Conover & Searing, 2005, p. 270). Highly formal deliberations may also occur within certain subtypes of democratic innovations, for example, on government-run online consultation platforms. (2) Arenas of semi-formal deliberations, such as "conversations between constituents and government officials, and conversations in political parties, interest groups, and the media" (Conover & Searing, 2005, p. 270). (3) Arenas of informal deliberations, that is, the "less deliberative everyday discussions among political activists, attentive publics and general publics; a form of political talk that is essential to the system's democratic character" (Conover & Searing, 2005, p. 270). Such everyday discussions occur in a broad variety of deliberative sites, at kitchen tables and in pubs as well as on social media. 7 To provide the theoretical background for our hypotheses about the variations in deliberative quality that we expect to observe between arenas 1-3, we first require a more detailed characterization of these arenas. For this we refer to three criteria which can be extracted from the respective lines of argument in the theoretical writings of systemic scholars: Dryzek (2000Dryzek ( , 2009Dryzek ( , 2016, , Habermas (1996), Owen Smith (2014), Parkinson and Mansbridge (2012), and Smith (2016). Although it may be impossible to offer a comprehensive list of features that distinguish between the real-world deliberations that occur in a system's different arenas, we suggest the following three essential criteria to characterize arenas 1-3.
Deliberative Procedures' Functional Role Within a Political System. Deliberative procedures can fulfill different functions. Deliberations are supposed to facilitate domination-free opinion-forming processes and to enhance citizen support, as well as to create mutual respect or epistemic benefits (e.g., Alonso, 2017;Estlund, 1997;Geissel & Newton, 2012;Morrell, 2005;Setälä, Grönlund, & Herne, 2008). For the purpose of this article, we cannot consider all of these different functions, but we assume that a crucial function that distinguishes deliberative procedures in different arenas is their power to make collectively binding decisions (Dryzek & Niemeyer, 2010, p. 11;Fraser, 1990, p. 75). Deliberations occurring in arena 1 have a constitutionally fixed power to make collectively binding decisions. In contrast, deliberations in arena 2 are primarily meant to contribute to opinion-and will-formation and to the aggregation of political interests, thereby preparing political decision making without being able to implement laws themselves. Deliberations in arena 3 have an even more mediated (although not necessarily less relevant) impact on collectively binding decision-making processes: being located at the periphery of the system, they are meant to enable citizens to detect and form authentic political preferences in their everyday communication practices (also see Curato, Hammond, & Min, 2018, p. 99). 8 These deliberations-and, more generally speaking, the communicative exchanges taking place in citizens' "lifeworld" enable citizens to form and articulate political opinions and therefore constitute a crucial resource for decisionoriented deliberations without having any immediate impact on the decisions themselves (Habermas, 1984(Habermas, , 1996; see also Fleuß, 2019, p. 128f.). Scholars may determine this role of a deliberative procedure in democratic decision making primarily by reference to constitutional documents and legal regulations: Is the respective deliberation ascribed a function within legislative procedures-either in the constitution or in other highranking laws or, for example, in founding documents? Do these documents ascribe a function in opinion-and will-formation to the deliberation? 9 The Inclusiveness of the Deliberative Procedure. Deliberations in arenas 1-3 differ with regards to the individuals who (typically) participate. Decision-oriented deliberations in arena 1, that is, in the system's center, tend to be dominated by educated, informed deliberators, such as parliamentarians, judges and other (semi-) professionals with comparatively high socioeconomic statuses. Deliberations in arenas 2 and 3 involve successively broader ranges of citizens (e.g., Conover & Searing, 2005, p. 278;Cook, Carpini, & Jacobs, 2007, p. 13f). In consequence, realworld deliberative procedures (located in arenas 1-3) differ significantly with regards to their participants' social backgrounds, education levels, genders, cognitive capacities, social skills and values (Conover & Searing, 2005, p. 278;Cook et al., 2007;Michels, 2011). 10 We suggest classifying deliberative sites with regards to this criterion by answering the following questions: Who is authorized to participate in respective deliberations? For example, are only officials, members of certain professions, or members of specific organizations admitted? If there are no formal restrictions with regards to the groups admitted, scholars should estimate if there are other informal mechanisms that hinder people from accessing a deliberation, such as (comparatively) high inhibition thresholds for participating in public plenary debates.
Degree of Institutionalization/Regulatory Density. Deliberations in arena 1-3 display crucial differences with regards to the degree of regulation through formal rules, behavioral, and social conventions or moderation. While arena 1 deliberations, such as parliamentary procedures, are usually structured by formal procedural rules, mass media communications and civil society deliberations (arena 2) are far less dominated by procedural norms (Bächtiger & Parkinson, 2018, p. 66;Habermas, 2018). Arena 3 deliberations-taking place at kitchen tables, in pubs and, generally speaking, in citizens' "life-world"-are prime examples of deliberations that lack any pre-structured procedural rules or a homogeneous set of established behavioral conventions (Habermas, 1996, p. 354). In classifying deliberative sites according to this criterion, scholars must take into account different kinds of "procedural norms" to answer the following questions: To what extent are deliberations regulated by statutory procedural requirements? Are there rigid social conventions and expectations that might be relevant to participants, for example, in terms of what "can be said" in public? And is the deliberative procedure moderated such that an external actor can effectively intervene if procedural, behavioral or social norms are violated?

Hypotheses: Deliberative Quality in Different System Arenas
Research Question 1 asks how the general deliberative quality of online participatory practices varies between different deliberative arenas. One general, but rather vague, expectation among systemic deliberative theorists is that deliberations in different arenas of a system are likely to display different qualities (see Premises 1-3). Federal court and parliamentary deliberations are prime examples of procedures that tend to conform to the high standards of classic deliberative theory, such as rationality and reciprocity (Bächtiger & Parkinson, 2018, p. 66;Christiano, 2012;Habermas, 2018;Jensen, 2003, pp. 349, 371f.;Pedrini, 2014;Pedrini, Bächtiger, & Steenbergen, 2013;Rawls, 1997;Steiner, Bächtiger, Spörndli, & Steenbergen, 2004). By contrast, informal political debates, for example in social media, are expected to deviate from these standards (Boswell & Corbett, 2017;Conover & Searing, 2005, p. 278;Mansbridge 1999;Pedrini 2014). Nevertheless, existing empirical findings of differing DQI scores among different deliberative procedures 11 have explored neither the variety of functions of the deliberative procedures in different contexts nor the different forms of communication used to fulfill these functions. This article, therefore, analyses two interpretations of this general claim, in reference to the two concepts of deliberation (as alluded to in the previous section): (a) The classic concept of deliberation. Deliberative procedures in different arenas tend to fulfill, to different degrees, deliberative theories' classic standards for highquality deliberation: while highly formalized deliberations in federal courts or constituted deliberative councils can be expected to conform with the standards of rational reason-giving spelled out in first-generation-deliberative theory, procedures in semi-formal and informal arenas can be expected to perform less well. (b) The tolerant concept of deliberation. Deliberations that occur in the different arenas differ in both their "deliberative performance" and in the forms of communication used by their participants: it is not only that deliberations in semi-formal and informal settings are "deficient" in their reason-giving procedures, but that their participants also use different styles of reasoning, for example by including emotional expression, storytelling, and humor Mansbridge et al., 2012;Young, 2002).
Our characterization of deliberative arenas (see Table 1) allows us derive more fine-grained and theory-driven expectations of the deliberative quality found in the different deliberative procedures of arenas 1-3. Our hypotheses summarize these expectations and are based on our characterizations of the arenas-that is, their respective functions within the political system, their inclusiveness, and their degrees of institutionalization. The rest of this subsection, therefore, lays out seven empirically testable hypotheses.
At the highest level of abstraction, the theoretical expectation spelled out in claim (a) can be summarized by the following hypothesis: H1: The highest level of deliberative quality, according to a classic concept of deliberation, will be found in arena 1, the lowest level will be found in arena 3; arena 2 will reach a medium level.
Research Questions 1.1 and 1.2 go into further detail with regards to the underlying concept of deliberation: RQ1.1 asks for variations in the deliberative quality between the arenas according to a classic concept of deliberation. RQ1.2, on the other hand, is focusing on differences in the proportion of alternative forms of communication.
To provide an answer to these questions, and with that a more differentiated assessment of these variations that we expected to predominate in arenas 1-3, we next seek to test six additional hypotheses (H2-H7).
Although the DQI has received significant attention and has been used in a number of studies, its overall index is not entirely suitable for this study since it was developed to analyze parliamentary debates. It, therefore, requires some adaptions to be applicable in online contexts. For the purposes of this study, we considered five classic standards of deliberative quality grouped into four dimensions-rationality, reciprocity, respect, and constructiveness-that are widely shared as elements of deliberation among deliberative theorists (Cohen, 1989;Gutmann & Thompson, 2009;Habermas, 1996). Furthermore, we also included alternative forms of communication that are elements referred to in

Rationality
A crucial measure of the quality of deliberation is the rationality displayed by its participants. We operationalized "rationality" here as a measure of the topic relevance of participants' claims and by the presence of supporting arguments. From a theoretical point of view, we expected that deliberations in arena 1 would display the highest degree of topic relevance while actors in arena 3 would display the lowest topic orientation, while a medium level of topic relevance will be reached in arena 2. This expectation is based on arguments frequently advanced by democratic theorists related to characteristics (a) and (b). First, we assume that discussions in deliberative procedures that are closely tied to making collectively binding decisions are likely to display a high degree of topic-relevant communication (characteristic a). To fulfill their decision-making function, parliamentary deliberations and formalized consultation platforms require participants to display a certain level of topic orientation (see Stromer-Galley & Martinson, 2009). Conversely, this assumption also suggests that the degree of topic relevance is likely to decrease when the deliberative procedures are not focused on making collectively binding decisions but instead fulfill other functions, for example, the inclusion of diverse preferences and establishment of mutual respect (Mansbridge et al., 2012, p. 22).
Second, we assume that the argument quality scores would differ between arenas, in particular, because of their different levels of inclusiveness (characteristic b). Deliberations in arena 1, such as parliamentary deliberations, tend to be less inclusive than semi-formal public deliberations in mass media (arena 2), let alone in everyday political discussions (arena 3). Systemic scholars have adopted an argument that was initially advanced by deliberative theory's feminist critics, and now largely agree (e.g., Curato et al., 2018, pp. 43-45) that classic standards for rational reason-giving tend to be fulfilled by certain groups of political actors, because "[t]he privileging of allegedly dispassionate speech styles […] often correlates with other differences of social privilege. The speech culture of white, middle class men tends to be more controlled, without significant gesture and expression or emotion" (Young, 2002, pp. 39, 56 passim). "Reasoned argumentation," accordingly, is expected to occur among well-educated (political) elites rather than in deliberations in the broader public sphere whose participants are comprised of actors from heterogeneous social, cultural, and educational backgrounds.
Since these theory-driven expectations of topic relevance and argumentation point in the same direction, we can summarize our expectations concerning the level of rationality in the following hypothesis: H2: The highest level of rationality will be found in arena 1 and the lowest level will be found in arena 3; deliberative practices in the semi-formal arena will reach a medium level in this dimension (arena 2).

Reciprocity
High-quality deliberation requires participants not only to take and give reasons, but also to connect and relate their arguments and claims to other participants' contributions (e.g., Gutmann & Thompson, 2002). At first glance, the extent to which deliberators refer to each others' comments does not seem to be closely associated with the deliberative procedures' function (a), inclusiveness (b), or regulatory density (c); rather, it seems likely to depend on institutional features, such as a platform's design or technical features (see e.g., Aragón, Gómez, & Kaltenbrunner, 2017), or on the nature of the topic discussed (Ziegele et al., 2014).
Although these features are not included in our characterization of deliberative arenas, the decision-making capacity of a deliberation (characteristic a), as well as the applicable formal rules and informal conventions (characteristic c), provide incentives for deliberators to refer to other participants' comments: in the case of deliberations with the power to make collectively binding decisions, deliberators are incentivized to create consensus, and referencing other speakers' contributions can help achieve this aim. This is also true regarding the impact of different degrees of regulatory density: the conventional and legal rules that structure deliberative procedures typically suggest and incentivize a communicative practice of reciprocity. We, therefore, stipulate that deliberative procedures characterized by explicit decision-making capacity and a higher degree of regulatory density will display higher levels of reciprocity: H3: Deliberations in arena 1 will display the highest level of reciprocity and deliberations in arena 3 will display the lowest level of reciprocity, with arena 2 reaching a medium level.

Respect
With regards to the degree of respectful communication in different arenas, deliberative scholars often assume that respectful and empathetic communication is easier among people who are alike (e.g., Mutz & Martin, 2001;Mutz, 2006, p. 106). Although creating respect and consensus across difference is a crucial goal of democratic deliberation (Dryzek, 2005;Ercan, 2017), a respectful style of communication is usually expected among homogeneous social groups and groups united by a common interest (Rosenberg, 2007, p. 355). The three arenas' characteristics do not explicitly refer to the level of group homogeneity or social integration among their participants; nevertheless, the arenas do differ with regards to their inclusiveness (characteristic b), and social, sociodemographic and economic inclusiveness can reasonably be expected to be negatively correlated with participants' homogeneity. Therefore, on the basis of characteristic b, we can expect that the level of respect displayed in deliberations is likely to decrease sequentially from arena 1 to arena 3. 12 In addition, the three arenas are characterized by differing regulatory density (c): while arena 1 deliberations are highly regulated by both legal norms and specific behavioral conventions, this regulatory density decreases in arena 2 and again in arena 3. Consequently, we can expect that socially desirable conduct-that is, avoiding aggressive communication-is most likely to be displayed in arena 1, with arenas 2 and 3 being progressively less likely to be dominated by respectful communication. This hypothesis is summarized as follows: H4: The level of respect displayed in deliberations will be highest in arena 1 and lowest in arena 3, with deliberations in arena 2 reaching a medium level.

Constructiveness
Habermas's (1984) original account of communicative action assumed that consensus-oriented communication predominantly occurred when actors' rationales and behavior were not guided by power struggles or strategic considerations. At the same time, a deliberative procedure's constructiveness-that is, the proposal of solutions or compromises-is primarily promoted by two characteristics of deliberative procedures: its function (characteristic a) and its regulatory density (characteristic c). For example, procedural regulations may require deliberators (e.g., in parliamentary procedures or in federal courts) to reach consensus or compromise. However, the third characteristic of deliberative procedures could also plausibly contribute to its constructiveness: since highly divergent initial preferences can make it harder (and less likely) to reach consensus, we hypothesize that deliberations among participants with homogeneous backgrounds, attitudes, and preferences (see characteristic b) may benefit from these starting conditions. In line with the other characteristics' potential impacts, this would suggest that more inclusive deliberative procedures might, in fact, display lower levels of constructiveness. Based on our characterization of the three arenas, we, therefore, formed the following hypothesis: H5: The level of constructiveness will be highest in arena 1 and lowest in arena 3, with deliberations in arena 2 reaching a medium level.
The five hypotheses (H1-H5) developed so far refer essentially to indicators that are also assessed by classic DQI measurements (Steenbergen et al., 2003). However, as outlined above, a systemic approach to deliberation acknowledges the plurality and heterogeneity of the different forms of communication that are expected to be present to different degrees in different arenas of a deliberative system. In this article, we, therefore, refer to additional measurements of deliberative quality that systemic scholars frequently take into account: storytelling, and the expression of positive and negative emotions.

Storytelling
Employing narratives, or personal storytelling, to justify a position is frequently acknowledged as a way of engaging in deliberation (e.g., Dryzek, 2000;Polletta & Lee, 2006;Young, 2002). However, theorists typically expect storytelling to occur more in informal than in highly formal settings. From the systemic perspective that forms our point of departure, deliberations in arena 3 are not intended to create collectively binding decisions but rather to help citizens identify and form authentic preferences (characteristic a). These deliberations tend to be located in everyday "lifeworld" settings and constitute a "resource" for public opinion-and will-formation in civil society organizations or mass media communications (arena 2) (Fleuß, 2019, p. 128f.). These different functional roles suggest that citizens in arenas 2 and 3 are more likely to rely on situated communication, that is, to employ narrative as a communicative means. In addition, deliberations that are less regulated by procedural norms are more open-and therefore more prone-to the occurrence of narratives (characteristic b). Moreover, narrative communication is generally considered to be more frequently found among less educated, less professionalized deliberators (Polletta & Gardner, 2018). Taken together, these assumptions about the factors promoting (or hindering) a narrative deliberative style suggest the following hypothesis: H6: Deliberations in arena 3 will display the highest degree of storytelling, with successively lower degrees in arenas 2 and 1.

Expressions of Emotion
Systemic approaches to communication acknowledge the contributions of a variety of forms of communication to systems' overall deliberative quality. Accordingly, these approaches are able to answer the challenges of difference democrats such as Iris Marion Young who claim that the "privileging of allegedly dispassionate speech styles" is likely to give an advantage to Western, male and highly educated political and social elites (Polletta & Gardner, 2018;Sanders, 1997;Young, 2002, p. 39). Arenas that are not limited to such elites but include a more pluralistic group of deliberators are therefore more likely to include expressions of emotion (characteristic c); since arenas 2 and 3 are more inclusive than arena 1, it can, therefore, be expected that expressions of emotion are more likely to occur in arenas 2 and 3. In addition, the respective arenas' functions and their corresponding procedural rules may suggest that arena 3 is more prone to the occurrence of emotional expressions since everyday political talk often refers to private or personal matters and is situated in contexts where deliberators also have personal relationships with each other (Habermas, 1996, p. 354). At the same time, mass media deliberations may also be expected to display a high share of emotional expressions, although for a different reason: news value theory suggests, in brief, that journalists select topics with a higher potential to evoke emotion, which tends to attract greater attention from readers or viewers (e.g., Eilders, 1997). Accordingly, we formulated our seventh and final hypothesis: H7: Deliberations in arenas 2 and 3 will display a significantly higher degree of emotional expression than deliberations in arena 1.

Methodology
This section explains how we tested our hypotheses regarding the deliberative quality and the presence of storytelling and emotional expressions in sample online deliberations that we assort to arenas 1-3. 13 For arena 1, we analyze a governmentrun online consultation platform that was supposed to perform an advisory function for legislative procedures (ThF-Gesetz, 2014, section 4) (characteristic a). 14 It requires participants to be formally enrolled (characteristic b) and provides comparatively precise guidelines for the deliberative procedure (characteristic c).
For arena 2, we analyze comments that were posted under news articles on three official German mass media platforms. German Basic Law generally ascribes mass media a significant role in opinion-and will-formation (Basic Law, 2019, section 5; see Sarcinelli, 2011, pp. 62-64) (characteristic a). As the debates took place on the website of elite national news sources, we take inhibition thresholds for participating to be higher than in arena 3 deliberations (characteristic b). In addition, moderators were able to intervene in these debates, for example, by deleting "inappropriate" contributions (characteristic c). For arena 3, we analyzed communications on a civil society Facebook community page. These communications were not closely associated with democratic decision-making procedures, but primarily enabled citizens to detect and form authentic preferences (characteristic a). These Facebook debates were openly accessible (characteristic b) and were not moderated (or otherwise "regulated") (characteristic c).
Although further research will have to inquire if our findings generalize to all arenas 1-3 in deliberative systems, our research design is useful for an exploratory analysis as it allows us to hold constant a range of variables: All of the deliberations dealt with the broader topic of refugee integration (and possible problems of thereof) in Germany; the topic, and its associated complexity and level of controversy, were therefore constant across all cases, although the platforms' primary functions, inclusiveness, and regulatory density differed across the cases (see Table 1). The news articles discussed different aspects of this topic, but, as with the other two arenas, the comments revolved more narrowly around a forthcoming political decision to accommodate refugees on the Tempelhofer Feld, 15 subject to participatory planning processes conducted in Berlin in 2015.
The systemic approach to deliberation suggests that the deliberative quality of the online discussions in each of the arenas should vary, with regards to both their overall quality and their performances in the different sub-measures of quality outlined above. To test our hypotheses, we began by conducting a quantitative content analysis of all user comments in our sample deliberations. The final sample used for data analysis included comments on 47 posts on the consultation platform, 10 news articles across the three mass media platforms and 42 posts on Facebook. The selected news articles were published in December 2015 on three German news media websites, Süddeutsche Zeitung (SZ) Online, Welt Online, and Zeit Online, which are elite national news sources that are widely considered to be opinion leaders in Germany (Jarren & Vogel, 2011), and their online platforms are listed among the most popular German online resources (AGOF, 2015).
The first step of the sampling process consisted of the random selection of 10 news articles, from which all 1,747 user comments were copied, entered into a database and numbered chronologically. In the second step, up to 100 sequential comments were randomly selected from each of the 10 articles, leading to a total sample of 794 comments to be analyzed in arena 2. We also collected all of the user comments on the 47 posts from the consultation platform (arena 1; N = 603 comments) and on the 42 posts on the Facebook community page (arena 3; N = 767 comments). Ultimately, a final sample of 2,164 user comments was analyzed across the three platforms.
We considered four dimensions of deliberative quality-rationality, reciprocity, respect, and constructiveness-as well as the alternative forms of communication of storytelling and expression of emotion ( Table 2). All variables were coded dichotomously. The comments were analyzed by a team of five trained coders. The intercoder reliability was tested for all analyzed categories and a Krippendorff's α value of 0.67 was considered the minimum value to allow tentative conclusions (Krippendorff, 2004). All variables had a Krippendorff's α > 0.70. In order to be able to compare the three analyzed arenas regarding the classic concept of deliberative quality, we applied a revised version of the DQI (Steenbergen et al., 2003). Like the DQI, the index we used is a theory-driven additive index based on discourse ethics (Habermas, 1990(Habermas, , 1996. For a comparative analysis of online deliberative arenas, we computed a five-component index measuring classic deliberative quality with the indicators topic relevance, argumentation, reciprocity, respect, and constructiveness. The index values ranged from 0 to 5, with 5 indicating the highest level of classic deliberative quality.

Findings
Drawing on our theoretical framework, we extracted expectations about the level of deliberative quality of each of the different online deliberation procedures in arenas 1-3. First, we tested H1, which expected to find a systematic variation in the level of deliberative quality across the three arenas. Our data supported this hypothesis: the highest overall deliberative quality was associated with the deliberations allocated to arena 1 (M = 3.08; see Table 2), a lower level of deliberative quality was associated with arena 2 (M = 2.88), and the lowest level was in arena 3 (M = 2.45). We used t-tests to compare the means of the three arenas pairwise and found that the mean score for arena 1 was significantly higher than the mean score for arena 2 (t = 4.02, p < 0.001) and the mean score for arena 2 was significantly higher than the mean score for arena 3 (t = 8.90, p < 0.001). Accordingly, we can confirm H1.
RQ1.1 also sought to understand the differences between arenas 1-3 for each of the four dimensions of classic measurements of deliberative quality, and the previous section laid out our hypotheses for each of these dimensions across the three arenas. We first focused on the dimension of rationality. In line with the differences in the degree of formalization of the deliberative arenas, H2 expected that the highest level of rationality would be found in arena 1 and the lowest level would be found in arena 3. As shown in Table 3, our results confirmed exactly this pattern regarding the presence of argumentation in the deliberations associated with each arena in our sample: the highest level of argumentation was found in arena 1 (68 percent), followed by arena 2 (46 percent), and finally arena 3 (36 percent). However, for our second rationality measure, topic relevance, we found a slightly different pattern: very high levels of topic relevance in arenas 1 (99 percent) and 3 (95 percent) and a medium level of topic relevance in arena 2 (75 percent). H2 was therefore only partly supported.
In addition, the reciprocity levels of the three arenas demonstrated a different pattern from the one associated with the other deliberative quality characteristics (H3). We found that reciprocity levels were highest on the mass media platforms (arena 2), with 77 percent of comments addressing other comments. Lower levels of reciprocity were found in the deliberations on the citizen consultation platform Note: The index is a revised version of the Discourse Quality Index (Steenbergen et al., 2003) including five components (topic relevance, argumentation, reciprocity, respect, and constructiveness); the minimum score for the index is 0 and the maximum is 5.
(arena 1, 37 percent) and on Facebook (arena 3, 29 percent). H3 was therefore rejected. H4 expected that the level of respect would decrease from arena 1 to 3. Our data confirmed this assumption, with the highest proportion of respectful comments found on the highly regulated consultation forum (99 percent), followed by the mass media platforms (87 percent), and Facebook (83 percent). Therefore, our findings support H4.
H5 anticipated that the level of constructiveness would decrease from arena 1 to 3. The data shows another picture: compared to the other two arenas the highest level of constructiveness was found in the consultation platform (5 percent), a lower level was found on the mass media platforms and on Facebook (both 3 percent). However, the low percentages suggest that constructiveness is a rare phenomenon in all three analyzed arenas. Consequently, H5 has to be rejected.
With regard to alternative forms of communication, our hypotheses H6 and H7 expected that the different arenas would display different degrees of storytelling and emotional communication. In H6 we expected that deliberations in arena 3 would display the highest degree of storytelling compared to arenas 1 and 2. The data instead showed that storytelling was highest in arena 1 (31 percent), followed closely by arena 3 (28 percent) and lowest in arena 2 (3 percent). H6 was therefore rejected.
Concerning H7 we separately analyzed the degree of expressions of both positive and negative emotions. The pattern for positive emotions was very similar to that found in H6: as shown in Table 4, the highest proportion of expressions of positive emotions was found in arena 1 (18 percent), followed by arena 3 (10 percent) and lowest in arena 2 (two percent). In contrast, the highest proportion of expressions of negative emotions was found in arena 3 (48 percent), followed by arena 2 (23 percent) and the lowest degree was found in arena 1 (12 percent). Consequently, our findings only support H7 with regards to the expression of negative emotions but not for positive emotions.

Summary and Discussion
By comparing and evaluating examples of online deliberation, this study employed a systemic approach as a conceptual point of departure to analyze deliberative democracy (Dryzek, 2000;Mansbridge et al., 2012). We used the plurality of deliberative sites (Ercan & Dryzek, 2015) to test the assumption that the deliberative quality of communication varies between the different types of deliberative arenas within a deliberative system (Conover & Searing, 2005). For our empirical analysis, we distinguished between three deliberative arenas that were characterized by (a) their functions within the deliberative system, (b) their inclusiveness, and (c) their degree of regulatory density. These core characteristics of deliberative arenas served as a basis for our hypotheses H1-H7.
In our first hypothesis (H1), we expected to find varying levels of deliberative quality (according to a classic concept of deliberation). Our empirical findings based on a quantitative content analysis of a consultation platform (arena 1), three mass media platforms (arena 2), and a social media community page (arena 3) broadly supported this. The overall level of deliberative quality was highest on the online consultation platform, while the mass media platforms showed a moderate level of deliberative quality, and the social media platform showed the lowest level of deliberative quality. H1 was accordingly confirmed.
However, regarding the different dimensions of deliberative quality, we found some support for our hypotheses but also some unexpected deviations. For instance, as we had expected, the levels of rationality (H2) and respect (H4) were highest in arena 1 (the consultation platform), followed by arena 2 (mass media platforms) and eventually arena 3 (social media platform). However, the level of reciprocity (H3) was unexpectedly higher in the user discussions on the mass media platforms (arena 2) than on the consultation platform (arena 1), although, as expected, it was lowest on the social media platform (arena 3). Furthermore, we found relatively low levels of constructiveness in all arenas (H5), and although the highest level of constructiveness did occur on the consultation platform, none of the differences between the three arenas were significant. H5 was therefore only partially supported. In sum, the results met our expectations in the dimensions of rationality and respect, but reciprocity and constructiveness demonstrated unanticipated patterns.
RQ1.2 addressed variations in the number of personal storytelling and of expressions of emotion in arenas 1-3, and our hypotheses H6 and H7 assumed that deliberations in less formal arenas were generally more likely to display high degrees of these alternative forms of communication. The results did not conform to our theoretical expectations. H6 assumed that arena 3 would display the highest incidence of storytelling; instead, although a high number of personal accounts were found on Facebook, the consultation platform showed an even higher proportion. H7 anticipated higher numbers of emotional expression overall in arenas 2 and 3, but our results showed different patterns for positive and negative emotions. The highest proportion of positive emotions were found in arena 1 (consultation), followed by arena 3 (social media), and eventually by arena 2 (mass media). However, the expression of negative emotions was in line with our hypothesis: The highest proportion of negative emotions was found on Facebook, a medium-high amount was found on mass media platforms, and the lowest amount was found on the consultation platform.
Against the backdrop of deliberative democracy theory, these results leave us with an ambivalent picture: We were able to confirm some of our theoretical expectations that were based on our systematization of deliberative arenas and their characteristics, with regards to variances in specific aspects of deliberative quality. However, we were not able to confirm all hypotheses and even our supportive results still need to be confirmed in further studies, including other examples of online deliberation practices and sites. The varying empirical levels of reciprocity remain a challenge for scholars working with a classic understanding of deliberation. As in this study, previous studies have found surprisingly high levels of reciprocal comments on mass media websites compared with discussion forums or consultation platforms (see Esau et al., 2017). Different factors could be responsible for the high amount of reciprocal user comments beneath news articles. One possible direction for future studies would be a better understanding of how the deliberative quality of journalistic articles influences reciprocity in comment sections. Finally, our theoretically grounded expectation of a greater occurrence of emotional expressions and personal storytelling in the more informal arenas (i.e., arenas 2 and 3) was not confirmed by our empirical analysis.
One major limitation of the study regards the usage of an aggregative index that combines different measurements for deliberative quality. Previous comparative research has shown that deliberation processes are not unidimensional phenomena (also see Bächtiger, Gerber, & Fournier-Tombs, forthcoming). Among other problems arising from this, index-based comparisons aggregating different deliberative quality criteria have to be treated carefully and therefore can only be a rough benchmark. While the topic was held constant (integration of refugees in Germany), we were not able to control the framing of the topic. Such possible variations due to different frames are not shown in this study and should be considered in future studies.
Our results could be explained by the unique characteristics of the topic under discussion. Immigration policy is a highly controversial subject that may be especially prone to negative emotional communication. Moreover, even though the overall discussion topic remained the same throughout the different arenas, slight framing differences may have had outsized effects. In addition, our theoretical explanation for the varying degrees of constructiveness that we anticipated in arenas 1-3 relied in part on the assumption that participants in arenas 2 and 3 would come from more heterogenous backgrounds than participants in arena 1 (see H5). However, the social media community page chosen for analysis in our empirical study did not conform to this assumption because its commenters largely started with a shared political goal. Further research is also needed to examine the impact that online deliberations have on the communicative means applied by subsequent deliberators.
Our research design and the case selection allow us to hold constant a broad range of variables that are usually taken to impact deliberative quality. Therefore, it allows tentative conclusions with regards to the impact of the arenas' characteristics on deliberative quality. As we are testing general theory-driven hypotheses with a small amount of cases, our findings can, however, not put us in the position to conclusively confirm or to suggest far-reaching adaptations of systemic deliberative theory. 16 In addition, empirical examinations of Premises 1 and 2 (as conducted in this study) are necessary, but not sufficient for a comprehensive systemic analysis. By the same token, recommendations for improving a system's deliberative quality presuppose the analysis conducted in this article, but must also consider the interplay between different deliberative sites (Premise 3, see Fleuß & Helbig, forthcoming). 17 Nonetheless, the findings of this study enable us to pinpoint some crucial gaps in contemporary deliberative theory that should be addressed in future collaborations of theoretical and empirical researchers. Systemic scholars seek to provide an integrative analytical framework capable of analyzing and evaluating a broad range of communicative practices that employ both rational reason-giving and alternative forms of communication (see Premise 1). However, the precise role of emotional communication in different deliberative arenas has not yet been sufficiently analyzed from the perspective of the systemic approach. Integrating empirical findings, systemic scholars might stipulate that emotional communication fulfills different functions for democratic decision making in different arenas of a deliberative system. For example, negative emotions in the sample deliberations were often used to mobilize likeminded readers to also participate in the discussion, whereas positive emotions were primarily used in support of other participants' statements. On the basis of such refinements, scholars might be able to explain the puzzling findings with regards to positive and negative emotional expressions that we also observed in this study.
Moreover, the communicative practices analyzed by systemic scholars occur in heterogeneous sites of a deliberative system (see Premise 2). In contemporary societies, online sites of public communication and deliberation are becoming increasingly important, but deliberative theory was originally developed for offline deliberations, and crucial characteristics of online communication have therefore likely eluded this framework to date.
To fill these gaps and provide an integrative framework that is capable of analyzing deliberative processes in all parts of the deliberative system, systemic scholars are challenged to fine-tune their theoretical explanations and to integrate a broader range of empirical analyses in order to understand the mechanisms that underlie online deliberation processes.  Elstub et al. (2016, p. 141f.) distinguish between four generations of deliberative democratic theory.
First-generation theorists developed "an explicitly normative theory on the rational, impartial jus-