Behavioral Experiments for Assessing the Abstract Argumentation Semantics of Reinstatement

Authors


should be sent to Iyad Rahwan, Masdar Institute of Science & Technology, P. O. Box 54224, Abu Dhabi, United Arab Emirates. E-mail: irahwan@acm.org

Abstract

Argumentation is a very fertile area of research in Artificial Intelligence, and various semantics have been developed to predict when an argument can be accepted, depending on the abstract structure of its defeaters and defenders. When these semantics make conflicting predictions, theoretical arbitration typically relies on ad hoc examples and normative intuition about what prediction ought to be the correct one. We advocate a complementary, descriptive-experimental method, based on the collection of behavioral data about the way human reasoners handle these critical cases. We report two studies applying this method to the case of reinstatement (both in its simple and floating forms). Results speak for the cognitive plausibility of reinstatement and yet show that it does not yield the full expected recovery of the attacked argument. Furthermore, results show that floating reinstatement yields comparable effects to that of simple reinstatement, thus arguing in favor of preferred argumentation semantics, rather than grounded argumentation semantics. Besides their theoretical value for validating and inspiring argumentation semantics, these results have applied value for developing artificial agents meant to argue with human users.

1. Introduction

Understanding human reasoning and decision making is a key question in cognitive science. There is considerable literature on understanding whether and why people deviate from formal, normative models of deductive reasoning (Bonnefon, 2009; Evans & Over, 2004; Johnson-Laird & Byrne, 2002), decision-theoretic reasoning (Shafir & LeBoeuf, 2002; Tversky & Kahneman, 1981), and defeasible reasoning (Stenning & Lambalgen, 2008). The latter involves a basic form of reasoning in the presence of conflicting information, which can also be referred to as argumentation (van Eemeren, Grootendorst, & Henkemans, 1996).

Argumentation has become a very fertile area of research in Artificial Intelligence, as illustrated by recent volumes and journal special issues (Bench-Capon & Dunne, 2007; Besnard & Hunter, 2008; Rahwan & McBurney, 2007; Rahwan & Simari, 2009). A highly influential framework for studying argumentation-based reasoning was introduced by Dung (1995). An argumentation framework is simply a pair inline image where inline image is a set of arguments and inline image is a defeat relation between arguments. This approach focuses on the defeat relations between arguments, leaving aside their origin or their internal structure. Various semantics have attempted to characterize ‘‘correct’’ argumentation-based reasoning within such a framework. Given an argumentation framework (that can take the form of a graph), a semantics assigns a status to each argument, that is, it determines whether the argument can be accepted.1

These semantics typically come from a normative perspective, which relies on intuition and ad hoc hypothetical examples as to what constitutes correct reasoning. We will argue that there are limits to relying solely on this approach, and we will advocate the use of psychological experiments as a methodological tool for informing and validating intuitions about argumentation-based reasoning.

In this article, we apply this experimental method to the problem of reinstatement, both in its simple and floating form. All classical semantics deem simple reinstatement to be acceptable, but different semantics have different takes on the special case of floating reinstatement. We will show that psychological experiments can help evaluate these various semantics, and they can provide unique insights even when all formal semantics are in agreement. Not only these insights can inform current and future semantics, but they are relevant to the design of software agents that can argue persuasively with humans, or provide reliable support to human evaluation of arguments (e.g., on top of argument diagrammating tools).

In the next section, we offer a brief reminder of Dung's abstract theory of argumentation, focusing on our examples of choice, simple and floating reinstatement. Then, we discuss how argumentation semantics are typically evaluated in the Artificial Intelligence literature, and we motivate the need for an empirical perspective. We then report two empirical studies investigating simple and floating reinstatement, respectively.

2. Abstract argumentation frameworks

In this section, we summarize key elements of abstract argumentation frameworks. This section contains technical background only, whose outline is the following. Fig. 1 displays the canonical graph of simple reinstatement, whereas Fig. 2 displays the canonical graph of floating reinstatement. The main question is, in both cases, whether A can be accepted. For simple reinstatement, A is accepted by preferred as well as grounded semantics. For floating reinstatement, A is not accepted by grounded semantics, but it is accepted by preferred semantics. Additionally, preferred semantics also accept C and D in the (formally defined) ‘‘credulous’’ sense, but not in the ‘‘sceptical’’ sense.

Figure 1.

 The canonical graph of defeat and simple reinstatement. Argument A is defeated by argument B, which is in turn defeated by argument C.

Figure 2.

 The canonical graph of defeat and floating reinstatement. Argument A is defeated by B, which is itself defeated by C as well as D, although C and D are mutual defeaters.

We now lay bare the technical background required to arrive at these conclusions. In the following, we adopt the common assumption that argument sets are finite, and we begin with Dung's (1995) abstract definition of an argumentation framework.

Definition 1 (Argumentation framework):  An argumentation framework is a pair inline image where inline image is a set of arguments and inline image is a defeat relation. An argument αdefeats an argument β iff (α,β) ∈ ⇀, also written αβ.

An argumentation framework can be represented as a directed graph in which vertices are arguments and directed arcs characterize defeat among arguments.

The directed graphs displayed in Figs. 1 and 2 will be our running examples all through the article. These two graphs display the canonical forms of simple and floating reinstatement, respectively. As it will appear in the course of this section, the critical issue with these examples is whether argument A can be accepted in spite of being defeated by argument B.

Example 1:  The graph in Fig. 1 (simple reinstatement) consists of three arguments A, B, C, and features two defeat relations: BA and CB. The graph in Fig. 2 (floating reinstatement) consists of four arguments A, B, C, and D, and features five defeat relations: BA, CB, DB, CD, and DC.

Note that each node is a complete argument: that is, a premise as well as a conclusion. The arrows between the nodes represent defeats among arguments. To understand what actual arguments following these graph structures look like, consider the following three arguments that follow the simple reinstatement structure in Fig. 1.

  • (A) Mary does not limit her phone usage. Therefore, Mary has a large phone bill.
  • (B) Mary has a speech disorder. Therefore, Mary limits her phone usage.
  • (C) Mary is a singer. Therefore, Mary does not have a speech disorder.

Clearly, argument (B) is an attempt to defeat argument (A) by undermining the latter's main premise—that is, argument (B) concludes that Mary limits her phone usage, negating (A)’s premise that she does not do so. In a similar fashion, argument (C) defeats argument (B) itself by undermining (B)’s premise.

There are many ways to define defeat (Rahwan & Simari, 2009). To simplify the reasoning problem, we opted to go with an explicit and simple notion of defeat: The defeater's conclusion explicitly negates the defeated argument's premise. This so-called undercutting defeat also insures that the defeats are not symmetric.

The following natural language arguments follow the floating reinstatement structure shown in Fig. 2.

  • (A) Cody does not fly. Therefore, Cody is unable to escape by flying.
  • (B) Cody is a bird. Therefore, Cody flies.
  • (C) Cody is a rabbit. Therefore, Cody is not a bird.
  • (D) Cody is a cat. Therefore, Cody is not a bird.

Note here that (B) defeats (A) as above. Both (C) and (D) defeat (B) by undercutting its premise that Cody is a bird. However, (C) and (D) mutually defeat each other, as their conclusions are contradictory (so-called rebutting defeat).

We now need to define the two fundamental notions of conflict-freedom and defence. First, we introduce the notations S+ and α. For a given set S of arguments, S+ is the set of arguments that are defeated by the arguments in S. Formally, inline image. Conversely, for a given argument α, the set α is the set of all arguments that defeat α. Formally, inline image.

Definition 2 (Conflict-freedom):  Let inline image be an argumentation framework and let Sinline image. S is conflict-free iff S ∩ S+ = ∅.

In other terms, a set of arguments is conflict free if and only if no argument in that set defeats another.

Definition 3 (Defence):  Let inline image be an argumentation framework, let Sinline image, and let inline image. S defends α if and only if αS+. We also say that argument α is acceptable with respect to S.

In other terms, a set of arguments defends a given argument if and only if it defeats all its defeaters.

Example 2:  In the graph displayed in Fig. 1, the set {A,C} is conflict free, but the set {A,B} is not, and neither is the set {B,C}. Because the set {C} defeats all the defeaters of A, we can say that the set {C} defends argument A. In the graph displayed in Fig. 2, the only conflict-free sets (apart from trivial ones containing single arguments) are {A,C} and {A,D}. Either one of the sets, {C}, {D}, or {C,D}, defends A against all its defeaters.

We now define the characteristic function of an argumentation framework.

Definition 4 (Characteristic function):  Let inline image be an argumentation framework. The characteristic function of AF is inline image: inline imageinline image such that, given inline image, we have inline image.

Applied to an argument set S, the characteristic function returns the set of all arguments defended by S. Because we are only dealing in this article with one argumentation framework at a time, we will use the notation inline image instead of inline image.

We now turn to various so-called extensions that can characterize the collective acceptability of a set of arguments. Essentially, these extensions provide different possible ways to group self-defending arguments together. These extensions will be used subsequently to define the argument evaluation criteria that we study empirically in this paper.

Definition 5 (Complete/grounded/preferred extensions):  Let S be a conflict-free set of arguments in framework inline image.

  • S is a complete extension iff inline image.
  • S is a grounded extension iff it is the minimal complete extension with respect to set inclusion.
  • S is a preferred extension iff it is a maximal complete extension with respect to set inclusion.

S is a complete extension if and only if all arguments defended by S are also in S (that is, if S is a fixed point of the operator inline image). There may be more than one complete extension, each corresponding to a particular consistent and self-defending viewpoint.

Example 3:  In the graph displayed in Fig. 1, the set {C} is not a complete extension, because it defends A without including it. The set {B} is not a complete extension because it includes B without defending it against C—see Fig. 3. The only complete extension is {A,C}. The graph displayed in Fig. 2 has two complete extensions, {A, C} and {A, D}—see Fig. 4.

Figure 3.

 Single (complete, grounded, and preferred) extension in simple reinstatement. Accepted arguments are shaded.

Figure 4.

 The two (complete, preferred) extensions in floating reinstatement. Accepted arguments are shaded.

A grounded extension contains all the arguments in the graph that are not defeated, as well as all the arguments which are defended directly or indirectly by nondefeated arguments. This can be seen as a noncommittal view (characterized by the least fixed point of inline image). As such, there always exists a unique grounded extension.

More intuitively, computing arguments in the grounded extension can be seen as a process of labeling nodes of the graph. First, nodes that have no defeaters are labeled ‘‘undefeated’’ (and included in the extension) and the nodes attacked by them are labeled ‘‘defeated’’ (and discarded of the extension). Then, all labeled arguments are suppressed and the process is repeated on the resulting subgraph, and so forth. If no initial, undefeated node can be found for some iteration, all unlabeled nodes are labeled as ‘‘defeated’’ and the process is terminated.

Example 4:  The graph displayed in Fig. 1 has only one complete extension, {A, C}, which is also its grounded extension. The graph displayed in Fig. 2 has two complete extensions {A, C} and {A, D}, but none of this is the grounded extension, because there is no node in the graph that is initially undefeated. In that case, the grounded extension is the empty set.

A preferred extension is a bolder, more committed position that cannot be extended (by accepting more arguments) without causing inconsistency. Thus, a preferred extension can be thought of as a maximal consistent set of hypotheses. There may be multiple preferred extensions, and the grounded extension is included in all of them.

Example 5:  The graph displayed in Fig. 1 has only one complete extension, {A, C}, which is also a preferred extension. The graph displayed in Fig. 2 has two complete extensions {A, C} and {A, D}, and both qualify as preferred extensions.

Now that we have defined various semantics that identify the extensions of an argument graph, we can at last define the status of an individual argument within the graph, that is, we can define criteria for accepting or not each individual argument. The main question in this paper is whether people evaluate a reinstated argument sceptically or credulously in accordance with the definition below.

Definition 6 (Argument status):  Let inline image be an argumentation framework, and inline image its extensions under a given semantics. Let inline image.

  • α is accepted in the sceptical sense iff inline image, inline image with i = 1,…,n.
  • α is accepted in the credulous sense iff inline image such that inline image.
  • α is rejected iff inline image such that inline image.

Under the grounded semantics, any argument that belongs to the unique grounded extension is accepted both in the credulous and the sceptical sense, and any argument that does not belong to the unique grounded extension is rejected. Under the preferred semantics, an argument is sceptically accepted if it belongs to all preferred extensions; but it can also be credulously accepted if it belongs to at least one preferred extension. If an argument is neither sceptically nor credulously accepted, it is rejected.

Example 6:  The graph displayed in Fig. 1 has only one complete extension, {A, C}, which is grounded as well as preferred. As a consequence, arguments A and C are accepted by grounded as well as preferred semantics, both in the credulous and sceptical sense. The graph displayed in Fig. 2 has an empty grounded extension, which means that no argument should be accepted under a grounded semantics. Under a preferred semantics, though, two extensions are identified, {A, C} and {A, D}. From these extensions, only A can be accepted in a sceptical sense, but A, C, and D can all be accepted in a credulous sense.

3. What validates a semantics?

As established in the previous section, different semantics can have different takes on which arguments can be accepted within a given argumentation framework. The question then arises of evaluating the different claims made by different semantics as to what constitutes an acceptable argument. In this section, we discuss this issue in the broader context of the general sources of inspiration and validation found for these semantics in the formal argumentation literature. We discuss in turn the example-based approach, the principle-based approach, and lastly the experiment-based approach that we suggest needs more attention from the Artificial Intelligence community.

3.1. The example-based approach

Most semantics for argumentation-based reasoning in Artificial Intelligence are based on intuition as to what constitutes correct reasoning. A typical research article presents scenarios that can be hypothetical or real (e.g., from the legal domain), and that correspond to one or several argument structures (e.g., floating reinstatement). The proposed semantics is then shown to draw intuitively satisfying conclusions. The difficulty, then, is that one is often able to construct other examples with the same logical structure, in which the proposed semantics draws counter-intuitive conclusions. For example, Horty (2002) famously devoted a whole paper to demonstrate counter-intuitive results with floating conclusions in default reasoning (see also Bonnefon, 2004).

Such counter-intuitive results motivate work on new semantic criteria to capture the novel examples, and the process repeats, examples always being the main tool for comparing semantics with one another. This example-based approach (to borrow a term from Baroni & Giacomin, 2007) was, for example, the inspiration for the CF2 semantics (Baroni, Giacomin, & Guida, 2005), dealing with odd-length cycles examples that were problematic for preferred semantics; or for the semi-stable semantics (Caminada, 2006b), dealing with cases in which no stable extension exists, and are shown to have guaranteed existence.

Baroni and Giacomin (2007) made a compelling case for the limitations of the example-based approach, noting in particular that even in relatively simple examples, there might not be a consensual intuition on what should be the correct conclusion. In parallel, Prakken (2002) observed that intuitions about given examples were helpful for generating new investigations, but less helpful as critical tests between different semantics. This recognized difficulty in relying on intuition alone as the benchmark for designing and evaluating argumentation semantics motivated a number of authors to advocate a more systematic approach to which we now turn.

3.2. The principle-based approach

To overcome the limitations of the example-based approach, a number of authors recently advocated a more systematic, axiomatic, principle-based approach (e.g., Baroni & Giacomin, 2007; Caminada & Amgoud, 2007). In this approach, alternative semantics are evaluated by analyzing whether they satisfy certain principles, or quality postulates.

Baroni and Giacomin (2007) offered, for example, the reinstatement criterion, according to which an argument must be included in any extension that reinstates it, and directionality criterion which requires that an argument's status should only be affected by the status of its defeaters. The Baroni and Giacomin (2007) article offers many other interesting criteria to provide a comprehensive and systematic comparison between abstract argumentation semantics. In parallel, Caminada (2006a) provided postulates for the notion of reinstatement, in order to characterize the labeling of arguments in an argument graph (in, out, and undecided). One postulate states that an argument must be ‘‘in’’ if and only if all of its defeaters are ‘‘out.’’ Another postulate states that an argument must be ‘‘out’’ if and only if at least one of its defeaters is ‘‘in.’’ This enabled Caminada to characterize different semantics by examining the kinds of labelings they allowed.

The principle-based approach provides a significant improvement over the basic example-based approach, as it enables claims that transcend individual examples and characterize semantics more generally. The source of the general postulates, however, is still the researcher's intuition as to what correct reasoning ought to be. In sum, most of the extent validation of various argumentation semantics, example-based or principle-based, relies on normative claims based on intuition. We now suggest that this normative-intuitive perspective could be adequately complemented with descriptive, experimental evidence about how people actually reason from conflicting arguments.

3.3. The experiment-based approach

There is a growing concern within the Artificial Intelligence community that logicians and computer scientists ought to give serious attention to cognitive plausibility when assessing formal models of reasoning, argumentation, and decision making. For example, Benthem (2008) strongly supports the rise of a new psychologism in logic at large, arguing that although logicians and computer scientists have tended to go by intuition and anecdotal evidence, formal theories can be modified under pressure from evidence obtained through careful experimental design. In the context of epistemic logic, Pietarinen (2003) argues for the important role of empirical findings from cognitive science in revising our logical conceptions of knowledge and belief, commenting that the interplay between logic and cognition is likely to reach increasingly wider and become increasingly prominent.

Pelletier and Elio (1997, 2005) also argued extensively for the importance of experimental data when formalizing default and inheritance reasoning, arguing that default reasoning is particularly psychologistic in that it is defined by what people do. Their own results have been complemented by a dynamic experimental literature consisting of controlled tests of human default reasoning (e.g., Benferhat, Bonnefon, & da Silva Neves, 2005; Bonnefon, Da Silva Neves, Dubois, & Prade, 2008; Da Silva Neves, Bonnefon, & Raufaste, 2002; Ford, 2004; Ford & Billington, 2000; Pfeifer & Kleiter, 2005, 2009).

Finally, and in close relation to the problems of simple and floating reinstatement that we have introduced in the previous section, Horty (2002) implicitly appealed to descriptive validation when highlighting the issues that floating conclusions raise for sceptical semantics:2

There is a vivid practical difference between the two skeptical alternatives. […] Which alternative is correct? I have not done a formal survey, but most of the people to whom I have presented this example are suspicious of the floating conclusion. (p. 64)

We believe that the field of computational argumentation can indeed benefit from the same kind of formal surveys that have been conducted in the field of default reasoning, and that have been generally called for in Artificial Intelligence. To our knowledge, only very few articles have explicitly sought to inform formal models of argumentation with experimental evidence, and these experimental data have only been collected in relation to the specific issue of argumentation-based decision making (e.g., Amgoud, Bonnefon, & Prade, 2005; Bonnefon, Dubois, Fargier, & Leblois, 2008; Dubois, Fargier, & Bonnefon, 2008). What we offer in this article is an experimental investigation of the basic issue of how people reason from the critical argument structures corresponding to simple and floating reinstatement, and whether one of the current available semantics can capture their reasoning.

4. Study 1: Simple reinstatement

Study 1 investigates the basic structure of argument reinstatement. Abstractly, this structure is defined in the following argumentation framework (as displayed in Fig. 1): AF = 〈{A, B, C},BA,CB〉, in which argument A is attacked by argument B but reinstated by argument C.

Study 1 seeks to answer the following questions: Does the confidence in the conclusion of A decrease when A is defeated by B? Does this confidence then increase when C is introduced alongside A and B? If so, does confidence return to its initial level, that of when A was presented alone?

4.1. Method

Twenty participants were randomly approached in offices, shopping malls, and open spaces in Dubai, to take part in Study 1. Participants read an introduction to the task, informing them that the purpose of the experiment was to collect information about how people thought, that the task included no trick question, and that they simply had to mark the answer that they felt correct. Participants were asked about their proficiency with the English language, in order to make sure that it was above a reasonable level. Participants evaluated their proficiency by choosing one of nine terms ranging from Expert to Very Limited. They then solved 18 problems each, following a three-level, six-measure within-participant design.

The three-level independent variable was the Pattern of the problem (Base, Defeated, Reinstated). In the Base pattern, participants were only presented with argument A; in the Defeated pattern, participants were presented with arguments A and B; finally, in the Reinstated pattern, participants were presented with all three arguments A, B, and C.

Participants saw six different versions of each pattern, which used six different sets of contents for the arguments A, B, and C (see Appendix A for a list of all contents). More specifically, half the participants solved the Base (argument A), Defeated (arguments A and B), and Reinstated (arguments A, B, and C) problems using the first set of contents, then the Base, Defeated, and Reinstated problems using the second set of contents, and up to the Base, Defeated, and Reinstated problems using the sixth set of contents. The other half did the same, but they started with the sixth set of contents and worked their way down to the first.

Participants had to answer every problem, in the order they appeared in the questionnaire, without peeking at the next problem in the questionnaire. For each problem, participants had to assess the conclusion of argument A, using a seven-point scale anchored at certainly false and certainly true. The scale and the phrasing of the question were similar to that used in Politzer and Bonnefon (2006).3

4.2. Manipulation check

An independent sample of 18 participants was recruited to take part in the manipulation check of Study 1. The purpose of the manipulation check was to make sure that the C arguments did a good job at defeating the B arguments. Without this precaution, we would not be able to interpret the potential effect of C arguments on A arguments in the main experiment. Participants in the manipulation check solved 12 problems, according to a two-level, six-measure design. For each of the six argument sets, participants assessed their confidence (on a seven-point scale similar to that used in the main study) in the conclusion of B when B was presented alone, and their confidence in the conclusion of B when B was presented together with C.

4.3. Results

Averaging across the six contents and 20 participants, the base confidence in the conclusion (when argument A is presented alone) was 5.9 (SD = 0.8), whereas confidence in the defeated conclusion (when argument A is attacked by argument B) was 4.0 (SD = 1.4). Confidence in the reinstated conclusion (when argument A is attacked by argument B but reinstated by argument C) went back up to 5.2 (SD = 1.0).

Confidence in the conclusion was entered as the dependent variable in a repeated-measure analysis of variance, with pattern as a three-level predictor (Base, Defeated, Reinstated) and six measures corresponding to the six contents. The multivariate test detects a significant effect of Pattern, F(2,18) = 14.1, p < .001, inline image. This overall effect reflects both an effect of defeat and an effect of reinstatement. As shown by a contrast analysis, ratings in the Base condition were significantly higher than ratings in the Defeated condition, F(1,19) = 26.8, p < .001, inline image; and ratings in the Defeated condition were themselves significantly lower than ratings in the Reinstated condition, F(1,19) = 9.9, p = .005, inline image. Although reinstatement increased the acceptability of a conclusion, the recovery was not perfect. Indeed, the ratings in the Reinstated condition were still significantly lower than the ratings in the Base condition, F(1,19) = 9.1, p = .007, inline image.

The reliable effect of reinstatement must be related to the success of the reinstating manipulation, as shown by the results of the manipulation check. Averaging across the six contents, the base confidence in the conclusion of defeaters was 5.1 (SD = 0.8), whereas it was 4.1 (SD = 0.7) for attacked defeaters. A repeated-measure analysis of variance, with pattern as two-level predictor, and six measures corresponding to six contents, detected a significant effect of pattern F(6,12) = 3.8, p = .02, inline image.

Results thus support the notions of defeat and reinstatement. That is, confidence in the conclusion of a defeated argument significantly decreases, but it increases when the defeater is itself attacked by a reinstating argument. Results also suggest, however, that a reinstated argument does not fully recover from its defeat, as confidence in its conclusion remains significantly lower than what it was when the argument was presented in isolation. We defer the discussions of these results until after we report the results of Study 2, which extends Study 1 by considering the more complex case of floating reinstatement.

5. Study 2: Floating reinstatement

Study 2 offers an experimental comparison of the simple reinstatement structure to the more complex structure known as floating reinstatement, graphically displayed in Fig. 2.

In addition to replicating the findings of Study 1, Study 2 seeks to answer the following questions: Does floating reinstatement restore the confidence in the conclusion of argument A, and does it do so to the same extent as simple reinstatement? (A ‘‘yes’’ to both questions would go against the predictions of grounded semantics.) If so, does the effectiveness of floating reinstatement require that participants manifest a preference for either C over D, or D over C? (A ‘‘yes’’ would provide support to the predictions of credulous preferred semantics; a ‘‘no’’ would provide support to the predictions of sceptical preferred semantics.)

5.1. Method

Fourty-seven participants were randomly approached in the same circumstances and following the same protocol as in Study 1. They were randomly assigned to two experimental groups corresponding to simple and floating reinstatement, respectively, then solved 12 problems, following a three-level, four-measure within-participant design.

The three-level independent variable was the Pattern of the problem (Base, Defeated, Reinstated). In the Base pattern, participants were only presented with argument A; in the Defeated pattern, participants were presented with arguments A and B; finally, in the Reinstated pattern, participants were presented with the three arguments A, B, and C (in the simple reinstatement group) or with the four arguments A, B, C, and D (in the floating reinstatement group).

The procedure used in Study 2 was the same as that used in Study 1, but the contents of arguments A, B, C, and D were taken from four different argument sets than in Study 1 (see Appendix B). In addition to the questions used in Study 1, participants rated their understanding of each problem (‘‘How clearly did you understand the problem?’’) on a seven-point scale anchored at Not at all and Completely. Lastly, participants in the floating reinstatement group answered the following question about the four reinstated problems: Do you think that (a) C is a better argument than D, (b) D is a better argument than C, or (c) C and D are about equally good?

5.2. Results

Fig. 5 displays the average confidence in the conclusion of A, as a function of Pattern and Type of reinstatement, averaged across the contents and participants. The visual inspection of Fig. 5 already suggests that the results are very similar for the two groups. This preliminary intuition was confirmed by the results of a mixed-design analysis of variance, using the confidence in the conclusion as a dependent variable, pattern as a three-level within-subject predictor (Base, Defeated, Reinstated), the type of reinstatement as a two-level between-group variable (Simple, Floating), and four measures corresponding to the four linguistic contents.

Figure 5.

 Reinstatement is as effective in its floating form as in its simple form. Confidence in the conclusion of an argument decreases when the argument is defeated, and it is then imperfectly restored when its defeater is itself defeated, whether by a single argument (simple reinstatement) or by two mutually defeating arguments (floating reinstatement).

The multivariate test detected a significant effect of Pattern, F(8,38) = 6.1, p < .001, inline image. It did not, however, detect a significant main effect of Type of reinstatement F(4,42) < 1, p = .79, inline image, nor a significant interaction between Pattern and Type, F(8,38) = 1.2, p = .32, inline image.

As in Study 1, the overall effect of Pattern reflected a successful defeat followed by a successful reinstatement. As shown by contrast analysis, confidence ratings in the Defeated condition were significantly lower than ratings in the Base condition, F(1,45) = 34.9, p < .001, inline image, and this difference was not moderated by the Type of reinstatement (there is indeed no reason that it should be), F(1,45) < 1, p = .67, inline image. The confidence ratings in the Reinstated condition were significantly greater than in the Defeated condition, F(1,45) = 13.7, p < .001, inline image, and this difference (more interestingly this time) was not moderated by the Type of reinstatement, F(1,45) < 1, p = .60, inline image. Just as in Study 1, reinstatement is not perfect, as ratings in the Reinstated condition remain significantly lower than in the Base condition, F(1,45) = 9.0, p < .01, inline image. Again, there is no evidence whatsoever of a moderation by Type of reinstatement, F(1,45) < 1, p = .92, inline image.

So far, results suggest that floating reinstatement has an effect that is identical to classic reinstatement. We further note that although subjects found the floating reinstatement problems slightly harder to understand than the simple reinstatement problems, this difference appeared to play no role in the ratings they gave for their confidence in the conclusion. The average understanding rating was 4.6 (SD = 1.1) for simple reinstatement problems, compared to 4.0 (SD = 0.9) for floating reinstatement problems, t(45) = 2.0, p = .05. However, a regression analysis seeking to predict acceptance of reinstated arguments on the basis of problem understanding, Type of reinstatement (dummy coded, 1 for floating), and the interaction term between these two predictors, failed to find any significant effect. The interaction term in particular achieved a standardized β of .19, nonreliably different from zero, t = 0.32, p = .75.

The effectiveness of floating reinstatement does not appear to result from the subjects manifesting a preference for one of the mutually defeated arguments. We conducted four repeated-measure analyses of variance, one for each argument set, with conclusion acceptance as a dependent variable, pattern as a two-level predictor (Defeated, Reinstated), and preference as a dummy coded between-group variable (0 for subjects who said the two mutually defeating arguments were equally good, 1 otherwise). The interaction term between pattern and preference did not achieve statistical significance in any of the four analyses, all Fs in the 0.5–1.5 range, all ps in the .23–.48 range.

6. General discussion

Following the introduction of Dung's (1995) influential abstract argumentation frameworks, formal argumentation has become a fertile area of research in Artificial Intelligence. An argumentation framework can be represented as a directed graph in which vertices are arguments and directed arcs characterize defeat among arguments. Within this framework, various semantics (e.g., preferred vs. grounded) have been offered that seek to establish whether each argument in the graph can be accepted. In some cases (such as simple reinstatement), preferred and grounded semantics are in agreement; but in other cases (such as floating reinstatement), the two semantics have different takes on what constitutes an acceptable argument.

When there is a conflict between the predictions of two semantics, the standard practice in Artificial Intelligence is to rely on intuition to elect one of these predictions as the normatively correct one. Although this normative-intuitive approach has its uses, we argued that it might be adequately complemented with the kind of descriptive-experimental approach that has already been used in some domains of Artificial Intelligence (e.g., default and inheritance reasoning), and that has been called for by various voices within the formal community. This descriptive experimental approach consists of using the methods of experimental psychology to run controlled studies of argument-based reasoning; and to confront the results of these studies with the predictions made by formal semantics.

In this article, we applied this approach to simple as well as floating reinstatement. Study 1 addressed the basic situation of simple reinstatement, across a varied set of linguistic contents. Participants reasoned in a way that reflected the formal notions of defeat and reinstatement: Their confidence in an argument A decreased when it was attacked by an argument B, but bounced back up when B itself was attacked by a third argument C. These findings are in agreement with grounded as well as preferred semantics (and others). What neither semantics could predict, though, is the finding (replicated in Study 2) that the recovery of argument A was not complete when reinstated by argument C: Confidence in A in presence of B and C did not raise back to its former level, when A was presented alone.

This is not a trivial observation. Indeed, every possibility seemed plausible a priori. We could expect, as formal semantics would have it, that A would fully regain its former status. We could also imagine that the confidence in A in presence of B and C would surpass the confidence in A when presented alone: Indeed, confidence in A might be boosted by seeing a potential objection to A being ruled out. But what happened was exactly the contrary. Seeing one objection to A, even when it was ruled out, decreased the confidence in A, possibly because the evocation of one objection prompted participants to consider other possible objections that were not explicitly ruled out in the problem.

There is indeed some sort of suspension of disbelief involved in reasoning experiments using natural language materials (see Evans & Over, 2004, Chapter 6, for a review of how to increase or decrease this suspension of disbelief by means of experimental instructions). Participants can easily generate all sorts of objections to the arguments presented to them by the experimenter, but they suspend their disbelief in these arguments for the sake of the experiment. When one objection is presented by the experimenter herself, though, suspension of disbelief is disrupted and some participants start to let their own private beliefs leak into the way they reason from the experimental materials. The fact that simple reinstatement works, though, even if not perfectly, is good news to current semantics, and a warning for future semantics not to dispense with simple reinstatement.

Turning now to floating reinstatement, our results suggest that, empirically speaking, floating reinstatement works exactly as well as simple reinstatement. Participants’ confidence in an argument A decreased when it was attacked by an argument B, but bounced back up when B itself was attacked by two mutually defeating arguments C and D. These results clearly speak in favor of preferred semantics. Results also suggest that the sceptical version of preferred semantics might be more cognitively plausible than the credulous version, as the effect of floating reinstatement was not dependent on participants showing a preference for one of the two mutually defeating arguments. This question is not yet settled, however, because the data do not make it clear whether participants would be willing to commit to accepting one of the mutually defeating arguments C and D. Hence, this aspect of the results requires further investigation.

Besides their theoretical value, our results also have applied value for developing agents that are meant to argue with human users. We already know that artificial agents can achieve better negotiation results with human users when they do not play normative equilibrium strategies, but rather adopt boundedly rational strategies inspired from human behavioral data (Gal & Pfeffer, 2007; Lin, Kraus, Wilkenfeld, & Barry, 2008). Generally speaking, we may expect that artificial agents may similarly be more successful when arguing with human users, if they can anticipate human reactions to various abstract argumentation frameworks. With that goal in mind, our results suggest that artificial agents may be better off avoiding discussion that may reveal a defeater, even if the agent has a counter-argument to that defeater; but should be ready to use floating reinstatement as well as simple reinstatement in order to neutralize a defeater raised by the human user. These kinds of heuristics can be incorporated into a decision-theoretic model of a persuasive agent that interacts with users using natural language (Grasso, Cawsey, & Jones, 2000; Reed, 1998). Such agents may also be complemented by domain-specific knowledge of effective argumentation strategies (e.g., in the domain of genetic counseling [Green, 2007] or healthy diet promotion [Mazzotta, Rosis, & Carofiglio, 2007]). Going beyond our specific results, by building up a corpus of argument structures and how they are evaluated, it may be possible to use machine learning techniques to build models that predict how people will react to novel argument structures.

Independently of our specific results, we hope to have convinced the reader that the wealth of scientific methodology from psychology can give a new perspective on the problems raised when formalizing argumentation and developing argument evaluation semantics. We hope that our claims and findings can prompt researchers working on the computational modeling of argument to explore new avenues of investigation inspired by, and validated against, empirical evidence from psychology and cognitive science.

We also hope to have excited cognitive scientists working on human reasoning about the growing literature on formal models of argumentation. These models, and their associated normative properties, have great potential in complementing existing research on human reasoning and providing conceptual means for dealing with highly complex inference structures.

Footnotes

  • 1

     Other semantics (e.g., Caminada, 2006a) introduce a more fine-grained distinction between accepted, rejected, and undecided arguments. A comprehensive review of argumentation semantics is beyond the scope of this article, but excellent reviews can be found elsewhere, for example, in Baroni and Giacomin (2007) or Rahwan and Simari (2009), Chapter 2.

  • 2

     Working within the scope of default logic, Horty gave a specific example to highlight counter-intuitive results in classical reasoning with a floating conclusion supported by two mutually conflicting pieces of evidence.

  • 3

     The question always refers to the conclusion of argument A. For example, for Argument Set 1 in the appendix, the question would be worded ‘‘Alex's car will halt is (1) certainly false; (2) much more false than true; (3) slightly more false than true; (4) as false as true; …(7) certainly true.’’ Participants responded by checking the corresponding numeral on a graphically depicted scale.

Appendices

Appendix A: Materials used in Study 1

Argument set 1

  • (A) The battery of Alex's car is not working. Therefore, Alex's car will halt.
  • (B) The battery of Alex's car has just been changed today. Therefore, the battery of Alex's car is working.
  • (C) The garage was closed today. Therefore, the battery of Alex's car has not been changed today.

Argument set 2

  • (A) Louis applied the brake and the brake was not faulty. Therefore, the car slowed down.
  • (B) The brake fluid was empty. Therefore, the brake was faulty.
  • (C) The car had just undergone maintenance service. Therefore, the brake fluid was not empty.

Argument set 3

  • (A) Mary does not limit her phone usage. Therefore, Mary has a large phone bill.
  • (B) Mary has a speech disorder. Therefore, Mary limits her phone usage.
  • (C) Mary is a singer. Therefore, Mary does not have a speech disorder.

Argument set 4

  • (A) John has no way to know Leila's password. Therefore, Leila's e-mails are secured from John.
  • (B) Leila's secret question is very easy to answer. Therefore, John has a way to know Leila's password.
  • (C) Leila purposely gave a wrong answer to her secret question. Therefore, Leila's secret question is not very easy to answer.

Argument set 5

  • (A) Mike's laptop does not have anti-virus software installed. Therefore, Mike's laptop is vulnerable to computer viruses.
  • (B) Nowadays anti-virus software is always available by default on purchase. Therefore, Mike's laptop has anti-virus software.
  • (C) Some laptops are very cheap and have minimal software. Therefore, anti-virus software is not always available by default.

Argument set 6

  • (A) There is no electricity in the house. Therefore, all lights in the house are off.
  • (B) There is a working portable generator in the house. Therefore, there is electricity in the house.
  • (C) The fuel tank of the portable generator is empty. Therefore, the portable generator is not working.

Appendix B: Materials used in Study 2

Argument set 1

  • (A) Cody does not fly. Therefore, Cody is unable to escape by flying.
  • (B) Cody is a bird. Therefore, Cody flies.
  • (C) Cody is a rabbit. Therefore, Cody is not a bird.
  • (D)Cody is a cat. Therefore, Cody is not a bird.

Argument set 2

  • (A) Smith does not follow American spelling. Therefore, Smith writes “colour” instead of “color”.
  • (B) Smith speaks American English. Therefore, Smith follows American spelling.
  • (C) Smith was born and brought up in England. Therefore, does not speak American English.
  • (D) Smith was born and brought up in Australia. Therefore, does not speak American English .

Argument set 3

  • (A) The car did not slow down. Therefore, the car approached the signal at the same speed or higher.
  • (B) Louis applied the brake. Therefore, the car slowed down.
  • (C) Louis applied the accelerator instead. Therefore, Louis did not apply the brake.
  • (D) Louis applied the clutch instead. Therefore, Louis did not apply the brake.

Argument set 4

  • (A) Stephen is not guilty. Therefore, Stephen is to be free from conviction.
  • (B) Stephen was seen at the crime scene at the time of the crime. Therefore, Stephen is guilty.
  • (C) Stephen was having dinner with his family at the time of crime. Therefore, Stephen was not seen at the crime scene at the time of the crime.
  • (D) Stephen was watching football with his friends in the stadium at the time of the crime. Therefore, Stephen was not seen at the crime at the time of the crime.

Ancillary