Self-Repair Increases Referential Coordination

When interlocutors repeatedly describe referents to each other, they rapidly converge on referring expressions which become increasingly systematized and abstract as the interaction progresses. Previous experimental research suggests that interactive repair mechanisms in dialogue underpin convergence. However, this research has so far only focused on the role of other-initiated repair and has not examined whether self-initiated repair might also play a role. To investigate this question, we report the results from a computer-mediated maze task experiment. In this task, participants communicate with each other via an experimental chat tool, which selectively transforms participants’ private turn-revisions into public self-repairs that are made visible to the other participant. For example, if a participant, A, types “On the top square,” and then before sending, A revises the turn to “On the top row,” the server automatically detects the revision and transforms the private turn-revisions into a public self-repair, for example, “On the top square umm I meant row.” Participants who received these transformed turns used more abstract and systematized referring expressions, but performed worse at the task. We argue that this is due to the artiﬁcial self-repairs causing participants to put more effort into diagnosing and resolving the referential coordination problems they face in the task, yielding better grounded spatial semantics and consequently increased use of abstract referring expressions.

Convergence on referring expressions is intrinsically interactive. Each pair of participants typically creates their own, idiosyncratic conventions for the same referents depending on their specific interaction history (Garrod & Doherty, 1994;Healey, 1997). Yet, the development of abstraction is not simply due to the coordination problem of creating a novel referring expression: once referring expressions have been used successfully, they continue to develop in predictable directions (Garrod, 1999;Healey, 2004; see also Table 1). The emergence of abstract referring expressions also occurs across modalities: in spoken interaction (Pickering & Garrod, 2004), text-based messaging , gesture (Macuch Silva, Holler, Ozyurek, & Roberts, 2020;Motamedi, Schouwstra, Smith, Culbertson, & Kirby, 2019;Nölle, Staib, Fusaroli, & Tylén, 2018), whistle-based language (Verhoef, Roberts, & Dingemanse, 2015), and in graphical, mediated interaction (Galantucci, 2005; Table 1 Global development of abstract descriptions in the maze game Initially You need to go to my switch which is all the way at the top right on the sticking out part on the left 5 min That's me done, can you go two down from the large block of squares on the right 10 min You need to go to the middle column last square 15 min I'm on the fourth column from right 3rd square 20 min Wait then go to 5th column topmost square 30 min Went back to 4th column 1st square 35 min 3rd col from left, row 7 from top 40 min Then 2c r 6. yours? 45 min 5, 7 Note. Initially, participants use descriptions that typically rely on visually salient features of the maze, for example, the "sticking out part" or "large block of squares on the right." As the task progresses, participants develop more systematized descriptions which conceive of the mazes as consisting of squares aligned in columns (e.g., "fourth column from right 3rd square"). By the end of the experiment, the most coordinated pairs tend to use extremely concise Cartesian coordinate descriptions which conceive of the mazes as consisting of rows and columns.
Further, the quality of the interaction directly affects the development of coordination. If interlocutors are prevented from providing each other with feedback, for example, by being prevented from annotating each other's drawings, this impedes the development of abstract referring expressions (Healey et al., 2007). Similarly, in multiparty interaction, convergence occurs at a different rate between fully ratified participants than between participants and overhearers who have limited opportunities for engaging in the interaction ; see also Kühlen & Brennan 2013).
Cumulatively, these findings suggest that processing that occurs in interaction places important constraints on the semantic negotiation of referring expressions (Freyd, 1983;Olsen & Tylén, 2023). However, there is currently no consensus about which mechanisms are involved.
One important source of constraints comes from individuals' cognitive biases (Kirby, Cornish, & Smith, 2008). On this view, simply being exposed to another's linguistic output should suffice to drive abstraction, for example, in an iterated learning chain (Kirby, Griffiths, & Smith, 2014;Silvey, Kirby, & Smith, 2019). But when noninteracting participants are exposed to exactly the same signs as interacting dyads, the signs that are subsequently produced by the noninteracting participants are less effective and less efficient (Fay, Walker, Swoboda, & Garrod, 2018), demonstrating the importance of interindividual, as opposed to intraindividual processes occurring in interaction.
A parsimonious account for interindividual coordination is provided by the Interactive Alignment model (Pickering & Garrod, 2004), which proposes that convergence arises as a consequence of automatic mutual priming. But this does not fully explain convergence, since priming is intrinsically conservative (Healey, 2004): once a particular form is the most successfully and widely used by a group, there is no mechanism to explain how it might be supplanted by another. Yet, interlocutors do not settle on abbreviated forms of the initially most frequently used referring expression in a "winner-takes-all" process. Interlocutors continue to develop novel and more abstract descriptions throughout the interaction (see Table 1). The priming account also does not explain conversational routines that do not involve lexical repetition or syntactic parallelism, for example, adjacency pairs (Clark, 1996;Schegloff, 2007), which often consist of complementary pairs of different types of contribution. In fact, patterns of local imitation of turns are worse statistical predictors of dialogue coordination than patterns of different, complementary turns (Fusaroli & Tylen, 2016), while indiscriminate, local imitation is actually associated with unsuccessful dialogue (Fusaroli et al., 2012).

Miscommunication drives abstraction
An alternative account is provided by Healey (2008) and , who argue that the interactive mechanisms associated with miscommunication play a central role in the development of abstract descriptions. Although historically miscommunication has been treated as a phenomenon to be avoided by interlocutors (Healey, De Ruiter, & Mills, 2018), research in conversation analysis has revealed how miscommunication involves a family of intricate interactive "repair" mechanisms that are used by interlocutors to sustain coordination Heesen, Fröhlich, Sievers, Woensdregt, & Dingemanse, 2022;Schegloff, 1992). For example,

Example 1
A Hey, the first time they stopped me from selling cigarettes was this morning B From selling cigarettes A Or buying cigarettes Source: From Schegloff et al. (1977).
Here, B supposes that A did not intend to say "selling," and identifies the possible mistake in A's turn by repeating the problematic phrase "From selling cigarettes." Importantly, this form of repair leaves it up to A to remedy their own mistake. Similar forms of repair occur in the maze game:

Example 2
A Move to the third square second row B third? A from the right In this example, participant B has trouble understanding how A is counting squares within a row. B signals this trouble by repeating the problematic element "third," which A then clarifies. Similarly, in Example 3:

Example 3
A I'm in row 6 column 7 B huh? A next to the clump of squares that looks like an arm sticking out In this example, participant B is unable to precisely specify the problem. So, B uses an open-class repair, "huh?" (Drew, 1997), to signal problematic understanding, leading A to fully reformulate their turn with an easier to understand, less abstract, description.
According to the repair-based account of Healey et al., such repair sequences allow interlocutors to identify potential divergences of interpretation with their conversational partner concerning the semantics of referring expressions, and then interactively resolve these divergences. Findings from a set of maze-task experiments (Healey 1997;Mills, 2014; provide evidence for repair-driven convergence. In this task, pairs of participants collaboratively solve mazes. This presents participants with the recurrent need to refer to spatial locations (see Fig. 1 for an example of maze configuration). A consistent finding is that participants initially start out using descriptions which identify visually salient features of the maze, for example, "the sticking out part," "at Fig. 1. Each participant has two windows on their screen. The top window displays the maze configuration. The lower window contains the chat interface used by participants to communicate with each other. In this dialogue, P1 originally typed "Am on top row," then subsequently deleted "top row" and replaced it with "first row." These private deletions are transformed by the server into a self-repair and sent to P2's screen.
the end of the arm." Over the course of the experiment, participants progressively use more abstract descriptions, for example, "longest row 5th square," while the most co-ordinated pairs converge on more complex abstract Matrix descriptions, such as "A5," "2,1," or "row 3 column 4" (see also Table 1). These descriptions are more difficult to coordinate on as their successful use requires coordinating on counting conventions (Healey, 2004). In a particular use, a description such as "D4" is a compact expression of a meaning like: "4 across from the leftmost edge of the maze window, counting the edge as zero, and counting the missing boxes and counting 3 boxes up from the lower edge of the window" (Mills, 2014). In order for participants to converge on such Matrix schemas, they first need to establish how to count rows and columns, whether to count from 0 or 1, whether "rows" can also be vertical, whether to count missing nodes, and so on (Healey, 2004). This is accomplished interactively, via repair, as it allows participants to identify, diagnose, and resolve any differences in interpretation ; see also van Arkel, Woensdregt, Dingemanse, & Blokpoel, 2020;Bjørndahl, Fusaroli, Østergaard, & Tylén, 2015;Micklos, Walker, & Fay, 2020).

Manipulating miscommunication
To investigate experimentally the role played by repair, Healey (2006, 2008) conducted an experiment in which participants communicated via an experimental chat-tool which inserts artificial repairs into the interaction. The repairs appear, to participants, to originate from each other. In Examples 3 and 4 below, the second turn is an artificial repair produced by the server that appears to A as originating from participant B.

Example 4
A Go to 3rd row 2nd column B row? (produced by the server) A yeah counting from the top

Example 5
A My switch is at 4,5 B huh? (produced by the server) A it's next to the sticking out part Participants who received such artificial repairs produced fewer abstract descriptions, suggesting that, when participants encounter difficulties, they resort to less abstract descriptions that rely on visually salient features of the maze, which are easier to co-ordinate on.
A similar method was used in a subsequent experiment  which used the chat-tool to automatically detect instances of naturally occurring repair and amplify their severity. For example, in the following conversation, B's repair "5th?" is intercepted and transformed into "what?" and sent to A.

Example 6
A go to the 3rd row 2nd column B 3rd? (intercepted by server, not sent to B) B what? (transformed turn sent to B) A go to 3rd row 2nd column from the right Participants who received these manipulations produced more abstract descriptions, while the manipulations had no other discernible effect on task performance.  explain this pattern as being due to these interventions exacerbating the apparent severity of actual "trouble" in co-ordinating on the semantics of referring expressions. Participants respond to this increased severity by putting more effort into diagnosing and resolving the problem, yielding better grounded spatial semantics and consequently increased use of abstract descriptions.

Self-repair
In addition to the types of repair discussed above, speakers can also modify their own utterances with a so-called self-repair (Schegloff, 2007). Suppose that in Example 1, if A had corrected their slip of the tongue immediately after uttering it, this could have yielded a turn such as:

Example 7
A Hey, the first time they stopped me from selling, uhh buying cigarettes was this morning In this example, A identifies that they made a mistake, then signals the suspension of the delivery of the utterance with the editing expression "uhh" (Levelt, 1983), followed by replacing "buying" with "selling." Similarly, in the maze game, a participant might produce a selfrepair such as:

Example 8
A Move to the top row uhh I mean first row.

Example 9
A My goal is at 4, 5 oops it's at 4, 6 From a cognitive perspective, self-repair can be attributed to the inexorable incrementality of processing (Gregoromichelaki, Kempson, & Howes, 2020;Hough, 2014). In addition, selfrepairs are associated with better planning and coordination in effective team communication (Gervits, et al., 2016), are indicative of speakers adapting their descriptions to the perspective of their partner (Clark & Wilkes-Gibbs, 1986;Clark & Krych, 2004), and can have a beneficial effect on comprehension (Brennan & Schober, 2001).

Research questions
In summary, experimental research suggests that convergence on abstract referring expressions is underpinned by participants identifying, diagnosing, and resolving differences in interpretation via repair. However, experiments that manipulated repair have focused solely on other-initiated repair, thus perhaps missing how self-repair might be an important mechanism underpinning semantic change and adaptation in interaction. To address this question, we describe an experiment which investigates whether participants who play the maze task and whose (covert) self-repair efforts are artificially upgraded to public signals will also be induced to use more abstract descriptions.

The maze task
The maze task is a computer-mediated version of the maze game experiments conducted by Garrod and Anderson (1987) and Garrod and Doherty (1994). Pairs of participants sit in different rooms in front of a computer screen which displays (1) the maze application and (2) a chat window for communicating with each other (see Fig. 1). The maze application displays a maze configuration consisting of interconnected nodes. Each participant's maze has a goal location marked with a red cross. The paths to the goal are blocked by gates which can only be opened if the other participant moves their position marker to a location that corresponds to a gray switch location that is only visible on their partner's screen. In order to get to the goal and solve the maze, participants have to open their gates by getting their partner to go onto a switch that only the player can see on their screen. This creates a recurrent co-ordination problem of participants guiding each other onto each other's switches. Participants play 12 randomly generated mazes, with a timeout of 5 min: If they fail to complete a maze in 5 min, the next maze is automatically loaded. This means that all dyads play 12 games (i.e., attempt 12 different mazes).

Manipulation: Transforming private turn-revision into self-repairs
Participants communicate with each other via a custom Artificial Intelligence-mediated (Hancock, Naaman, & Levy, 2020) instant messaging program (see Fig. 1). The instant messaging program consists of two windows. The top window shows the conversation history; the lower window is a turn-formulation window in which participants type their turn privately before sending it by pressing ENTER. All participants' keystrokes are sent to the server which analyses what they type and automatically transforms participants' private turn-revisions into self-repairs that are made visible to the other participant. For example, suppose a participant types the following: Participant1: Go to the square on the left, next to the big block on top.
And then, before pressing ENTER, the participant edits the turn to: Participant1: Go to the square on the left, next to the third column The chat server automatically identifies the deleted portion of the turn, and appends an editing expression (Levelt, 1983), such as "umm," "uhh", followed by the new revised text. This would yield the following turn, sent to P2: Participant1: Go to the square on the left, next to the big block on top umm next to the third column.
Importantly, participant 1 does not see the transformed text in their own chat window (see Fig. 1).
The experiment was conducted on native Dutch-speaking participants. We used the following editing expressions, which were identified in a previous pilot study: "eeh," "eehm," G. Mills, G. Redeker / Cognitive Science 47 (2023) 9 of 24 "euh," "euhm," "ehm," "uh," "uuh," "uuhm," "ik bedoel" ("I mean"), "eh ik bedoel" ("uh I mean"). Interventions were performed on both members of a dyad. The editing expression was selected randomly. In order to avoid cascades of interventions, a minimum of five turns had to elapse after each intervention before a turn by the same participant would be manipulated again.

Measures
The following measures were used:

Description types
Proportion of Matrix descriptions: This measure records whether a participant describes a Maze using a Cartesian coordinate schema consisting of rows and columns. Each turn was classified as one of three categories: 1. Non spatial descriptions, for example, "tell me where to go" 2. Matrix, Cartesian descriptions, for example, "4,5," "A1," "row 3 column 2" 3. Other, for example, "the sticking out row," "the part that looks like a head," "big column on the right." This corresponds to the categories Figural, Path, and Line from the original maze game (Garrod & Doherty, 1994).
Each maze description was classified independently by both authors. Any conflicting classification was discussed and resolved.

Performance measures
Task success: The number of mazes completed, which ranges between 0 and 12.

Number of turns:
The number of messages typed in the private turn formulation window and sent to the other participant.
Turn-length: The length (in characters) of each message. Note that turn-length and number of turns measure different properties of the interaction. For example, if participants ground in multiple turns (installments), this would lead to more turns that are also shorter.
Edits: All turns were analyzed to establish whether they had been revised while being typed. This measures how much effort participants put into turn formulation.
Alignment: This records for each spatial description whether it is of the same type (Matrix vs. Other) as the description produced by the previous participant.

Additional analyses (see Discussion)
First use of Matrix description: This measure records when (i.e., on which turn number) a dyad first uses a Matrix description.
Unique words: This records the number of unique words produced by each participant.

Participants
One hundred and twenty-two participants were recruited from first-year undergraduate classes at the University of Groningen, Department of Communication and Information Science, and participated for course credit. These classes have an approximately equal balance of genders. Participants were randomly assigned to dyads, and the dyads were randomly assigned to either the Control condition or Manipulated condition. Four pairs were discarded as it turned out they had previously participated in a maze game experiment, leaving 24 dyads in the Control condition and 33 dyads in the Manipulated condition.

Procedure
Pairs were booked for 1-h slots. They were given written instructions, and then instructed verbally. Each pair of participants was asked to complete all 12 games as fast as possible. The nature of the experimental manipulations was not disclosed to participants until the debriefing session. All procedures were in accordance with the 1964 Helsinki Declaration and were reviewed by the Faculty's Committee for the Ethical Evaluation of Research (CETO).

Hypothesis
If repair underpins the emergence of abstract descriptions, then analogously to the experiment conducted by , participants whose covert repairs are exposed should use more Matrix descriptions than participants in the Control group.

Research question
What effect will the manipulation have on performance measures? We see three possibilities. The manipulation: 1. has no discernible effect (as in ), or 2. increases the amount of "trouble" in the interaction, having a deleterious effect on task performance, or 3. increases participants' effort in coordinating in the task, having a beneficial effect on task performance.

Results
We analyzed the results using R version 3.6.2 (R Core Team, 2022), together with the LME4 package version 1.1-26 (Bates, Maechler, et al., 2015) and the MASS package v. 7.3-54 (Ripley et al., 2013). The models included random intercepts for dyads, participants, and mazes, as well as random slopes of condition and time within mazes. The models were estimated with an unstructured covariance matrix. Since we are interested in what the participants type, the artificial, transformed turns generated by the server are excluded from the analysis; only the original unmodified turns as sent by the participants are included in the analyses. This resulted in 17,627 turns overall. Dyads took a mean 31 min and 30 s (SD = 7 min and 4 s) to solve all 12 mazes.

Description types
In order to test whether participants in the Manipulated condition used more Matrix descriptions than participants in the Control condition, we conducted a likelihood ratio test of the model with the manipulation effect against the model without the manipulation effect. This revealed a significant difference (χ 2 (3) = 8.52, p = .0364). The predicted probability of Matrix descriptions in the Control group is 0.02 [95% CI: 0.00, 0.18]. The predicted probability of Matrix descriptions in the Manipulated group is 0.34 [95% CI: 0.07, 0.77]. This confirms H1 (see Fig. 2).

Performance measures
In order to investigate the effect of the manipulations on performance measures, we compared models that included/excluded the corresponding predictors (main effects and interactions). Following Eshghi and Healey (2016) and , we pool the scores for the first six games (EARLY) and the last six games (LATE) to provide an index of how the measures change over time. Akaike's Information Criterion (AIC) was used for model comparison, as there was no nesting relationship between all models being compared-it was not possible to use a chi-square difference test between the different models. The model with the lowest AIC score was considered the best-fitting. We report the best-fitting model.

Task success
Task success was modeled with a multilevel binomial logistic regression, using glmer with a logit link function.

Turn length
The length of participants' turns was modeled with a multilevel negative binomial regression, using glmer.nb with a log link function. The model with the lowest AIC showed a significant effect of time (b = −0.210 [95% CI: −2.61, −0.158], z = −7.98, p < .001). The predicted mean turn length in the first six games is 16.9 [95% CI: 15.8, 18.1] characters, while in the last six games, the predicted turn length is 13.7 [95% CI: 12.5, 15.0] characters (see Fig. 4).

Number of turns
The number of turns produced by dyads over the course of the experiment was modeled with a multilevel negative binomial regression, using glmer.nb with a log link function. The model with the lowest AIC showed a significant effect of the manipulation (b = 0.187, [95% CI: 0.00227, 0.372], z = 1.984, p = .0473) and a significant effect of time (b = −0.512, [95% CI: −0.593, −0.431], p < .001) (see Fig. 5 and Table 2).

Deletes
The number of deletes produced by dyads over the course of the experiment was modeled with a multilevel logistic regression, using glmer with a logit link function. The model with the lowest AIC showed a significant effect of time (b = −0.211, [95% CI: −0.308, −0.116], z = −4.32, p < .01). The predicted probability of a turn containing a delete is 0.39 [95% CI: 0.36, 0.42] in the first six games and is 0.34 [95% CI: 0.32, 0.37] in the last six games (see Fig. 6).

Semantic alignment
The alignment of participants' spatial descriptions was modeled with a multilevel logistic regression, using glmer with a logit link function.   Fig. 7).

First use of matrix descriptions
The first use of Matrix descriptions by a dyad was modeled with a negative binomial regression from the MASS package. A likelihood ratio test of the model with the manipulation effect against the model without the manipulation effect did not reveal a significant difference (χ 2 (1) = 0.798, p = .372). The predicted number of turns that elapse before a member of a dyad produces a Matrix description is 23.7 [95% CI: 16.5, 34.3].

Number of unique words
The number of unique words was modeled with a multilevel negative binomial regression, using the glmer.nb with a log link function.  Fig. 8 and Table 3).

Discussion
The results confirm the repair-driven view of co-ordination: Dyads whose covert repairs were exposed produced more abstract Matrix descriptions.
Although the changes in performance measures over time are consistent with the basic findings that interlocutors develop increasingly contracted referring expressions and become more successful as the task progresses, the effect of the manipulation on task performance is puzzling. Consistent with previous research, the manipulation appears to be having a beneficial effect on semantic co-ordination. However, unexpectedly, the manipulation also has a detrimental effect on task performance (task success and number of turns). Prima facie, this conflicts with previous research which has consistently found a positive association between abstract descriptions and task success (Castillo et al., 2019;Garrod & Doherty, 1994;.
The immediate questions that arise are: Why are the interventions causing more disruption? Why are they causing more abstraction? Might the increased abstraction be causing the disruption and/or vice-versa? We identify four possible explanations of the flow of information between participants: (a) First, the editing expressions might be directly influencing participants to use more abstract descriptions. According to Arnold and Tanenhaus (2011), recipients of turns containing editing expressions are more likely to focus on less familiar items (see also Barr, 2001;Corley, MacGregor, & Donaldson, 2007). This could cause manipulated participants to consider previously unmentioned maze locations, effectively exposing participants to more exemplars, thereby creating a pressure to develop referring schemas that abstract over these exemplars (see also Raviv et al., 2022). Similarly, the editing expressions could also prompt participants to use less familiar referring expressions, stimulating participants' exploration of the space of possible descriptions. This is partially borne out in the additional analyses-manipulated dyads use more unique words, but do not appear to be introducing Matrix descriptions any earlier, suggesting that the decrease in task performance is not due to participants being induced to use Matrix descriptions prematurely, that is, before they have established sufficient co-ordination to use them successfully.
(b) Second, the deleted text might have influenced participants' conversational memory for the references produced during the interaction (Knutsen & Le Bigot, 2015) and could also have helped participants to uncover sources of misalignment (see also Schober, Suessbrick, & Conrad, 2018), in particular, concerning how to count in the maze. Consider Examples 10-15 below. In Examples 10 and 11, the deleted text shows that the sender is encountering trouble counting rows and nodes, potentially alerting the recipient to this trouble. Example 12 appears to show the sender was considering a different origo for counting (counting from bottom left vs. bottom right), while similarly Example 13 shows that the sender was originally considering counting from top to bottom, as opposed to from bottom to top. In Example 14, the edited turn refers to both horizontal and vertical row counts, whereas the original turn only shows horizontal row counts. Making the deleted text visible could be beneficial for the recipient since it provides an impetus for conceptualizing the maze as consisting of horizontal as well as vertical rows, which are the constituent elements of Matrix descriptions. Similarly, in Example 15, the sender initially conceptualized the maze as consisting of vertical rows, using ("3rd row from left") to refer to the vertical column which contains two switches and the position marker (see Fig. 1, P2's maze) but then changes and uses a different schema that conceptualizes the maze as consisting of horizontal rows ("last row third block from left").
(d) Fourth, the interventions could be causing participants to think their partner is experiencing more difficulty than they actually are, inducing them to compensate by expending more effort in grounding the referring expressions, following the "principle of least collaborative effort" (Clark & Brennan, 1991). This effect could be driven by the editing expressions, which have been shown to cause participants to appear less confident (Brennan & Williams, 1995) and as having a poorer grasp of the task (Susca & Healey, 2002). Moreover, many of the substitutions were concerned with correcting typos, for example In such manipulations, the text is much less readable than the original text and is often garbled. This could lead participants to think more negatively of their partner's ability (Boland & Queen, 2016), leading them to "dumb down" and put more effort into their turns in order to compensate for the (apparent) decreased skill (see, e.g., Dreisbach & Fischer, 2011) or commitment (Michael, Sebanz, & Knoblich, 2016; see also Mills, Gregoromichelaki et al., 2021) of their partner. In addition, participants' responses to manipulated messages could be giving the participant whose message was modified the impression that the recipient is experiencing more difficulty in formulating their turn, and is, therefore, a less credible interlocutor.
Relatedly, and more importantly in our view, the effect of the manipulation here is more radical than in previous experiments. Here, the manipulation renders as purposefully public information signals that were not intended to be included in the message sent to the interlocutor. Public self-repair in conversation does not only have the function of correcting trouble but can also be used strategically by the speaker to perform other actions like marking dispreferred responses, serve identity construction, or responsively react to (multimodal) feedback from the addressee (see, e.g., Lerner & Kitzinger, 2007;Schegloff, 2013). Under our manipulation, such potential strategic uses appear in the public arena for the consideration of the addressee while not underpinned by the intentions of the speaker or any reasons based on the interactional common ground, for example, there is no dispreferred response that is mitigated and the reformulations are not intended to indicate that the speaker's original description needs to be taken into account by the addressee. It is possible that such nonintended messages have both a local and downstream effect on the amount of effort that participants have to expend to disentangle what the import of each other's responses is. This might facilitate coordination in making the speaker's thought process transparent and liable to be corrected (see the discussion of editing expressions above), but, on the other hand, it might result in participants having to take more turns to achieve their goal.

Conclusions and future work
In summary, it appears that self-repairs are causing participants to put more effort into grounding their referring expressions, whether as a consequence of attributing lower confidence to the other interlocutor, or due to the edited text providing more information about problems in the task. Dyads who received the interventions typed more turns and solved fewer mazes, suggesting that they are putting more effort into co-ordinating their referring expressions, while the increased use of unique words suggests that the interventions are inducing participants to explore the space of possible referring expressions. Somewhat surprisingly, despite using more abstract descriptions, participants are not using them earlier, suggesting that the exploration process is not occurring at the level of Matrix descriptions, but is presumably occurring at a finer grain, for example, in clarifying spatial semantics, as in Examples 10-15.
However, it is unclear how the constituent components of the self-repairs contributed to the patterns observed. The interventions used a variety of editing expressions, which might have had different effects on participants (see, e.g., Clark & Fox Tree, 2002;Womack et al., 2012 for a discussion). Also, the algorithm for transforming private edits into public self-repairs was not sensitive to the content of the messages. This means that many different types of "trouble" were made visible, including typos, reformulations, and specifications. In addition, some of this "trouble" might have been introduced by the interventions themselves, as well as by participants attempting to make sense of their partner's response to the interventions. Given this complexity, it is very difficult to determine the extent to which the different types of editing expression and "trouble" types might have contributed toward the observed pattern.
To address these issues, a promising next step would be to use more sophisticated AImediated communication to detect and manipulate specific kinds of "trouble," for example, solely manipulating typos or reformulations of Matrix descriptions, as well as "trouble" concerning the procedural coordination in the task (Knutsen, Bangerter, & Mayor, 2019). In contrast to the present study, such experiments should use a between-participant design to avoid interactions between different types of intervention. This would allow much more fine-grained comparison of different types of self-repair, and would also allow comparison between self-and other-initiated repair: the same reformulations of referring expressions could be introduced in one dialogue in the format of artificial self-repairs, and in a different dialogue as artificial other-initiated repairs. This approach could be augmented by using an incremental WYSIWYG chat interface which displays characters as they are typed (Maraev, Mazzocconi, Mills, & Howes, 2020;Ziembowicz & Nowak, 2019), which would be amenable to artificially and automatically manipulating public turn-edits in real-time.