Social Interaction in YouTube Text-Based Polylogues: A Study of Coherence



Since YouTube was launched, its emblematic video-sharing facility has attracted considerable attention as a social networking system of cultural production. In addition to vlogging, YouTube offers a text facility through which YouTubers share and negotiate opinions. However, research into the latter is scarce, especially within language-based disciplines (Androutsopoulos & Beiβwenger 2009; Zelenkauskaite & Herring 2008). This article contributes to addressing this imbalance by focusing on YouTube text-based ‘conversation’ (Herring 2010a). Specifically, it examines coherence in a corpus of YouTube postings in Spanish. Although coherence has been the object of much academic debate in other forms of computer-mediated communication, no empirical analysis of coherence in YouTube text has been undertaken to date. Results underline the conversational potential of this facility.

Launched in February 2005 under the slogan “Broadcast Yourself,” in 2006 YouTube was sold to Google with an activity rate of 100 million views and over 65,000 daily video uploads (Paolillo, 2008). Since then, its popularity has continued to grow. Although no official figures are currently available,1 processes such as the “Youtubification of politics” during the 2008 U.S. General Elections (Garcés-Conejos Blitvich, 2010a; May, 2008) are indicative of YouTube's increasing influence across a range of social domains. YouTube's popularity and influence owes principally to its video-sharing (vlogging) facility, which is regarded as its “emblematic form of […] participation” (Burgess & Green, 2009, p. 53) and cultural production (Burgess & Green, 2008; Lange, 2007; Paolillo, 2008). YouTube vlogging has attracted considerable academic attention, including research into its individual identity construction (Lange, 2007) and social networking function (Adami, 2009; Paolillo, 2008) and its role as a form of “post-television” (Lister, Dovey, Giddings, Grant & Kelly 2009; Tolson 2010) and a cultural system in which core community members struggle with platform providers and “Big Media” production (Burgess & Green, 2008, 2009).

In addition to vlogging, YouTube offers a text facility through which YouTubers are able to post comments on previously uploaded video files. YouTubers are thus able to share, negotiate, agree, and challenge opinions, often with seemingly no other end in mind than to interact and be in touch with other, often unknown, YouTubers. Despite its popularity as an online space for social interaction, research into this YouTube text facility is scarce, especially within language-based disciplines (Androutsopoulos & Beiβwenger, 2009; Zelenkauskaite & Herring, 2008). Exceptions include, inter alia, Garcés-Conejos Blitvich (2010a, Forthcoming), Garcés-Conejos Blitvich, Lorenzo-Dus, and Bou-Franch (2009), Harley and Fitzpatrick (2009), Jones and Schieffelin (2009), Lorenzo-Dus (2009), Lorenzo-Dus, Garcés-Conejos Blitvich, and Bou-Franch (2011) and Paparacharissi (2011). This article seeks to contribute to addressing this imbalance by focusing on YouTube text-based “conversation” (cf. Herring, 2010a). Starting from the dual premise that the YouTube text facility provides an online space for social interaction and that in conversation participants try to make sense of each others' contributions, i.e., strive for coherence, the specific aim of this article is to examine coherence in this online environment. Coherence in computer-mediated communication (CMC, henceforth) has been the focus of considerable academic debate (see section on coherence and CMC, below). However, to our knowledge, no empirical analysis of coherence in YouTube text has been undertaken to date. In our work, coherence is examined in a corpus of YouTube postings in Spanish, so it answers calls for empirical research on languages other than English within the context of a “multilingual internet” (Danet & Herring, 2003, 2007; Herring, 2004; Herring, 2010a).

The article is organized as follows. First, relevant work on coherence in different forms of online communication is reviewed. Second, the multilevel, methodological design of our empirical study is explained in detail while results and discussions are dealt with next. Finally, the conclusions underline the ways in which this paper contributes to furthering our knowledge of how YouTubers communicate through the text-commenting facility.

Coherence and Computer-Mediated Communication

From discourse-analytic and pragmatic perspectives, coherence is understood as a general process of sense-making in which individuals engage whenever they communicate. While coherence is not a property of texts, these contain “a manifestation of the participants' common making of meaning,” itself a “situated and distributed practice” (Korolija, 2000, p. 429). In the words of Gómez González (2010) “although coherence phenomena may be cognitive in nature, their (re)construction is often based on explicit linguistic signals in the text itself” (p. 600).2 Coherence, then, may be generally viewed as ‘connection building’ in and through discourse. As Gee (1999, p. 85–94) argues, discourse contains a series of cues that guide participants' understanding of the connections that they build within and across utterances and between discourse and people, ideas, texts, and institutions outside the current situation. Such connection building is simultaneously a cognitive, interactional and inter-textual achievement.

Coherence has been examined within a number of CMC studies since the 1990s. In a pioneer study of various synchronous and asynchronous CMC environments, Herring (1999) identified lack of simultaneous feedback and disrupted adjacency as two main reasons accounting for the incoherence of much online interaction, vis-à-vis communication in offline settings. Paradoxically, she argued, while the medium's incoherence was problematic for users, some of its “disjointed effects” (Herring 1999, p. 2) made communication in online environments more attractive and enjoyable to them.

Herring's (1999) study attracted much scholarly attention, which sought to test the validity of the claims that CMC was largely incoherent and that this made communication inconvenient albeit pleasurable. This research focused on a variety of technological communication tools.3 It identified two additional features that pose coherence problems in CMC: multitasking and authority in instant messaging (Woerner, Yates & Orlikowski, 2006); and multiple participation in chat rooms, discussion forums, text messaging on interactive television, and Twitter (Honeycutt & Herring, 2009; Zelenkauskaite & Herring, 2008). Importantly, it also noted the existence of varying degrees and forms of coherence across different CMC settings. What is more, it established that online users successfully overcome potential coherence problems by adapting to the social and technological affordances of particular online tools through a range of strategies, such as repetition and speaker selection in synchronous team meetings (Markman, 2006) and lexical repetition, color coding, and multiple windows in Instant Messaging (Woerner et al., 2006). However, what these strategies and the assumed problems of coherence that they seek to overcome are yet to be determined in the context of YouTube text.

Empirical research has also shown that coherence is a multilayered, activity-specific process (Korolija, 2000). This understanding of coherence has informed most studies of discursive resources employed to achieve and maintain coherence in online environments. These resources include sequential features, like adjacency and topic development; grammatical and lexical cohesion; and turn-taking features, like backchannelling, naming, or quoting (cf. e.g. Berglund, 2009; Herring, 1999; Herring & Kurtz, 2006; Herring & Nix, 1997; Herring, Kutz, Paolillo and Zelenkauskaite, 2009; Honeycutt & Herring, 2009; Lapadat, 2007; Markman, 2006; Nilsen & Mätikalo, 2010; Simpson, 2005; Woerner et al., 2006; Zelenkauskaite & Herring, 2008).

Finally, coherence is sensitive to participation structure. In CMC settings, participation has been classified as (i) one-to-one communication; (ii) one-to-many interactions, and (iii) intergroup discussions (cf. Baron, 1998, 2010; Herring, 1996; 2007; Yates, 2000). In (ii) and (iii) participation is multiple, i.e., it belongs to the pragmatics notion of polylogal interaction (Kerbrat-Orecchioni, 2004; Marcoccia, 2004). As for the text-based YouTube participation structure, it encompasses both instances of one-to-many interaction (ii) and intergroup discussion (iii). It therefore constitutes a sui generis case of polylogal communication – one, furthermore, which is open to public, largely anonymous, multiparticipation for as long as the YouTube video-clips that trigger the interaction remain posted. Because of its open, public nature and the highly persistent (Herring, 2007) textual record it generates, YouTube interaction not only tends to involve a sizeable number of participants but also to do so over a prolonged period of time.4

It is in its open, public nature that a further defining feature of YouTube text as a particular type of polylogue lies, namely the double articulation of the interaction that it generates.5 This includes, at one level, active message-sending YouTubers, who communicate with each other within a single polylogue as part of one-to-one interactions or intergroup discussions. At another level, interaction involves the “imagined ‘mass’ of ordinary users” (Burgess & Green, 2008, p. 8), who do not make comments but passively participate in the polylogue. Those who actively interact with each other online are aware, just as those interacting in broadcasting settings are, of the distributed recipiency (Hutchby, 2006) of their postings.

Finally, when compared with dyadic interaction, YouTube text-based interaction is complex, flexible, unstable, and unpredictable – features that it shares with a number of on- and offline polylogues (cf. Kerbrat-Orecchioni, 2004). Preliminary research into its structural properties reveals YouTube polylogues to be characterized by a combination of, on the one hand, the orderly, turn-by-turn patterns typical of dyadic conversation and, on the other, ‘networked sequences’ consisting of adjacent and nonadjacent turns typical of asynchronous interaction (Lorenzo-Dus et al., 2009). Such structural properties are likely to have a direct impact on how coherence works within YouTube text-based discussions.



The corpus for this study consists of 300 consecutive postings in Spanish drawn from the larger GENTEXT digital corpus.6 The data were taken from two YouTube polylogues that were respectively triggered by a video against abortion (first 150 comments; 6,998 words) and a video about domestic violence (first 150 comments; 6,093 words). These video clips were selected because they address hotly debated topics in Spanish society and appeal to a wide audience; they were, therefore, expected to trigger interest and discussion.7 The video ( against abortion (henceforth, AB) entitled ¡Podemos! (We can!) was uploaded by the Christian organization Hazte Oir (Make Yourself Heard) in 2008; their YouTube channel was created in June 2006 to ‘promote citizen participation in Spain's political life’.8 The video ( against domestic violence (henceforth, DV) was entitled Mujer maltratada en el metro (Battered woman in the underground) and was uploaded by an individual user whose YouTube channel dates back to 2006. This video clip was produced by the Madrid section of the Spanish Trade Union UGT in collaboration with several local town halls.

Analytical framework

The analysis of coherence in text-based YouTube polylogues is methodologically complex given its multilayered nature (see explained above) and the fact that polylogues are known for posing “a challenge to all methods of formal analysis” (Marcoccia, 2004, p.144) that can only be handled by “successive accessing approaches, from various perspectives and on different levels” (Traverso, 2004, p. 53–54). We therefore devised an innovative, multilayered analytic framework which was specifically designed to tap into four key discourse features that previous Conversation Analysis and Computer Mediated Discourse Analysis research into coherence in online and offline contexts has identified as particularly important (cf. e.g. Sacks, Schegloff, & Jefferson, 1974; Schegloff, 1968; Halliday & Hasan, 1976; Kerbrat-Orecchioni, 2004; Herring, 2004, 2007):

  1. Participation and adjacency. Multiparty participation (cf. Herring & Nix, 1997; Herring et al., 2009; Honeycutt & Herring, 2009; Zelenkauskaite & Herring, 2008) and adjacency disruption (cf. Herring, 1999, Marcoccia, 2004, Lapadat, 2007, Woerner et al., 2007) are resources said to pose problems to online coherence, and hence potentially also problematic to text-based YouTube interaction.
  2. Turn-taking and cohesion. These are seen as adaptive resources in CMC that may be part of the solution to the potential problem of lack of coherence in YouTube (Berglund, 2009; Herring, 1999; Honeycutt & Herring, 2009; Lapadat, 2007; Markman, 2006; Nilsen & Mätikalo, 2010; Simpson, 2005; Woerner et al., 2007).

Next, each of these features is described. As they, at times, include a combination of both previously identified and data-driven categories, illustrative examples from our corpus are provided where appropriate.

Participation and adjacency

Number of participants is of central importance to polylogal interaction inasmuch as establishing coherence becomes often more difficult as the number of participants increases beyond the dyad (Bruxelles & Kerbrat-Orecchioni, 2004; Herring et al., 2009; Honeycutt & Herring, 2009; Zelenkauskaite & Herring, 2008). All individuals involved in a polylogal interaction, regardless of whether they are message senders or readers, share the status of polylogue participant (Grosjean, 2004) and, therefore, play a role in its participatory structure (Herring, 2007). Participant identification in YouTube text-based interaction, however, is limited to its active (message-sending) users and the amount of comments that they contribute (cf. Herring, 1999; Herring & Nix, 1997; Herring et al., 2009; Grosjean, 2004; ZelenKauskaite & Herring, 2008). These provide a tangible, though admittedly partial, view of the configuration of the participation structure of YouTube polylogues.

Adjacency, for its part, forms the basis for sequential organization in interaction and contributes to coherence (Schegloff, 1968; 1990). In our YouTube corpus, it was examined through the coding of postings according to a taxonomy of turn relations which included five categories: adjacent turn, nonadjacent turn, video turn, multiple reference turn, and mixed turn (Table 1).9

Table 1. Coding scheme for turn-types
Adjacent Turn (AT)Turn referring to immediately prior turn
Non-adjacent Turn


Turn that refers to other but the immediately adjacent turn
Video Turn (VT)Turn referring to triggering video clip
Multiple Turn (MT)Turn referring to multiple prior turns
Mixed Turn (MXT)Turn combining two or more of the above turn-types

The categories AT and MT are based on previous work on CMC turn-types that differentiates between turns with a clear connection to an immediately previous turn and those connected to more than one turn (Berglund, 2009; Herring, 1999; Holmer, 2008; Zelenkauskaite & Herring, 2008). The category VT is used when a turn refers explicitly to the triggering video-clip. This category, also characteristic of online environments like digital newspapers and magazines, underlines the multimodal dimension of the YouTube text facility. Example 1 provides an illustration of the VT category. In turn 11, a YouTuber comments on the video by saying very good!!, while in turn 3, an exclamation of support – great – is followed by an expression of congratulation to those responsible for the video clip.

Example 1. (

1.a. AB data, turn 11.

VT-11) viviens Very good!!

1.b. DV data, turn 3

VT-3) nachovidal1 great to those responsible congratulations

Example 2 illustrates a sequence combining both adjacent and non-adjacent turns from the DV polylogue. Turn 135 is not adjacent to its immediately previous turn but it itself triggers a series of comments by three different YouTubers. Of these, turn 134 directly refers, i.e. is adjacent, to the previous turn in 135. The following two turns, 133 and 132, also directly refer to turn 135 but are not adjacent to it.

Example 2. DV data, turns 135-134-133-132 (

NAT-135) xmanta11 There're already places where you can file complaints on these kinds of things. The underground is not the place to make such a scene. I would make this clear to this woman, without violence because she's not my wife, but resolutely. I guess instead of doing that she could stay at home and take care of her family. That's why, if this was a real case, I wouldn't be surprised if things went badly!

  • AT-134) murcielagabat It's a pity you think so. Laws are useless these days because they think that all's well as long as you have a restrictive order banning this person from getting within 500m. (some women even live 100 m away from their ex-partners).
  • Well, many women reporting abuse have been killed, too. When a woman is desperate, she'll do anything for her children even if it is in the underground any help is welcome.
  • NAT-133) Prolaski she could stay at home and take care of her family? if this was a real case you wouldn't be surprised if things went badly?
  • That is justifying it. If your wife, in your opinion, was completely stupid, even if you thought so, that doesn't mean you can batter her. If you can't stand her, leave her. And so on, and so on, until you find someone who can stand you and whom you love.
  • This advertisement is targeting emotion, that's why they shot it in the underground, it's a creative resource. Pay attention to the message not to the background.
  • NAT-132) Animaldn88 Well said xmanta11. You're just so right.

Turn-management and cross-turn cohesion

Several turn-management devices have been previously identified in the literature on CMC coherence: backchannels (Herring, 1999), cross-turn addressivity or naming (Herring, 1999; Honeycutt & Herring, 2009; Lapadat, 2007; Markman, 2006; Nilsen & Mätikalo, 2010; Woerner et al., 2006); cross-turn linking through explicit expressions (Herring, 1999; Lapadat, 2007; Woerner et al., 2006) and cross-turn quoting (Herring, 1999; Lapadat, 2007; Markman, 2006; Nilsen & Mätikalo, 2010). These were all coded in our corpus. In addition, we coded conversation-analytic, turn-constructional units, namely turn-entry and turn-exit devices (Sacks et al., 1974), which have been generally treated in the CMC literature under different names such as openings, speaker allocation techniques and forward structuring (Anderson, Beard, and Walther, 2010; Herring, 1999; Lapadat, 2007; Markman, 2006; Woernet et al., 2006). Finally, two further devices emerged from the analysis of the corpus: video addressivity and indirect addressivity. These, and the other turn-management devices coded in our study, are listed and defined in Table 2.

Table 2. Coding scheme for turn-management devices
Backchannels (BC)Short reactions to prior comment/to the triggering video clip
Cross-Turn Addressivity


Explicitly using a user name to select addressee
Cross-Turn Linking (CTL)Use of explicit expressions that link a turn to another turn
Cross-Turn Quoting (CTQ)Copying all or part of a prior message into current message to indicate who one is responding to
Turn-Entry Devices (TEN)Discourse markers that link a turn to a prior turn (e.g. listen, by the way, you see)
Turn-Exit Devices (TEX)Expressions that close the turn and link it to a next turn, such as That's all, fullstop; question tags and aphorisms like sad but true.
Video Addressivity (VA)Turns addressing the video or organisations that host it
Indirect addressivity


Indirectly addressing individuals that may be part of the polylogue

The comment very good!! in Example 1.a. above constitutes an instance of backchannel, in which a YouTuber reacts to the video clip through a brief expression of support. Example 2 above contains several turn-management devices. Turn 132 uses cross-turn addressivity by explicitly naming a previous contributor, Xmanta11, and addressing the current turn to this one (T135). Turn 133 contains two cases of (partial) cross-turn quotation, in which a YouTuber repeats two stretches from turn 135 that s/he is willing to respond to: She could stay at home and take care of the family? If this was a real case you wouldn't be surprised things went badly? By using cross-turn quotations, this user explicitly identifies the parts of the message s/he wishes to respond to (cf. Eklundh, 2010). Turn 134 features turn-organizational units in the form of turn entry and turn exit devices. The turn begins with It's a pity you think so. Through the adverb so, the user links his/her turn to a prior turn. After making his/her main point, this user marks the end of the turn by stating its upshot with the expression any help is welcome. Finally, turn 135 contains an instance of video addressivity. This particular YouTuber addresses the video clip in the middle of his/her turn through the remark: I would make this clear to this woman, where this woman refers to the main character of the video clip.

Cross-turn linking is realized through expressions at the start of turns such as regarding this. In example 3 below, a user resorts to in response to Calpero83 to indicate that the message is addressed to the user named Calpero83 (cross-turn addressivity) and to explicitly mark this message as a response to the message by Calpero83, i.e., cross-turn linking.

Example 3 – DV polylogue (

AT-118) GlowGloomyOrchid the video clip is short and very good. in response to calpero83, I think she dresses like that to communicate that this problem of (domestic) violence can be found in all social and/or economic strata.

Example 4 below contains a case of indirect address: The fact that there are human beings bent on doing evil… Here, this YouTuber refers to people whom she views as ‘wrong-doers’.

Example 4 – AB polylogue (

AT-115) carmen1916 The fact that there are human beings bent on doing evil, mainly for money, and sometimes for mercy (towards the mother, rather than towards the baby), doesn't turn evil into goodness, and one thing is to start a war and another very different thing is to defend yourself and to defend you country when you're attacked

Finally, cohesion was coded in our corpus at the cross-turn level. Use of cohesive devices in a text is known to “facilitate the task of recognizing its coherence” (Tanskanen, 2006: 21). In their formative work, Halliday and Hasan (1976) define textual cohesion in semantic terms as referring to “relations of meaning that exist within the text, and that define it as a text” (p. 4). They identify five systematic resources for cohesion: reference, substitution, ellipsis, conjunction, and lexical cohesion. This scheme - already used in the CMC literature on coherence (Simpson, 2005; Woerner et al., 2006; Berglund, 2009) – was adopted in our study (see Table 3):

Table 3. Coding scheme for cross-turn cohesive devices
Reference (REF)Items whose interpretation depends on something else to which they refer
Substitution (SUB)Replacing of an item with another item
Ellipsis (ELL)Omission of an element
Conjunction (CON)Explicit link of textual segments through use of connectors
Lexical Cohesion


Unlike the preceding grammar-based resources, lexical cohesion refers to vocabulary items used to build coherent texts

The opening line of turn 134 in example 2 above It is a pity you think so contains cross-turn reference in the second person of the verb you think which refers to a previous YouTuber, and cross-turn substitution in so, an adverb which replaces the previous YouTuber's opinion. Cross-turn ellipsis can be found in, for instance, turn 111 from the DV set which begins with is different. This comment refers to what a prior YouTuber said and its full realization would be what you're saying is different from what I mean. The sentence Well, many women reporting abuse have been killed, too from turn 134 in example 2, provides a case of conjunction with the previous turn through the use of the discourse marker Well, and several instances of lexical repetition, a frequent form of lexical cohesion, in the use of women reporting abuse and killed, as these terms had already occurred in previous turns.


Coding according to the multilayered, analytic framework described above was independently and jointly undertaken by the three authors of this paper. Intercoder differences were subsequently resolved through discussion. Once coded, the data were quantitatively analyzed. The results revealed a number of patterns, which were further explored qualitatively.

Results and discussion

Participation and adjacency analysis – The problem?


Table 4 shows the number of participants and their degree of participation in the corpus. As far as number of participants is concerned, 131 different posters were identified in the data. Of these, 36 (27.5%) contributed to the AB data set while the remaining 95 (72.5%) participated in the DV data set. The average degree of participation in the corpus was, thus, 2.29 turns per YouTuber. However, the two data sets revealed considerable differences since, on average, each participant in the AB data contributed over twice as many turns as each participant in the DV data: 4.16 and 1.57, respectively. Moreover, 101 (77%) of all users contributed only one comment. This tendency, also identified in Herring's (2010b) study of academic discussion lists, was observed in both data sets, with 25/36 (69.5%) and 76/95 (80%) one-turn contributions being identified in the AB and DV data, respectively.

Table 4. Participation structure
 AB dataDV dataTOTAL
# of participants3695131
Mean # of turns per participant4.161.572.29
# of one turn contributors25




# of multiple contributors11





A closer inspection of multiple-turn contributions revealed further differences between the two data sets. Of the 11 multiple contributors to the AB data, one single user produced 67 comments, i.e. 45% of all comments in the AB polylogue. One further participant posted 21 comments and the remaining nine participants submitted between 2 and 7 comments each. The DV data, for its part, contained a higher number of multiple contributors, none of them dominating the interactional floor as clearly as in the AB data. While two participants contributed 13 and 14 turns each, the remaining 17 posters made from 2 to 7 contributions. These results underline the different degree of involvement in the polylogue. A cline of participant involvement could be established with multiple contributing YouTubers at one end of the continuum marked for high involvement and one-message-contributing YouTubers and YouTube readers at the end of less involvement (Marcoccia, 2004). These findings further signal fluctuations in the configuration of participation patterns, with participants constantly entering and leaving the interactional space (cf. Grosjean, 2004; Marcoccia, 2004).

Participation in the YouTube polylogues under examination, therefore, was massive, unequal and fluid. These are the kind of properties that have been reported to pose problems to coherence making processes in other online environments. However, as we argue below, coherence did not seem to be problematic in our data.


The most common turn-type in the corpus was the adjacent turn (AT), which accounted for 188 (62.6%) comments. This was followed by the video turn (VT) with 57 (19%) occurrences, and the nonadjacent (NAT) turn which was identified in 36 (12%) posts. The multiple-reference (MT) and the mixed (MXT) turn types were the least frequent, featuring on 9 (3%) and 10 (3.3%) occasions respectively. Figure 1 shows the frequency of use and the distribution of the different turn-types in the two data sets. While there were more adjacent turns in the AB than in the DV data – and the reverse was the case regarding video-related turns – patterns of distribution were quite similar and showed a preference for adjacent turn-types in both data sets.

Figure 1.

Turn-types. This figure illustrates frequency of use and distribution of turn-types in both data sets.

The results show that the YouTube interface allows for considerable adjacency, a finding that partly contradicts prior claims about adjacency pairs being “regularly disrupted” in CMC (Herring, 1999, p. 7). In fact, the YouTube commenting facility has a means of helping users mark the relations between comments.11 When sending a message, a user may choose between ‘post a new comment’– with the resulting message being displayed on top of the list of comments as a new message – and ‘respond to a comment.’ In the latter case, the reply is displayed below the comment it is oriented to (and therefore adjacent to it) and slightly indented to the right. Although comments may be misplaced (Herring, 1999; Marcoccia, 2004), the automatic, different placement of new / responding comments and the resulting visualization of the conversation structure undoubtedly contribute to promoting adjacency and enhance coherence. It must be noted, however, that posting new messages at the top and posting replies in a different order makes the reading of the textual record of the conversation more difficult, since one needs to read up new comments and to read down replies. This would not be a problem if, as it has been argued, most YouTubers only read the most recent comments (Jones & Schieffelin, 2009). However, this is a point that needs to be further researched.

Turn-management and cohesion analysis: Adaptive resources for solving a (potential) problem of lack of coherence


The three most frequently used devices in the data were, in decreasing order, turn-entry devices (n = 103; 22.1%) cross-turn addressivity (n = 90; 19.3%), and turn-exit devices (n = 77; 16.5%). Next in frequency of use were indirect addressivity (n =  64; 13.7%), cross-turn linking strategies (n = 54; 11.6%) and video addressivity (n = 35; 7.5%). The least frequent strategies were back-channel signals (n = 23; 4.9%) and cross-turn quoting (n = 20; 4.3%). Figure 2 shows the frequency of use and distribution of these devices in each of the two corpora.

Figure 2.

Turn-management devices. This figure illustrates frequency of use and distribution of turn-management devices in both data sets.

Within the AB data set, the most frequent device was the use of turn-entry signals (n = 67), followed by cross-turn addressivity (n = 36) and turn-exit (n = 32). The difference between turn-entry and both turn-exit and cross-turn addressivity devices was statistically significant at the p < .001 level.12 The most frequently used devices in the DV data set were cross-turn addressivity (n = 54) and turn-exit (n = 45), followed by turn-entry devices (n = 36). Differences in frequency of use of these three devices were not statistically significant. Statistically significant differences between the AB and the DV data sets were only found in the case of turn-entry devices.13

The high number of turn-entry and turn-exit devices points to an underlying pattern of turn-design (Sacks et al., 1974) carried over from face-to-face conversational habits. Since these devices tie in a current turn with previous and subsequent turns, they are important turn-management techniques oriented towards coherence maintenance. Cross-turn addressivity signals, for their part, were more frequent in the DV polylogue than in the AB polylogue, probably due to the former's larger number of participants. Cross-turn linking strategies further served to mark relations between messages. In contrast, strategies like video-addressivity and, especially, (manually inserted) cross-turn quoting and backchannelling, which are representative of online communication (Herring, 1999) - hardly featured in the data.

Cross-turn cohesion

As shown in Table 5, cross-turn cohesion was mainly achieved in our corpus through lexical means (n = 1018; 56%), followed by reference (n = 645; 35.5%). The remaining cohesion devices – substitution (n = 49); ellipsis (n = 56) and conjunction (n = 51) - were far less frequent. Our findings support prior research that identified lexical cohesion, and more specifically lexical repetition, as a common means of inducing coherence in online environments (Woerner et al., 2006; Simpson, 2005; Berglund, 2009).

Table 5. Frequency of use of devices for cross-turn cohesion

Figure 3 shows the frequency of use and distribution of these devices for each data set.

Figure 3.

Cross-turn cohesive devices. This figure illustrates the use and distribution of cross-turn cohesive devices in both data sets.

The overall frequency of cross-turn cohesion devices in the data was highly relative to the overall number of turns. Very similar patterns of use and distribution of cross-turn cohesion devices were identified in the AB and DV polylogues.

YouTube interaction patterns

In this section we report on a number of interaction patterns that emerged in the analysis and which were qualitatively explored vis-à-vis coherence making processes. Coherence was assessed by taking into account participants' own contributions in order to identify interactional progression and/or disruption.

Example 5 depicts an interactional pattern of adjacent turns. Comment 146 triggers a series of sixteen adjacent turns by three other contributors who, in all cases, reply to, or continue with, the immediately previous message. Thus, turns in this example produced a chained pattern, as it were, of adjacent turns. Here the interaction flowed on a turn by turn basis and presented no apparent problems to coherence. Indeed, this pattern underlined the link noted in the literature between adjacency and coherence.

Example 5. Chained interaction pattern from AB data.


The previous interaction pattern of chained, adjacent turns was not the sole basis for coherence in our data. Networked patterns of nonlinear, i.e. nonadjacent, turns that seemed to pose no problems for coherence were also identified. This supports other CMC research that has pointed out that disrupted turn adjacency does not necessarily lead to miscommunication (Berglund, 2009; Lapadat, 2007). The following is an example of an interactional pattern in which a comment (135) by one participant receives three responding comments by three different participants (turns 134, 133, 132). Of the three responding comments, only the first, turn 134, is adjacent to its first part, while turns 133 and 132 constitute nonadjacent turns.

Example 6. Pattern of multiple responses to a turn, from DV data (reproduced in full, above, as example 2).


In the above multiple response interaction pattern, adjacency disruption did not seem to interfere with coherence. Participants' own, relevant responses indicate that this was the case, an interpretation which was reinforced by the visual representation of the interaction, in which responses were indented to the right.

Two further, apparently coherent, interaction patterns were identified, both involving the category of video-related turn: ‘serial’ and ‘sprinkled’ patterns. Video turns, the second most frequent turn-type in our data, contributed to coherence on a global level by explicitly tying YouTube's postings to the video clip that triggered them. Serial interaction patterns of video turns tended to occur at the beginning of the polylogue as initial, relevant responses to the triggering video clip.

Example 7 illustrates the serial interaction pattern in our corpus. It comes from the AB data set and opened with a series of 12 turns congratulating video-producers and/or endorsing the content of the video clip. Textual interaction among message senders only began after turn 13, which contained a message that disagreed with the contents of both, the video clip and the previous turns. This turn, which marked the end of the serial pattern of video-related turns, was coded as MT, i.e. as referring to multiple turns. It explicitly addressed all previous messages through use of third person plural pronouns and verbal endings, as well as through the deployment of a cross-turn addressivity signal – the vocative gentlemen – to refer to other participants.

Example 7. The serial interaction pattern of video turns (AB Data) (

MT-13) brey45

it is the first part of the video clip that should be clear to you, it's the most realistic, the second part is a utopia, gentlemen, a utopia, that's what dav should understand

VT-12) explorersl

Really great ad … turning around the empty pro-abortion arguments … great!

VT-11) viviens

very good!!

VT-10) juacomonerone

Zapatero is the enemy of life.

VT-9) miccuenta

It's great … we must turn around the social drowsiness about abortion.: Promoting such barbaric actions in the middle of the 21 century is unacceptable

VT-8) nellyhoz


VT-7) lobbero

Very good ad

VT-6) zapocabron

Congratulations HO. You're doing great.

VT-5) anukita76

I think it's great, this new corporate ad. Congratulations!!!


And Zapatero and Aido must understand we won't stand by and wait, we won't be accomplices to this child killing.

When the PP gets elected, the laws will be changed and others will be repealed, like Zapatero has done, but this time it will be for the better, for life and for the real rights of society, the real rights of citizens, not the impositions rather than rights of some radical, anti-establishment minority groups.

VT-3) Quidamt Awesome!! Congratulations to the producers and .. let's turn the law upside down!!

VT-2) pacoceacero Simply awesome !! Congratulations to the producers

VT-1) RochelleKayra Indeed:We will turn it upside down!.For life!

As for the sprinkled interaction pattern, this was identified in relation to video turns that were dispersed throughout the interaction. They acted like free turns expressing opinions and reactions to the video clip. These turns explicitly addressed the video clip but did not contribute to the topical discussion of the sequence of turns in which they were embedded. This produced different types of networked patterns, like the one observed in example 8 below. This short interactional excerpt begins with turn 121 which claimed that mothers who have an abortion should be imprisoned. This turn received two responses (turns 120 and 119) by the most frequent contributors to this data set, Carmen1916 and brey45, who were engaged in an ongoing debate. The next posting, turn 122, is a video turn endorsing the contents of the video clip, and it is inserted in the middle of the discussion between Carmen1916 and brey45. The latter, however, ‘spoke’ past this comment, and resumed their discussion in turns 124 and 123, thus ignoring the video turn. This pattern of topical discussion sprinkled with video turns that were ignored reveals that although video-related turns did not contribute to topical talk, and hence to the local coherence of their surrounding sequences, they did not seem to interfere with, or disrupt, the discussion in the sequence, i.e. they did not pose problems for coherence. Therefore, they neither contributed to nor hindered coherence.

Example 8. Topic sequence sprinkled with video-related turns (

VT-125) Quidamt Yes, let's turn it upside down!

NAT-124) Comentarios marcados como spam Ocultar

carmen1916 (hace 8 meses)

Comentario eliminado por el autor


As José Manuel de Prada wisely says, abortion is something that constitutional law considers unacceptable. Please: neither the constitution nor strong democracy can admit such attack on human beings

  • AT-123) brey45 oh really? then we've been unconstitutional for the last 22 years, but you complain now, why didn't you complain under the conservative government? why didn't you ask or why don't you ask for abortion to be punishable again?
  • jose manuel prada referred to the terms the law establishes not to the decriminalization of abortion my friend, don't get it wrong

VT-122) jemaj3 of course I agree with the video clip. We can turn around the pro abortion mentality and have a pro life society

MT-121) namorfilus


  • 120) carmen1916
  • How do you know they were not forced? And what if they're desperate … or ill informed … or think that the zygote is just a “bug”. I think we should rather blame those who facilitate this crime or make money out of abortion. If they have done it out of their own will, the suffering they'll have for the rest of their lives will be much worse than jail.
  • 119) brey45
  • See, you're not saying anything that contradicts what I say, I say that I would keep abortion laws as they are now for extreme cases, and would apply them well, what is it that you didn't understand?


We approached our analysis of coherence in text-based YouTube polylogues by looking at participation and adjacency as problems to coherence and at turn-management and cohesion as coherence-inducing mechanisms which could be employed to solve such potential problems. The picture that has emerged regarding participation is complex, revealing different patterns for each of the two polylogues under examination. Notwithstanding these differences, participation was massive, unequal, and fluid in both data sets. These three properties are generally considered problematic for coherence-making processes. Yet in our data, they seemed to be overcome by the considerable adjacency displayed across postings. Since adjacency has been claimed to play a key role in sustaining coherence, and a great number of turns in the data were adjacent, we can conclude that communication over the YouTube text-commenting facility was far from incoherent. This view was further confirmed by our study of turn management signals and cohesion devices which showed that these coherence-inducing mechanisms were very frequent in the data. Of special interest was YouTubers' preference for managing turns through turn-entry/exit devices and cross-turn addressivity signals instead of employing such strategies as cross-turn quoting and backchannelling. This finding was especially relevant within the context of online communication, since the latter two devices have often been hailed as the most representative ways in which participants adapt to the sequential requirements of text-based CMC in their search for coherence (Herring, 1999). However, in our data, YouTubers chose to adapt to the electronic context by resorting to other turn-management signals. This, therefore, revealed the existence of variability regarding turn-management signals across different forms of CMC (Bou-Franch, 2011). Regarding devices for cross-turn cohesion, the data showed that, like in other online environments, YouTubers prominently drew on lexical means. Further, cross-turn cohesion devices were pervasive in the data. Thus, albeit cohesive ties alone do not determine the coherence of a text (Korolija, 2000; Simpson, 2005), if cohesion is considered to be the glue of discourse (Berglund, 2009; Erickson, Herring, & Sack, 2002), our findings suggest that the postings of YouTube polylogues are sufficiently connected so as to constitute a space for online interaction rather than a series of disconnected comments.

Of particular significance were the differences in participation patterns in the two polylogues mentioned above. Although participation was massive in both cases, the higher number of contributors in the DV polylogue led us to expect more problems and adaptive resources in this polylogue than in the AB data (cf. Herring et al., 2009; Honeycutt & Herring, 2009; Zelenkauskaite & Herring, 2008). Yet, patterns of adjacency, turn-management and cohesion were very similar for the two data sets. This finding is significant in that it contributes to demythologising the extent to which the number of participants, their different interactional involvement, and the changing configuration of participation interfere with coherence. As our study reveals, YouTubers found ways around the problems to produce collaborative, coherent interaction in the two polylogues that were the object of study.

YouTube interaction patterns were finally subjected to qualitative examination. This revealed a number of different interaction patterns combining both the linear, orderly features typical of face-to-face, dyadic interaction as well as nonlinear patterns common in asynchronous interaction which were described as ‘networked’ (Lorenzo-Dus et al., 2009). Interaction patterns included chained patterns of adjacent turns, multiple response patterns of adjacent and nonadjacent turns, serial patterns of video turns and interaction patterns sprinkled with video turns. The coherence of both linear and networked patterns was assessed drawing on participants' contributions. Importantly, networked patterns did not interfere with coherence, which supports the claim that “coherence is created and maintained despite disrupted turn adjacency” (Berglund, 2009, p. 4); after all, linear sequencing may not be as relevant for coherence in some online contexts as it is in oral interaction (cf. Lapadat, 2007; Berglund, 2009). Electronic users who, like YouTubers, have access to a persistent textual record of the interaction, have been argued to activate different discourse processing strategies that are more suitable for text-based CMC (Herring, 2010a). Our study of linear and networked patterns, thus, contributes to advancing knowledge of communication containing discontinuous, nonlinear features – a type of communication that remains short supply (Sannino, 2006).

The present study also responds to a recent call for methodological reflection in computer-mediated discourse studies. More specifically, we view our contribution in terms of the need to adapt and reconceptualize extant concepts and methods as a necessary step in the development of computer-mediated discourse (cf. Beiβwenger, 2008; Androutsopoulos & Beiβwenger, 2009). In particular, this study has applied an innovative, multilevel methodology that draws on the notion of polylogue, and has characterized text-based YouTube interaction as a doubly-articulated polylogue, thus moving away from categories designed for the study of dyadic, face-to-face conversation. Treating text-based YouTube interaction as polylogal communication, and considering its double articulation, provides a better means to study multiple participant, mediated, intergroup communication (cf. Garcés-Conejos Blitvich, 2010b, Lorenzo-Dus et al., 2011).

Future research in other YouTube polylogues and in languages other than Spanish should explore the form that textual coherence takes, the contextual variability of coherence-maintaining resources and the extent to which these are technologically and socioculturally constrained. Moreover, our study has other implications for future research in that it lays the foundation on which to expand current understanding of coherence in this complex social network by considering both textual as well as audiovisual responses.

Burgess and Green (2009) recently argued that “YouTube is a potential site of cosmopolitan cultural citizenship – a space in which individuals can represent their identities and perspectives, engage with the self-representations of others, and encounter cultural difference” (p. 81). Indeed, the YouTube universe constitutes an unquestionable force within contemporary popular culture, a universe created collectively by all its participants. As we have noted throughout this paper, this universe is inhabited not only by video-producers and viewers but also by an important group of YouTubers that engage in textual interaction. Our study has shown that YouTube's text-commenting facility provides a space for coherent textual discussion, an online space, therefore, for social interaction; this needs to be seen as contributing to the participatory culture of YouTube (Burgess & Green, 2009).


  1. 1

  2. 2

    For empirical studies of Spanish that also view coherence in terms of its textual and cognitive dimensions see, inter alia, Aznar, Cros, and Quintana (1991); Esparza Torres (2006); López Alonso and Séré (2001).

  3. 3

    This contrasts with other communication technologies which constrain participation in different ways. For example, like YouTube, responses to digital newspapers are public in that all users need is to have an accessing account. However, unlike YouTube, in digital newspapers, participation is not left open, in the sense that responses can only be sent for a limited period of time after the article is posted (Bou-Franch, 2010).

  4. 4

    This contrasts with other communication technologies which constrain participation in different ways. For example, like YouTube, responses to digital newspapers are public in that all users need is to have an accessing account. However, unlike YouTube, in digital newspapers, participation is not left open, in the sense that responses can only be sent for a limited period of time after the article is posted (Bou-Franch, 2010).

  5. 5

    The notion of double articulation comes from the context of broadcasting and refers to “a communicative interaction between those participating in discussion, interview, game show or whatever and, at the same time, is designed to be heard by absent audiences” (Scannell, 1991, p. 1). The ‘imagined “mass” of ordinary users’ of YouTube polylogues resembles, in some ways, the listening and viewing audiences in broadcasting contexts. Conceiving of them as part of the wider, distributed recipients of YouTube interaction contributes to addressing calls for moving the analysis of this CMC “beyond the screen” (Androutsopoulos & Beiβwenger, 2009).

  6. 6

    GENTEXT is a research group, based at the University of Valencia - Spain, that examines gender inequality in society through the analysis of different discursive data.

  7. 7

    For a discussion of the content of the polylogues, and specifically of the discursive construction of domestic violence in YouTube, see Bou-Franch, Lorenzo-Dus, and Garcés-Conejos Blitvich (2010).

  8. 8

    All translations into English in this paper are by its authors.

  9. 9

    We use the term turn to refer to participants' messages or contributions. However, we are aware that the notion of turn was developed for the analysis of face-to-face conversation and that, therefore, turn in conversation is not equivalent to turn in YouTube interaction. For further insights into the notion of turn in CMC see Beiβwenger (2008).

  10. 10

    Since YouTube's new messages are posted in reversed chronological order, we decided to number comments also in a ‘reversed’ manner, that is, beginning with the oldest post and then moving up.

  11. 11

    Throughout 2010, and after the data collection process for this project ended, YouTube implemented several changes to its interface, which included the way of displaying replies to other comments. In replying to another comment, the tool now automatically inserts the name of the user object of the response, preceded by the @ sign. Responding comments are no longer indented to the right albeit they are still placed below the comment they connect with.

  12. 12

    TEN – TEX: t (149) = 4.266, p <.001; TEN – CTA: t (149) = 1.655, p<.001.

  13. 13

    t (149) = −3.630, p <.001.


  • Patricia Bou-Franch ( is Associate Professor in the Department of English and German Philology at the University of Valencia, Spain. Her current research interests include computer-mediated communication, gender inequality, identity construction and impoliteness. Departamento de Filología Inglesa y Alemana. Blasco Ibañez 32. 46010 – Valencia. Spain

  • Nuria Lorenzo-Dus ( is Professor in the English Language and Literature Department at Swansea University, UK, where she is also director of the Language Research Centre ( Her research expertise lies in the fields of interactional sociolinguistics and media discourse analysis.

  • Pilar Garcés-Conejos Blitvich ( is Associate Professor in the English Department at the University of North Carolina at Charlotte, USA, where she teaches applied linguistics. Her main research interests include mediated communication (both traditional and new media), genre theory, identity theory and im/politeness models.