• Open Access

A holistic approach to task-based interaction


email: paul.seedhouse@ncl.ac.uk


This paper proposes that interaction generated by tasks has previously been very difficult to analyse because of its highly indexical nature. Task-related actions and non-verbal communication could not be related easily to talk. A technological solution to this problem is presented, using a combination of task-tracking hardware and software, video recording and transcription. This enables a holistic approach, i.e. one in which all elements of behaviour can be integrated in analysis. Micro-analyses of multimodal data are undertaken, which provide revealing insights into the processes of task-based learning. A framework for describing and analysing task-based interaction from a holistic perspective is outlined.

inline image

Conceptions of ‘task’ and ‘task-based interaction’

There are well-known conceptual problems involved in the numerous different definitions of what is (and is not) a ‘task’, summarized in Ellis (2003: 2–9). In this section, however, we do not consider these definitions, but rather the different ways of conceptualizing a task as it evolves in time. We employ Breen's (1989) conception of the three phases of a task: task-as-workplan, task-in-process, and task-as-outcomes. The task-as-workplan is the intended pedagogy, the plan made prior to classroom implementation of what the teachers and learners will do.1 The task-in-process is the actual pedagogy or what actually happens in the classroom. The task-as-outcomes is whatever is physically produced. This may be a piece of writing or a sheet marking the number of differences found in a spot-the-difference task.

There are, then, three different aspects or phases to the construct ‘task’. Any framework which attempts to portray task-based interaction in a holistic way will need to track the relationship between these phases as they unfold during the implementation of a task. The relationship may be a linear one, but this is not necessarily the case. In practice, there is sometimes a difference between what is supposed to happen (task-as-workplan) and what actually happens (task-in-process). There is now ample evidence in the literature (Coughlan and Duff 1994; Donato 2000; Foster 1998; Ohta 2001; Platt and Brooks 1994; Mori 2002; Roebuck 2000; Seedhouse 2005) of tasks-as-workplan resulting in different and unexpected tasks-in-process. For example, Coughlan and Duff (1994) demonstrate that the same task-as-workplan does not yield comparable results in terms of task-in-process when performed by several individuals, or even when performed by the same individual on two different occasions.

The difference between phases is particularly important when researching task-based learning and teaching (TBLT). TBLT interaction research gathers data from the task-in-process, because that is the actual communicative event which generates interactional data. So we need to be clear at the start that when we research task-based interaction, we are essentially researching the task-in-process, although reference may and should be made to the other phases as well.

Because of the different phases of ‘task’ mentioned above, it is not such a simple matter to define task-based interaction. One possible definition which might be considered is: ‘Any interaction produced by any task conforming to an accepted definition of task. However, this definition is really focused on the nature of the task-as-workplan. The problem is that occasionally learners are given a task and produce interaction which has little or nothing to do with the task and/or is in L1 (Seedhouse 2004). Interactional data are always gathered from the task-in-process, because that is the actual communicative event which generates interactional data. We therefore need for present purposes a definition which focuses on the interaction generated by the task. Another possible definition is: ‘L2 interaction in which participants display an orientation to the completion of a task.’ This definition displays a task-in-process focus and is therefore more suitable for applying to interactional data. It also prompts us to examine how exactly learners display an orientation to task in the details of their talk. Moreover, it means that in cases where the learners go off-task or speak in L1, we no longer need to consider these as task-based interaction. This is therefore the definition of task-based interaction which we employ in this paper.

However, one problem which remains with this definition is that of heterogeneity. In other words, there are many varieties of tasks and these can generate rather different varieties of interaction. Is it therefore possible that disparate varieties of task-talk show some commonality, some standard orientation to task? If we examine the following extract, it displays the characteristics of minimalization and indexicality typical of convergent tasks (Duff 1986) such as information gaps.

Extract 1

 1 L1: ready?

 2 L2: ready

 3 L1: er (.) the blue oblong above the red oblong, eh? the yellow oblong.

 4 L2: (.) alright. (.) >faster, faster.<=

 5 L1: =the: red cylinder (.) beside the (.) blue oblong,

 6 L2: (.) left or right?=

 7 L1: =right.

 8 L2: (.) right yeah ( ) OK.

 9 L1: (1.0) the the red cube (.) was: (1.0)

10 L2: the red cube?

(Warren 1985: 275)

However, interaction typical of divergent tasks (Duff 1986) such as discussion and debate often looks rather different, as in the extract below, in which Norwegian schoolchildren are debating the statement ‘Intermarriage is looked upon as the key to Americanization’.

Extract 2

L2: yeah e:r (2.0) when you look at in the (1.0) ekteskap ((tr: marriage))

LL: marriage

L2: marriage yeah you just look at marriage and you think it will cause problems for the two for the two people living together?=

L1: =no for the children.

L2: the children? (1.0) yeah (1.0) I think it could be a problem for the marriage itself too e:r because I I read e:r a survey=

L1: =((unintelligible 2 sec))

L2: yeah, yeah and I just e:r how to how you behave and how which which what kind of moral you have, I read a survey from Norway e:r which said that most divorces was caused with the marriages between a Norwegian and a foreigner=

L3: mhm.

L2: =so that e:r – the marriages are more unstable (2.0) and yeah it might be that it would cause problems for the childrens

(Seedhouse 1996: 389)

Here there is much less evidence of minimalization and indexicality. The interactants are free to take the discussion in different directions as they debate. However, we still find evidence of orientation to task by the same participants. In this case it is expressed as discussion of the concepts inherent in the statement. Explicit arguments are formed and ideas developed using fairly complete sentences with multiple clauses. Orientation to task, then, may take different forms according to the type of task.

The study of task-based interaction

It is often argued that TBLT promotes language learning. In order to develop a full understanding of how this might happen, we need to be able to portray in some detail the process of learning through interaction which is involved. Swan (2005) argues that there is insufficient evidence of how TBLT operates in real teaching contexts, and studying interaction is an essential aspect of this. Another reason for studying task-based interaction is to verify what actually happens in interaction, to see whether the task-as-workplan matches the task-in-process. Since research data are gathered from the task-in-process as implemented by students, it is essential to understand how task-based interaction is organized and how it can be studied.

According to Samuda and Bygate (2008: 85), TBLT research operates on three sets of dimensions, namely systemic vs. process, macro vs. micro, and quantitative vs. qualitative. In terms of research into task-based interaction, much early TBLT research based on Long's Interaction Hypothesis used a quantitative, systemic methodology which isolated individual features in task-based interaction (e.g. clarification requests, confirmation checks) for quantitative treatment. Samuda and Bygate (2008: 96) suggest that a problem with systemic approaches is ‘that it allow less detailed analysis of the data and less attention to the perceptions and processes engaged in by the participants’. Qualitative, process studies have also been conducted using sociocultural and CA approaches. Samuda and Bygate (2008: 97) suggest that it is hard to generalize from these instances, and that they provide little information on task design and implementation by designer and teacher. We will return to these points below when considering a framework for describing and analysing task-based interaction.

In this study, we present a holistic approach to studying task-based interaction. Such an approach to TBLT would portray this variety of interaction as a whole. It would not pre-select individual features of the interaction, assuming them to be of more value than others. Rather, it would consider how all the features interrelate, how they combine and contribute to the L2 learning process. Crucially, it would have to consider non-verbal elements of both interaction and task completion, relating these to verbal elements. In a holistic approach, it cannot be assumed that any element of behaviour is irrelevant to learning processes. Such an approach would be able to provide a basis for the overall evaluation of the advantages and disadvantages of this variety of interaction, and to compare this variety with other varieties of L2 classroom interaction.

Is a holistic approach justified? Claims for TBLT are often holistic. According to Breen (1987: 161), the task-based syllabus

approaches communicative knowledge as a unified system wherein any use of the new language requires the learner to continually match choices from his or her linguistic repertoire to the social requirements and expectations governing communicative behaviour and to meanings and ideas he wishes to share.

According to Samuda and Bygate (2008: 17), a task is a holistic activity. It follows, then, that the interaction should be described, analysed, and evaluated in a holistic manner rather than a segmental, atomistic one.

This is not to suggest that there is anything inherently wrong with quantifying individual interactional features (e.g. recasts). But, as Seedhouse (2005) suggests, quantification should not be premature, but should take place after an emic, holistic microanalysis of each extract has been conducted as an instance of discourse in its own right. In the second stage, the analysed interactional data (e.g. recasts) could be used for quantitative treatment with their construct validity assured.

Conversation Analysis (CA)

In this paper we investigate task-based interaction using Conversation Analysis (CA), a methodology for the analysis of naturally occurring spoken interaction. It is a multi-disciplinary methodology which is now applied in a very wide range of professional and academic areas. CA does not treat language as an autonomous system independent of its use; rather, it treats ‘grammar and lexical choices as sets of resources which participants deploy, monitor, interpret and manipulate’ (Schegloff et al. 2002: 15) in order to perform their social acts. According to Seedhouse (2004), one way of presenting the principles of CA is in relation to the questions which it asks. The essential question which we must ask at all stages of CA analysis of data is ‘Why that, in that way, right now?’ This encapsulates the perspective of interaction as action (why that) which is expressed by means of linguistic forms (in that way) in a developing sequence (right now). In other words, CA is a holistic methodology and is therefore suitable for the analysis of task-based interaction as part of a holistic approach. Over time, CA has developed the use of video data to incorporate the study of non-verbal communication into its analyses. Recent CA studies in the area of language learning which demonstrate the significance of non-verbal communication and gaze for our understanding of interaction include Carroll (2004), Lazaraton (2004), Mori (2003), Mori and Hayashi (2006), and Olsher (2004). CA studies which investigate features of task-based interaction include Jenks (2007; 2009) and Seedhouse (2004).

Problems in studying task-based interaction

A number of problems are evident in analysing task-based interaction. The first is in relating the talk to the performance of the task. Transcripts of task-based interaction are sometimes difficult to read and analyse, as they are highly indexical, with talk relating to physical movements undertaken when completing the tasks or reacting to data received from a computer. The extract below demonstrates the qualities of minimalization and indexicality.

Extract 3

11 C: °and looking at family photographs°

12  (1.0)

13 A: and not (0.7) oh that's full stop (0.6)

14 C: yeh (0.8)

15 A: no (0.6) ok (.) [photographs]

16 C: [what is this]

17  (2.8)

18 A: ok

19  (5.4)

20 A: the story went

21  (1.3)

22 Y: the story went on yeh yeh yeh(.)

23 A: huh(.) do you  [think ]

24 C: [I think] so yeh why not

25 A: ok go ahead

26 Y: ((moves piece 5 to her left))

The interaction is very hard to understand without knowing what exactly learners are physically doing during the task. In this case, access to non-verbal communication and details of the task being performed would render the interaction comprehensible and analysable, as we show below in relation to extract 5. In order to develop a holistic perspective on task-based interaction, then, it is essential to be able to relate (1) non-verbal communication and (2) physical performance of the task to (3) the details of the talk. This paper presents a technological solution to the problems of portraying these three aspects simultaneously for analysis and study.

Another problem is comparing task-based interaction to other varieties of L2 classroom interaction. It is often claimed that TBLT provides some advantages over other teaching approaches, so it should be possible to find evidence for this in the interaction generated by the different approaches. Seedhouse (2004) provides a framework for comparing different varieties of L2 classroom interaction and suggests that some common characteristics of task-based interaction are as follows. There is a reflexive relationship between the nature of the task and the turn-taking system. There is a tendency to minimalization and indexicality. Tasks tend to generate many instances of clarification requests, confirmation checks, comprehension checks, and self-repetitions. However, there can be heterogeneity in task-based interaction, as suggested above in relation to extract 2. Clearly, much work remains to be done in the area of describing task-based interaction as a variety. This paper proposes a framework for the holistic portrayal of task-based interaction, which should facilitate its description as a variety which may then be compared with others.

A technological solution

We used a combination of technologies to relate non-verbal communication and performance of the task to the details of the talk.2 We combined task-tracking hardware and software (digital tabletop), video/audio recording, and transcription. Digital tabletops (Figure 1) are multi-user, multi-touch interactive digital tables that combine face-to-face interaction with the full use of digital media. They also enable innovative task design.

Figure 1.

Digital tabletop

The type of horizontal tabletop display shown in Figure 1 has great potential in educational settings. Its strength lies in the way it combines face-to-face interaction with the use of information resources. What makes this tool especially interesting is that it can adapt well to the study of task-based learning and teaching in terms of having groups of students working on a task using an electronically enhanced shared space. This space can be manipulated, monitored, tracked, or even connected to other sources of information. Text, audio, video, and physical materials can be used on these tabletop displays and can be implemented in an interactive way.

Educational tabletop applications have recently been used for EFL learners. Morris et al. (2006) indicate that designing innovative user interfaces for digital tabletops represents a potentially powerful tool to facilitate certain pedagogical goals. They designed a number of applications (MatchingTable, PoetryTable, ClassificationTable) to handle specific educational tasks in EFL settings. Rick and Rogers (2008) showed how they adapted a single user application to be used in a shareable collaborative learning environment. Recent research investigated how shareable spaces might enhance group participation (Rogers, Lim, Hazlewood, and Marshall 2009) and whether the use of tangibles on such surfaces could promote learning (Marshall 2007).

In short, a digital tabletop of this sort satisfies Wilson's (1992) view of what an interactive multimedia learning environment should look like. He indicates that such an environment should allow:

the electronically integrated display and user control of a variety of media formats and information types, including motion video and film, still photographs, text, graphics, animations, sound, numbers and data. The resulting interactive experience for the user is a multidimensional, multi-sensory interweave of self-directed reading, viewing, listening, and interacting, through activities such as exploring, searching, manipulating, writing, linking, creating, juxtaposing, and editing. (Wilson 1992: 186)

Each digital display accepts input from three users simultaneously and provides a digital recording of how participants are actually completing a task.

It is true that not all classroom teaching and learning involves moving physical objects. However, different types of tasks can be designed for the table. A task can be making a story out of mini video clips on the table, or can be exchanging objects (digital/physical) on the table with movement constraints and feedback (key vocabulary/ price, etc.).

As Figure 2 shows, the digital recording shows which of the participants has moved the text where. Three people use three stylus pens, with different colours, and the table can sense which pen does what at any given point in time. For our purposes, what made this even more interesting was that we were able to link this data to data from the video/audio equipment around the tabletop, as shown in Figure 3.

Figure 2.

Screen capture of the jumbled text task used in this study

Figure 3.

System layout: different modalities put together

The study

The study was conducted in a classroom in the Culture Lab at Newcastle University. A digital tabletop was put in the middle of the room and two video cameras were also used during the task to capture video and audio data. Screen capture (Figure 2) was used to record the movement of the pieces of the story. The three learners had previously been introduced to the digital tabletop and the task using a dummy task.

The task in this study was a jumbled sentence text – a typical L2 classroom task which aimed to generate interaction between students. The story was taken from a textbook designed for advanced second language learners. The story was digitized and embedded in the table. The application mixes up the pieces randomly on the tabletop, where the learners can manipulate them. They can move, rotate, and maximize the pieces of text. The learners need to discuss among themselves how to rebuild the story and put the pieces in the correct order to rebuild the narrative.

In terms of the distinction between convergent and divergent tasks (Duff 1986), this activity represents a convergent task as the learners seek to come to an agreement with regard to the appropriate order of the pieces. They try to rebuild the story into its original order. So the assumption here is that there is only one solution, although in reality and as the task unfolds as a task-as-process (Breen 1989), this single goal orientation might change as the interactants negotiate.

The setting in this study was not a normal classroom setting but more of a semi-experimental environment. However, the presence of the digital table and cameras did not seem to introduce any intrusive effect. The students felt comfortable, and it was apparent that they were involved enthusiastically in the task.

The participants in this study were postgraduate international students at Newcastle University. Their English proficiency as shown by IELTS scores was quite advanced, the average IELTS score for the participants being 6.5. Data were collected from four groups of students (two triads and two pairs). Three groups represented PhD students from a number of countries and one group included MA students in TESOL. Three groups shared the same first language (L1) while the fourth group included students with different L1s. Three groups were from the School of Education, Communication and Language Sciences and the fourth group was from Computer Science. All of the data shown in this article are from a single group.

Data sources

The data sources in this study are shown in Figure 3. Using Transana3 software to align the video and tabletop with the transcripts of the interaction on a single screen, we are able to make three data sources available simultaneously for multimodal analysis. Of course, audio also plays simultaneously. The first source is the detailed transcript of the interaction (transcript view). The audio recording was transcribed using CA conventions. The video view of the group around the table is another source of data (video view). The third source was the screen capture, which shows movement related to task completion on the table (screen view). So, movement of the pieces of text can be viewed simultaneously on the video view and the screen view. Speech can be heard at the same time as the transcript is highlighted on the transcript view (Figure 3). This presentation gives the analyst the convenience of examining all elements of task-based interaction as many times as needed. Moreover, the ability to review talk, non-verbal elements, and task-completion actions simultaneously enables analysis of the interdependence of these three elements, as we demonstrate below. We propose that task-based talk can only be adequately analysed in conjunction with these two other elements.

Data analysis

In this section we examine episodes of task-based interaction. This has two aims: (1) to provide an example of how a holistic portrayal of interactional processes might be achieved, using the combination of task-tracking digital tabletop, video/audio recording, and transcription; (2) to demonstrate the value added by such an approach. Task-relevant activities (digital tabletop) will be related to talk (transcript) and non-verbal communication (video).We seek to demonstrate that certain aspects of TBLT processes can only be revealed by such a detailed examination of the interaction. A segmental approach which focuses on discrete phenomena may present a rather different picture from a holistic one, as we suggest below.

Uncovering task processes

Extract 4

 1 A: ((clears his throat))  [ok ]

 2 C: [accor]ding to family legends one or both of the brothers

 3  (1.1)were

 4 A: brothers::wer[e::aah ]

 5 C: [swapping] family information °s[tory] and looking at°

 6 A: [yes]

 7 A: no this swapping

 8  (1.8)

 9 A: °ok°

10  (3.1)

11 C: °and looking at family photographs°

12  (1.0)

13 A: and not (0.7) oh that's full stop (0.6)

14 C: yeh (0.8)

15 A: no (0.6) ok (.) [photographs]

16 C: [what is this]

17  (2.8)

18 A: ok

19  (5.4)

20 A: the story went

21  (1.3)

22 Y: the story went on yeh yeh yeh (.)

23 A: huh (.) do you  [think]

24 C: [I think] so yeh why not

25 A: ok go ahead

26  (1.8)

Extract 4 shows a problem in communication that leads the learners down a false trail as far as task completion is concerned. We see how an incorrect learner strategy (a guess) becomes accepted as reality and takes on a life of its own. If we had access only to the transcripts of extract 4, it would be very difficult to fathom out how the learners chose the wrong piece of text and went down the wrong path. By employing the video view and screen view as well (Figure 4), however, it is possible to uncover the task processes which led to this outcome.

Figure 4.

Introducing ‘were’ in line 4

Figure 4 shows the arrangement of the first 3 pieces of the story, piece 3 reading ‘According to family legend, one or both of the brothers’. In line 2, C reads this out aloud and at this point all three participants are searching for the next line of text to follow this. Since piece 3 ends with ‘brothers’, speaker C, having seen that the piece ends with a plural noun, employs a learning strategy and guesses that the word ‘were’ might follow ‘brothers’ and be the start of the next piece. C verbalizes this understanding in line 3. A echoes this in agreement in line 4. As ‘were’ was introduced by speaker C, the participants look for an appropriate text that would follow ‘the brothers were’ and chose ‘swapping family information’ (piece 4) as this would fit grammatically. In fact, the next piece should have been that the two brothers ‘had gotten into trouble for hunting on royal property’. In Figure 4, the shot was taken at line 3, as the nonexistent word ‘were’ was introduced. At line 5 piece 4 is moved to the space below piece 3. Figure 5 below gives the learners' final, incorrect solution to the task.

Figure 5.

Final (incorrect) arrangement of the story

It demonstrates that all participants were satisfied with the decision that piece 4 was the correct piece that should follow piece 3, although the two pieces do not actually form a grammatical sentence. They also ended up putting pieces 5 and 6 in one sequence. Piece number 6 should follow piece number 3, but that did not happen, causing this final, incorrect arrangement of pieces.

So, the verbal introduction of the word ‘were’ (which is not there in the written text) led to it becoming accepted by the participants as part of the written text. The word ‘took on a life of its own’ and led the learners to generate an incorrect sequence of written text. It is only by employing this multimodal framework that we can see how this problem arose. Furthermore, we can see that there is a reflexive relationship between task-based talk and task performance. In this case, a guess in talk becomes accepted as part of the written text and leads to failure to complete the task correctly. The employment of a learner strategy had a negative outcome in this particular case.


We noted above that task-based interaction can be heavily indexical and minimalized, particularly when involving convergent tasks. It is therefore difficult or even impossible to read and analyse transcripts of talk without knowing what the learners are physically doing. The following transcript (extract 5) shows a heavily indexical encounter.

Extract 5

11 C: °and looking at family photographs°

12  (1.0)

13 A: and not (0.7) oh that's full stop (0.6)

14 C; yeh (0.8)

15 A: no (0.6) ok (.) [photographs]

16 C: [what is this]

17  (2.8)

18 A: ok

19  (5.4)

20 A: the story went

21  (1.3)

22 Y: the story went on yeh yeh yeh (.)

23 A: huh (.) do you  [think ]

24 C: [I think] so yeh why not

25 A: ok go ahead

26 Y: ((moves piece 5 to her left))

If we had only the transcript, we might think that A and Y were telling a story in lines 20 and 22 that would go on and some events would follow. This transcript becomes much more comprehensible when we show the video view and screen capture that accompany the transcript.

The arrangement of the pieces in the screen capture shows the progress of the task. The four pieces (1–4) at the top left corner were picked as the first four pieces in the story. This frame was taken at line 20 from the extract above. Speaker A suggests that that piece (5) (that reads ‘the story went . . . ’) is a potential candidate for the next move. He moves it slightly away from himself and finally, at line 26, participant Y takes over the movement and puts it in place below pieces 1–4 on the top left. A's proposal is accepted by the other participants and jointly acted upon. So the multimodal data presentation helps us to analyse line 20. By placing his pen on piece 5, gazing at it, and reading the first 3 words of the text out, A is proposing that piece 5 should be the next piece of text in the story. In line 22, Y agrees with this proposal. In line 23, A checks this proposal with C, who agrees, and in line 25, A asks Y to physically move piece 5 into position as he cannot reach over that far (see Figure 6).

Figure 6.

Juxtaposition of the visual elements (line 20)

So an adequate analysis of this extract of task-based interaction is only possible with multimodal information. From a different angle, however, the analysis demonstrates that the three interactants have developed a multimodal speech exchange system appropriate to this task, in which verbal and non-verbal elements and task-completion actions are inextricably intertwined. Because of his gaze and pen position, and because of the point they have reached in the task-completion sequence, all that is required for A to make a formal proposal that piece 5 should be the next piece of the story is for him to read out its first three words. Verbal indexicality and minimalization are therefore built into the multimodal speech exchange system which participants have constructed for this task. As Goodwin (2000:1505) suggests, ‘participants visibly attend to such graphic fields as crucial to the organization of the events and action that make up activity reflexively situated within a setting, and which contribute structure to that action.’ The implication of this is that an emic perspective (i.e. an understanding of how participants organize their own talk) on task-based interaction is only possible if non-verbal communication and task-completion actions are available for analysis.

Self-repetitions, clarification requests, confirmation checks

In early TBLT studies based on the Interaction Hypothesis (e.g. Pica 1988), features of interactional modifications were isolated for quantitative treatment, for example self-repetitions, clarification requests, confirmation checks.

Extract 6

36 A: yeh that's right what [I mean]

37 C: [the st]ory on that these two in an attempt to escape the law

38  (1.3)

39 C: to escape the law:↑ (0.9)

40 A: [the law:↑]

41 Y: [escape ] the law::: (.)

42 A: the law:

43  (1.3)

44  and [not a se]nse here

45 A: [ok: ] >what about< something there is hiding there

46  (1.1)

Lines 39–42 could in principle be coded as any of the above features. These lines show that all the participants are repeating the same thing. It is not clear from the transcript alone what the function of these repetitions is. They could be clarification requests or confirmation checks, and C in line 39 may be doing self-repetition. However, when we look at the visual data together with the transcript, a radically different picture emerges.

The meaning of the repetition of ‘the law’ in lines 39–42 can only be properly understood by reference to visual information on the video view and screen view. The three participants are scanning the board to find a piece 6 to put after piece 5 (the law). They are visually scanning different areas of the tabletop, as can be seen in Figure 7. By repeating ‘the law’, they are displaying to each other their joint orientation: that they are engaged in exactly the same part of the task at the same time. The repetition is a kind of cueing or anchoring system which shows that they are synchronizing their task focus, even though their gazes are diverging.

Figure 7.

Gaze during repetitions

In early TBLT studies, self-repetitions, clarification requests, confirmation checks, etc. were counted because they were thought to be evidence of negotiation of meaning, which was thought to be conducive to SLA. The multimodal analysis here, however, suggests that the repetitions in this extract are best understood as an integral part of the multimodal speech exchange system that the participants have developed for the completion of this specific task. It follows that it would be easy to mis-code or mis-analyse any verbal action in task-based interaction unless multi-modal information is integrated into the analysis.

Silent contributions

Throughout the extracts presented here, there are noticeable periods of silence. Often, these are related to the nature of the task, as there is physical manipulation and spatial placement of the pieces taking place. This gives rise to lengthy pauses in the interaction as participants engage in activities which cannot be heard, but which can be understood by looking at video and tabletop data. Moments of silence, then, were often full of task-related action in which the learners were actually involved in the practicalities of the task.

Extract 7

1 C: a decade ago I came across an individual with my same last name of Boyter

2  (2.3) and then ↑ (0.5) it's gonna be (1.0) Boyter stop after this right↑

3 Y: hmm

4 C: because he have to (0.8)

5 Y: ((moves one of the pieces))

6 C: yeh Boyer is (.) °that common (0.7) ok here come on:↑ (1.1) here we go: ↑ good boy↑°

In Figure 8 and extract 7 it is clear that Speaker Y is not saying anything. However, her manipulation of the pieces on the table did have consequences on the next move in the sequence of events. As speaker C finishes reading aloud piece number 1 and looks for the next piece (lines 1, 2, 4), speaker Y points to piece number 2 with her pen and moves it to her left without saying anything, as shown in Figure 8. Moving the piece to the left is an indication that this piece should be the next. This grabs speaker C's attention and C agrees that this is a potential next piece, as shown in extract 7 (line 5); finally, C moves it below piece number 1 where they arrange the pieces.

Figure 8.

Silent contributions (line 5)

Without looking at Y's silent action in line 5, it would be hard to analyse fully the collaborative achievement of building the story. These physical manipulations have important consequences for the outcome of the task. Selecting piece number 2 and introducing it as the second piece in the puzzle took the story in a certain direction and imposed a certain narrative logic.

Our multimodal analysis can reveal how such instances of silent moves in fact demonstrate the mutual interplay between verbal and non-verbal elements and how they contribute to the organization of action. It also gives a different impression of speaker Y, who did not verbalize much during this task. From the transcripts alone, one might have gained the erroneous impression that she was not very engaged with the task. She uttered only 75 words (excluding reading text aloud) during a task which took 11 minutes to complete. However, her contribution in terms of task-related actions on the tabletop was equal to that of the other participants. Figure 8 and extract 7 also show that participants can creatively combine verbal and non-verbal aspects of task management. Here, Y is performing the physical movement and C is supplying the verbal accompaniment.

Successful collaboration: sorting out a reference

In many cases the students performed actions collaboratively, and sometimes they spent the time reading the pieces on the table silently while looking for connections in the puzzle. It was interesting to see how students faced a problem and how they collaborated to solve it by looking at the best matches and coordinating the process through non-verbal communication. We have seen that the students in extract 4 were not successful in locating the correct piece that goes after piece number 3 in Figure 4. It took them some time to leave that piece aside and take another line for the story. However, later on during the task they managed collaboratively to pull out the correct sequence and rearrange the pieces accordingly. Extract 8 shows how this process was collaboratively talked into being, and how students jointly worked out the solution by combining verbal and non-verbal resources.

Extract 8

1 C: he told a story of his ancestor James Boyter who in [eighteen cent]

2 A: [ninetee eighteen] transferred

3  the Atlantic

4 C: yeh yeh [trans]ferred the Atlantic Ocean from Scotland with his older brother Alec

5 A:[here]  older [brother Alexander]

6→C: [and then two of ] they two of thems(.) right

7 A: yeh

8 C: yeh the story I think the story went on that these two is an attempt to escape the

9  law right↑

10 A: the law [and my]

11 C: [because] two people=

12 A: yeh::=

13 C: just been like=

14 A: =yes

15 C: =James [Boyter and Alexander ]=

16 A: =yes [together yes ]yes=

17 C: em=

In extract 8 the students were able to collaboratively locate the problematic reference to ‘these two’ in piece 3. Figure 9 shows how this was done.

Figure 9.

Sorting out ‘these two’ problem

In Extract 8 and Figure 9 speaker C suggests (line 8) that piece 3 can be aligned with piece 2 as it mentions two people. The reference to two people can be seen in line 6, where she clearly shows the existence of two people involved in that event (she uses two fingers). She then pushes piece 3 toward piece 2 (Figure 9) while indicating in lines 11–16 that the two people that piece 3 talks about are James Boyter and Alexander, and therefore shows that piece 3 should follow piece 2. The way she was able to successfully bring these two pieces together shows how the speakers managed to collaborate interactionally using verbal and non-verbal means. Lines 7, 12, 14, and 16 show how participant A was intensely involved in the moment-by-moment ratification of C's proposals and reasoning. His utterances ‘yes/yeh’ were repeated in every line, showing his agreement with what she was saying. In line 16 he also uses two fingers, showing that these two people were together by extending two fingers and bringing them close to each other, mimicking C's previous physical actions in line 6 (see Figure 10).

Figure 10.

Two fingers gesture repeated

Participant Y, although not appearing verbally in the transcript, was involved in this process and showed non-verbal support for the other participants' agreement that the two people are James Boyter and Alexander. She nodded and looked at the piece that was the locus of talk, as can be seen in Figure 9. Together the participants were able to orchestrate this process of putting things together and reach a justifiable solution for bringing these two pieces together.

Successful collaboration: converging efforts

In extract 9 the students were trying to locate the fourth piece to fit after ‘to escape the law’ as shown by Figure 11.

Figure 11.

Resolving a problem (line 20)

Extract 9

18 A: =the law and my grandfather didn't know he was either no not this one (.) the law=

19 C: the story went on that these two is an attempt to escape the [lawr] (.)

20 Y: fled=

21 C: ye:::h↑ [fleed] to the United States oh very good [well done]

22 A: [yep] you make progress

Extract 9 shows another interesting example where the participants collaboratively succeeded in overcoming a problematic point. In lines 18–19 there was an accumulating momentum where all the efforts were focused on finding piece 4 to go after ‘the law’ (piece 3). They had examined this piece before and had not been able to find a match for it, as we have seen earlier. In these two lines (18–19) all participants were busy surveying the table while keeping their orientation to each other.

In line 19, participant C provided a cue for participants while they searched for text on the table by reading aloud piece 3. Participant Y spotted piece number 4 and immediately suggested (line 20) that it could be the next item of the sequence. At the same time as she was doing that (line 20), participant A was looking at the same piece and pointing to it (Figure 12). As participant A uttered the word ‘fled’, his facial and hand gestures show that he was also involved in achieving the reached agreement on piece number 4. Participants Y and A were both touching piece 4 with their pens in line 20. Speaker C joins this collaborative achievement by pointing to the piece in line 21(Figure 12). Speaker A initiated the move of piece 4 with the pen (line 20), Y moved the piece halfway (line 21), and finally C moved it fully into position. C also provided a verbal accompaniment and positive evaluation. This multimodal analysis shows that they were all involved with precisely coordinated timing in moving the story in the right direction. This example again demonstrates speaker Y's minimal verbalization (line 20), but nonetheless she made an important contribution to task completion.

Figure 12.

Joint efforts

A framework for describing and analysing task-based interaction

The data analysis has revealed some of the multimodal complexity of task-based interaction and some of the issues involved in its analysis. We now consider what an adequate framework might be for describing and analysing task-based interaction. It should have the following components, and should track the development of the task through its three phases, as described above, providing a holistic perspective. The data analysis above provides an illustration of what components (b), (c), and (d) might look like. There is no illustration of the other components, owing to considerations of space.

  • (a) Details of task-as-workplan. It is essential to have details of the workplan as stated in e.g. a coursebook or a lesson plan, with as much detail as possible on task design. It is also necessary to know how exactly the teacher has presented the task-as-workplan to the learners. Preferably, a recording would be available of how exactly the teacher has presented the task-as-workplan. As Van den Branden (2006) points out, teachers often modify the task-as-workplan and ‘interpret’ it to the learners.
  • (b) Video and audio recording of the interaction (task-in-process).
  • (c) Transcription and CA analysis of the interaction (task-in-process).
  • (d) Tracking of task progress (task-in-process). In this study, data are collected from capturing the table surface in video format. It is basically screen capturing of the implementation of the task on the surface of the table.

It is essential that components (b), (c), and (d) can be analysed simultaneously in order to understand how they interrelate. In the data analysis above, we explained why components (b), (c), and (d) are vital to understanding task-based interaction and the learning process in particular.

  • (e) Quantification (if required). As Seedhouse (2005) suggests, once the qualitative analysis is complete, the analysed interactional data could be used for quantitative treatment with their construct validity assured.
  • (f) Task outcomes should be considered, where applicable. The task-as-outcomes is whatever is physically produced. This may be a piece of writing or a sheet marking spot-the-difference in a picture showing, for example, that 7/8 differences had been spotted in the given time.
  • (g) Synthesis. Finally, the three phases of the task are examined to see whether the process has been a linear one in which all phases coincided neatly, or whether trouble arose during any phase and why.


Goodwin (2000: 490) proposes that ‘the construction of action through talk within situated interaction is accomplished through the temporally unfolding juxtaposition of quite different kinds of semiotic resources’. This study suggests that during task-based interaction, a significant amount of learners' attention is spent on coordinating verbal elements with non-verbal elements of interaction and with task-related actions in a precisely timed, multi-dimensional dance. This may be useful practice for performing real-life tasks with others in a second language, and may therefore constitute a significant advantage of TBLT. These non-verbal elements have not previously been visible in TBLT research, as the technology and methodology were not available. Studies of this nature can offer a number of insights into the realities of TBLT. In extract 4 above we saw high-level students make some wrong moves which meant they were unable to complete the task successfully. We also saw in extract 9 how learners were able to successfully collaborate to complete a task. The holistic perspective revealed that learners who do not make a great verbal contribution may still contribute significantly to task completion.

A criticism of the above holistic framework is that it is extremely time-consuming and laborious, and this is certainly the case. It is not argued that all studies of task-based interaction need to provide such a degree of detail. However, we argue that a small number of studies of this kind are necessary. It may be that such studies, focusing on all aspects of the process of language learning through tasks, will reveal that tasks, when implemented, create many learning opportunities which were not previously evident. Digital tabletops are currently very rare in classrooms, but it is likely that the technology shown will become more widely available (and less expensive!) in future. The technology offers the possibility of combining possibilities of innovative task design with digital recording of learner choices and moves. It is certain that there remains much more to be discovered concerning the relationship between verbal and non-verbal elements of interaction and task-related actions.


  • 1

    Coughlan and Duff (1994) use the terms ‘task’ and ‘activity’ to express the same distinction.

  • 2

    Many thanks go to Ahmed Kharuffa – a PhD student in Computer Science at Newcastle University – for his help and for allowing us to use his tabletop application.

  • 3