4.1. Three hypothesis about tutoring effectiveness
Three general hypotheses have been proposed for the learning benefit from being tutored (Chi et al., 2001). This section elaborates these three hypotheses and briefly reviews prior evidence relevant to them. The first hypothesis, referred to as the tutor-centered pedagogical hypothesis (or in brief, the tutor-centered hypothesis), assumes that learning from tutoring is enhanced because the tutor undertakes pedagogical moves (such as explaining, scaffolding, giving feedback, or motivating) that are tailored to the tutees. This hypothesis similarly underlies research that examines effective and ineffective teaching practices (Shulman, 1986), in that it assumes the effectiveness of teaching affects student learning. Such an assumption may implicitly underlie tutoring research that searched for tutors' optimum pedagogical strategies and moves (Evens, Spitkovsky, Boyle, Michael, & Rovick, 1993; Hume, Michael, Rovick, & Evens, 1993; Lepper, Woolverton, Mumme, & Gurtner, 1991; Merrill, Reiser, Merrill, & Landes, 1995; Merrill, Reiser, Ranney, & Trafton, 1992; Putnam, 1987; Sleeman, Kelly, Martinak, Ward, & Morre, 1989; VanLehn et al., 2003).
Crediting a tutor almost entirely for a tutee's learning is rational but actually without basis in evidence. Perhaps the hypothesis arose from three observations: (a) that powerful techniques are used by a few unique and exceptional orators and tutors such as Socrates (Collins, Brown, & Newman, 1989; Collins & Stevens, 1982); (b) that a tutor is generally knowledgeable about the content domain that she is tutoring; therefore, this domain expertise is confounded with the assumption that the tutor must also be an expert on the pedagogy of tutoring; and (c) that a tutor does typically control, lead, and dominate the tutorial conversation (Chi et al., 2001; Graesser, Person, & Magliano, 1995). Granted, one can determine that some tutorial moves might affect learning (specifically, feedback can accelerate learning in the context of problem-solving; Anderson, Corbett, Koedinger, & Pelletier, 1995). However, these three observations may be epiphenomena, in that the differential tutoring moves may not in fact be responsible for tutee's learning.
We can evaluate this tutor-centered pedagogical hypothesis in terms of three components: frequency, quality, and adaptiveness. That is, for the first component, if we assume that a tutor's move is responsible for a tutee's learning, then the more often such a pedagogical move is undertaken, the better the tutees ought to learn. Our prior results found no support for this frequency component in terms of correlations. For instance, we found that although our novice tutors explained frequently to tutees, the tutees did not seem to learn much from these explanations (Chi et al., 2001, Study 1).
The quality component was not examined directly in our prior study. One could argue that expert tutors might give excellent explanations that novice tutors could not, assuming that good instructional explanations facilitate learning much more so than poor instructional explanations (Eisenhart et al., 1993). Since the tutoring dialogues in this study involve an experienced teacher/tutor, we can infer the quality component by analyzing the frequency component again. If the same results are obtained here with this more expert-like Tutor as our prior results with novice tutors, then the evidence here would indirectly refute the quality component.
The third component of the tutor-centered hypothesis is adaptiveness. Adaptiveness is a very complex concept. It can refer to a tutor's selection of the appropriate moves, delivered at the right moment and based on a tutee's need for feedback and help (Murray & VanLehn, 2005). Thus, being adaptive can be operationalized to mean that (a) a tutor must choose the appropriate moves (or problems to be solved by the tutee) that are tailored to the tutee; (b) a tutor knows when to deliver his feedback, explanations, and scaffolding hints (such as contingent upon the correctness of a tutee's response; Wood, Wood, & Middleton, 1978); and (c) this knowledge must be gleaned from his continuous assessment of the tutee's competence and understanding. Moreover, assessment can mean from either a normative perspective or from the tutee's (or student's) perspective. For example, we found that inexperienced tutors could not assess tutees' deep understanding accurately from the students' perspective (Chi, Siler, & Jeong, 2004), whereas tutors are usually quite capable of assessing tutees' competence from the normative perspective (Putnam, 1987). In short, there is a scant set of evidence to test the many aspects of the adaptiveness component of the tutor-centered hypothesis. We will add a small piece of evidence to this scant set here.
The second hypothesis for the benefit of tutoring is the idea that a tutoring context, by definition, is one in which a tutee has greater opportunities to have a one-on-one dialogue with the tutor, as compared to a standard classroom context. This opportunity to be constructive potentially could cause the tutees to learn more. We called this the student-centered constructive hypothesis (or the student-centered hypothesis) to contrast the role of the tutees from the role of the tutor. We provided preliminary evidence in support of this hypothesis. For example, when the tutors were suppressed from giving any explanations and feedback at all, and could only give prompts (Study 2, Chi et al., 2001), the tutees learned just as effectively as when the tutors gave a substantial amount of explanations and feedback (Study 1, Chi et al., 2001). We attributed the tutees' learning to the constructive responses that they gave to tutors' scaffoldings.
Finally, the third proposed hypothesis is the interactive coordination hypothesis (or the interaction hypothesis), which states that tutoring effectiveness depends upon the joint or coordinated effort of both the tutor and the tutee. For example, our evidence showed that some tutor moves (such as scaffolding) were more beneficial for tutees' learning than other tutor moves (such as giving explanations; Chi et al., 2001, pp. 499–500). Moreover, when we encouraged the tutors to do more scaffolding than explaining, this resulted in an increase in the number of multi-turn deep tutor-tutee interactions (see fig. 9, Chi et al., 2001). Both of these results suggest that scaffolding can elicit more meaningful and elaborate joint construction. Although we could infer that these two results confirmed that some kind of interactions between a tutor and a tutee contributed more toward learning than other kinds, it was actually impossible to isolate the contributions of the tutees independently of the contributions of the tutors. That is, we could not discriminate whether the learning arose from the tutors' scaffoldings per se or from the tutees' constructive responses per se or from their interactions.
In sum, our previous studies (Chi et al., 2001; Chi et al., 2004) provided sufficient evidence to question the tutor-centered hypothesis and to highlight both the student-centered and the interaction hypotheses as potential accounts for tutees' learning. The current study hopes to provide additional evidence to support and/or refute these hypotheses, using a procedural domain (problem-solving) rather than a conceptual domain (human circulatory system), a single more expert-like tutor rather than multiple novice tutors, and college students rather than middle-school students as tutees. Moreover, by inferring the effectiveness of tutoring from the additional perspective of how the Collaborative Observers learn, we might gain better insight into all three hypotheses since the tutoring was not tailored to the Observers, nor were the Observers constructing responses to the Tutor, and moreover, the Observers heard not only what the Tutor said, but they also heard what the Tutees said.
4.2. Segmentation and grain size
Learning studies using complex materials, such as solving physics problems, can generate a massive amount of protocol data. Because of the labor-intensiveness of transcribing and coding such a massive amount of data, we tested a short-cut method and undertook duplicate coding for 20% of the data for the purpose of calculating interrater reliability. However, we did inject several “validating” analyses to gain further confidence in our coding. By this, we mean alternative codings, often at a different grain size, or with a different set of goals, to see whether the results from different codings replicate each other or whether some codings replicate robust evidence in the literature.
We begin by explaining how the 10 tutoring videotapes were segmented in a short-cut way after they were transcribed. Each transcript was first segmented according to speaker turns. A turn, by definition, is speech by a single speaker (Traum & Heeman, 1997). Then the transcript was further segmented according to either the speaker's intonation (e.g., a falling tone, a rising tone), pauses, or changes in action (e.g., from talking to writing on the board or from reading the problem statement to writing on the board). Two coders independently segmented 20% (2 of the 10) tutoring video transcripts while watching the tutoring videos and agreed on 2466 (97.03% of the total) segments.
This short-cut method of segmentation, based on the structure of speech (turns, intonation, pauses, and changes in action) can be carried out much more objectively and rapidly than segmentation based on an analysis of the content of speech, as we have routinely done in the past (Chi, 1997; Chi et al., 2001). Therefore, it is important to know whether segmentation based on the structure of speech is adequate in comparison to segmentation based on the content, since the former can be more easily automated. To verify this, we selected the middle 20% of each of the 10 tutoring transcripts and coded according to the content method of segmentation (Chi, 1997). The segment boundaries were then compared across the two methods, yielding concordance rate of 89.1% for the 2,416 coding decisions made across the two systems. Not only does this result indicate a high level of agreement between the two methods, but the percentage of disagreements between the two methods was quite symmetric (5.4% of the time a segment was indicated by the content coding but not according to the structure coding; and 6.1% for the alternate instance), indicating that there was no systematic bias one way or the other in terms of segmenting at a consistent grain size.
Using the structure rather than the content of speech, a segment does nevertheless roughly correspond to what we have previously referred to as a statement; that is, to a single idea, presented by a single speaker within a turn. Thus, a segment is our smallest unit of analyses. The utterances made by the Tutor, shown below, were coded as three segments; the boundaries are shown by the double lines.
So you have therefore written the equation of motion.//
And from using the equation of motion you have been able to find out what would be the normal reaction for block A.//
Now similarly there is a equation of motion for block B.//
Segments can also be combined into interactive dialogue units when tutor-tutee responses are considered jointly, often in adjacent pairs of turns. Because a turn can contain multiple segments, the last segment within a turn (from the Tutor, for example) can be analyzed with the response in the next turn (by the Tutee), to create a dialogue unit. Finally, segments can also be combined into an episode unit. An episode corresponded to the consecutive talk and problem-solving segments that referred to the same problem-solving node, from the model of problem solutions (see Fig. 1 for an example of a problem solution model).
The results will be reported below in three sections, corresponding to the three grain sizes: segments within one turn, dialogues that are consecutive segments involving two turns, and episodes often involving multiple turns. The details of the codings for each grain size will be unpacked as each result is being described.
4.2.1. Independent segment analyses
The Tutor, on average, made a total of 686 segments, whereas the Tutees averaged 443 segments per tutoring session. This approximate 3:2 tutor-tutee ratio is typical in tutoring, in which the tutor usually dominates the conversation by talking more (Chi et al., 2001; Graesser & Person, 1994). In our prior study, the tutor-tutee ratio was even more pronounced (621 tutor statements vs. 206 tutee statements, roughly a 3:1 ratio). The smaller difference in this current study may be attributed to the experience of this Tutor in the context of our research, in that he recognized the advantage of making fewer statements himself and eliciting more responses from the Tutees. One could perhaps take the ratio of tutor-tutee contributions as an index of tutor expertise, suggesting that our Tutor in fact is more of an expert tutor.
Overall, the tutorial dialogue, considering both the Tutor and the Tutees' moves, was far more extensive with the Poor Tutees (1393 mean number of tutor-tutee segments) than with the Good Tutees (864 segments). An obvious interpretation of this result is that the Poor Tutees needed more help. This overall greater amount of dialogue with the Poor Tutees translated into a greater frequency of all types of tutor moves; therefore, it is sometimes more appropriate to calculate the proportion and other times to use the frequency of moves in the analyses to be reported below.
188.8.131.52. Learning from the tutor's moves?
Tutor moves were defined as instructional segments, uttered by the Tutor, that were relevant to the pedagogical task of tutoring. Tutor segments were categorized as either an explaining move, a scaffolding move, a feedback move, or various other miscellaneous moves (such as summarizing, comprehension checking, tutor responding to tutee questions, false starts, and so forth). Tutor moves for the middle 20% of each of the 10 tutoring protocols were independently coded by two raters into these four categories. Based on the Kappa coefficient, the interrater reliability for this coding indicated substantial agreement (κ = 0.758). Subsequent analyses will focus on the three largest categories: explanations, scaffolding, and feedback segments only.
An explanation is an utterance in which the Tutor defines physics concepts or principles, provides an interpretation of important problem situation features, or describes how to carry out a particular procedure, the conventions used to carry out a particular mathematical or physics problem-solving step, or the outcome of applying some procedure. Below is an example of a tutor explanation, generated in one turn, containing four segments:
If there is a net force F, then there will be an acceleration, a, on that object.//
If there is no force, then there is no acceleration.//
So this is the equation which tells you that whether an object—an object is accelerated or not.//
And umm, therefore, it is called the equation of motion.//
A scaffolding is defined as either a prompt that is content free (superficially, it gives away no information) or some kind of support for helping or guiding the tutees toward understanding. The support can take the form of a hint, an assertion with an expectation to fill in a blank, a direct or indirect question, and so forth. In Chi et al. (2001), we identified 14 different forms of scaffolding. Below are three examples of scaffolding taken from different contexts in the protocols, each of one segment length:
Weight is the…?//
Acceleration due to…?//
When a force acts on body, uh, how does the body react to it?//
A feedback segment can be either a short positive (e.g., “right”) or negative (e.g., “no, no”) response about the correctness or incorrectness of what the Tutees said or did, or it can be more extensive, in terms of correcting what the Tutees did incorrectly (e.g., “No, the Earth”) or elaborating further on what the Tutees did or stated (e.g., “No, it should be accelerating towards A and B”). In the latter two cases, the Tutor's feedback would be coded only as a corrective/elaborative feedback (and not double-coded as both a negative and a corrective/elaborative feedback). These feedback segments can be given to either correct or incorrect Tutee responses.
The top section of Table 3 shows the average number of segments per session and their proportion for each type of Tutor's instructional move. Note that scaffolding is the largest category of the Tutor's instructional moves, consisting of 36% of his total statements. In contrast, in our prior study, scaffolding consisted of only 5% of the total number of instructional moves. Likewise, explanations consisted of 23% of the total instructional moves, whereas in our prior study, the tutors' explanations consisted of 53% of the total statements (Chi et al., 2001, Fig. 2). The reversal in the ratio of explanations-to-scaffolding might be caused by the expertise of the current Tutor, whereas our prior study involved 11 novice tutors. In fact, one might consider the ratio of explanations to scaffolding moves as another index of our Tutor's pedagogical expertise.
Table 3. Correlation of Tutor and Tutee moves with Tutees' and Collaborative Observers' deep learning
|Tutor instructional moves|| || || || |
| Feedback||130||19||Trend r = −.603, p = .065||N.S.|
| Other||154||22|| || |
| Total||686||100|| || |
|Tutee learning moves|| || || || |
| Substantive||230||52||Trend r = .605, p = .064||N.S.|
| Nonsubstantive||213||48||r = −.899, p = .000||N.S.|
| Total||443||100|| || |
|Tutee substantive moves|| || || || |
| Relevant follow-ups||99||43||r = .641, p = .047||Trend r = .398, p = .082|
| Irrelevant responses||131||57||Trend r = .620, p = .056||N.S.|
|T otal||230||100|| || |
Did the frequency of Tutor moves correlate with Tutees' learning? No. There were no significant correlations between the average number of Tutor's explanation nor scaffolding segments per se with either the Tutees' or the Collaborative Observers' matched deep step gains (see the last 2 columns of Table 3, top). This result replicates what was found in Chi et al. (2001). There, neither the tutors' explanations (Table 3) nor scaffoldings (Table 3, Model 2), correlated with deep learning. That is, if we extract from the protocols only the Tutor's moves in terms of the frequency of explanations and scaffoldings that the Tutor provided, then receiving and hearing those moves as independent monologues did not have an impact on either the Tutees' or the Observers' learning.
What about the Tutor's feedback? As shown in Table 3, there was a negative correlation between all types (positive, negative, corrective, elaborative) of Tutor feedback and the Tutees' learning (r = −.603) but no correlation with the Observers' learning, and the negative correlation with the Tutees' learning was marginally significant (p = .065). This marginal negative correlation between the Tutor's feedback and the Tutees' learning cannot be mediated by Tutees' errors because there is not a significant correlation between Tutees' errors and Tutees learning. That is, it is not the case that the more errors a tutee makes (thereby eliciting more tutor feedback), the less he is likely to learn, thus accounting for the negative correlation. This puzzling negative correlation will be examined more closely later.
184.108.40.206. Learning from the tutees' moves?
The preceding section analyzed the Tutor's moves independently of the Tutees' responses. Such an analysis, in essence, treated the Tutor's moves as instructional monologues, and there were no significant correlations between the frequency of the Tutor's moves with the Tutees' or the Observers' learning (except for a marginal negative correlation with feedback for the Tutees only). If neither the Tutees nor the Observers learned by considering the Tutor's independent moves, then how did they learn? In this section, we analyze the Tutees' independent learning moves.
Tutees' segments were categorized as either a substantive or a nonsubstantive learning move in terms of the content, regardless of the form of the segment, such as an assertion or a question. A substantive segment is defined as a meaningful contribution to an ongoing activity, such as problem solving, or a relevant response to the Tutor's explanations. For example, to the Tutor explanation shown below the Tutee's response would be coded as a substantive one:
Tutor: See this equation is true for constant acceleration.//
Now the acceleration is constant here.//
Forces are not changing on the weight so the acceleration is constant.//
Tutee: The initial velocity is zero then.//
A nonsubstantive segment is defined as a continuer, a repetition, an agreement, or off-task remarks. To the Tutor's explanation shown above, if the Tutee had responded with “alright,” then that would be coded as a nonsubstantive response.
The middle section of Table 3 showed that 52% of the Tutees' segments were substantive. The correlation of substantive moves with matched deep step gain was r = .605 and it approached significance (p = .064), whereas the correlation of nonsubstantive moves with matched deep step gain was strongly negative (r = −.899, p = .000). Thus, the Tutees learned only when they responded with substantive contributions, but they definitely did not learn when they constructed nonsubstantive responses, again, replicating our previous results (Chi et al., 2001). Thus, being responsive per se is not sufficient; one must construct substantive responses in order to learn.
Substantive segments can be further divided into those that are relevant or irrelevant. Relevant substantive segments are those that are responsive to the Tutor's comments in the sense of building on or following up to the Tutor's comments. The following underlined segment would be an example of a relevant substantive response:
Tutor: If I push it, it's, velocity becomes some—something.//
Tutee: Mm hmm. [tutee nods yes]//
Tutor: So from zero to something, there is a change.//
Tutee: Ok, so yeah. It wouldn't be a constant.//
Irrelevant responses are those that are not responsive to the Tutor's comments but are nonetheless substantive. The underlined example below is an example of an irrelevant but substantive response:
Tutor: It seems reasonable?//
Tutee: That the Earth is accelerating.//
Tutor: Because of these masses.//
Tutee: [tutee laughs] No. Those are some pretty big masses.//
Tutees benefited from constructing substantive responses, both relevant (the correlation is r = .641, p = .047) as well as irrelevant ones (r = .620, p = .056. See Table 3, bottom.) The Observers, however, seemed to be able to benefit somewhat only from overhearing the relevant responses (trend, r = .398, p = .082).
The fact that the Tutees could benefit from making substantive responses whether or not they were relevant replicates our overall self-explanation effect (Chi et al., 1994; McNamara, 2004) if we assume that irrelevant substantive responses are analogous to idiosyncratic self-explanations. In particular, in Chi et al. (1994), we claimed that students could learn whether they generated correct or incorrect self-explanations. A simple interpretation for this latter finding is that, although the substantive responses may seem irrelevant from the normative perspective, they can be conceived of as self-explanations that serve the Tutees' own purposes of repairing and refining their own understanding (Chi, 2000). This same interpretation can be used to explain the modest benefit Observers had from overhearing the relevant substantive follow-ups but not from overhearing the irrelevant responses, because the irrelevant ones would not make sense to an observer, since they served the Tutees' own purpose of repairing and refining their own understanding.
This result, that Tutees learned from constructing substantive responses, further reinforces our previous interpretation of Study 2 in Chi et al. (2001). There, the tutees seemed to have learned in an artificial tutoring condition in which the tutors were suppressed from explaining but encouraged to scaffold. The tutees in the suppressed condition learned just as well without tutors' explanations but with many more tutor scaffoldings (see Fig. 8, Chi et al., 2001). We had inferred then that the tutees must have learned from the benefit of constructing responses to the tutors' scaffoldings, and the result provided here further confirms that interpretation, along with some evidence in VanLehn et al. (2003, tables 11 and 12), Litman and Forbes-Riley (2006), and Jackson, Person, and Graesser (2004).
220.127.116.11. Summary of segment-level analysis
Overall, the pattern of correlation results shows that the frequency of the Tutor's moves had mostly no effect on the Tutees' (nor the Observers') learning, replicating our tentative results from the 2001 study. Moreover, the Tutor's feedback moves were somewhat detrimental to the Tutees' learning. Thus, these results do not support the frequency component of the tutor-centered hypothesis. Moreover, because our Tutor is an experienced teacher, the lack of correlation between his instructional moves and Tutees' learning suggests indirectly that the quality component of the tutor-centered hypothesis is not supported either. Thus, there was no support for two of the three components of the tutor-centered hypothesis.
The Tutees' moves, however, did affect their own learning, but only if they constructed substantive responses (whether relevant or irrelevant). This finding, that the students' own construction is responsible for learning from tutoring, further supports the student-centered constructive hypothesis.
The Observers could not learn by overhearing either the Tutor's or the Tutees' independent moves, even if they were substantive. But there was a trend for a correlation with their learning when the Tutees' responses were not only substantive but relevant as well, most likely because substantive relevant responses are normative, whereas substantive irrelevant responses are specific to the Tutees' own mental models only. This trend will be examined in greater detail in the next section.
4.2.2. Interactive dialogue analyses
Testing the tutor-centered and the student-centered hypotheses involved analyzing Tutor's instructional moves and Tutees' learning moves independently. Even so, we often could not disambiguate whether the Tutees' learning arose from their own constructions or from receiving the Tutor's instructional moves. For example, as shown in the bottom of Table 3, the Tutees learned from constructing both relevant and irrelevant substantive responses. But we could not determine whether their learning arose from their own self-directed construction only, from receiving the Tutor's instructional moves, or from some interaction of the two. However, analyzing interactive dialogue units as well as analyzing from the perspective of the Observers' learning might allow us to differentiate the contributions of the Tutor, the Tutees, or their interactions toward the Tutees' learning. Accordingly, in this section, analyses of the tutoring protocols will take on a larger grain size, in terms of tutor-tutee dialogue units.
18.104.22.168. Tutees' relevant substantive follow-up responses to tutor's scaffolding and explanations
Since the Observers did not learn at all from overhearing the Tutees' irrelevant responses (Table 3, bottom), we focus only on the Tutees' relevant follow-up responses. Coding of relevant Tutee responses was actually interactive coding that examined adjacent pairs of turns, because relevance must be defined in the context of the content of the prior utterance. These prior Tutor utterances were either explanations or scaffoldings. There were no instances of Tutees making immediate relevant substantive responses to Tutor feedbacks because Tutor feedbacks were immediately followed by either some other type of Tutor move before a Tutee had a chance to respond (e.g., feedback followed by a scaffolding or explanation), by a nonsubstantive Tutee move (e.g., a continuer or repetition of what the Tutor said without adding any new information), or by a Tutee nonsubstantive response (e.g., a request directed to the Tutor or an assertion like “I can do this math” or “Ok I see where you are going.”) Accordingly, we examined tutor-tutee dialogue units that were relevant Tutee responses that either followed a Tutor scaffolding or a Tutor explanation.
As shown at the bottom of Table 3, there were on average a total of 230 substantive segments made by each Tutee per tutoring session. Of these, only 99 of them were relevant follow-up responses to the Tutor's scaffolding or explaining moves. Here is an example of a Tutee's relevant follow-up (underlined) given to the Tutor's scaffolding (more Tutor segments are provided for context):
Tutor: No M is acceleration of what?//
This force is acting on what?//
Tutee: This force is acting on—the ground.//
And here is an example of a Tutee's relevant follow-up given to the Tutor's explanation:
Tutor: So this will be pulling this object to it.//
So since A and the force that A and B are experiencing due to G is attractive force—force—directed toward G—//
so G should be experiencing a force which is directed toward A and B//
so that will be upward force//
Tutee: Oh okay—so I have this backwards.//
For the Tutees, their learning correlated significantly with constructing a relevant follow-up to the Tutor's scaffolding (r = .656, p = .039), more so than constructing a relevant follow-up to a Tutor's explanations (r = .576, p = .082), although there is a trend here too (see Table 4). However, we cannot tease apart whether the Tutees learned as a result of receiving the Tutor's scaffolding and explaining pedagogical moves, or from their own construction. Since the Observers were not constructing the responses, whether or not they learned by overhearing these tutor-tutee dialogue units would help differentiate the interpretations above. Table 4 (last column) shows that the Collaborative Observers learned significantly only when they overheard scaffolding-relevant follow-up dialogue units (r = .434, p = .056) but not explanation-relevant follow-up dialogue units. Their differentiated learning outcomes suggest that the source of the Tutees' learning might be different for the two types of instructional moves. We offer the interpretation that the Tutees learned from co-construction when the Tutor scaffolded them, and they learned from their own self-construction when the Tutor explained to them. That is, when the Tutor explained, the Tutees might have learned from constructing their own relevant substantive responses (and not from receiving the explanations since there was no correlation between Tutor's explanations and Tutee learning; see Table 3 again), whereas when the Tutor scaffolded, the Tutees might have learned from joint or co-constructing with the Tutor, in the general sense of building on to and/or extending upon (Tao & Gunstone, 1999) what the Tutor said.
Table 4. Correlation of the number of Tutor-Tutee relevant substantive interactive dialogue with Tutees' and Observers' deep learning
|Tutor scaffolding followed by Tutees' relevant substantive responses||59||60||r = .656, p = .039||r = .434, p = .056|
|Tutor explaining followed by Tutees' relevant substantive responses||40||40||Trend r = .576, p = .082||N.S.|
|Total||99||100|| || |
Why would Tutor's scaffolding enhance joint construction more so than explaining? Note that a scaffolding tends to be a question, a prompt, or a hint, some move that is brief and expects a follow-up response, whereas an explanation tends to be longer and more didactic-like assertions that do not necessarily expect a response (see the two previous protocol examples). Therefore, by its very nature of being short and anticipatory, it is easier to understand and to build on a scaffolding move than an explaining move. In short, one interpretation is that the short and anticipatory nature of scaffolding invites joint construction.
One way to test our interpretation that scaffolding-relevant follow-up dialogue units are jointly constructed more so than explanation-relevant follow-ups is to analyze their coherence. Since joint construction involves building on and extending each other's utterances, one would expect jointly constructed dialogues to be more coherent than non-jointly constructed dialogues. To test this coherence hypothesis, we compared the cohesiveness of scaffolding-relevant follow-up and explanation-relevant follow-up dialogue units by using a computer tool called Coh-Metrix, which was developed for analyzing the cohesion, language characteristics, and readability of texts (McNamara, Louwerse, Cai, & Graesser, 2005). Applying this text analysis tool revealed that local cohesion based on adjacent sentences for scaffolding-relevant follow-up dialogue units (M = LSA local measure of cohesion of 0.28) was on average significantly more cohesive than explanation-relevant follow-up dialogue units (M = LSA local measure of cohesion of 0.19; F[1,9] = 5.685, p = .041; d = 1.078). In addition, there was a strong trend for global cohesion across sentences to be higher for scaffolding-relevant follow-up dialogue units (M = LSA global measure of cohesion of 0.26) than explanation-relevant follow-up dialogues (M = LSA global measure of cohesion of 0.16; F[1,9] = 4.491, p = .063; d = 1.059).
In short, the fact that scaffolding-relevant response units are more coherent than explanation-relevant response units supports our interpretation that scaffolding-relevant response units were more likely to be jointly constructed. Thus, the Tutees' learning when responding to the Tutor's scaffolding may arise from joint construction, supporting the interaction hypothesis, whereas the Tutees' learning when responding to the Tutor's explanation may arise from self-construction, supporting the student-centered hypothesis.
Why might overhearing scaffolding-relevant follow-up dialogues also be better for the Observers' learning than overhearing explanation-relevant follow-up dialogue units (see Table 4 again)? The same interpretation can be applied here as well: That is, scaffolding-relevant follow-up dialogues tend to be more easily understood by the Observers because they were shorter and more coherent than explanation-relevant follow-up dialogues. We tested whether scaffolding-relevant response units were in fact shorter than explanation-relevant response units by taking the middle 20% of each of the tutoring protocols and calculated the number of words produced. Scaffolding-relevant response dialogue units averaged 30 words, whereas explanation-relevant response dialogue units averaged 66 words. Our interpretation that shorter dialogue units are more comprehensible by an observer is compatible to some results of VanLehn et al. (2003); they also found that shorter learning opportunities were associated with more frequent gains.
22.214.171.124. Tutor's feedback to tutees' errors
We reported in Table 3 (in the Feedback row) that there was an overall negative correlation between the Tutor's feedback and the Tutees' learning. In order to understand this puzzling result, it may be more meaningful to examine the effect of feedback in an interactive way, such as examining only feedback that followed an error that a tutee made, presumably because it is more difficult to predict the utility of positive feedback, feedback usually given on a correct step. The frequency of positive feedback may also be a function of a tutor's style (such as giving more positive feedback for motivation purposes; Lepper et al., 1991). Moreover, Tutees may not learn as much by being told that their actions were correct. Thus, it seems more informative to analyze the effect of the Tutor's negative feedback to errors only.
Although a majority of the studies in the tutoring literature discuss feedback in terms of negative ones, the choice of giving a negative feedback is not at the discretion of the tutor, since it obviously depends on whether an error was made in the first place. However, the Tutor does have control over whether the negative feedback contains only the correct answer, or whether the negative feedback also includes elaborations and justifications. In short, feedback to errors can take one of three forms. Besides giving a negative feedback saying that the response is incorrect or “No,” the Tutor has the additional option of giving a corrective feedback in which the Tutor basically gave the correct answer, such as:
Tutee: [Tutee writes * g] Times gravity.//
Tutor: Times acceleration due to gravity.//
Don't say gravity.//
On the other hand, a tutor can also give an elaborative feedback to an error, such as:
Tutee: FN would be//
would FN be mass of A plus mass of B? Or?//
Tutor: Again you—a force cannot be mass.//
These are two distinct quantities.//
Examining these latter three forms of feedback (negative, corrective, and elaborative) corresponds to analyzing interactive dialogue units of an error followed by a feedback segment.
Table 5 (1st column) shows the average number of feedback-to-error dialogue units for the Good and the Poor Tutees. It is not surprising that there are almost twice as many feedback-to-error units for the Poor Tutees since they committed more errors during tutoring (M = 89) than the Good Tutees, who committed fewer errors (M = 56), as mentioned earlier. Given that errors tend to elicit feedback, the contrast in the frequency of feedback to errors between the Good and the Poor Tutees makes sense.
Table 5. The average frequency per session, correlation, and distribution of Tutor feedback to Good and Poor Tutees' errors
|Good Tutees||43||N.S.||N.S.||15 (34%)||19 (43%)||10 (23%)|
|Poor Tutees||80||r = −.882, p = .048||r = −.835, p = .003||26 (32%)||41 (52%)||13 (16%)|
The contrastive approach clarifies a possible reason for the puzzling overall marginal negative correlation between the Tutor's feedback and the Tutees' learning in Table 3. When the Tutor's feedback to errors were correlated separately for the Good and the Poor Tutees, the overall marginal, negative correlation became an even stronger, negative correlation for the Poor Tutees only (r = −.882, p = .048, see column 2, Table 5). This suggests that the detrimental effect of the Tutor's feedback only affected the Poor Tutees, whereas the Tutor's feedback to errors had no effect on the Good Tutees' learning.
A similar pattern of a strong negative correlation between the Tutor's feedback to errors and learning occurred for the Collaborative Observers as well (r = −.835, p = .003, see 3rd column in Table 5). That is, the Observers suffered when they observed the Poor Tapes, in terms of the frequency of feedback to errors.
The interpretation we offer is the following. Feedback to errors has no effect on the Good Tutees perhaps because they can learn even without feedback; that is, they can ignore the feedback. The Poor Tutees, on the other hand, could not benefit from the Tutor's feedback to their errors perhaps they could not make sense of the feedback, so that the more feedback they received to their errors, the more confused they were (thus less learning); and such confusion might have affected the learning of the Observers who watched their tapes. Recall that we reported earlier that the Poor Tutees overall did express confusion twice as frequently as the Good Tutees. Thus, the Poor Tutees had difficulty making sense of the Tutor's feedback.
But why might the Poor Tutees have difficulty making sense of the Tutor's feedback? One possible reason is that the feedback they received was less informative. For example, corrective feedback is less informative because it only gave the correct answer without further justifications, as in elaborative feedback. Table 5 (the last 3 columns) shows a distribution of the three different types of feedback (negative, corrective, elaborative) to errors. Although the distribution of the three types of feedback the Tutor gave is similar for the Good versus the Poor Tutees (both groups received the lowest proportion of elaborative feedback and highest proportion of corrective feedback), Poor Tutees received proportionately more corrective feedback than elaborative feedback (52 vs. 16%), as compared to Good Tutees (43 vs. 23%). The contrast in the difference between the corrective and elaborative feedback for the Poor Tutees (36%) and the Good Tutees (20%) was significant (F[1,8] = 5.188, p = .052). In other words, the Poor Tutees received significantly more corrective feedback than elaborative feedback (F[1,4] = 32.106, p = .005), whereas there was no significant difference in the two types of feedback received by the Good Tutees. Because corrective feedback is less elaborative and contains no justifications, it may be more difficult for Poor Tutees to make sense of corrective feedback, which is the predominant kind of negative feedback that they received. Not making sense of the corrective feedback they received in turn affected how well the Observers could learn from overhearing them as well.
Thus, the earlier analyses of looking only at Tutor's feedback as independent moves, masked much stronger correlational effects when we analyzed feedback in an interactive and contrastive way. Basically, a tutor's feedback to errors seems harmless to Good Tutees and their Observers but detrimental to Poor Tutees and their Observers. This suggests that feedback per se is not the only critical factor, but what kind of feedback a tutor gives, and whether or not tutees can assimilate, understand, and use the feedback, thus supporting the interaction hypothesis. As reported earlier, the Poor Tutees expressed more confusion than the Good Tutees, perhaps because the feedback they received was less elaborative. Overhearing the Poor Tutees' feedback-to-error dialogue units must have had a detrimental effect on the Observers' learning as well.
126.96.36.199. Summary of dialogue-level analyses
The first set of analyses showed that the most effective form of dialogue units are scaffoldings followed by relevant substantive Tutee responses (see last 2 columns in Table 4), in terms of both the Tutees and the Observers' learning. We surmise that Tutees learned from them because they could jointly construct meaningful follow-up responses to the Tutor's scaffoldings, but less so to the Tutor's explanations. We assumed that jointly constructed dialogues may be shorter and more coherent, and scaffolding-relevant response dialogue units did turn out to be more coherent than explanation-response dialogue units, based on the Coh-Metrix analysis. The finding that the Observers also learned only when they overheard scaffolding-relevant response dialogue units is consistent with the coherence interpretation. Additionally, scaffolding response units may be more understandable than explanation response units because they tend to be shorter, as confirmed by the word count analysis. These findings provide evidence in support of the interaction hypothesis.
The second set of analyses examined feedback to errors. We found that feedback to errors was detrimental to Poor Tutees and Observers of their tapes but not to the Good Tutees and Observers of their tapes. The interpretation we offered was that Poor Tutees needed the feedback and yet possibly could not make sense of the feedback since the Tutor's feedback to them was more of the corrective kind rather than the elaborative kind. Corrective feedback basically gave only the right answer, whereas elaborative feedback gave the justification as well. In short, Tutees' learning is a function of both whether or not they can make sense of a tutor's feedback as well as whether a tutor gives them more elaborative feedback, again, supporting the interaction hypothesis. Thus, the differential learning gains of the Good versus the Poor Tutees as a function of feedback to errors further underscore the importance of the role of the tutees, in being able to make sense of the feedback, and not merely the role of a tutor, in terms of whether the right kind of feedback was given or not.
4.2.3. Episode analyses
In the prior dialogue analyses, we inferred that scaffolding-relevant follow-up units were jointly constructed because they were more coherent. However, we can directly code dialogue units as either jointly constructed or independently constructed by looking at a larger grain size. This would allow us to test the interaction hypothesis more directly. Accordingly, another pass at coding the protocols was undertaken at a larger episode-level grain size.
Segments in the tutoring protocols were combined into episodes. An episode is usually a multi-turn dialogue unit bounded by utterances whose content is relevant to a specific solution node (as shown in Fig. 1). Appendix B illustrates several episodes. For example, Episode III is relevant to Node 2.2.2 in Fig. 1. The appendix in its entirety can be found at http://www.cogsci.rip.edu/CSJarchive/Supplemental/Index.html.
188.8.131.52. Joint and independent coverage of all nodes
For each episode, we differentiated whether the substantive contributions were initiated and covered by the Tutor alone (as in Episode II), the Tutees alone (as in Episode III), or jointly by both the Tutor and the Tutees (as in Episodes IV, V, see Appendix B).
Table 6 shows that a majority of the episodes (55 per tutoring session) were jointly covered by the Tutor and the Tutee, followed by 32 episodes covered independently by the Tutor and 16 independently covered by the Tutees. If we assume that joint coverage involves more scaffolding and independent Tutor coverage involves more explaining, then this difference between the frequency of joint coverage and independent Tutor coverage mirrors the results of greater frequency of Tutor scaffolding than explaining (see Table 3 again).
Table 6. Frequency and correlations for all node episodes with Tutees' and Observers' deep learning
|Tutor and Tutee||55||r = 0.646, p = .043||r = 0.457, p = .043|
|Tutees||16||r = 0.637, p = .047||Trend|
| || || ||r = 0.418, p = .067|
If interacting with the Tutor facilitates learning, then there should be a significant correlation between the frequency of joint coverage and Tutees' learning. Table 6 shows that Tutees indeed learned when they jointly covered a node with the Tutor (r = 0.646, p = .043), thereby supporting the interacting hypothesis. Moreover, the Tutees also learned when they covered the nodes independently (r = 0.637, p = .047), suggesting that independent coverage obviously required them to be constructive, thereby leading to learning, thus supporting the student-centered hypothesis. The significant correlation of the Tutees' independent coverage of nodes replicates the significant correlation of Tutees' substantive moves at the segment level (Table 3). Thus, analyses at two different grain sizes produce the same pattern of results. Finally, the Tutees did not learn when the Tutor independently covered a node (Table 6), just as they did not learn when the Tutor's scaffolding and explaining moves were considered independently (Table 3), thus weakening the important of the tutor-centered hypothesis.
Can the Observers' learning further confirm the interacting and the student-centered hypotheses as well as any of our prior interpretations? The Observers likewise learned from overhearing joint coverage of nodes (r = 0.457, p = .043) but not from the Tutor's coverage of nodes. Again, if we assume that joint coverage involves more scaffolding and independent Tutor coverage involves more explaining, then the same interpretation offered for the results reported in Table 5 can be applied here as well; that is, Observers learn from overhearing joint coverage because joint coverage episodes are more coherent and short, containing scaffolding-response dialogues, whereas Tutor's independent coverage may be less coherent and long. The Collaborative Observers also benefited somewhat from overhearing Tutees' independent coverage of a node (a trend). Their weaker learning from overhearing the Tutees' independent coverage of a node (as compared to joint coverage) reinforces the interpretation that a Tutees' construction often serves their own purposes, and may be less comprehensible to others (Chi, 2000), consistent with the lack of correlation between the Observers' learning and Tutees' irrelevant substantive responses (see Table 3 again).
The contrast between the Observers' learning from overhearing the Tutees' independent coverage but not from overhearing the Tutor's independent coverage is related to some findings in the literature with respect to learning from an expert versus a peer. For example, Hinds, Patterson, and Pfeffer (2001) have found that learners performed better when instructed by novices than by experts in an electronic wiring task. Likewise, Cho, Schunn, and Charney (2006) found that students are far more able to incorporate feedback from their peers than from their instructor in a writing task. These findings, along with the result here of both the Tutees' and the Observers' failure to learn from the Tutor's independent coverage, are consistent with the finding that Tutees also do not learn from receiving the Tutor's explanations (as shown by a lack of correlation in Table 3).
184.108.40.206. Summary of episode analyses
In sum, analyses at the episode level provide further evidence to support both the student-centered and the interaction hypothesis but no evidence in support of the tutor-centered hypothesis. The correlation of learning with joint coverage of problem-solving nodes supported the interaction hypothesis, while the correlation of learning with the tutee's independent coverage of the nodes supported the student-centered hypothesis. The lack of any correlation of learning with Tutor's independent coverage further undermines the tutor-centered hypothesis.