Calling Iranian learners of L2 English: effect of gloss type on lexical retention and reading performance under different learning conditions
Abstract
This study sought to compare how three different gloss types (text–picture, text–audio and text–picture–audio) affected English as a foreign language (EFL) learners' reading comprehension and vocabulary acquisition. The study also compared how results on comprehension and vocabulary acquisition differed across three learning conditions (i.e., incidental, intentional and explicit instruction). A between‐groups design was employed with four groups (N = 135) of Iranian university learners of L2 English. The participants (with upper‐intermediate proficiency level) read English texts. Written recall and multiple‐choice questions were used to measure reading comprehension; vocabulary knowledge scale (VKS) and contextualized vocabulary knowledge test (CVKT) were used to assess vocabulary acquisition. Results of statistical analyses revealed that while the text–picture–audio gloss type consistently resulted in better vocabulary learning and reading comprehension, the learning conditions varied in terms of their immediate and delayed effect on vocabulary and reading scores. This study suggests that learner performances across gloss types are condition specific and provides both pedagogical and theoretical implications.
What is already known about this topic
- Electronic glosses foster reading comprehension and vocabulary acquisition.
- There are different positions about the effectiveness of form focused instruction in grammar, with the focus on forms approach having a higher acceptable rate in SLA. But, this issue has been rarely researched in vocabulary acquisition.
What this paper adds
- This study supports the complementary nature of dual annotations in vocabulary learning and reading comprehension.
- This study extends the issue of form focused instruction to vocabulary learning by comparing the incidental, intentional and incidental–intentional learning orientations.
- This study evaluates the interaction between the multiple gloss types and the learning orientations.
Implications for theory, policy or practice
- This study provides both pedagogical and theoretical implications.
Introduction
Vocabulary development is a significant component in the reading process to promote reading comprehension (Hall, Greenberg, Laures‐Gore, & Pae, 2014; Palmer, Boon, & Spencer, 2014; Simmons et al., 2010); however, it is not easily learned by all learners, particularly English as a foreign language (EFL) learners (Begler, Hunt, & Kite, 2012; Ebbers & Denton, 2008; Huang, Chern, & Lin, 2009). In most cases, second language (L2) reading constitutes the primary and most common source that learners use to learn on their own beyond the classroom. Language learners especially those with lower levels of proficiency are less likely to engage in independent reading activities (Becker, McElvany, & Kortenbruck, 2010; De Naeghel & Van Keer, 2013) and often do not have adequate skills and strategies to autonomously infer the meaning of new words through simply reading (Bryant, Goodwin, Bryant, & Higgins, 2003), which in turn impedes their vocabulary development (Roberts, Torgeson, Boardman, & Scammacca, 2008). To complicate the matters more, the large number of new words in textbooks often exceeds the learners' reading level (De Naeghel, Van Keer, Vansteenkiste, & Rosseel, 2012; Silva, Verhoeven, & van Leeuwe, 2011), thus aggravating the already existent gap in vocabulary development among struggling readers and their more proficient peers (Stanovich, 1986).
The recognition of applied linguists that language learners cannot succeed in language learning from entirely meaning centred instruction has led them to suggest that learners can benefit from form‐focused instruction (FFI). Focus on form (FonF) and Focus on forms (FonFs) instructions were introduced as two main types of FFI. Whereas FonF attends to linguistic features that arise within a communicative task (Ellis, 2001; Long, 1991), FonFs refers to the discrete language components in separate lessons in an order put forward by syllabus designers (Laufer, 2006). Although this distinction has been discussed with reference to the acquisition of grammatical structures, these approaches can be adjusted easily to the context of vocabulary learning and instruction (Laufer, 2006). Laufer (2006), in accordance with Ellis's (2001) view of grammar, extends the FFI distinction to vocabulary learning by stating that FonF considers the noticed words as ‘tools for task completion’ and the FonFs ‘treats the words attended to as the objects of study’ (p. 150). Therefore, FonF emphasizes vocabulary words in a communicative task environment because of the fact that the completion of a communicative task requires the comprehension of the lexical items. In contrast, FonFs teaches and uses words in non‐communicative language tasks.
Vocabulary research has not attended much to form‐focused instruction. One reason behind this lack of attention can be the consistent belief of some scholars in the ‘default hypothesis’ in vocabulary learning, which considers vocabulary learning as originating mainly from reading. There is, however, robust empirical evidence indicating that only a small number of L2 words can be recalled from exposure to texts without any subsequent vocabulary practice (Laufer, 2006; Loewen, 2003). With respect to the fact that meaning‐focused approach does not necessarily bring about a satisfactory vocabulary development, vocabulary teaching should also integrate an FFI component. The present study was designed to examine the explicit instruction (with a FonFs emphasis), incidental (focus on meaning approach) and intentional (planned FonF approach) learning conditions with the use of electronic glosses in computer assisted vocabulary learning and reading comprehension.
Literature review
Intentional and incidental learning conditions
Barcroft (2009) defines incidental learning condition as learners' learning of new words from the context without having a purpose to do so. According to Hulstijn (2003), the most general meaning of incidental learning is couched in terms of vocabulary learning through reading and/or listening. Hulstijn argues that this meaning‐based approach ‘focuses on the meaning of words and texts through reading, rather than through the conscious, intentional memorization of lists of word forms and their meanings’ (p. 358). In intentional vocabulary learning, according to Schmitt (2008), the specific purpose is, however, to learn vocabulary usually through an intended focus which is believed to be the most effective and fastest approach in promoting retention and mastery.
There is compelling evidence that much of children's vocabulary acquisition in their first language takes place incidentally based on repeated exposure over time (Nagy & Scott, 2000; Suggate, Lenhard, Neudecker, & Schneider, 2013). However, this gradual approach might not address the comprehensive vocabulary needs of second and foreign language learners because of a lack of sufficient input exposure (Nation, 2013) and the slow nature of this process (Paribakht & Wesche, 1997). In response to this drawback, researchers attempted to find more appropriate ways of accelerating the process of word learning, while not forfeiting the meaning‐focused input from which incidental learning can take place.
Laufer (2005) compares several studies that fostered an explicit focus on vocabulary learning. Activities in which words were treated as the objects of learning, and were related to, although not seeded in, meaning‐based tasks brought about 33–86% word gain and those that asked learners to work with decontextualized words (not related to any meaning‐based tasks) led the learners to remember 13–99% of the words. These variations indicated different types of design and study elements, but could compare highly favourably with the findings obtained from incidental learning. Xie (2013), using classroom observation, audio‐taping, video‐taping and stimulated reflection of four English teachers' vocabulary teaching practices, concluded that the EFL teachers showed a heavy reliance on the use of explicit vocabulary definition in preference to examples without definition during reading activities. Xie (2013), however, suggests teachers use activities that cause attention to form while maintaining meaningful communication. Maynard, Pullen, and Coyne (2010) compared the effectiveness of ‘rich instruction’ which aimed to directly instruct the meanings of the target lexical items within the context of the story reading, ‘basic instruction’ which presented simple definitions of the encountered lexical items to the learners, and ‘incidental instruction’ in which the target lexical items were included in the story but were not subject to any teaching or definition. Results demonstrated both short‐term and long‐term gains for rich instruction over both basic and incidental instructions.
The best interpretation of these studies is probably that although intentional condition is relatively more conducive to EFL vocabulary learning, it poses some unavoidable difficulties for teachers and material developers such as ‘teaching all the contextual types of word knowledge’ (Schmitt, 2008, p. 353). Schmitt goes on to recommend the promotion of meaning‐focused incidental exposure as an equal complement to intentional consolidation and enhancement of word knowledge. This stance also gets support from Ortega's (2009) argument that the task of reading comprehension inherently directs learners' attention to the words, thus blurring the distinction between incidental and intentional conditions.
These arguments highlight that vocabulary learning cannot be dependent solely on implicit incidental learning but needs to be controlled. According to Takac (2008), explicit vocabulary teaching would ascertain that ‘lexical development in the target language follows a systematic and logical path, thus avoiding uncontrolled accumulation of sporadic lexical items’ (p. 18). However, the role and contribution of explicit vocabulary teaching are still controversial because some believe that learning is not as linear and systematic as the teaching of it (Lewis, 2000). The current approaches to vocabulary instruction, then, acknowledge the significance of both implicit and explicit approaches. In the present study, intentional learning encompassed a focus on the glossed items by encouraging learners to deliberately attend to the glossed lexical items during reading, and this learning condition was distinguished from the explicit instruction condition where the focus was moved a step forward by devoting time to connecting L2 lexical items with their L1 equivalents before reading, and reviewing and consolidation of the presented words during the reading activity.
In sum, intentional learning needs to complement meaning‐focused incidental exposure especially in EFL contexts where the input conditions of the first language cannot be recreated. One way to help learners take better advantage of the available exposure is glossing‐supplementary lexical information incorporated in the reading text.
Glosses
Jung (2016) defines a gloss as the ‘information provided about an unfamiliar linguistic item in the form of a definition, synonym, or translation in order to reduce the linguistic obscurity, and in so doing, assist reading comprehension’ (p. 93).
Glossing presents learners with supplementary information to help them compensate for the lack of adequate contextual clues in acquiring new words while reading (Ko, 2012). L2 reading texts have been glossed in different ways in the past studies such as L1 versus L2 glosses (e.g., Ko, 2012; Taylor, 2006; Yoshii, 2006), electronic versus hardcopy glosses (e.g., Bowles, 2004; Lee & Lee, 2015), textual versus hypermedia glosses (e.g., Chun & Plass, 1996; Nikolova, 2002) and single versus multiple glosses (Rott, 2005; Watanabe, 1997), among others. There are also variations in the studies with respect to the degree of elaboration, with some providing a simple definition or synonym of the lexical item (e.g., Guidi, 2009; Hulstijn, 1992), some a definition and an example sentence (e.g., Hulstijn & Laufer, 2001) and others a text and a related picture or video clip (e.g., Al‐Seghayer, 2001; Chun & Plass, 1996a; Jones, 2003), to mention a few. These different methodological approaches to glossing have led to inconclusive results in the existing literature as to the role of gloss in reading and vocabulary gains.
Below we present the theoretical framework supporting glossing followed by a review of the relevant empirical studies on the role of glossing in vocabulary acquisition and reading comprehension.
Generative theory of multimedia learning (GTML)
Generative theory of multimedia learning (GTML) supports the efficacy of dual channels in multimedia learning and sustains a hypothetical basis for understanding how individuals learn in multimedia contexts. GTML favours the learners' meaningful and purposeful involvement in both verbal and visual cognitive processing. The way the multimedia functions advocates Paivio's (1990) dual coding theory, which expresses that a mix of imagery and verbal information enhances data transforming. Chun and Plass (1996) have additionally recommended that individuals recall pictures better than words and that words are effectively recalled in the case that they are accompanied by pictures.
Dual coding theory provides two subsystems of information analysis, one for verbal data and one for visual materials (Paivio, 1990). Therefore, multimedia is exploiting the full ability of human information processing systems. Mayer (2001) clarifies the favourable aspects of multimedia learning both quantitatively and qualitatively: (1) exhibiting the same material in two channels is better than one channel by offering the data twice (redundancy principle); and (2) words and pictures supplement one another in facilitating human mental representations of both visual and verbal information. In fact, research results, too, corroborate the superiority of dual over single glosses in promoting incidental L2 vocabulary learning (Al‐Seghayer, 2001; Chun & Plass, 1996; Jones, 2003).
Studies on glossing in reading comprehension and vocabulary acquisition
The literature regarding the use of glossing and its effectiveness in reading comprehension continues to produce mixed results. Although some studies support the positive effect of glossing on reading comprehension (e.g., Abuseileek, 2008; Gettys, Imhof, & Kautz, 2001; Ko, 2012; Lomicka, 1998; Martinez‐Fernández, 2010; Nikolova, 2002; Sakar & Erçetin, 2005), others have shown that glossing can be ineffective (Bell & LeBlanc, 2000; Jung, 2016; Lee, Lee & Lee, 2015; Yanguas, 2009). Sakar and Erçetin (2005) demonstrated that glosses, particularly the audio and video glosses, had a negative influence on intermediate learners' comprehension scores. Similarly, Ariew and Erçetin (2004) found negative impact of visual glossing on intermediate and advanced level learners' comprehension performance. The study by Chen and Yen (2013), on the other hand, indicated the positive role of glosses for both reading comprehension and vocabulary acquisition. This study evaluated textual gloss, glossary annotation and pop‐up gloss formats on electronic reading performance and vocabulary learning of 83 non‐English‐majored college learners in Taiwan. The results showed that, for reading comprehension, the best performance came from the pop‐up gloss group while the vocabulary performance benefitted equally well from all gloss types. This result supports Morrison's (2004) and Yao's (2006) studies which found a superior effect of pop‐up glosses on reading comprehension in comparison to in‐text and marginal glosses. Last, Tseng, Yeh, and Yang (2015) showed that marking vocabulary and adding L1 explanatory notes could scaffold learners' comprehension to reach recognition and understanding of the meaning of unknown words.
When it comes to the effect of glosses on vocabulary acquisition, however, the majority of previous research highlights a positive role for both hard copy glosses (Alessi & Dwyer, 2008; Hulstijn, Hollander, & Greidanus, 1996; Ko, 2012; Watanabe, 1997; Webb, 2007) and multimedia glosses (De Ridder, 2002; Gettys et al., 2001; Kim & Kim, 2012). Mohsen and Balakumar's (2011) meta‐analysis of the effect of multimedia glosses on vocabulary learning during the past seventeen years showed the facilitative contribution of multimedia glosses in vocabulary learning. More specifically, they have demonstrated a significant superiority for multimedia glosses contrasted with traditional glosses in supporting L2 vocabulary retention. Multiple glosses have been discovered to be more powerful than a single annotation or no annotation in an L2 reading context.
The present study
Previous research on L2 word glossing, specifically in a multimedia environment, indicates little consistency about how different multiple glosses, specifically text–picture, text–audio and text–picture–audio, influence L2 word meaning learning and reading comprehension. Also, a lack of enough studies on audio glosses compared to other modalities (e.g., Sadeghi & Ahmadi, 2012) warrants further research in this area. In addition, the link between different learning conditions (including incidental, intentional, and explicit instruction) and vocabulary learning in multimedia environment has not been investigated.
Motivated by former research on multimedia glossing and existing gaps in the literature, the principal objective of this study is to reveal how different multiple gloss types influence L2 vocabulary meaning learning and reading comprehension in incidental, intentional and explicit instruction conditions. This study expands the application of multimedia learning through the generative theory of multimedia learning (Mayer, 1997, 2001) which supports the complementary nature of dual channels in facilitating the registration of both visual and verbal information. In light of gaps in the literature, the following research questions were explored:
- Do multiple gloss types (text–picture, text–audio and text–picture–audio) affect the receptive vocabulary learning scores on immediate and delayed vocabulary tests?
- Do multiple gloss types (text–picture, text–audio and text–picture–audio) affect L2 reading comprehension?
- Do learning conditions (incidental, intentional and explicit instruction) affect receptive vocabulary learning scores on immediate and delayed vocabulary tests?
- Do learning conditions (incidental, intentional and explicit instruction) affect reading comprehension?
- Is there an interaction between gloss types and learning conditions in immediate and delayed receptive vocabulary tests?
- Is there an interaction between gloss types and learning conditions in reading comprehension?
Participants
The participants in this study were 135 first‐year university EFL learners (in four groups) majoring in English Literature. Participants included both male (N = 59) and female (N = 75) learners with an age range of 18–21. The four classes were randomly assigned to incidental (N = 33), explicit instruction (N = 31), intentional (N = 35) and control (N = 36) groups. All the participants were taking a reading comprehension course conducted in a language lab. They all expressed familiarity and comfort using computers and reading English texts on the screen. Both the teacher and the participants gave consent to take part in this research.
Instrumentation
Reading book
The ‘Reading and Vocabulary Development 4: Concepts and Comments’ book (Ackert & Lee, 2005) at the upper‐intermediate level was the source from which the reading texts were adopted. The book, including authentic and updated content, is a reading skills text designed for learners of English as a second or foreign language. The book consists of five units each including four lessons about a particular topic. All the texts were assumed to be equal in terms of content and difficulty because the book is written for a specific level of proficiency (i.e., upper‐intermediate). However, readability indices were also calculated using the Flesch Reading Ease Formula (Flesch, 1948), and the results met the level of participants' reading ability.
Vocabulary knowledge scale (VKS)
Paribakht and Wesche's (1997) VKS was used to measure the participants' immediate and delayed vocabulary knowledge progress. The VKS was adjusted in this study by excluding the last two levels of the scale, that is levels IV and V, because the purpose of the present study was not the measurement of the productive vocabulary knowledge. Each of the immediate VKS tests, administered every three sessions, consisted of 24 items (8 keywords from each lesson) and the delayed one included 100 items which were selected randomly from each lesson. Participants were allowed 30 seconds to respond to each item. The scores to VKS were assigned based on a 0–2‐point scale adopted from Paribakht and Wesche (1997). Level I received a score of 0 which indicates complete unfamiliarity with the item, level II got a score of 1 showing the recognition of the word without knowing its meaning, and level III received a score of 2 if a correct synonym or translation was provided. The maximum possible score was 16 per test (2 points × 8 words). The Cronbach's alpha reliability coefficient of the VKS was 0.86, and the factor analysis accounted for 80.52% of its total variance.
Contextualized vocabulary knowledge test (CVKT)
CVKT which presents the target words in their original text was administered to the participants to measure their receptive vocabulary knowledge in context rather than in isolation. Each of the immediate CVKT, administered every three sessions, included 24 items (eight keywords from each lesson), and the delayed one had 100 items in which participants were asked to choose only one correct answer from among four choices within 30 seconds for each item. The learners were instructed to select only one correct answer from among four choices. The CVKT followed a binary correction where a correct answer had a score of 1 and an incorrect response received 0. The maximum score was 8 per test (1 point × 8 words). The reliability index of CVKT was estimated by means of Cronbach's alpha (α = 0.71), and its validity accounted for 84.36% of the total variance. The VKS and CVKT post‐tests were distinct from the pre‐tests by presenting the finalized lexical items in a different order.
Reading comprehension test
Multiple‐choice and written recall questions were used to evaluate participants' comprehension of the practiced texts during the treatment. Here, too, the participants selected one correct answer from among the choices or provided short answers to the written recall questions within one hour. The reading test was administered after each three sessions of treatment and included a total of 22 multiple‐choice and three written recall questions. In the written recall test, participants were required to write down in English about the text that they read; however, poor English was ignored because the goal was to solely gain a measure of learners' comprehension. Learners' responses to multiple‐choice questions were scored as either correct (1 point) or incorrect (0 point). The answers to the written recall test were analysed according to the ‘main idea units’ recalled correctly for each text, and a comparison was made between these units and the original text to ascertain their occurrence or inference from the text. The multiple‐choice task was found reliable (α = .80) and valid (accounting 79.17% of the variation). Moreover, the inter‐rater reliability of the written recall task using Cohen's Kappa test (k = .76, p = .000) indicated a high agreement between the two raters.
Programme
The Foreign Language Annotator (FLAn) Programme (Thibeault, 2014) was used to present unfamiliar words by means of hypermedia links in three formats of text–picture, text–audio and text–picture–audio. The pictures were selected from the Internet and were all colour pictures. The programme was available without any Internet connection requirement (see Figure 1).

Procedure
To ascertain ecological validity, the experiment was conducted during the learners' regular course, and the target words and the contexts were elicited from the existing learning materials. The criterion for the selection of target words was their unfamiliarity (based on pre‐test results) to the learners and their being keywords in the text which appeared only once. Prior to the treatment, the learners were tested using VKS, CVKT and TOEFL on their knowledge of the target items and level of proficiency.
The participants were randomly assigned to four intact classes (i.e., three experimental groups and one control group). Each class was differentiated from the other according to the learning condition (i.e., incidental, intentional and explicit instruction). Incidental learning was operationalized in the present study according to the absence of explicit instructions to intend to learn target vocabulary items. Although participants in the incidental condition were not instructed to attempt to learn target words, they were informed of the glosses, and they could use them if needed. Learners in the explicit instruction group, however, were provided with a list of target words with L1 translations prior to reading the text. They were instructed to rehearse and learn the words which would occur in the text. And, in the intentional learning condition, participants were asked to consult the glosses in reading and comprehending the presented text as well as in answering the subsequent reading comprehension questions.
This three way classification of learning conditions is based on the form‐focused instruction hypothesis put forward by Long (1991), according to which the focus‐on‐forms (FonFs) instruction refers to the pre‐selection of specific features based on a linguistic syllabus and the intensive and systematic treatment of those features. This approach formed the basis of the explicit instruction condition practiced in the present study. On the contrary, focus on meaning instruction, i.e., the incidental learning in the current study, excludes attention to the structural components of the language (Doughty & Williams, 1998), and it devotes little or no time to the discrete elements of the language. And, in FonF instruction, attention to form arises out of the performance of meaning‐centred tasks. According to this definition, then, the intentional condition in the present study falls into the planned FonF instruction type where there was a focused and intensive task which concentrated on the linguistic feature (i.e., unknown words) that was the target of the task, that is, although the participants' tasks were to read for comprehension and then to answer the comprehension questions, they were invited to pay special attention to glossed words in so doing. Learners in the explicit instruction group, however, spent more time on practicing the target words than did the other learners by attending to and practicing the glossed words (by being explicitly taught) before the act of reading.
As for glossing, participants in each class were assigned to all gloss groups of text–picture, text–audio and text–picture–audio, in random order. A counter balancing design was used so that each class started with a different type of word gloss assigned randomly. The computers were preset to provide different gloss types for every six sessions of treatment. Put differently, in each class based on a particular learning condition, except for the first session which was totally devoted to the pre‐tests, the remaining 18 sessions were divided into three periods of six sessions for each gloss type. Before the reading sessions, a tutorial was given to familiarize the participants with the reading tasks and the FLAn (Thibeault, 2014). After each three sessions of treatment out of 19 total sessions, the participants in both the treatment and control groups were given the post‐tests including the VKS, CVKT and the reading comprehension test. The VKS was administered before the CVKT to control the testing effect resulting from the participants' recognition of the words. These tests were also used after three months to evaluate the delayed effect of the treatment. No participants withdrew from the study, and all the participants answered the delayed post‐tests, too. And, because the 3‐month interval coincided with the summer break, it is believed that the participants did not rehearse the target words. The detailed design of the study is depicted in Figure 2.

Results
To begin with, at the outset of the study, the homogeneity of all groups in terms of English language proficiency including reading capability of participants was confirmed (p = 0.07). The Kolmogorov–Smirnov test confirmed the normal distribution of the data (p > 0.05). A series of analysis of variance (ANOVA) tests were performed in order to provide answers to the research questions.
As a first step in the analysis of the data, the effect of gloss and condition type on the VKS taken immediately after the treatment was investigated. Table 1 presents the descriptive statistics.
| Conditions | Gloss | Mean | Std. deviation | N |
|---|---|---|---|---|
| Incidental | Text–picture | 94.04 | 31.66 | 66 |
| Text–audio | 89.00 | 22.88 | 66 | |
| Text–picture–audio | 84.22 | 16.48 | 66 | |
| Total | 89.09 | 24.68 | 198 | |
| Intentional | Text–picture–audio | 124.15 | 13.14 | 70 |
| Text–audio | 93.88 | 14.54 | 70 | |
| Text–picture | 103.11 | 17.71 | 70 | |
| Total | 107.05 | 19.79 | 210 | |
| Explicit instruction | Text–picture–audio | 104.82 | 17.29 | 62 |
| Text–audio | 89.27 | 15.17 | 62 | |
| Text–picture | 98.79 | 20.45 | 62 | |
| Total | 97.62 | 18.80 | 186 | |
| Control group | No‐gloss | 83.57 | 20.74 | 216 |
| Total | 83.57 | 20.74 | 216 | |
| Total | Text–picture–audio | 104.79 | 22.75 | 198 |
| Text–audio | 90.81 | 17.98 | 198 | |
| Text–picture | 98.73 | 24.17 | 198 | |
| No‐gloss | 83.57 | 20.74 | 216 | |
| Total | 94.23 | 22.95 | 810 |
As shown in Table 1, the intentional condition achieved a higher VKS score (M = 107.05, SD = 19.79) followed by the incidental (M = 89.09, SD = 24.68), explicit instruction (M = 97.62, SD = 18.80) and control (M = 83.57, SD = 20.74) groups. Moreover, all the experimental conditions performed well when presented with the text–picture–audio gloss format (M = 104.79, SD = 22.75).
A two‐way ANOVA was used to check whether these differences were significant. ANOVA test indicated the differences to be statistically significant both for the learning conditions, F(2, 800) = 41.78, p = .000, ŋ2 = .095, and the types of gloss, F(2, 800) = 23.64, p = .000, ŋ2 = .056. The condition by gloss interaction was also found significant, F(4, 800) = 15.87, p = .000, ŋ2 = .074. Effect sizes for the learning conditions referred to the superiority of intentional condition over the explicit instruction (p = .000, d = .48), incidental (p = .000, d = .80) and control (p = .000, d = 1.15) groups. Moreover, the explicit instruction group showed a higher performance compared to the incidental (p = .000, d = .38) and control (p = .000, d = .70) groups. The effect sizes for glosses showed the superiority of the text–picture–audio over text–audio (p = .000, d = .68) and text–picture (p = .01, d = .25), with the text–picture gloss outperforming text–audio gloss (p = .000, d = .37). All the gloss types led to high VKS than the no gloss condition (p = .000, d > .3). These differences are plotted in Figure 3.

In order to determine whether condition and gloss type had independent or interactive effects on the immediate CVKT performance, another two‐way ANOVA was conducted. The homogeneity of variance assumption of the ANOVA was maintained. Descriptive results of immediate CVKT are shown in Table 2.
| Conditions | Gloss | Mean | Std. deviation | N |
|---|---|---|---|---|
| Incidental | Text–picture | 50.07 | 11.41 | 66 |
| Text–picture–audio | 46.36 | 10.72 | 66 | |
| Text–audio | 42.89 | 10.67 | 66 | |
| Total | 46.44 | 11.27 | 198 | |
| Intentional | Text–picture–audio | 49.55 | 11.26 | 70 |
| Text–picture | 43.07 | 9.87 | 70 | |
| Text–audio | 39.44 | 9.46 | 70 | |
| Total | 44.02 | 11.007 | 210 | |
| Explicit instruction | Text–picture–audio | 44.58 | 11.13 | 62 |
| Text–picture | 36.70 | 11.88 | 62 | |
| Text–audio | 26.87 | 8.28 | 62 | |
| Total | 36.05 | 12.76 | 186 | |
| Control group | No‐gloss | 25.75 | 15.25 | 216 |
| Total | 25.75 | 15.25 | 216 | |
| Total | Text–picture–audio | 46.93 | 11.17 | 198 |
| Text–picture | 43.41 | 12.24 | 198 | |
| Text–audio | 36.65 | 11.66 | 198 | |
| No‐gloss | 25.75 | 15.25 | 216 | |
| Total | 37.91 | 15.13 | 810 |
According to Table 2, the incidental (M = 46.44, SD = 11.27) and intentional (M = 44.02, SD = 11.007) conditions both gained high CVKT scores, outperforming the explicit instruction (M = 36.05, SD = 12.76) and control (M = 25.75, SD = 15.25) groups. Table 2 also points to the superiority of text–picture–audio gloss type (M = 46.93, SD = 11.17). The results of two‐way ANOVA for the CVKT immediate test showed that there were significant main effects for gloss, F(2, 800) = 38.51, p = .000, ŋ2 = 0.88, and condition, F(2, 800) = 39.10, p = .000, ŋ2 = 0.089. Results also indicate significant interaction effects, F(4, 800) = 7.51, p = .000, ŋ2 = 0.036 (see Figure 4). Based on effect sizes, the incidental (p = .000, d = .86) and intentional (p = .000, d = .66) conditions achieved higher than the explicit instruction condition, but there was an insignificant difference between the incidental and intentional conditions (p = .17, d = .21). All the groups performed better than the control group (p = .000, d > .7). Moreover, effect sizes indicated significant superiority of text–picture–audio over text–audio (p = .000, d = .90), text–picture (p = .01, d = .30) and no gloss (p = .000, d = 1.58). And, the text–picture performed better in comparison to the text–audio gloss (p = .000, d = .56).

In addition to completing the immediate tests of vocabulary knowledge, participants were asked to provide answers in the delayed post‐tests. The delayed post‐test data were analysed using one‐way ANOVA. Significant differences were found between the conditions in both the delayed VKS, F(3, 809) = 213.49, p = .000, and delayed CVKT, F(3, 809) = 520.23, p = .000. There was a significant difference between the delayed VKS and CVKT (p = .000). In the delayed VKS, similar to the immediate one, the intentional group obtained the highest score (M = 157.22, SD = 24.38). Likewise, in the delayed CVKT, the intentional group obtained the highest score (M = 68.80, SD = 16.37), which is different from the immediate CVKT where the incidental condition group (M = 46.44, SD = 11.27) received the highest score. In sum, the ANOVA results for the first and third research questions showed differences between the immediate and delayed post‐tests and among the conditions.
A two‐way ANOVA test was conducted to demonstrate how gloss type and learning condition interactions contributed to the reading comprehension performance of the participants. The homogeneity of variance assumption of the ANOVA was sustained. First, the descriptive statistics were computed (see Table 3).
| Conditions | Gloss | Mean | Std. deviation | N |
|---|---|---|---|---|
| Incidental | Text–picture | 36.50 | 10.28 | 66 |
| Text–picture–audio | 34.18 | 9.41 | 66 | |
| Text–audio | 28.34 | 17.06 | 66 | |
| Total | 33.01 | 9.61 | 198 | |
| Intentional | Text–picture–audio | 41.20 | 8.72 | 70 |
| Text–picture | 35.97 | 10.08 | 70 | |
| Text–audio | 33.78 | 9.90 | 70 | |
| Total | 36.98 | 10.04 | 210 | |
| Explicit instruction | Text–picture–audio | 30.70 | 10.59 | 62 |
| Text–picture | 25.70 | 7.91 | 62 | |
| Text–audio | 19.72 | 16.93 | 62 | |
| Total | 25.38 | 9.68 | 186 | |
| Control group | No‐gloss | 22.18 | 9.63 | 216 |
| Total | 22.18 | 9.63 | 216 | |
| Total | Text–picture–audio | 35.57 | 10.48 | 198 |
| Text–picture | 32.93 | 10.67 | 198 | |
| Text–audio | 27.57 | 9.95 | 198 | |
| No‐gloss | 22.18 | 9.63 | 216 | |
| Total | 29.40 | 11.41 | 810 |
As demonstrated in Table 3, intentional condition scored higher (M = 36.98, SD = 10.04) than the incidental (M = 33.01, SD = 9.61), explicit instruction (M = 25.38, SD = 9.68) and control (M = 22.18, SD = 9.63) conditions. Results also indicated the dominance of text–picture–audio gloss format (M = 35.57, SD = 10.48).
Results of ANOVA indicated that condition, F(2, 800) = 79.28, p = .000, ŋ2 = .165, and gloss, F(2, 800) = 39.15, p = .000, ŋ2 = .089, had significant effects on L2 reading comprehension (see Figure 5). A significant interaction effect between condition and gloss was found, F(4, 800) = 4.44, p = .001, ŋ2 = .022. Effect size comparisons indicated a significant difference among all the conditions, with the intentional group outperforming the explicit instruction (p = .000, d = 1.17), incidental (p = .000, d = .40) and control (p = .000, d = 1.50) groups. In addition, the incidental condition performed better than the explicit instruction (p = .000, d = .79), with both faring better than the control group (p = .000, d = 1.12, .33), respectively. In addition, the text–picture–audio gloss was better than the text–audio (p = .000, d = .78) and text–picture (p = .02, d = .24) glosses, followed by the text–picture superiority over text–audio (p = .000, d = .51). All gloss types were found more effective than the no gloss condition (p = .000, d > .5). Overall, the results of ANOVA for the second and fourth research questions showed significant differences among multiple gloss types and learning conditions.

Discussion
The present study was conducted to investigate the effect of multiple electronic gloss types on EFL learners' reading comprehension and vocabulary achievement under different learning conditions. Moreover, the interaction between the glosses and the learning conditions was examined. Results of the analyses for the first and second research questions ascertained the positive influence of multimedia glosses in facilitating foreign language vocabulary acquisition and reading comprehension. Based on the results reported above, the text–picture–audio annotation group consistently outperformed the text–picture, text–audio and no‐gloss conditions regardless of the different post‐test measures. This result is in line with both the generative theory of multimedia learning and the cognitive load theory. In other words, participants tend to learn more words when both visual and verbal annotations are used than when only one type of gloss or no gloss is used, a finding which is supported by the dual coding assumption of dual coding theory and the GTML (Mayer, 1997, 2001; Paivio, 1990; Plass, Chun, Mayer, & Leutner, 1998).
The dual channel hypothesis, particularly the modality aspect, of the GTML can be utilized to justify this result (Mayer, 2001). Mayer discriminates two different channels for managing visual/pictorial and auditory/verbal data. The modality impact assumes that working memory has to a certain extent autonomous processors for processing pictures and audio. The efficient working memory resources would be expanded by utilizing both visual and auditory channels (Mayer & Moreno, 1998). According to GTML, text and audio glosses are both provided verbally, and picture gloss is non‐verbally provided data. Therefore, both multimedia glosses include a mix of verbal and non‐verbal data. According to the modality principle (Baddeley, 1999; Mayer, 2001), while the audio gloss is analysed by the auditory channel, the text gloss and picture gloss would be handled by the visual channel. Consequently, in picture glosses, the concurrent register of both pictorial and textual data leads the visual channel to be over‐excessed. As a result, the cognitive resources accessible in the visual working memory need to be shared between the information presented by texts or pictures, while the auditory (phonological) working memory is not used. On the other hand, in text–picture–audio glosses, the auditory channel is used for the registration of the audio, and it is analysed in the phonological working memory, while the image and text are registered by the visual channel and processed in the visual working memory. This integration permits cognitive resources in both working memories to be utilized. Put another way, text–picture–audio glosses required more cognitive resources compared with text–picture glosses.
This theory can explain the gloss scores in incidental, explicit instruction and intentional learning conditions. As stated previously, analyses of the data for the third research question on the effect of learning conditions on immediate and delayed word meaning learning showed that learners in the incidental group were more successful in the text–picture gloss type while the explicit and intentional group learners achieved higher scores in the text–picture–audio format. In a hypertext context, incidental learning takes place when the learner learns the information peripheral to the purpose of the task even when the purpose is adapted during the task. Baylor (2001) suggests that external factors such as hypermedia links can negatively influence incidental learning. The assumption that the processing of text–picture glosses by learners in the incidental group required less cognitive resources compared to the text–picture–audio format suggests that incidental learning can be a means to improve learning in hypermedia contexts without putting extra demands on cognitive resources. On the other hand, the nature of intentional learning let the learners localize their cognitive resources to the words and efficiently use their working memory capacities.
The superior benefits of text–picture–audio gloss on L2 word meaning learning can also be attributed to the split‐attention principle (Mayer & Moreno, 1998). Learners who are exposed to the text–picture glosses need to divide their focus in the visual working memory between various visual information such as written texts and images. Learners, who are exposed to text–picture–audio annotations, approach the audio as an auditory resource and the picture as a visual resource through auditory working memory and visual working memory separately, necessitating an attention division in either auditory working memory or visual working memory. Along these lines, efficient working memory would be enhanced by introducing data in a blended (visual and audio) instead of a unitary mode (visual or audio). Thus, the text–picture–audio gloss led to higher vocabulary recall than the text–picture and text–audio annotation in the present study.
The intentional learning condition led to better immediate VKS recall and the incidental condition resulted in improved recall in the immediate CVKT. However, the intentional learning condition was found more effective in both the VKS and CVKT delayed tests. This finding can be explained from two perspectives. First, the acquisition of unknown vocabulary items in a target language is very much related to the degree of mental effort put by the learner on that specific word. The assumption that in the intentional learning group, learners' attention was mostly directed to both text comprehension and vocabulary learning, they could devote more attention to the words on VKS as a definition‐supply test. On the other hand, incidental group learners who attended to the meaning of the reading text could hold the words temporarily in the short term memory and therefore could not register them into long‐term memory regarding the lack of attention. As a result, learners in the incidental group were more successful in CVKT as a receptive test that required the recognition of the meaning solely. This also explains the superior performance of learners in the intentional group in the delayed VKS and CVKT because they could successfully divide their attention to form and meaning.
The results of the analyses for the fourth research question which was concerned with the effect of learning conditions on reading comprehension pinpointed that learners in the intentional group gained the highest score in the reading comprehension test. In fact, this finding was not surprising because it was expected that the intentional condition could bring about a more successful reading comprehension because the participants did not intend to retain the word meaning, but they just checked the glosses for text comprehension and their attention was focused on text comprehension. On the contrary, incidental learners devoted their attention steadily on text and the explicit instruction group learners were more focused on target vocabulary learning. From another perspective, the reading comprehension test included questions of both multiple‐choice and written recall format. Therefore, whereas a multiple‐choice question is a discrete‐point task with a focus on segregated pieces of information, written recall is considered a global task with a focus on holistic comprehension. The combination of both tasks would require both incidental and intentional conditions, explaining the superiority of the intentional group in the current study. Finally, the significant interaction between gloss and learning condition in all measures suggests that the efficiency of text–picture–audio format over other gloss types in enhancing vocabulary recall depends on the learning condition.
Conclusion and implications
The current study sought to compare dual multimedia annotation types and their effects on immediate and delayed vocabulary performance and reading comprehension under incidental, explicit instruction and intentional learning conditions. The outcomes pointed to a consistent effectiveness of text–picture–audio gloss type over other formats irrespective of the tests. And, while the intentional condition was found more effective in the promotion of vocabulary knowledge in VKS and reading tests, the incidental condition proved to be more effective in the immediate CVKT.
This study provides both pedagogical and theoretical implications. On the pedagogical side, it offers implications relevant to the design of multimedia instruction and foreign language teaching. In particular, the results suggest that the utilization of text–picture–audio gloss encourages L2 vocabulary recall more effectively than single annotations. In developing multimedia courses or materials, this finding offers direction in deciding about information presentation in diverse modes. This could, likewise, inform language instructors and administrators in choosing the best multimedia projects to improve vocabulary meaning learning and reading comprehension. Presenting the definitions of a word along with its auditory explanation and picture simultaneously appears to foster cognitive processing of multimedia information. Consequently, material developers as well as teachers who are interested in developing their own or supplementary materials according to their learners' needs should take the contiguity principle into account. The contiguity principle of generative theory of multimedia learning postulates that providing verbal with visual and auditory information simultaneously decreases cognitive load and leads to enhanced learning. Furthermore, it should be noted that different learning conditions require distinct devotion of learners' attention based on what is considered important by the learning objective. Regarding vocabulary learning, intentional condition needs to be welcomed if the final objective is vocabulary acquisition. Learners should be helped to retain the meaning of exposed words after the first encounter by means of external aids such as pictures to reinforce the relationship between form and meaning. Intentional learning condition assisted participants to consolidate their guesses of word meanings by encouraging them to use the glosses. On the theoretical side, this study provides evidence for both the generative theory of multimedia learning that distinguishes between visual and verbal working memory in vocabulary meaning learning and text comprehension, and form‐focused instruction that promotes attention to forms.
Individualized instruction by means of using computers is a highly beneficial method for teaching the frequently occurring keywords in the text because language learners can have distinct knowledge about different subsets of the frequent words, decreasing the likelihood of successful group teaching (Coady, 1993). Moreover, individualized instruction on the computer can likewise be fruitful regarding the fact that it can also be performed outside of class without using valuable classroom time. Alternatively, when computers are not accessible, learners can be given word lists with definitions along with example sentences where every word is presented in context.
While the study explained how different computerized glosses enhanced learners' vocabulary and reading achievements, some limitations were identified. The present study focused on word meaning learning and reading comprehension of Iranian L2 learners. Considering that word meaning constitutes only one dimension of word knowledge (Nation, 2013), future studies can compensate for this limitation by examining other dimensions. The audio gloss in this study provided information on the annotated word in audio without giving its pronunciation. As a result, the auditory modality in the study excluded pronunciation of the word which can be further investigated. Finally, the research was implemented for one semester. Longitudinal studies can be carried out to trace whether the annotation types and learning conditions have long‐term effects on the improvement of Iranian learners of L2 English vocabulary gain and reading performance. Similar studies (with the extensions suggested) can be carried out in the context of other English as foreign or second language (EFL/ESL) learners (than Iranians) to arrive at more conclusive and generalizable findings.
Biographies
Karim Sadeghi has a PhD in TESOL/Language Testing from the University of East Anglia (UK) and is an academic member of staff at Urmia University, Iran. He has 100+ publications in national and international journals on Applied Linguistics and has presented widely at local, national and international conferences. He was selected as Iran's top researcher in Humanities and Social Sciences in 2013. In addition to serving on editorial board of a few AL journals, he is also the Founding Editor and Editor‐in‐Chief of Iranian Journal of Language Teaching Research. His current research focuses on research evaluation, alternative assessment and teacher education. His most recent publications include The Idea of English in Iran (published in Journal of Multilingual and Multicultural Development, 2015) and Teaching Spoken English in Iran: Options and Issues (published in English Teaching: Practice and Critique, 2015), both co‐authored with Jack C Richards.
Sima Khezrlou received her BA in 2008 from Islamic Azad University, Tehran South Branch in English Language Translation. Right after graduation, she started an MA in English Language Teaching at the University of Tehran, Iran. She is currently a PhD student at Urmia University, Iran. Her research interests are teaching vocabulary and reading comprehension, second language acquisition, and CALL.
Sima Modirkhameneh has a PhD in TEFL from the University of Surrey (UK) and works at Urmia University, Iran.




