should be sent to Athena Vouloumanos, Department of Psychology, New York University, 6 Washington Place, New York, NY 10003. E-mail: firstname.lastname@example.org
The roles of linguistic, cognitive, and social-pragmatic processes in word learning are well established. If statistical mechanisms also contribute to word learning, they must interact with these processes; however, there exists little evidence for such mechanistic synergy. Adults use co-occurrence statistics to encode speech–object pairings with detailed sensitivity in stochastic learning environments (Vouloumanos, 2008). Here, we replicate this statistical work with nonspeech sounds and compare the results with the previous speech studies to examine whether exclusion constraints contribute equally to the statistical learning of speech–object and nonspeech–object associations. In environments in which performance could benefit from exclusion, we find a learning advantage for speech over nonspeech, revealing an interaction between statistical and exclusion processes in associative word learning.
Learning the meaning of words is a nontrivial challenge (Bloom, 2000; Quine, 1960). For example, when a word is heard its referent is not always in view (or the focus of attention) (Gleitman, 1990), and conversely, when a referent is attended to its label is not always heard. These imperfect pairings at first glance might appear confusing; however, the relative frequency of word-referent co-occurrence can provide a valuable source of information. Adults and infants are able to track these variations, revealing a detailed sensitivity to the frequency of word–object co-occurrence (Vouloumanos, 2008; Vouloumanos & Werker, 2009). Similarly, the task of selecting which of multiple available referents is singled out by a label can be eased statistically. Learners keep track of multiple potential referents for a label across situations; this, over multiple observations, helps disambiguate the intended referent (Smith & Yu, 2008; Yu & Smith, 2007). Thus, the statistical regularities in word–object co-occurrences can be useful in learning the meanings of words.
Statistical processes alone, however, cannot fully account for children’s word learning as they leave many phenomena unexplained. For example, children can infer likely referents for words by first excluding other unlikely alternatives whose labels are already known (see Clark, 1990). This process is often referred to as mutual exclusivity (Markman, 1992); related formulations have been termed novel-name nameless-category (Mervis, Golinkoff, & Bertrand, 1994), exclusion (Dixon, 1977), and disjunctive syllogism (Halberda, 2003). In essence, these refer to an exclusion heuristic that (in the linguistic domain) leads learners to assign only one label per object, such that they assign a novel label to a novel object rather than to one whose label is already known. For example, a young child who is shown a banana (with a known label) and a whisk (with no known label) and is asked to indicate the fendle will assign this new label to the previously unlabeled object (Markman & Wachtel, 1988; Markman, Wasow, & Hansen, 2003), even in the face of conflicting pragmatic information (Jaswal & Hansen, 2006). This predisposition for exclusion fruitfully biases the application of new labels to new referents, relying, beyond bare statistics, on constraints that have been argued to be especially active in language learning (Markman, 1994; cf., Bloom, 2000).
We explored the relation between exclusion constraints and learning of stochastic sound–object associations by testing learning of nonspeech sounds in two experiments, and comparing the results with previous speech studies (Vouloumanos, 2008). The associative learning paradigm consisted of two phases. The training phase presented a series of trials pairing one novel sound and one novel object with variable and stochastic associative frequencies. The test phase presented trials featuring one sound and a forced choice between two objects. Tracking the frequency of the sound–object pairings in the training phase would allow participants subsequently to select the object more frequently paired with the sound.
Decisions in the test trial are commonly attributed to “positive” associative information, that is, the frequency with which the sound and the target object were paired in training (“The sound went with the object currently on the right”). However, “negative” exclusion information could also influence participants’ choices. That is, decisions could be influenced by which object is not the correct match: specifically, the frequency with which the foil object was paired with a different sound in training (“The left object goes with a different sound, so it must be the one on the right”). Frequent prior pairings between the foil object and a different sound would provide strong exclusion evidence, supplementing the positive associative information to bias participants toward the target object.
The prior pairing frequencies between the foil object and the alternate sound (“negative” exclusion information) affects the strength of the exclusion information available to participants during test trials. For example, if the foil co-occurs with multiple labels (e.g., furry, gerbil, white), each with lower probability, then word learners should be and are less likely to reject another new label for the foil (Yurovsky & Yu, 2008). However, if the foil co-occurs with one label with high probability (banana), and with other potential labels with much lower probability (e.g., Daddy’s, yummy), the more deterministic statistical information should lead word learners to be more likely to reject a new label for the foil.
The first experiment exposed participants to a number of low probability associations to probe their sensitivity to nonspeech–object co-occurrences. We expected that participants would demonstrate a high level of sensitivity to nonspeech co-occurrence frequencies and, relying on positive associative information, they would fare equivalently to previous performance with speech. Although negative exclusion information was also available in the form of weak pairings between the foil object and alternate sounds, we did not expect an effect of exclusion because all co-occurrences were of low probabilities. In the second experiment, we increased the probabilities of the sound–object pairings such that learners could form stronger (more deterministic) associations with both target and foil objects (Vouloumanos, 2008). We reasoned that more deterministic information about foil objects would allow participants to reject incorrect pairings by recruiting negative exclusion information (based on a strong foil object–alternate sound pairing) to supplement the positive statistical associative information from the target object–sound pairing. By manipulating the target and foil co-occurrence probabilities independently, we can observe whether, in cases more likely to recruit exclusion (i.e., those in which the foil object is strongly associated with an alternate sound), participants’ statistical learning of speech and nonspeech associations would benefit equally from the additional exclusion process. In contrast, if speech associations were privileged relative to nonspeech associations in this high probability environment, this would provide evidence that statistical mechanisms and exclusion constraints can work synergistically to guide learners to the most likely word meaning.
2. Experiment 1
Previous work has established that both adult (Vouloumanos, 2008) and infant learners (Vouloumanos & Werker, 2009) are sensitive to probabilistic associations between speech sounds (words) and objects. The goal of the first experiment was to establish that adult listeners are likewise sensitive to probabilistic associations between nonspeech sounds and objects. Participants were exposed to a training environment with only low associative frequencies between sounds and objects (Vouloumanos, 2008, experiment 2), to provide a strong test of their ability to track co-occurrence in a stochastic learning environment.
In this experiment, all pairings occurred with low probability (the strongest sound–object associations were paired just 60% of the time). Two sources of information were available to participants: positive associative information based on the association strength between the sound and the target object, and negative exclusion information based on the association strength between an alternate sound and the foil object. We anticipated that the weak foil associations would mean that exclusion information was too weak to be useful, but it is possible that participants could use both positive and negative sources of information.
Given the planned comparison with the published data of Vouloumanos (2008), every effort was made to replicate the original experimental conditions. Participants were drawn from the same pool of university students and the experiments were run in the same testing room, taking advantage of the original hardware equipment and computing software.
Forty undergraduate students at the University of British Columbia participated individually and were remunerated with either course credit or $10. All had normal hearing and normal (or corrected-to-normal) vision. Data from two additional participants were excluded due to performance beyond 2 SDs from the mean.
The visual stimuli were 12 novel objects featuring distinct shapes and colors (Vouloumanos, 2008). The auditory stimuli were 12 distinct, novel nonspeech sounds (available online). The sounds were chosen to be maximally distinct from one another to ease perceptual and memory load. They ranged from 434 to 963 ms in length (M =697 ms, SD = 172 ms), comparable to the previous speech tokens that ranged from 525 to 900 ms (M =649 ms, SD = 120 ms).
Participants were tested on one of four orders using different sound–object combinations. The order of presentation in training and test phases was randomized individually for each participant.
The experiment was controlled from a remote room using a PowerMac G4 computer (Apple, Cupertino, CA, USA) with a custom-scripted Hypercard stack.
The training phase consisted of one hundred and twenty 3-s trials, each consisting of one sound and one object. The intertrial interval was 500 ms. Every sound and every object was presented 10 times, but the sound–object pairings varied across trials (see Fig. 1): Each sound co-occurred six times with one object, and a total of four times with three other objects (twice with one object and once each with two others), and each object co-occurred six times with one sound, and four times with three others (twice with one sound and once each with two other sounds). In this phase, participants were simply asked to pay attention to the sounds and objects.
The test phase consisted of 48 trials, each featuring one sound and two objects. Participants were instructed to press one of two buttons to choose the object that “went best” with the sound. The different associative frequencies (6, 2, 1, and 0) were combined exhaustively, yielding two types of test trials: (a) unambiguous trials in which only one of the two objects had co-occurred with the sound during training, 6:0, 2:0, and 1:0 (e.g., in the 6:0 pairing, the sound had been heard six times with the target object, and never with the foil; the foil had been paired with four alternate sounds 6, 2, 1, and 1 times); and (b) ambiguous trials in which both objects had been heard with the sound during training, albeit one with higher frequency, 6:1, 6:2, 2:1 (e.g., in the 6:2 pairing, the sound had occurred six times with the target object and twice with the foil; the foil had been paired with three alternate sounds 6, 1, and 1 times). In addition, we added a small number (4) of dummy trials (0:0) with no correct answer as a check against bias in the pairings, for which, as expected, performance was at chance.
2.2. Results and discussion
The data were collapsed across the four orders and analyzed with the corresponding speech data previously collected by Vouloumanos (2008). To investigate the effects of the frequency with which the target object had been paired with the target sound in training (target link) and the frequency with which the foil object had been paired with the target sound in training (foil link) over speech and nonspeech sounds, we ran a multilevel logistic regression with trials nested within participants using the web interface for the lme4 package in r (Bates & Maechler, 2009; Ooms, 2009). The dependent variable was performance (correct/incorrect) and the independent variables were sound condition (speech/nonspeech), target link, and foil link. All two- and three-way interactions were included as predictors.
Two significant main effects emerged, target link (β = .22, z =11.45, p < .0001) and foil link (β = −.30, z = −4.90, p < .0001). The effect of sound condition was not significant (β = −.10, z = −0.97, p = .33), and no interactions were significant. In this experiment, thus, participants were able to use positive associative information in the form of nonspeech sound–object pairings to learn associations between sounds and objects. Moreover, the ambiguity introduced when the foil object was also paired with the target sound during training hindered performance. Performance did not differ between speech and nonspeech (see Table 1).
Table 1. Experiment 1 accuracy rates by sound condition (Target link: Foil link [Alternate foil link])
Note. Standard errors (SEs) have been adjusted to account for the nested data structure.
In Experiment 1’s training environment in which all associations were relatively weak, participants revealed a detailed sensitivity to the nuanced probabilities of co-occurrence between nonspeech sounds and objects. There were no significant differences between learning of speech and nonspeech sounds, suggesting that the statistical mechanism operates equally well over speech and nonspeech.
In addition to using the positive associative information between the target object and sound, participants may also have made use of negative exclusion information stemming from the prior pairing of the foil object and an alternate, unheard sound. The strength of this negative information was constant across all test trials (always 6/10), so Experiment 1 did not allow us to examine the effect of negative exclusion information. In Experiment 2, we varied negative information to allow us to examine its usefulness in tandem with positive association information.
3. Experiment 2
In Experiment 2, a mix of high- and low-probability pairings was created to introduce the possibility of exclusion processes and examine whether performance varied between speech and nonspeech. Learners could form stronger (more deterministic) associations with both target and foil objects, allowing them to use the stronger negative evidence from the foils to map the sound onto the target. Exclusion processes could also encourage participants to select the target by rejecting the foil based on a strong association between the foil object and an unheard sound (alternate foil link). We anticipated that exclusion might be used in test trials in which the foil objects were associated with an alternate sound with high probability (i.e., paired up to 8 or 10 times in the training phase). Further, the higher probabilities also increased the chance that, in the randomly ordered training phase, participants first observed the stimuli in their most frequent pairing. Because primacy plays an important role in associative learning (Yurovsky & Yu, 2008), primacy, coupled with the higher probabilities, was expected to provide significantly stronger evidence for exclusion than in the previous experiment.
This experiment replicated a previous speech study (Vouloumanos, 2008, experiment 1) with nonspeech stimuli. Performance was compared with the existing speech data to determine whether exclusion and statistical processes interact similarly or differently when learning speech and nonspeech associations.
Forty undergraduate students from the same population as Experiment 1 participated in the experiment and were remunerated with course credit or $10. All participants had normal hearing and normal (or corrected-to-normal) vision. Two additional participants were tested and their data excluded from analysis due to performance more than 2 SDs from the mean.
These were identical to those used in Experiment 1.
The training and test phases were conducted as in Experiment 1, but the probability of sound–object co-occurrence varied more broadly (see Fig. 2): Some were perfectly predictable sound–object pairings (10 times of 10) and some were of low probability (e.g., 1 time of 10). To create test trials, target link and foil link frequencies were combined, with target link ranging from 1 to 10, and foil link ranging from 0 to 2 (10:0, 8:0, 6:0, 2:0, 1:0, 8:1, 6:1, 6:2, 2:1). Thus, target and foil co-occurrence probabilities varied independently. In addition, there was variance in the maximum number of times the foil object had been paired with another sound (alternate foil link); this varied from 6 to 10. Foil link and alternate foil link were not independent, however: When the alternate foil link was 10, foil link was necessarily 0. When the alternate foil link was 6, foil link was always >0 (see Fig. 2).
3.2. Results and discussion
The nonspeech data were collapsed across the four orders and compared with the speech data previously collected by Vouloumanos (2008) in a multilevel logistic regression (Bates & Maechler, 2009; Ooms, 2009). Recall that, in Experiment 2, we were interested in not only the pairings between the target sound and the target and foil objects (target link and foil link) but also in how the number of pairings between the foil object and another sound (alternate foil link) that provided negative exclusion information contributed to performance. This was not a factor in Experiment 1 because foil links and alternate foil links were constant across trials. Sound condition (speech/nonspeech) was included as fourth predictor, and all two- and three-way interactions were estimated.
As in Experiment 1, there was a main effect of target link (β = .30, z =17.48, p < .0001) and a main effect of foil link (β = −.33, z = −2.91, p = .004). These effects confirm that performance on object–sound associations is supported by the number of times the sound and the target object have been associated in the past and is hindered by prior associations between the sound and the foil object. There was a main effect of sound condition (β = −.32, z = −2.17, p = .03) with better learning for speech than nonspeech (see Table 2). The effect of alternate foil link over and above the effect of foil link was not significant (β = .05, z = 0.84, p = .40). No interactions were significant.
Table 2. Experiment 2 accuracy rates by sound condition (Target link: Foil link [Alternate foil link])
Note. Standard errors (SEs) have been adjusted for the nested data structure.
Experiment 2 provides some evidence that in this higher probability environment more likely to recruit exclusion processes, speech associations are learned better than nonspeech. However, we found no evidence that alternate foil links contributed to learning, possibly due to the limited range of alternate foil associations provided. To directly test the contribution of negative exclusion information, we increase the variability in alternate foil links by combining data from Experiments 1 and 2.
4. Combined analysis
Thus far, Experiments 1 and 2 show that the positive associative strength between a sound and a target object guides associative learning, and that the negative association between that sound and a foil object hinders performance. Experiment 2 also suggests that speech associations are better learned in high-probability environments more likely to recruit exclusion processes. However, in order to more directly examine the contribution of exclusion processes to learning, and whether this differs for speech and nonspeech, we need to examine whether participants use negative exclusion information from the alternate foil links. Recall that in Experiment 1, alternate foil link did not vary, and in Experiment 2 it covaried strongly with foil link, so in neither of these Experiments alone could its contribution be independently assessed. To fully investigate the effect of alternate foil link, we collapsed the data across Experiments 1 and 2, including both nonspeech and speech sounds. We used multilevel logistic regression to investigate the effects of sound condition (speech/nonspeech), target link, foil link, and alternate foil link.
All four main effects were significant. As in each experiment separately, there were main effects of target link (β = .27, z =22.06, p < .0001) and foil link (β = −.28, z = −6.30, p <.0001). In addition, a main effect of sound condition emerged (β = −.19, z = −2.26, p = .023), indicating better learning of speech sounds than nonspeech sounds. Finally, there was a main effect of alternate foil link (β = .13, z =4.96, p < .0001), indicating that performance improved as the foil object became more strongly associated with an alternate sound.
Importantly, there was a significant two-way interaction between sound condition and alternate foil link (β = −.14, z = −2.82, p =0.004). We probed this interaction by computing the simple slope of alternate foil link within each speech condition. In the speech condition, the effect of alternate foil link was positive (β = .19, z =5.33, p <.0001). In the nonspeech condition, this effect was near zero (β = .03, z =1.00, p =.32). These results reveal that only when the sound–object pairings involved speech sounds did participants use exclusion criteria from the alternate foil link to guide their performance. This suggests that participants’ enhanced learning of stochastic speech–object mappings results from the use of exclusion learning constraints, and that exclusion preferentially benefits the statistical learning of words.
5. General discussion
How are words learned? The experimental literature has tended to approach the problem from one of two perspectives: bottom-up/statistical or top-down/constraints. Despite the likelihood that both processes underlie the complex, multifaceted nature of word learning, little empirical evidence exists for the two mechanisms acting in combination. In Experiments 1 and 2, participants accurately tracked nonspeech–object co-occurrences, performing equivalently to previous work with speech sounds (Vouloumanos, 2008). However, in Experiment 2, in conditions in which strong negative evidence from the foils should allow the operation of exclusion processes, learning was better with speech than with nonspeech. When Experiments 1 and 2 were combined to provide enough variability to allow us to more directly investigate the contribution of exclusion processes, we found that exclusion constraints preferentially benefited the statistical learning of words.
Statistical and exclusion sources of information were productively combined for speech, but not for nonspeech processing in the current study. These results suggest that exclusion constraints may operate more strongly over speech (at least in a simple associative environment) and thus may be especially active in language learning. However, this is not to say that interactive exclusion constraints are restricted to language learning. Exclusive mapping biases have been found in face–voice associations in children (Moher, Feigenson, & Halberda, 2010), as well as in nonhuman species such as California sea lions (Kastak & Schusterman, 2002). Exclusion in language might be driven by a general sensitivity to pragmatic and/or communicative intent, rather than a language-specific underlying assumption that every object can only be referred to by one label (Bloom, 2000; Clark, 1990; Diesendruck, 2005; Diesendruck & Markson, 2001; Woodward & Markman, 1998). In the current study participants used exclusion constraints more for the speech sounds likely because there was no basis to expect exclusive mapping of the nonspeech sounds. Without pragmatic or communicative contextual cues, there may be no reason to reliably associate the objects with multiple nonspeech sounds. If it had been plausible (perhaps if there were a functional relationship between sounds and objects, for example, if the objects emitted the nonspeech sounds upon being squeezed or shaken), similar constraints might have also applied over the nonspeech sounds. An interaction between statistical learning and exclusion processes might thus also be found in nonlinguistic domains. The synergistic relationship between mechanisms revealed here in language might be a special case of a more broadly available interactive process.
This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC 81103) to Janet F. Werker and a grant from the Fonds de Recherche sur la Société et la Culture to Athena Vouloumanos. Katherine Yoshida was supported by graduate fellowships from the Michael Smith Foundation for Health Research and the Social Sciences and Humanities Research Council of Canada (SSHRC). Mijke Rhemtulla was supported by a Banting Postdoctoral Fellowship from SSHRC. We are particularly grateful to Janet F. Werker, without whom this research would not have been possible. For their contributions to this work, we thank Krista Newbigging, Ramesh Thiruvengadaswamy, Ferran Pons, Laurel Fais, Anouk Huizink, Cristina Rabaglia, and David Poeppel.