should be send to Katerina Kantartzis, School of Psychology, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. E-mail: firstname.lastname@example.org
Sound-symbolism is the nonarbitrary link between the sound and meaning of a word. Japanese-speaking children performed better in a verb generalization task when they were taught novel sound-symbolic verbs, created based on existing Japanese sound-symbolic words, than novel nonsound-symbolic verbs (Imai, Kita, Nagumo, & Okada, 2008). A question remained as to whether the Japanese children had picked up regularities in the Japanese sound-symbolic lexicon or were sensitive to universal sound-symbolism. The present study aimed to provide support for the latter. In a verb generalization task, English-speaking 3-year-olds were taught novel sound-symbolic verbs, created based on Japanese sound-symbolism, or novel nonsound-symbolic verbs. English-speaking children performed better with the sound-symbolic verbs, just like Japanese-speaking children. We concluded that children are sensitive to universal sound-symbolism and can utilize it in word learning and generalization, regardless of their native language.
The task of word learning is an important step in children’s lives. There are various challenges the child is presented with when learning a novel word. Initially the child must identify the referent of a novel word in a complex reality (see Quine, 1960; for a more extensive discussion of difficulties at the identification stage). Following identification, the child must then store this novel word in such a way that makes it generalizable to new situations. Studies have shown that generalization is particularly difficult in verb-learning tasks (Imai, Haryu, & Okada, 2005; Maguire et al., 2002), even though children use verbs in their daily language. When 3-year-olds are presented with a novel verb while seeing Actor A doing Action X, they are not able to generalize the verb to a new situation with Actor B doing the same action (X). That is, they are unable to separate the actor or the patient object from the action in the semantic representation of the verb (Imai et al., 2005; Maguire et al., 2002) and thus are unable to correctly generalize novel verbs to new situations.
Previous research demonstrated that generalization of novel verbs becomes easier for 3-year-olds when the verbs sound-symbolically match the action they represent. In Imai et al.’s (2008) study, Japanese 3-year-olds were taught novel verbs that either sound-symbolically matched or did not match the referent actions. The novel sound-symbolic verbs were created on the basis of existing Japanese sound-symbolic words. The 3-year-olds failed to generalize a newly taught verb to an instance of the same action performed by a different actor, when the novel word did not have a sound-symbolic relation to the referent. However, they succeeded in the task when the novel verb sound-symbolically matched the action it represented. That is, Japanese children learned and stored new verbs in such a way that they were then able to correctly generalize them, when the novel word sound-symbolically matched the action it represents. These findings led the authors to propose the “sound-symbolism bootstrapping hypothesis,” which states that sound-symbolism can help children single out the referent of a novel word in the complex reality, which in turn allows them to store the semantic representation in such a way that children can correctly generalize the verb to new situations.
Given the crosslinguistic recognizability of sound-symbolism, a question arises as to whether children can use universal sound-symbolism to bootstrap their verb learning independent of their native language. The study with Japanese children by Imai et al. (2008) left this question unanswered because they tested Japanese children with novel words based on Japanese-existing sound-symbolic words (mimetics). Thus, it is not clear whether the children benefited from regularities in the existing Japanese sound-symbolic lexicon or they accessed sound-symbolism that can be universally detected by speakers of different languages. The former possibility cannot be dismissed a priori because Japanese is a language with a very rich inventory of sound-symbolic words (Hamano, 1998; Kita, 1997, 2001). A midsize dictionary of Japanese sound-symbolic words (Atoda & Hoshino, 1995) lists more than 1,700 entries. These words are frequently used by adults and by 3-year-olds (e.g., Allen et al., 2007).
In order to determine whether children can use universal sound-symbolism in word learning and generalization, it is important to test whether the Japanese-based sound-symbolism can benefit children whose native language has words with very different phonological properties to Japanese. If so, we can conclude that children in general can universally detect sound-symbolism and utilize it for word learning independent of their native language.
1.1. Current study
In the present research, we investigated whether English-speaking 3-year-olds can benefit from Japanese sound-symbolism in a novel verb generalization task. In this study, English-speaking children were taught a novel word and then asked to generalize it to a new situation with the same action performed by a different actor. There were three conditions: the sound-symbolic match condition, in which the novel verbs were sound-symbolically related to the referent action, and two control conditions in which the novel verbs were not sound-symbolically related to the referent action. The sound-symbolic words used were created on the basis of Japanese sound-symbolic words and were the ones used in Imai et al.’s (2008) study with Japanese children. They were verified to be crosslinguistically recognizable by a rating experiment conducted prior to the main experiment.
Forty-five monolingual English-speaking 3-year-olds (M =41.57 months, range = 36–48 months, 20 boys, 25 girls) were recruited from nurseries around Birmingham, UK, with prior parental consent.
The materials were word–action combinations. There were eight novel words. Four of the words were novel words created by altering Japanese mimetics (batobato, nosunosu, chokachoka, and tokutoku). The other four were nonwords with the structure of typical monosyllabic English verbs (bretting, blegging, blicking, and truffing). There were eight novel actions, which were various manners of walking. Four of the actions sound-symbolically matched one of the altered versions of Japanese mimetics, but not the English-type words. The other four did not sound-symbolically match either the mimetic-type words or the English-type words. The sound-symbolically matching word–action combinations were as follows: batobato = a large energetic movement, arms are swinging back and forward outstretched, whereas legs are making large leaping movement; chokachoka = walking quickly in very small steps with the arms swinging quickly with bent elbows; nosunosu = walking slowly in large steps with bent knees and the hands on knees (see the video screen shots for the training video and the same action video in Fig. 1); tokutoku = a small shuffling movement, with straight arms rigidly at the side and legs moving very slightly and rigidly. The same set of novel words were used in the sound-symbolism mismatch condition as were in the sound-symbolic match condition; what changed was the actions the words were paired with. This change in the actions made the word–action pairs nonsound-symbolic. The sound-symbolically mismatching word–action combinations were as follows: batobato = walking slowly, with arms loosely bent and hands touching in the front; chokachoka = legs slightly bent, walking slowly and in a controlled fashion, with arms bent and held out in front of body (as if carrying a tray); nosunosu = legs making large steps forward, with a bounce, arms swinging freely from side to side; tokutoku = creeping-type walk with medium sized steps, with arms bent and held closely in front of body.
A pretest was conducted to check whether the eight actions did indeed have the presumed relationships to the four mimetic-type words and the four English-type words. First, the mimetic-type words were presented (as an audio recording) along with the eight videos to 21 English-speaking adults (without any knowledge of Japanese) and 15 Japanese-speaking adults. The four words and eight videos were paired together exhaustively, resulting in a total of 32 word–video pairs. The participants were asked to rate how well they thought each word–action combination matched on a scale from 1 (did not match) to 7 (matches very well). The mean rating was significantly higher for the sound-symbolically matching word–action combinations than for the nonsound-symbolically matching word–action combinations for English speakers (sound-symbolically matching: M =4.4, SD = 1.02; nonsound-symbolically matching: M =3.50, SD = 1.04), t(20) = −3.8, p < .001, d =7.67, and Japanese speakers (sound-symbolically matching: M =5.71, SD = 0.66; nonsound-symbolically matching: M = 2.06, SD = 0.78), t(14) = −14.7, p < .001, d =10.79.2 The videos in the sound-symbolically matching combinations later served as the same-action videos (see Fig. 1) and the videos in the nonsound-symbolically matching combinations, as the same-actor distractor videos, in the test phase for the sound-symbolic match condition in the main experiment (see below for more information about the conditions).
Four novel English-type words (to be used in the neutral baseline condition in the main experiment) were also pretested with the same eight action videos as above. The degree of the sound-action match was tested by 20 English-speaking adults. The words and actions were paired exhaustively, and each pair was presented together individually, as above. The results ensured that the novel verbs did not sound-symbolically match any of the actions: The degree of match was judged to be poor for all the actions. The word–action combinations were divided into two sets: those with the videos that later served as the same-action videos and those with the videos that later served as the same-actor distractor videos in the test phase of the main experiment. The first set was the same as those in the sound-symbolically matching combinations in the pretest described above, and the second set was the same as those in the nonsound-symbolically matching combinations. There was no significant difference in the rating between the two sets (same-action: M =3.81, SD = 0.57; same-actor: M =3.63, SD = 0.79), t(19) = −1.0.
Each child was tested individually in a quiet area of the nursery. Two warm-up trials using familiar nouns were given to establish the procedure (of indicating the referent of a word by pointing). Then, a practice trial with a familiar verb preceded the main experiment to ensure that the children understood the training-test procedure. The practice trials followed the same procedure as the experimental trials.
All conditions followed the same structure of a training phase followed by a test phase (see Fig. 1). In the training phase, children were presented with a video of an actor carrying out an action (Actor A, Action X) on a laptop computer; the experimenter simultaneously presented the novel verb in one of the two sentences, depending on the condition they were in. In the test phase, which immediately followed the training phase, the experimenter asked the children to indicate the referent of the novel verb by pointing to one of the two action videos on the screen. In one video, the action was the same but the actor was different (same-action: Actor B, Action X); in the other video the actor was the same, whereas the action was different (same-actor distractor: Actor A, Action Y).
Participants were randomly assigned to one of three conditions.
2.4.1. Sound-symbolic match condition
Fifteen children were assigned to this condition (mean age = 41.7 months, range = 33–48 months, 9 girls). The newly taught verb was embedded in the sentences, “Look! He is doing X” (training) and “Which one is doing X” (test). The newly taught verb sound-symbolically matched the action in the training video and therefore matched the action in the same-action video but not the action in the same-actor distractor video (see Fig. 1 for an example of the videos used as same-action and same-actor distractor videos; see the Section 2.2 for verification of sound-symbolism). The action used in the same-action video or same-actor distractor video did not re-appear for another word.
2.4.2. Neutral baseline condition
Fifteen children (mean age = 42.5 months, range = 35–48 months, 8 girls) were tested in this condition. The newly taught verb was embedded in the sentences, “Look he is Xing” (training) and “Which one is Xing” (test). This condition provided a baseline for 3-year-olds’ performance in this verb generalization task when the newly taught verb did not sound-symbolically match the same-action or the same-actor distractor video. The verbs were presented in a form that resembled typical English verbs (e.g., blicking). The training videos, same-action videos, and same-actor distractor videos were all identical to those in the sound-symbolic match condition.
2.4.3. Sound-symbolic mismatch condition
Fifteen children (mean age = 40.5 months, range = 33–47 months, 8 girls) were taught the same set of words as in the sound-symbolic match condition and were therefore embedded in the same sentences. The two videos shown at the test phase for each word were identical to the ones in the sound-symbolic match condition. However, the newly taught verb did not sound-symbolically match the action in the training video and, consequently, the same-action video in the test phase. Instead, the verb did sound-symbolically match the action in the same-actor distractor video (see Fig. 1). Accordingly, the training videos differed from those in the sound-symbolic match condition because the same-action videos were different.
This condition allowed us to eliminate alternative explanations for the predicted finding that children would perform better in the sound-symbolic match condition than in the neutral baseline condition. Namely, if children were performing above chance in the match condition, one might suggest that the children were detecting sound-symbolism at the test phase and not learning anything in the training phase. In the mismatch condition, the children were taught a novel verb that did not sound-symbolically match the action, but they were presented with the sound-symbolically matching action as a same-actor distractor at the test phase. If children were simply detecting sound-symbolism at the test phase, then in the mismatch condition they should pick the sound-symbolically matching action, which in this condition was the same-actor distractor. If they were learning verbs despite the lack of sound-symbolism, they should be picking the same-action video. Therefore, if children’s good performance in the sound-symbolic match condition was due to the benefit of sound-symbolism in the learning phase (in line with our hypothesis), children in this condition should perform at chance and worse than those in the sound-symbolic match conditions.
Another concern is that the sentential frame (i.e., He is doing X) and/or features of word forms (e.g., reduplication) used in the sound-symbolic match condition might help children identify and learn the verbs more effectively than in the neutral baseline condition. In the mismatch condition, novel verbs and their sentential frame were identical to the match condition, but the novel verbs did not sound-symbolically match the action. If the sentential frame and/or features of word form assisted children in learning the novel verbs, then children should perform equally well in the mismatch condition as they do in the match condition.
When a child correctly extended the novel verb on the basis of the same action, the response was coded as correct. For each child, the proportion of correct responses out of the four trials was calculated and served as the dependent variable. As we expected, the children performed differently across the three groups, F(2, 42) = 4.04, p < .05, η2 = .161 (see Fig. 2). The children in the sound-symbolic match condition performed better than those in the sound-symbolic mismatch condition or in the neutral baseline condition (Fisher’s LSD as recommended by Howell, 2007, for three means, both ps < .05). Children more successfully learned and generalized novel verbs based on the identity of the action when the word sound-symbolically matched the action than when the word did not sound-symbolically match the action.
Consistent with the previous findings (Imai et al., 2005, 2008; Maguire et al., 2002), the performance of the children in the two control conditions did not significantly differ from chance (where chance is 0.5), t(14) = −1.87 (sound-symbolic mismatch condition), and t(14) = −0.49 (neutral baseline condition). In sharp contrast, the children in the sound-symbolic match condition successfully generalized the novel verbs and performed significantly above chance, t(14) = 2.57, p < .05, d = 0.663.
The performance in the sound-symbolic mismatch condition ruled out two possible alternative interpretations. First, the results may not have reflected the success of verb generalization but reflected success in detecting sound-symbolism between the word and the action at test. However, because the children in the sound-symbolic mismatch condition did not select the sound-symbolically matching distracter significantly more than chance, this alternative is unlikely. Secondly, the sentence structure (“doing X”) or features of word forms (e.g., reduplication) may have caused good performance in the sound-symbolic match condition. These possibilities can also be ruled out because the children were presented with the same set of novel sound-symbolic verbs in the same sentence frame in both sound-symbolic match and mismatch conditions, but only the latter group performed at chance and the difference between the two groups was significant.
It should be noted, however, that numerically (but not statistically) children chose the sound-symbolically matching distractor more often than the sound-symbolically mismatching target in the sound-symbolic mismatch condition (the proportion of correct responses is numerically slightly lower than chance, .50). This might be interpreted as the children using sound-symbolism to guide their choices in the test phase, rather than using sound-symbolism at the training phase. However, comparing the difference between the proportion of correct responses in the sound-symbolic mismatch condition and the baseline neutral condition, the difference is very small (.05) and not significant. Thus, we maintain that sound-symbolism assisted children in the training phase to form a semantic representation of the novel words based on action, which lead to better performance in the task.
This study demonstrated that English-speaking children performed better in a verb generalization task when the novel verb sound-symbolically matched the referent action than when it did not. Importantly, the novel sound-symbolic words were derived from Japanese sound-symbolic words, and the sound-symbolism could be detected by English-speaking adults and utilized by English-speaking children with no knowledge of Japanese. The English-speaking participants could not have derived the sound-symbolism from sound-meaning regularities in the Japanese lexicon; therefore, the sound-symbolism is likely to have a universal basis. Thus, we conclude that children are, in general, sensitive to universal sound-symbolism and can use this sensitivity in verb learning and generalization.
The current findings suggest that English-speaking adults and children can detect universal sound-symbolism. English-speaking 2.5-year-olds matched rounded versus pointed shapes to novel words in the way compatible with Köhler’s (1947) celebrated sound-symbolism for shapes (Maurer et al., 2006), which has been identified in speakers of different languages and ages (Davis, 1961). Furthermore, adult English speakers with no knowledge of Japanese could correctly guess some aspects of the meaning of Japanese sound-symbolic words (Imai et al., 2008; Iwasaki et al., 2007). The current study went beyond the previous studies in demonstrating that children use the sensitivity to universal sound-symbolism in word learning and generalization.
Exactly how does sound-symbolism help children learn verbs? When presented with a novel verb with an actor performing an action, 3-year-olds typically assume that both actor and action are necessary for verb meaning generalization and find it difficult to separate the critical component (i.e., action) from the noncritical one (i.e., object) (Imai et al., 2005; Maguire et al., 2002). Sound-symbolism seems to help children break down the action-actor combination and identify the action as the referent. As a consequence, the semantic representation of the verb is stored in such a way that the verb can be correctly generalized to new situations with the same action regardless of the actor (see also Imai et al., 2008). It should be noted, however, that the exact nature of the sound-symbolism used in this study (i.e., exactly what sound properties of words caused sound-symbolism) is not clear. Different aspects of the sound-symbolic words (phonetic, phonotactic, and prosodic properties) may have contributed to the sound-symbolism. This would be an important topic for future research.
Why do children have the capacity to use universal sound-symbolism when learning new words? We suggest that that is because sound-symbolism is a vestige of language evolution. Some researchers have suggested that sound-symbolic words played an important role in the evolution of human language (Kita, 2008; Kita, Kantartzis, & Imai, 2010; Ramachandran & Hubbard, 2001).3 One key step in language evolution is the emergence of a system for agreeing upon the referent of a novel word. One easy way in which such an agreement could have been made is universal sound-symbolism. If an inherent sound-meaning link exists in everybody’s mind, then the listener would be able to easily identify the referent of the speaker’s novel word, making communication easier. Thus, universal sound-symbolism could facilitate a rapid growth of a shared lexicon (Kita, 2008; Ramachandran & Hubbard, 2001). Given that sound-symbolic words in modern languages can refer to information in various domains such as vision, touch, smell, taste, manners of movement, emotion, and attitude (e.g., Kita, 1997; Voeltz & Kilian-Hatz, 2001), sound-symbolic proto words of our ancestors may have had a considerable expressive power (Kita, 2008). Thus, universal sound-symbolism would have had great adaptive values for our ancestors.
Universal sound-symbolism in modern languages may be the “fossils” of a sound-symbolic communication system our ancestors once used. Such fossils might have been preserved in today’s languages because children have a preference to use sound-symbolic words over nonsound-symbolic words. For example, it has been shown that Japanese children have a stronger preference than Japanese adults to use sound-symbolic words when describing the manner of motion in a narrative task (Kita, Özyürek, Allen, & Ishizuka, 2010).
We suggest that all humans are disposed to develop abilities to sense universal sound-symbolism and use it for word learning, and that the emergence of this disposition was a crucial step in language evolution. Because the capacity to use sound-symbolism in word learning is rooted in the evolutionary history, it is observable in children who are learning languages that are geographically separated and do not belong to the same language family, for example, Japanese (Imai et al., 2008) and English (the current study). It is possible that the present study tapped into the vestige of this evolutionary process still present in all children.
In the first edition published in 1929, the word “baluma” was used, but was changed to “maluma” in the 1947 edition.
Japanese speakers’ ratings were lower for the nonsound-symbolic matching pairs and higher for the sound-symbolic matching pairs than English speakers’ ratings. This may be either because Japanese speakers have stronger intuitions about how well word–action pairs matched due to extensive experience with sound-symbolic words or because the sound-symbolic words in this study were created on the basis of existing Japanese sound-symbolic words.
One of the reviewers questioned whether these suggestions are compatible with the fact that frequency-size sound-symbolism is shared by humans (Ohala, 1984, 1994) and other mammals and birds (Morton, 1994). Across species, high-frequency vocalizations are associated with smallness (and appeasement) and low-frequency vocalizations are associated with largeness (and hostility). We maintain that this does not necessarily undermine the possibility that sound-symbolism played a role in language evolution for two reasons. First, frequency-size symbolism is only one of many types of sound-symbolism, and other types of sound-symbolism may be specific to humans and their close evolutionary relatives. Second, even if all types of sound-symbolism are shared by humans and a wide range of species, including birds, it could still be argued that sound-symbolism is an important precursor of language, but the evolution of language required additional cognitive changes unique to the human lineage.