What does it take to learn a word?


  • This article is part of a special collection on developmental systems designed to explore the powerful array of forces shaping the individual development of brains, bodies, and behavior. The collection was created and edited by Mark S. Blumberg (University of Iowa), John P. Spencer (University of East Anglia), and David Shenk (Author, The Genius in All of Us), in conjunction with the DeLTA Center at The University of Iowa. View the full collection: How We Develop – Developmental Systems and the Emergence of Complex Behaviors
  • Conflict of interest: The authors have declared no conflicts of interest for this article.


Vocabulary learning is deceptively hard, but toddlers often make it look easy. Prior theories proposed that children's rapid acquisition of words is based on language-specific knowledge and constraints. In contrast, more recent work converges on the view that word learning proceeds via domain-general processes that are tuned to richly structured—not impoverished—input. We argue that new theoretical insights, coupled with methodological tools, have pushed the field toward an appreciation of simple, content-free processes working together as a system to support the acquisition of words. We illustrate this by considering three central phenomena of early language development: referential ambiguity, fast-mapping, and the vocabulary spurt. WIREs Cogn Sci 2017, 8:e1421. doi: 10.1002/wcs.1421

For further resources related to this article, please visit the WIREs website.


Words are deceptively simple, but profoundly important to language. The spoken form of a word is a complex sequence of articulations and acoustic cues. In the lexicon (our mental storehouse of words), these sequences must be linked to a rich set of semantic features, to syntactic properties like part of speech, and to other representations like orthography (the word's spelling). How is this complex set of information learned?

This question has engendered an enormous amount of research over the last 40 years. This research illustrates a core issue in the cognitive sciences: Is human language acquired via specialized mechanisms—or does it derive from more general developmental mechanisms that may be seen in other domains (like vision) and even in other species that lack language?

Virtually everyone agrees that children's ability for language is amazing. To reach an average-sized vocabulary by kindergarten, children have been argued to learn up to nine new words a day. In 1960, Quine[6] illustrated the difficulty of this feat, which we paraphrase here:

Imagine you are a field linguist studying a community whose language you do not know. You go hunting with a group of tribesmen and see a rabbit hop past. One of the tribesmen shouts ‘gavagai.’ How to you determine what this new word means? It could be ‘rabbit’ but it could also be ‘hopping,’ ‘fluffy,’ ‘dinner,’ ‘get it!’ or a host of other things.

This problem is further complicated by the relative cognitive immaturity of the very young learner: toddlers have a limited understanding of abstract concepts; they cannot do math; they cannot hop on one foot; and they are still learning to feed themselves. When the challenging problem of inferring a new word's meaning meets the poor cognitive skills of typical children, this creates a mystery. How can children learn so many words so quickly?

For many years, the most widely accepted answer to this question was that young learners are imbued with specialized abilities and/or innate knowledge that guide them to the correct word meanings. For example, children may come to the word learning table with the assumption that most new words refer to whole objects (the rabbit) not parts (ears) or features (fluffy); or they may assume that words refer to the more common or ‘basic level’ of description (e.g., gavagai means ‘rabbit’), rather than a subordinate level of description (e.g., ‘eastern cottontail rabbit’ or ‘Peter Rabbit’) or a superordinate level of description (e.g., ‘rodents’ and ‘mammals’). Such knowledge is captured as constraints,[7] principles,[8] or more recently, prior expectations,[9] and there is considerable evidence that children at some ages behave in ways that appear consistent with these kind of language specific abilities.

Now things are changing. The field of word learning is in the middle of a shift in viewpoint. There are new theoretical developments like a radical new understanding of learning (see Aslin, Statistical learning: a powerful mechanism that operates by mere exposure, WIREs Cogn Sci, also in the collection How We Develop) as well as a richer understanding of how toddlers’ own bodies play a role in cognition[10] (see Oudeyer, What do we learn about development from baby robots?, WIREs Cogn Sci, also in the collection How We Develop). These advances are being supported by data from new technologies like eye-tracking and wearable cameras.[11] Finally, sophisticated new computational tools are giving us an ever-clearer picture of the subtle information available in the child's environment,[12, 13] and allowing us to implement, explore, and test complex theories of how learning works.[13, 14] All together, these theoretical and methodological innovations are challenging older ideas about language-specific abilities and knowledge.

In many ways, these innovations confirm prior findings regarding regularities in children's behaviors, thereby supporting the prior work that is a powerful and important basis for our understanding of word learning. However, these new advances offer critical insight into where these principles and biases come from, and raise the possibility that they are not the product of innate specialization. As a result, the field is shifting from a focus on identifying and characterizing specialized abilities to an examination of the structure and richness of the linguistic input, and to the often unexpected—or emergent—consequences of very simple learning mechanisms. This shift has, in turn, increased our appreciation of how children shape the input they receive and learn how to learn words as they go along.


Quine's striking illustration of the gavagai problem was a popular characterization of the problem of referential ambiguity. Figure 1 shows a typical preschool room. There are many attractive and namable objects in view: tables with things to manipulate, a tree on the wall, a fun toy with moveable beads, and so on. A new word could refer to any of these items; it could also refer to a property of any of these objects—one table is blue, the other has red legs, the tree is big; it could also relate the speaker's feelings or intentions with respect to the objects.

Figure 1.

A typical preschool classroom features many potential referents for a new word.

In such a context, if the teacher were to say ‘wow, blicket!’ how could the child possibly figure out what the teacher was intending to communicate? Although complicated, children appear able to solve this problem with very little effort. By 16 months of age, they have learned that ‘table’ can refer to the bright blue object in the foreground and they can demonstrate their understanding of that word by pointing to it. Furthermore, this concept is already starting to become more complex—the same child may also understand that ‘table’ can refer to the other (red) table.

Early research[15-19] suggested children were able to identify referents and create novel word-referent links in as little as one exposure (sometimes termed ‘fast-mapping’), occasionally even learning new words—perhaps not suitable for scholarly publication—that mom and dad would rather they had not. Moreover, the rate with which children add new words to their productive vocabulary appears to explode in the second postnatal year. Infants typically produce their first word between 10 and 12 months of age. The next few words are added to the vocabulary slowly, but between 18 and 24 months of age the pace quickens dramatically. At this point, children go through what is known as a vocabulary spurt, adding words to the productive vocabulary at a rate as high as 10 new words every 2 weeks.[17]

Of course, this view may be optimistic. These kinds of estimates usually tap only the surface of learning—probing, for instance word usage in the simplest tasks. In this sense, vocabulary estimates fail to capture the considerable changes that can happen in how words are used and understood from childhood to adulthood. Nevertheless, given the scope of the word-learning problem and children's relative cognitive immaturity, it is easy to be impressed with their language-learning prowess. The hard part is to figure out how they do this.


The idea that young language learners have built-in specific knowledge they use to learn words came from a larger intellectual trend that swept developmental psychology in the 1980s and 1990s.

Before then, accounts of early cognition were often grounded in Piagetian theory, which suggested that infants’ and toddlers’ conception of the world was bound to their sensory and motor experiences, and grew more abstract as they constructed knowledge about the world around them. According to Piagetian theory, for example it was not until 7 years of age that children were fully capable of logical, abstract thought.

In the last part of the 20th century, research started to suggest that Piaget's theoretical account and empirical methods may have underestimated young children's abilities. New techniques were developed that used measures of infants’ looking rather than overt behavior such as reaching or verbal responses. These suggested that even very young infants understood basic principles of physics. For example, infants at some level appeared to understand the fact that two solid objects cannot occupy the same physical space,[20] and could distinguish causal and noncausal motion events.[21] This work appeared to suggest that infants are endowed with a primitive understanding of objects and their mechanical interactions, agents and their goal-directed interactions, number systems, places, and spatial layouts, as well as the thinking of social partners.[22, 23]

In the field of language development, this approach was complemented by a tendency to imbue the child with language-specific knowledge and processes. It was theorized that children solved the problem of referential ambiguity with deductive hypothesis testing guided by constraints or strategies that narrow down the set of meanings considered for a novel word[8, 24] or by understanding others' referential intent.[25] Thus, when a mom refers to her novel container full of coffee as a ‘mug,’ the child could use the whole object constraint to map the word to the drinking container. And later, when mom said ‘can you grab it by the handle?’ because the child already knows ‘mug’ she would use the mutual exclusivity constraint to assume the new novel word (handle) must be mapped to something else.

Thus, children's systematic behaviors when learning new words were explained by a wide range of different mechanisms, often with competing proposals to explain the same behavior. Take for example, the systematic way that children behave when they hear a novel name in the presence of both known and novel objects. The mutual-exclusivity constraint describes this as a sort of reasoning based on an assumption about how words work.[26] However, children's quick identification of novel referents in this situation could also derive from children understanding that novel names tend go with novel categories.[16] Or children may follow a principal that no two words mean exactly the same thing but that all word meanings contrast in some way.[27] Alternatively, it may be based on children's knowledge about others’ behaviors—e.g., children may assume that adults tend to name the most novel thing in a context.[28, 29] All of these lead to similar patterns of behaviors, though from ostensibly different reasoning principles. However, they also pin this behavior on the idea that the child comes to the table with some useful knowledge or assumptions about how to interpret new words.

In a similar way, the set of explanations proposed for the vocabulary spurt included many different language-specific processes. For example, there might be a shift from learning based on association to a conceptual understanding that words are not just associated to objects, but that they serve to refer to objects as part of a communicative system, that is they act as symbols.[30] Alternatively, children might achieve the sudden insight about the nature of language itself, e.g., realizing that most objects tend to have names (the naming insight),[17] or that that most words refer to categories of objects, not individual items.[31] These accounts offered detailed descriptions of children's behaviors when confronted with novel objects and complex learning scenarios.

One notable feature of all these specialized mechanism accounts of children's reference selection abilities and fast vocabulary growth, however, is that they are domain-specific—they rely on knowledge and processes that are tailored to the specific problems of learning words, and often to specific situations or specific sub-parts of the more general problem. A second notable feature is that they are, for the most part, static: these accounts do not suggest a mechanism for how these word learning behaviors develop. This question of developmental process—the causal events that give rise to the behaviors that support word learning—is driving a shift in the field.


Recently, researchers have begun to look more closely at the problem of referential ambiguity and to examine where these principles and biases that support word learning come from. This closer look gives fresh consideration to the possibility that domain general processes may enable word learning in the context of a sophisticated environment. This newer perspective suggests that more general learning and inference processes, processes that appear in many other domains of cognition, may underlie word learning, and sometimes even conspire to make children look and act as if they have knowledge that is highly specialized for the problem of learning words. This work opens the door to examining how word-learning behaviors are shaped by nonlinguistic aspects of the child's environment and the child's interaction with that environment. It suggests that children may be amazing word learners not because they are endowed with amazing innate abilities, but because they flexibly assemble a set of simple processes to rapidly learn many, many words.

One impetus for the emphasis on domain general processes comes from a novel view of the problem faced by children. The dominant framing of the problem of referential ambiguity largely derives from an adult-centric perspective: Adults know there are many possible ways to talk about a scene and see many possible referents for a novel word. Consequently from the adult perspective, the problem of referential ambiguity looms large and may even be insurmountable—there are just too many possible meanings for a new word in a new scene.

However, this may not accurately reflect the child's perspective. Recent work has examined the referent selection problem from the child's view using head-mounted cameras and eye-tracking systems.[32] It turns out that young word learners do not typically have large numbers of objects in view. Rather, their short arms and smaller stature mean that often there are only one or two objects in view when names are provided (Figure 2).[33] Thus, children confront a much narrower version of referential ambiguity than Quine assumed (contrast the top and bottom panels of Figure 2). While this may not rule in or out more abstract interpretations of a novel word (e.g., feelings and intentions), it certainly gets the child to the right object, and raises the possibility that previously unexplored, more general, factors like the size of the visual field, or the physical abilities of the child play an important role.

Figure 2.

Differences in the number of namable objects in view from the child's (a) and parent's (b) perspective.

Similarly, children's selection of an unnamed object as the referent for a novel word might not be the outcome of a sophisticated deductive reasoning process but instead the result of their attraction to the most novel object in a context.[34, 35] That is, even with no linguistic input, children tend to pay more attention to objects that are new.[36] Similarly, children tend pay attention to things that are in their mother's hands—this simple attentional bias can often lead them to choose items that have recently been manipulated, offered or touched, mimicking a process of social inference where children appear to know what mothers are intending to refer to.[37, 38] Finally, parents often label whatever children happen to be attending to,[39] basically solving the referential ambiguity problem for the child. All of these situations are ones that we used to think of as driven by knowledge (like mutual exclusivity) or skills (like social inference) that were geared to learning words. However, as we describe below, it quickly becomes apparent that they could also be the result of many general processes—like attentional biases–that work together to support infants’ selection of a referent in the moment. That does not of course rule out that such knowledge or skills play a role—particularly later in development as children become cognitively and socially more advanced. However, it again highlights how rather unexpected domain general factors could be doing much of the work, particularly early in infancy.


It is also clear that the act of referent selection is only part of the process—and perhaps not even the most important part. Children must still remember labels for new objects, they must store the visual or semantic features of the referent, and they must form a durable link between the two so that the word can be recognized. While the classic view suggested that constraints like mutual exclusivity were the basis for this learning, recent research suggests that this critical step in the process may not be as simple as it first appeared. That is, children may be able to figure out what object goes with a new word in order to respond to a parent's or experimenter's request, but that does not necessarily mean that they remember this mapping. For example, 2-year-old children are great at selecting a novel object when prompted with a novel word. However, when you test them on those same supposedly ‘fast-mapped’ words 5 min later, they are at chance.[40] This has not always been apparent because many prior studies did not test the children after a delay or failed to test their memory, instead retesting their ability to solve the mapping problem.

At the same time, though, retention is not divorced from the process of word learning. When children explore the to-be-named novel objects prior to the naming event, retention increases.[41] Retention abilities also improve over the course of early vocabulary development, such that by 2.5 years children do reliably demonstrate retention of word-referent mappings formed after brief exposure.[42] It thus appears that rather than instantaneously learning novel word–object mappings from the very earliest ages, children's word learning abilities grow as they acquire vocabulary[35, 43] and knowledge about things in the world, how they are named, how people talk about them,[33] and how people interact.[44]

But, this is bigger than just fast-mapping—learning a word is really a slow process of gradually determining what kinds of things a word refers to, see, e.g., Refs [15, 19, 35, 41, 42, 45-47]. In fact, recent studies have shown that children and adults can learn new word-object mappings even if there is no information to solve the referential ambiguity problem in a given encounter—when all the objects in an encounter are equally novel.[48, 49] In this situation, it appears that what learners do is gradually accumulate how likely a word is to be heard with many different objects, and choose the most likely object for a word,[50] but see Ref [51]. This again suggests that the process of retention is distinct from the process of referent selection since people appear to also be able to retain words even without successful referent selection.

Furthermore, longer term learning is not quite the same as the processes children use to solve the referential ambiguity problem.[19] Rather, recent experiments with fast-mapping, suggest it is not necessarily a logical inference problem. When children encounter a novel word, there are multiple possible interpretations. These compete during the short time between when the word is heard and when the child responds, and this competition is biased by a variety of domain-general processes like attention, selection, and the history of learning about the words and the objects; it may also be biased in older children by things like their understanding of others’ intentions or their knowledge of the language.[52] At the end of this competition, the link between the word and the interpretation of a word that wins (e.g., the referent selected in that moment) is strengthened while any links between that word and other possible referents are weakened.[13]

A critical insight here is that competition among potential interpretations—the basic mechanism underlying referent selection—is the basis of behavior and development in a variety of other domains like music perception, categorization, visual search, and decision-making.[53] This suggests then, that referent selection may fundamentally derive from general processes—though operating on linguistic, social, and visual inputs. Thus, as with the referential ambiguity problem, fast-mapping is starting to be seen as the product of multiple domain-general processes that do not contain specific knowledge about language.[19] As described next, recent theorizing on the vocabulary spurt reaches the same conclusion.


With the renewed emphasis on retention and on building links between words and meanings over many encounters, work on fast-mapping has started to focus on long-term processes that unfold over development. However, a sizeable body of research has examined even longer time scales, asking how the child's vocabulary (typically the number of words known) changes over the course of months or even years. Do we see a similar move toward domain general processes here? As it turns out, the answer is ‘yes.’

One of the most important phenomena in this domain is the so-called vocabulary spurt. The vocabulary spurt is defined by a rapid acceleration of the pace at which toddlers add new words to their productive vocabulary. As can be seen in Figure 3, in the first few months after children produce their first word, new words are added to the vocabulary slowly—one or two a week. Around the time that children have 50 words in their productive vocabulary, typically near 18 months of age, they start adding words much more quickly. Thus, there appears to be a nonlinear shift in vocabulary development.

Figure 3.

Number of words known as a function of time for individual children. (Adapted from Ref [54]. Copyright 1993 Cambridge University Press)

This phenomenon had previously been understood to indicate an underlying shift in the mechanisms supporting word learning. These kind of explanations included things like the sudden onset of constraints or principles (like mutual exclusivity), the acquisition of skills for inferring other people's intentions (e.g., which object did they intend to name), or a sudden insight about language like the naming insight. However, McMurray[55] demonstrated that the accelerating trajectory of the word spurt is actually the necessary consequence of two basic facts about word learning: (1) children learn multiple words at once[56] and (2) those words vary in difficulty (with most words being moderately difficult). These are both fairly noncontroversial.

With respect to the first criteria—multiple words learned simultaneously—when a child is trying to learn the meaning of ‘cup,’ she is simultaneously also trying to learn ‘dog,’ ‘run,’ ‘blue,’ ‘four,’ ‘share,’ and so on. With respect to the second—variable difficulty—‘cup’ refers to a concrete object that is easy to individuate and is highly similar to other things called cup. In contrast, far more words like ‘share’ refer to complex relational actions, abstract nouns, or properties that must be interpreted relative to the object. These are more difficult. McMurray showed mathematically that the combination of these two things always produces an accelerating learning curve, whether the focus of the learning is words, motor patterns, or recipes. Thus, the vocabulary explosion can be explained without recourse to a change in mechanism and without the need for specialized processes.

That is not to say that children's social skills may not also be improving at this time, or that they are not developing new strategies that can assist in referent selection and/or retention. Indeed, developmental studies suggest there are big changes around this age in a number of abilities including children's use of eye gaze,[57] their general pragmatic competence,[58] and categorization.[31] Likewise, exciting recent work demonstrates that as children's vocabularies grow, they become better able to use what they know about words,[59] how words go together,[60-62] and how people talk to each other[63] to learn even more words. However, such changes are not required to explain the spurt—it is an emergent consequence of a very simple property of learning.


New approaches to the problem of referential ambiguity, fast-mapping, and the vocabulary spurt illustrate a contemporary shift in theorizing regarding early word learning. This emerging view emphasizes the importance of domain-general processes like novelty, attention, statistical learning, association, competition, and parallel learning, as well as ecological factors like the properties of the body and communicative context.

More importantly, however, this new view suggests that all of these general processes are at the heart of early word learning and that they work together with developing social competencies—that also extend beyond the realm of word learning—to support and bootstrap both the child's initial lexical development and their growing representations of syntax and more complex linguistic mappings. Furthermore, these processes unfold dynamically over time. As a result, objects that draw the attention of a 15-month-old with a small vocabulary will present themselves differently when she is 30 months of age and knows more words and can engage in more complex linguistic interactions with others. In this way, domain-general processes that support word learning change over development to enable smart word learning to emerge from the joint action of multiple simple processes—none of which by itself is particularly smart. Thus, this perspective suggests that word learning is amazing not for being supported by domain-specific and special-purpose processes, but for the way simple, domain-general processes work together as a system to support flexibility and development.

This perspective is at its heart a developmental systems perspective—the idea that development is the product of bidirectional interactions between genes, biology, and the environment (see Blumberg, Development evolving: the origins and meanings of instinct, and Lickliter, Developmental evolution, WIREs Cogn Sci, also in the collection How We Develop), and mediated by the real-time behavior of the child.[64-66] This perspective opens the door both for greater understanding of how the child and environment influence each other and how processes in different domains interact. For example, recent work suggests that the presence of a visual referent can boost children's ability to distinguish similar sounds;[67] but see also Ref [68]; and 2-year olds can use memories of what has been seen where to link names to objects.[69]

We are also starting to understand how these influences cascade over development. This is critical when we consider that word learning is not conducted in a vacuum—children must learn which words go with which meanings at the same time as they are learning how to produce and perceive speech. For example, Jana Iverson and colleagues have examined the fact that children at risk for autism show a later onset of complex babbling (see Wozniak, et al., The development of autism spectrum disorders: variability and causal complexity, WIREs Cogn Sci, also in the collection How We Develop). They investigated the developmental precursors of this, finding that these children also show less mature visual-manual exploration which, in turn, leads them to less oral exploration of objects (e.g., mouthing objects), which impairs their articulatory development.[70-72] It is these kinds of developmental cascades—from primarily real-time behaviors like manual exploration and mouthing, to longer term developmental changes like the stability and precision of speech articulation—that create the articulation and auditory perception abilities that are the foundation for the word learning skills we have discussed here.

Furthermore, recent work suggests that these complex problems of development that cross multiple levels from perception, to action, to social interaction, to cognition, and timescales from in-the-moment behavior to learning, may actually be easier to solve simultaneously rather than in isolation. For, example recent computational modeling[73] suggests as children acquire mappings between words and object mappings (as we have discussed here), this may actually help early auditory organization, by teaching them which sounds are meaningfully different.

This systems view may also pave the way for smarter interventions. It is well known that children's word knowledge can vary greatly across factors such as socioeconomic status, gender, and reading level. For example, children who struggle with language and hearing impairments know fewer words and know less about them.[74-76] But an overemphasis on the role of endowed knowledge and/or constraints offers little leverage when learning goes awry. For instance, if we believe the primary deficit in autism is an innate inability to understand the intentions of others, intervention must focus on changes to that endowment. In contrast, Iverson's work suggests interventions for children at risk for autism should focus on supporting early motor development—boosting infants’ abilities to manipulate and explore objects which may cascade forward to increase exploration in oral articulation, setting the stage for early communication. Similarly, recent research suggests children with specific language impairment have a deficit in real-time processing such that competition between representations of words is not strong enough to resolve ambiguity during recognition. This could cascade forward to hurt future leaning because an inability to determine the correct word in the moment means representations cannot be updated with new information. This raises the possibility that early interventions aimed at boosting competitive recognition processes may change the later course of word learning and language development in these children.

Of course, much work is needed to specify the relations between real-time behaviors, learning, and development. But recent changes in multiple aspects of the field—from experimental, observational, and statistical methods, to the theoretical view of where knowledge originates—open the door to a much richer understanding of a child's developing language system and may also offer multiple avenues for changing it.[77]


Preparation of this article was supported by National Institutes of Health (NIH) grants to LKS (HD045713) and BM (DC008089). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.