Learning Phonemes With a Proto-Lexicon
Article first published online: 17 SEP 2012
Copyright © 2012 Cognitive Science Society, Inc.
Volume 37, Issue 1, pages 103–124, January/February 2013
How to Cite
Martin, A., Peperkamp, S. and Dupoux, E. (2013), Learning Phonemes With a Proto-Lexicon. Cognitive Science, 37: 103–124. doi: 10.1111/j.1551-6709.2012.01267.x
- Issue published online: 17 JAN 2013
- Article first published online: 17 SEP 2012
- Received 26 May 2011; received in revised form 28 November 2011; accepted 5 March 2012
- First language acquisition;
- Statistical learning;
- Allophonic rules
Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language-specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, and Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally differing word forms in speech heard by the infant, is also useful for learning phonemic categories and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, allowing them to take advantage of top-down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.