In language acquisition research, a hypothesis is gaining ground that children may exploit various sources of low-level information available in the input to start individuating structural linguistic relations such as lexical categories. Because most sources of information are probabilistic, it is further hypothesized that the child must ultimately integrate them using simple learning mechanisms. Although the potential importance of word beginnings and endings has long been noted (Maratsos & Chalkley, 1980; Peters, 1985; Slobin, 1973, 1985;), no empirical study has assessed their usefulness in learning syntactic categories, and we decided to make a quantitative estimate based on corpora of child-directed speech of English, French, Dutch, and Japanese. In this article, we have demonstrated that there are effective correlations between word-edge phonetic cues and lexical categories, which might potentially support the development of lexical knowledge.
We have also made suggestions for moving toward more plausible computational analyses of the early stages of language acquisition. Plausibility criteria were introduced from a developmental perspective. First, language knowledge is constructed progressively and some linguistic categorizations are learned before others. We considered that an initial state of lexical categorization may not involve fine-grained adult-like category distinctions. Rather, lexical categorization may start by distinguishing those categories that children seem to learn first, namely nouns and verbs, perhaps because these categories can map more directly to clear semantic properties of the world, such as objects for nouns and states/actions for verbs (Pinker, 1984). For this reason our analyses involved a coarse distinction between nouns, verbs, and other words, where “other” was a category on its own that would be split into finer-grained categories such as adjectives, determiners, etc. at a later stage. Another dimension of psychological plausibility pertains to the type of input representation considered: In this respect, we attempted to make minimal assumptions about the linguistic units available to the child and their perceptual accessibility at an early stage of development. After assessing the usefulness of linguistically-defined morphological affixes in Experiment 1, we showed that similarly good categorization results can be obtained by a naïve learner that simply focused on the first and last phoneme of a word—a more usable source of information because by their second year of life infants have developed a striking sensitivity to the sound patterns of their language as well as a sensitivity to word beginnings and endings.
Another crucial criterion of plausibility was generalizability from an original subset of learned word-form-to-category mappings to the rest of the lexicon, as well as generalizability of word-edge cues across different languages. Because discriminant analysis is a supervised method that requires providing a lexical category in all but one word at each step we presented a largely unsupervised model in which the size of the pre-labeled corpus was minimal. In Experiment 3 a simple two-layer perceptron generalized its knowledge of word edges to predict the lexical category of unseen words, after being trained on word-to-category mappings for only a small subset of the whole corpus. In other words, the system only needs a limited number of labeled cases to “get off the ground.” The source of information for the labeled cases may come from consistent semantic word-to-world mappings (Pinker, 1984), from distributional information (e.g., Redington et al., 1998), or social cues (Tomasello, 2000), or a combination of these cues. Finally, Experiments 4 through 6 proposed that the word-edge procedure extends to three languages other than English, and laid out the basis for future empirical studies with other languages.
Some considerations are in order with respect to word-edge information: In our analyses, we chose the phoneme as the relevant unit. There is evidence that speakers of “stress-timed” languages such as English and Dutch show greater access to phonemes (e.g., Cutler, Mehler, Norris, & Segui, 1986; Cutler & Norris, 1988; Vroomen, van Zon, & de Gelder, 1996). It may be that children are sensitive to other word beginning and ending units larger than the phoneme. For instance, speakers of “syllable-timed” languages (e.g., French, Italian, Spanish, Catalan, and Portuguese) show a processing advantage for syllables (e.g., Mehler, Dommergues, Frauenfelder, & Segui, 1981; Morais, Content, Cary, Mehler, & Segui, 1989; Sebastián-Gallés, Dupoux, Segui, & Mehler, 1992), and Japanese adults use morae as the primary unit of segmentation (Otake, Hatano, Cutler, & Mehler, 1993). Future additional analyses done on such units might shed more light on this aspect. In Experiment 6 it was noted that to the extent that the mora in Japanese may correspond to a single phoneme our analysis is partly coextensive with a mora-unit analysis.
There are other reasons to believe that the phoneme as a unit has a fundamental role in language acquisition. In corpus analyses of English child-directed speech from CHILDES and other corpora Hockema (2006) showed that the speech stream is primarily characterized by phoneme transitions that tend to be of just two kinds: those that occur within a word and those that occur between words. Preliminary results (Christiansen, Hockema, & Onnis, 2006; Christiansen, Onnis, & Hockema, 2008) suggest that a statistical learner that tracked transitional probability information between phonemes would be able to discover the boundaries of lexical units in continuous speech and—because these boundaries largely coincide with the analyses reported in this article—use such boundaries to start determining the lexical category of such units. Hence, statistical information about the distribution of phonemes in the speech signal can contribute to attenuate two major language acquisition problems at the same time: speech segmentation and syntactic category assignment. These analyses thus indicate that the simplification in the current analyses involving a perfectly segmented speech corpus is not necessary for word-edge cues to be useful. Christiansen et al. (2008) outlined a plausible segmentation scenario in which the segmentation outcome was suboptimal but, more important, still provided a solid basis for the discovery of lexical categories using word-edge information. Further delineation of potential developmental trajectories of early word segmentation and lexical discovery is outside the scope of this article, although we are pursuing it in other work. Below, we expand on the results obtained in the discriminant analyses experiments and specify further post-hoc analyses.
9.1. Further analyses I: From usefulness to usability
In discovering language structure from speech the search space is vast. Because specific mappings for a specific human language must be learned by the newborn child, a first step toward reducing the search space is to assess what statistical properties of the specific language are potentially useful and usable. Several studies reviewed here have indicated that the sound properties of words are both useful and used for grammatical categorization. In this article, we have proposed a distinction between usefulness and usability in computational models of language acquisition. Although many useful sources of information may be present in the raw input from birth, they may not become usable until later stages of perceptual development. Therefore, we introduce the notion of usability as a series of prerequisites for the psychological reality of language acquisition models. A first distinction between usefulness and usability incorporates a notable distinction between input and uptake. M. Harris (1992) defined uptake as “that part of the input that is actually attended to by the child.” To give an example, based on Santelmann and Jusczyk (1998) the uptake from the input for tracking non-adjacent relations such as is …-ing at 18 months is limited to three successive syllables (see also Gallaway & Richards, 1994, for a further distinction between uptake and intake).
Several studies have proposed that processing restrictions are actually beneficial to the child, in that they allow focusing on certain basic properties of language upon which to build further language at successive stages (see the “less is more” hypothesis; e.g., Elman, 1993; Newport, 1990). Hence, in our Experiment 2 we showed that cues that are more likely to be used by young children—phonemes at the edge of the word as opposed to full-fledge morphological units—are comparable to morphological information. In addition, the finding that word edges are useful in a variety of languages other than English (Experiments 4–6) lends considerable additional credence to the usability of these cues. Specific cues need not be the same across languages: A Venn diagram (Fig. 8) shows the partial overlap of the cues entered in the stepwise analyses for the various languages. More important, no single cue is universally important for all four of these languages.
In pursuing our goal of specifying usability in computational models of language, we were interested in comparing the validity of our word-edge cue procedure with other corpus-based estimates of useful phonological information. In particular, Monaghan et al. (2005) compiled extensive corpus measures of phonological information available in child-directed speech. They used 16 phonological cues that that have been suggested to be relevant for discriminating between noun and verbs and between function words and content words. The cues were at the word level (phoneme length, syllable length, presence of stress, and stress position), at the syllable level (onset complexity, syllabic complexity, reduced syllables, reduced first vowel, -ed inflection), and at the phoneme level (presence of coronals, initial unvoiced dental fricative, final voicing, nasal, stressed vowel position, vowel position, vowel height). Monaghan et al. reported classification results of 58.5% for Nouns and 68.3% for Verbs (61.3% overall classification). In order to compare the usefulness of our word edges directly with Monaghan et al.'s results we ran a linear discriminant analysis on the simpler 2-way (Noun/Verb) classification task. Hence, we entered only nouns and verbs in the analysis, leaving out all other words in the corpus. This resulted in 76.9% overall correct classification (Nouns = 83.5%, Verbs = 61.8%), which was significantly better than Monaghan et al.'s phonological cues (π2 = 192.12, p ≤ .001). Word-edge information requires combination of only two very salient features, the first and the last phoneme of a word, and is arguably a simpler source of information than the 16 combined phonological cues. This fact, coupled with better classification results of the discriminant analyses, increases the usability of word edge information.
In the Introduction, we also argued that a working definition of usability should take into consideration the partial and non-adult-like status of children's initial lexical categories (Nelson, 1973; Tomasello, 2000), in particular when it comes to modeling the very early stages of language development. We, therefore, made the simplifying assumption that children would start by classifying the most relevant lexical categories, nouns and verbs, from the beginning, whereas other categories would be lumped together in a “super category,” namely Other. Although Monaghan et al. (2005) also made a similar starting assumption, they excluded the Other category altogether from their analysis. This is equivalent to the child being exposed to a corpus of only nouns and verbs, with all other words being completely wiped out from the input. Although not impossible, this simplification assumes a filtering process, requiring the child first to divide the lexicon into two super-categories—Nouns + Verbs on the one side, and Other words on the other. At a later stage the child would filter out the Other words in order to focus on categorizing nouns and verbs first. This step was not modeled by Monaghan et al. Conversely our three-way classification task maintained a plausible simplified non-adult-like categorization of the lexicon (Noun, Verb, Other) without the need to filter out the input from words “irrelevant” to the task. In this respect, our three-way classification task may be more psychologically plausible. For this reason, we also wanted to compare how our word-edge procedure fared with respect to the 16 phonological cues of Monaghan et al. on the more complex three-way classification task. Hence, we ran a discriminant analysis on our corpus6 using the 16 phonological cues as independent variables: We obtained an overall 46.4% classification (41.2% Nouns, 60.5% Verbs, and 44.3% Other). Word-edge cues of Experiment 2 were significantly better than Monaghan et al.'s cues on the three-category distinction (π2 = 186.19, p ≤ .001).
From our comparisons with Monaghan et al. (2005), we can draw a series of conclusions: First, the usefulness of Monaghan et al.'s phonological cues was confirmed even in the three-way classification, although clearly to a lesser degree than their original two-way classification. This lower performance is not problematic considering that a large portion of the lexicon was excluded in the two-way classification. Second, our simpler word-edge discovery procedure was better than the 16 phonological cues in both two-way and three-way classification tasks. However, the word-edge classification procedure has the advantage of having five times more dimensions available for carving up the word space, and this is likely to have contributed to the differences in classification performance between the two types of cues. Moreover, the phonological cues may be particularly useful for English verb classification (Christiansen & Monaghan, 2006), something that the word-edge analyses did not indicate for English, but only for Dutch, French, and Japanese. Given that there is little overlap between our word-edge cues and the 16 phonological cues used by Monaghan et al., children could potentially use both types of cues for lexical category discovery. Last, although performance on the three-way classification task was obviously lower than on the two-way classification task for both phonological cues and word edges, the three-way classification is more psychologically plausible because it does not exclude words from the input.
9.2. Further analyses II: Differential contribution of beginnings and endings
An important question in evaluating word-edge information is whether beginnings and endings contribute equally to classification or whether one of the two is more informative and whether this is true across languages. To this end we ran further discrimination analyses using only word-edge beginnings and only word-edge endings respectively as predictors of lexical category. We examined which words were correctly classified by these partial analyses, and also explored which words were correctly classified using beginning cues only but which were incorrectly classified using ending cues, and vice versa, the case where ending cues produced a correct classification but beginning cues resulted in incorrect classification. Finally, we also noted those words that were classified incorrectly by both analyses. There were two possibilities for the resulting classifications. It may be that the same words are correctly classified by analyses based on both cue types, or it may be that there is complementarity in the classifications: Those words incorrectly classified by, say, the beginning cues, may be correctly classified by the ending cues. Table 7 presents the results for English, showing the number of words on which the classifications agreed and disagreed. A hierarchical loglinear analysis were used to assess whether there are main effects and interactions between the classifications based on the different cue types and the Noun/Verb/Other category. One-, two-, and three-way log-linear analyses on the data shown in Table 7 were carried out. The one-way analyses refer to main effects in the table, the two-way analyses refer to interactions between two of the factors, and the three-way analysis tests whether there is a three-way interaction in the table. The one-way effect of Category (Noun, Verb, Other; π2(24, N = 4730) = 3,104.83, p < .001) can be explained by there being more nouns than verbs and other words. The one-way effects of beginning cues, π2(24, N = 4730) = 3,583.89, p < .001, and ending cues, π2(24, N = 4730) = 2,397.65, p < .001 reflected the fact that each classification assigned words to the correct category significantly more than by chance. The two-way effects of Category by beginning cues, π2(22, N = 4730) = 2,756.24, p < .001, and Category by ending cues, π2(22, N = 4730) = 1,570, p < .001, indicate that the classifications were more successful overall for nouns and verbs than for other words, which is reasonable given that the Other category is a heterogeneous super category. The two-way effect of beginning cues by ending cues, π2(22, N = 4730) = 2,049.05, p < .001, was due to ending cues being more effective in classifying words than the beginning cues. However, interpretation of these lower-level interactions must be moderated by the three-way interaction.
Table 7. Correct and incorrect classifications based on beginnings or endings
| || ||Endings|| |
| || ||Endings|| |
| || ||Endings|| |
The three-way interaction (Category × Beginning × Ending; π2(20, N = 4730) = 1,221.4, p < .001) suggests that the combination of beginning and ending information operates differently for nouns, verbs, and other. The principal differences in the classifications in Table 7 are the number of words that the beginning and ending cues classify wrongly. For nouns, beginning cues misclassify almost twice (1,282) as many words as ending cues (706), so 72% of Nouns incorrectly classified by beginnings were remedied by endings, whereas 49% of Nouns incorrectly classified by endings were remedied by beginnings. A similar trend can be seen for verbs: 62% of Verbs incorrectly classified by beginnings are remedied by endings, whereas 47% of Verbs incorrectly classified by endings are remedied by beginnings. Hence, it appears that whereas both beginnings and endings contribute to correct classification, endings contribute more for nouns and verbs. For the Other category, however, there is a preferential role for beginnings over endings: 30% of Other words misclassified by beginnings were remedied by endings, and 39% of Other misclassified by endings were remedied by beginnings.
How do beginnings and endings operate in Dutch, French, and Japanese? Do they contribute differentially to lexical categorization as in English, where endings seem particularly informative in determining nouns and verb? If this was true, then a case could be made for the preferential role of ending cues across languages, as Slobin (1973) suggested. Therefore, we conducted one-, two-, and three-way hierarchical log-linear analyses for Dutch, French, and Japanese in the same way as for English above.
For Dutch, all main effects and interactions were highly significant (p < .001). Looking at the interactions, the classifications were more successful for nouns and verbs over other words. In addition, 62% of incorrect classifications by endings were remedied by beginnings, whereas 38% of incorrect classifications by beginnings were remedied by endings. For verbs, the opposite pattern applies: Only 26% of incorrect classifications by endings were remedied by beginnings, whereas 84% of beginnings were remedied by endings, suggesting that ending cues are very effective on classification of verbs, whereas beginning cues were particularly effective for nouns in Dutch. For Other words, there was no particular preference for beginning or ending cues, although both still contributed to correct about 30% of each other's misclassifications. This pattern is somewhat different from English, where endings appear more effective for nouns and verbs, whereas beginnings work better for other.
For French, all main effects and interactions were significant (p < .001). Endings overall contributed more to correct classification than beginnings, although to a lesser extent than English and Dutch. Both beginnings and endings contributed equally to correct each other's noun misclassifications (55% and 53% respectively), while endings contributed more for verbs (correcting 51% of misclassifications by beginnings) and other words (correcting 44% of misclassifications by beginnings). Again, this pattern is slightly different from English and Dutch.
For Japanese, all main effects and interactions were significant (p < .001). Beginnings were more useful for classifying nouns and other words and endings were more useful verbs: 51% of nouns, 64% of verbs, and 42% of other words misclassified by endings were remedied by beginnings, whereas 39% of nouns, 75% of verbs, and 32% of other words misclassified by beginnings were remedied by endings. This pattern is more similar to Dutch than to English and French, suggesting language-specific patterns of informativeness of the cues.
In pursuing these detailed investigations of word edges it is useful to keep in mind that the analyses do not tell us directly how the learning mechanism works, but they provide us with useful information to infer how a learning mechanism should work if it were to capitalize on word-edge information contained in the input. What the results of the cross-linguistic log-linear analyses reveal can be summed up as follows: First, beginnings and endings help each other in reducing misclassifications, and therefore a learning mechanism that capitalizes on this information for individuating lexical categories should be perceptually attuned to integrating both types on information simultaneously from an early stage. The integration part is particularly relevant. This supports earlier postulations of “operating principles” (Peters, 1985; Slobin, 1985) that the learning mechanism should pay attention to the boundaries of speech units. Second, there does not seem to be a universal preference in informativeness attached to the endings of words, at least based on the four languages investigated here, and this runs counter to Slobin's (1973) idea that a general cognitive bias for endings should be in place. Slobin's (1973) account was based on cross-linguistic evidence that locative markers in postverbal and postnominal positions (as in Hungarian) tend to be acquired before ones in preverbal and pronominal positions (as in Serbo-Croatian).
Our analyses and others (e.g., Fisher & Tokura, 1996) seem to suggest that a constellation of cues operate differently for different categories across languages, and as long as these cues complement each other when integrated, they will provide a solid statistical scaffolding to structure discovery directly available in the speech signal. Our analyses could be used in further behavioral and computational studies assessing the contribution of other sources of information integrated with word-edge information. We conclude that simple computational principles can be quite powerful even in isolation, although a complete account of language acquisition will require a combination of many simple computational principles for the detection and integration of multiple sources of probabilistic information.