1.1. Typological universals: Evidence of universal properties of the cognitive system?
A topic of debate in cognitive science since antiquity is the origin of knowledge, specifically, the relative contributions of environmental experience and learner-imposed structure. One classic field on which this debate has played out is grammatical knowledge. The existence of many statistical generalizations stating that seemingly arbitrary properties are shared by the world's languages—so-called typological universals—has been taken by some as compelling (if circumstantial) evidence that human learners are internally biased to impose those properties on the grammars they acquire, acting as agents of language change.1 That the debate is still very much alive is attested by contrary positions claimed to be superior, or generally more plausible, in recent work such as Nettle (1999), Bybee (2008), Evans and Levinson (2009), Tomasello (2009), and Levinson and Evans (2010). According to a mainstream version of the learning bias hypothesis, languages change over time because the statistical distribution of grammars acquired by one generation of learners is systematically different from the distribution of grammars deployed by the previous generation (Kroch, 2000; Lightfoot, 2006): Learners shift the distribution in favor of grammars exhibiting the properties preferred by their biases—these properties then emerge over time as typological universals (Kirby, 1999). Statistical mixtures are implicated in the variation that permeates language, at all levels. A language changes from an earlier state, predominantly exhibiting some particular pattern, to a later state, predominantly exhibiting a different pattern, as the statistical mixture of grammars used by speakers shifts away from the former pattern toward the latter.
This work addresses direct behavioral evidence that language learners do indeed shift grammar mixtures in favor of those grammars with properties observed to be favored typologically. In this article, we develop a computational learning model to formalize a notion of learning bias that can explain, quantitatively, how learners in artificial language-learning experiments systematically alter the language they are exposed to. In a particular case, we show that the bias identified by the model turns out to indeed formalize a prior preference for grammars obeying the relevant typological regularity (of word order).2
For the test case considered here, the relevant cross-linguistic regularity (see Table 1) is one that was identified by Joseph Greenberg in his groundbreaking typological work; he codified it as ‘‘Universal 18.’’
Greenberg’s Universal 18 bans languages combining word orders Adjective-Noun and Noun–Numeral (shaded cell)
- 1 Universal 18: Languages that [predominantly] order adjectives before nouns also [predominantly] order numerals before nouns—but not conversely (Greenberg, 1963, paraphrased).
This may be one of those ‘‘seemingly-arbitrary’’ typological generalizations, which to some suggest that learners impose properties on languages. And indeed, a bias parallel to Universal 18 appeared to be at work in the artificial language-learning experiment reported in Culbertson et al. (2012). The goal of the present article is to quantify, in a formal learning model, the bias exhibited by these learners—explaining the basis of such a bias is not in the scope of this article (but see Culbertson et al., 2012, for much discussion). We will treat the learners’ bias as crucially related to the properties of the linguistic structures involved, not entirely reducible to more general factors.
The model we propose addresses a general artificial language-learning paradigm in which participants are exposed to an artificial language displaying variation, motivated by the view of language change discussed above. This experimental paradigm was introduced by Hudson Kam and Newport (2005, 2009) in a line of research showing that, under certain conditions, adult and child learners will reduce the degree of variation in—regularize—a language. The bias of learners to shift mixtures to make them more regular and their bias to favor grammars that respect typological regularities like Universal 18 turn out to interact in a way that makes this experimental paradigm a particularly sensitive one for revealing those biases. The model is tested on learning data (reported in Culbertson et al., 2012) for languages generated by statistical mixtures of rules governing the order of adjectives, numerals, and nouns in simple, two-word phrases. The data are comprised of utterances produced by participants after exposure training. The fundamental hypothesis, supported by the data reviewed below, is that learners will regularize only in the direction of grammars that obey the constraints exhibited in typological generalizations.
This article proceeds as follows: in section 2, we review the relevant aspects of the experimental results we model as our test case, then in section 3 we provide some background, introducing the Bayesian approach to learning. In section 4, we start with a high-level, conceptual introduction to the model, and then move on to provide a technical description. In section 5, we describe how the model parameters are fit and report the results of the modeling process. Section 6 summarizes our findings and presents conclusions that we would like to draw from them concerning learning and linguistic typology. First, however, we will discuss models of artificial language-learning experiments that have been proposed previously and highlight the novel contribution made by this work.
1.2. Previous work
When studying any cognitive function, it is important to separate those phenomena that can be best characterized in domain-independent terms from those that cannot. This is especially true in the case of language, given the debate mentioned above concerning the extent to which language learning relies on general cognitive mechanisms rather than principles specific to the domain of language. To understand the questions at stake in the research reported here, and the nature of the original contribution, requires consideration of the domain-dependence issue, which arises in particular in Bayesian modeling of artificial language learning. Previous work in this area has addressed issues in syntax and word learning using domain-independent models, while the work we report here formalizes a problem in a domain-specific form.
A Bayesian model of an artificial language-learning experiment reported in Wonnacott, Newport, and Tanenhaus (2008) is presented in Perfors, Tenenbaum, and Wonnacott (2010) (for a similar study, see Hsu & Griffiths, 2009). The experiment examines the relationship between distributional information in the input and learners’ willingness to generalize—that is, allow novel items, here verbs, to appear in multiple syntactic constructions. Learners exposed to training data in which all verbs are presented in both constructions are contrasted with learners receiving data in which each verb is presented in only one construction. The key question is, given a novel verb presented in one construction, will it be predicted to be felicitous in the other construction as well? In the experiment, and one version of the model, the finding is that learners are more likely to predict that the verb can be used in the unpresented construction when verbs in the training data did so.
The problem Perfors et al. (2010) are fundamentally interested in concerns what learners infer about some set of data in the absence of negative evidence. Although in the Wonnacott et al. (2008) experiment, data are verbs and constructions, this question arises in a number of other domains, and Perfors et al. (2010) thus seek to explain learners’ behavior using a domain-general model. Models of this kind allow us to see precisely how part of the problem of learning which verbs may appear in which constructions—a well-studied problem in language acquisition—can be formally understood solely in the general terms of the distribution of items into categories.
This work addresses a case of what is hypothesized to be specifically a linguistic rather than domain-general structure. In particular, we ask whether cross-linguistic generalizations about this structure are psychologically real—specifically, manifest in the biases of the cognitive language-learning mechanism. In the artificial language-learning experiment being modeled (Culbertson et al., 2012), the stimuli to which learners are exposed are two-word phrases consisting of a modifier—an adjective or a numeral—and a noun. The word order of the phrase locates the modifier in either pre- or post-nominal position. For the design of the experiment and of the model, the hypothesis is that these linguistic (or substantive) dimensions are critical; the particular modifier types and word orders are not interchangeable. This is because the hypothesis under investigation is that a language in which adjectives are predominately pre-nominal, and numerals are predominantly post-nominal, will be less learnable than the language resulting from exchanging the word orders of the modifier types, all else equal. It is this that Greenberg's Universal 18 predicts, under our general hypothesis that if a language type is very rare among the world's languages, then it is less learnable—or more precisely, that the human language-learning system is biased against those linguistic patterns that are observed to be typologically rare.
In addition to linguistic-structure-specific issues, the Culbertson et al. (2012) experiment also involves the domain-general question of how learners treat variation. Regularization—reduction of variation—was the subject of work by Reali and Griffiths (2009), who developed a Bayesian model of artificial language-learning experiments in which participants are exposed to novel objects which are given inconsistent labels (two novel labels are used for each object, with varying frequencies). As with the Perfors et al. (2010) study, Reali and Griffiths (2009) show that a domain-general model can capture an important aspect of learning behavior—in this case the trajectory of regularization over generations of speakers. Interestingly, Reali and Griffiths (2009) also show that not all sources of variation are subject to regularization by learners; in another experiment, they show that the variation associated with the outcome of a coin toss is not regularized. In the work presented here, we are interested in how regularization interacts with Universal 18’s specific predictions concerning the substantive linguistic biases of learners. According to our hypothesis, regularization only proceeds in directions favored by substantive biases; hence, degree of regularization serves as an index that we can use to infer the particular content of those biases.
Our focus on the substantive, specifically linguistic aspects of Universal 18 affects the formal structure of our model. The bias—prior probability distribution—must depend jointly on the relevant substantive linguistic dimensions. For according to Universal 18, there is no asymmetry between pre- and post-nominal position for adjectives alone, nor for numerals alone: Both types of modifiers occur with great frequency in both positions across the languages of the world. What is very rare is the combination, within a single language, of adjectives in a particular position—pre-nominal—with numerals in a different position—post-nominal. This demands a new type of prior distribution, which we will develop below (section 4).