Testing the Limits of Long-Distance Learning: Learning Beyond a Three-Segment Window


should be sent to Sara Finley, Department of Brain and Cognitive Sciences, University of Rochester, Meliora Hall, Rochester, NY 14627. E-mail: sfinley@bcs.rochester.edu


Traditional flat-structured bigram and trigram models of phonotactics are useful because they capture a large number of facts about phonological processes. Additionally, these models predict that local interactions should be easier to learn than long-distance ones because long-distance dependencies are difficult to capture with these models. Long-distance phonotactic patterns have been observed by linguists in many languages, who have proposed different kinds of models, including feature-based bigram and trigram models, as well as precedence models. Contrary to flat-structured bigram and trigram models, these alternatives capture unbounded dependencies because at an abstract level of representation, the relevant elements are locally dependent, even if they are not adjacent at the observable level. Using an artificial grammar learning paradigm, we provide additional support for these alternative models of phonotactics. Participants in two experiments were exposed to a long-distance consonant-harmony pattern in which the first consonant of a five-syllable word was [s] or [∫] (“sh”) and triggered a suffix that was either [-su] or [-∫u] depending on the sibilant quality of this first consonant. Participants learned this pattern, despite the large distance between the trigger and the target, suggesting that when participants learn long-distance phonological patterns, that pattern is learned without specific reference to distance.

1. Introduction

An important aspect of human cognition is how patterns and associations are learned at a distance. In language, the vast majority of patterns occur between adjacent elements (Chomsky, 1981; Culicover & Wilkins, 1984). However, there are a large number of apparent exceptions to this generalization. These long-distance patterns pose important challenges for both theoretical and computational models of human cognition (Gomez, 2002). While the majority of computational and experimental work concerning long-distance phenomena has focused on the sentence level (Misyak & Christiansen, 2007; Misyak, Christiansen, & Tomblin, 2009; Onnis, Christiansen, Chater, & Gomez, 2003; Onnis, Monaghan, Christiansen, & Chater, 2004), the same issues apply at the word level. Specifically, there are important challenges for explaining long-distance patterns in phonotactics, the rules that govern possible sound sequences in a language.

Phonotactic patterns in language have three levels of locality. The most local level is adjacency; patterns apply to sounds that are adjacent. In nasal place assimilation, a coronal nasal takes the place of the following, adjacent stop consonant (e.g., the phrase “run pool” would be pronounced as “rum pool” after nasal place assimilation, but “run open” remains unchanged because the [n] and the [p] are non-adjacent). The second level of locality, referred to in this article as first-order non-locality, is when the segments involved in the pattern need not be adjacent, but they are local on the level of vowel or consonant. For example, in vowel harmony, vowels must agree in feature values with the nearest vowels to the left or right, but these vowels need not be adjacent, as consonants can intervene. The most extreme cases of non-locality in phonotactics, referred to in the present article as second-order non-locality (often referred to as “unbouded”1), occur when a phonotactic pattern can apply across distances that allow for intervening consonants and vowels.

The differences between first-order and second-order phonotactics are most clearly demonstrated in consonant harmony. Consonant harmony is a phonotactic pattern in which consonants must agree in some phonological feature, such as place of articulation. One of the most common varieties of consonant harmony involves sibilant fricatives [s] and [∫] (“sh”). In sibilant harmony, [s] and [∫] alternate such that if a word contains the sound [∫], all preceding (or following) [s] sounds become [∫]. For example, Navajo shows alternations in prefixes between [si-] and [∫i-] (Hansson, 2001; Heinz, 2010; McDonough, 1990; Rose & Walker, 2004; Sapir & Hojier, 1967). The prefix [si-] appears if the stem contains the alveolar fricatives/s, z, ts, ts’, or dz/, as in [site:z] “my car,” while [∫i-] appears if the stem contains the post-alveolar fricatives/∫, t∫ (“ch”), inline image, or dinline image/, as in [∫it∫ii:h] “my nose.”

In first-order long-distance consonant harmony, a vowel can readily intervene between two sibilant consonants (they need not be directly adjacent). In the Navajo example above, [s] and [∫] alternate in the perfective prefix, but the vowel [i] remains constant ([si-] vs. [∫i-]). In second-order long-distance consonant harmony, irrelevant consonants are ignored. For example, the form [∫ite: inline image] “we are lying” involves agreement between the first and last consonants, skipping the middle consonant. The distances between the two relevant consonants can be quite large. Hannsson cites several examples from Chumash, which has a similar sibilant harmony system to Navajo, in which agreement occurs across several sound segments (e.g., [ha∫xintilawa∫] “his former Indian name”), in which nine segments (four of which are vowels) intervene between the two sibilant consonants.

Traditional linguistic models of first-order and second-order non-locality use hierarchical, tier-based structures to explain the differences between local and non-local processes in phonology (Archangeli & Pulleyblank, 1994; Booij, 1984; Clements, 1976, 1977; Goldsmith, 1975). Tier-based models of phonotactics are based on the assumption that linguistic structures are hierarchically organized.2 Tier-based models maintain a theoretical notion of adjacency because different segments of particular feature values are placed onto different representational dimensions (or “tiers”). In Fig. 1, the only sibilant consonants in [∫ite:inline image] are [∫i] and [inline image]. Because sibilants are given their own tier, the agreement of features can be represented locally, despite the fact that the stop consonant [t] intervenes on the surface.

Figure 1.

 Tiered representations.

The hierarchically structured tier-based model posed by theoretical linguists significantly contrasts with traditional computational models of phonotactics that employ a flat-structured representation for segments. The most common computational method for capturing linguistic data uses n-gram models (Jurafsky & Martin, 2008). The basic premise behind an n-gram model of phonotactics is that words are divided into chunks of length n (e.g., length of 2 for bigrams, length of 3 for trigrams, etc.). For example, the word [gredId] “graded” has four trigrams: [gre], [red], [edI], and [dId]. Statistics calculated over these strings determines the restrictions of sound sequences in words. For example, the restriction against words in English beginning with pt is captured by the fact that there are no word-initial pt bigrams. One of the major appeals of n-gram models is that n-gram models do not require abstract and hierarchical information to model phonotactic patterns. Because the majority of phonotactic patterns occur within a set of three adjacent segments, trigrams are sufficient to capture most phonotactic patterns (Albright, 2009; Hayes & Wilson, 2008; Heinz, 2007). In addition, n-gram models, while not intended to be a true model for human cognition, tend to outperform models based on linguistic theory (Chelba & Jelinek, 2000; Jurafsky & Martin, 2008).

It is unclear whether n-gram models are capable of accounting for long-distance patterns without reference to structural elements (Gomez, 2002; Hayes & Wilson, 2008; Heinz, 2007; Onnis et al., 2003). First, modeling long-distance patterns with large n-grams reduces the computational simplicity of the model and creates a data sparsity problem. As n increases, the statistics become less useful and require more powerful computations (Jurafsky & Martin, 2008). Second, there is a question of how n should be chosen. One could store all possible n-grams (up to the length of the longest word in the language) or limit the algorithm to a particular value of n at the risk of missing patterns active at longer distances. This problem of a principle for limiting traditional n-gram statistics leads to a hypothesis that if n-gram statistics bear psychological reality, they are not the sole method for learning and representing data. Further, it leads one to expect that n-gram models must be buttressed by some other mechanism to account for long-distance patterns, as well as the principled nature of phonotactics (e.g., the fact that the majority of phonotactic restrictions are based on phonetic principles).

One possibility is to incorporate a tier-based structure (Archangeli & Pulleyblank, 1994; Booij, 1984; Clements, 1976, 1977; Goldsmith, 1975) into an n-gram model of phonotactics (Hayes & Wilson, 2008). An n-gram model without tiers can describe a first-order long-distance dependency because the number of segments intervening between non-adjacent elements is restricted in most cases. However, n-gram models without tiers cannot accurately account for second-order long-distance dependencies because some consonants (or vowels in the case of vowel harmony) will also have to be erased before an n-gram model can be effective. The addition of tiers into an n-gram model would allow unbounded agreement between sibilants to be modeled with bigrams on a sibilant tier, as well as other principles of phonotactics.3

Another alternative to traditional n-gram models is to model precedence relationships (Hayes & Wilson, 2008; Heinz, 2010; Heinz & Rogers, 2010). In precedence models, restrictions on sound sequences are based on frequencies of pairs of sounds that need not be adjacent but must be in a precedence relation (e.g., ts but not st in trucks). Precedence relations are stated in terms of two segments, but these segments need not be adjacent. For example, the consonants in the form [sokibosu] have the following precedence relations: [s-k], [s-b], [s-s], [k-b], [k-s], [b-s].4 By calculating statistics over the precedence relationships, the model can find the principled representations. For example, over a corpus of items in a sibilant harmony language, the model will find precedence relationships [s-s] and [∫-∫], but not [s-∫] and [∫-s], suggesting a sibilant harmony constraint. Precedence relations do not encode distance, meaning that segments need not be adjacent or separated by featural representations in order to be encoded in the model. This allows the model to represent unbounded phonotactic processes. It is important to note, however, that these alternatives are designed to capture long-distance patterns, not adjacent patterns. As Heinz (2010) notes, the precedence model is designed to supplement a strictly 2-local learner, rather than supplant it.

The present study tests the degree to which learners encode and represent novel long-distance phonotactic patterns. Using an artificial grammar learning paradigm, it is possible to directly study the mechanisms that underlie biases in adult human learning. The test phase of an artificial grammar learning experiment can be used to determine what learners infer from the material that they are exposed to. By testing participants on material that they have not heard before, it is possible to discern the level of generality with which particular patterns are learned (Finley & Badecker, 2009a; Wilson, 2006).

Previous research has shown that adults can learn vowel-harmony (Finley & Badecker, 2008, 2009a, 2009b, in press; Pycha, Nowak, Shin, & Shosted, 2003) and consonant-harmony patterns (Finley, 2011; Wilson, 2003) with a relatively short amount of training. Thus, the first-order long-distance dependency of “skipping” vowels or consonants is not a problem for learners in an artificial grammar learning setting. Finley (unpublished data) showed that second-order non-adjacent dependencies in vowel harmony are more difficult to learn than local counterparts. However, the representations required for long-distance dependencies in vowel harmony tend to be structurally more complex than those required for consonant harmony. In Finley (unpublished data), learners were exposed to a vowel harmony pattern that, in addition to the regular first-order non-adjacency, contained an intervening vowel that was inert to harmony; the harmonic feature passed through this vowel, and the feature value of the suffix vowel was determined non-locally. Participants were only able to learn the non-local pattern after a significant increase in training and tokens of test items. This suggests that long-distance patterns can be difficult for learners when such patterns require complex representations. Consonant harmony, unlike vowel harmony, does not require the same complex representations. The reason for this difference lies in the phonetic features of intervening consonants. Unlike the features of intervening vowels in vowel harmony, the features of intervening consonants in consonant harmony are irrelevant to the specific features involved in harmony, and they can be separated onto different structural tiers (see Fig. 1). In addition, consonants in harmony tend to be underspecified, such that the phonetic implementation can be gleaned via interpolation (Keating, 1988). Because there are fewer features distinguishing vowels from other vowels, intervening vowels in vowel harmony require more complex representations to sort out which vowels participate and which do not.

Finley (2011) compared learning of a first-order long-distance harmony pattern, in which the two relevant consonants were separated only by a vowel (e.g., besosu), with learning a second-order long-distance harmony pattern, in which vowels and consonants separated the two relevant consonants (e.g., sobesu). If a language requires second-order dependencies, it will also require first-order dependencies, but there are languages that require first-order dependencies without requiring second-order dependencies (Hansson, 2001). Learners in Finley (2011) followed the same pattern in their inferences in the artificial consonant-harmony patterns: when exposed to a second-order pattern (e.g., sobesu), learners generalized to a first-order pattern (e.g., besosu), but when exposed to a first-order pattern (e.g., besosu), there was no generalization to a second-order pattern (e.g., sobesu). This suggests that learners prefer a first-order pattern if it can explain all of the data but will infer a general, second-order pattern otherwise.

While Finley (2011) demonstrated long-distance learning of consonant harmony, all of the items in those experiments could be analyzed within a three-segment consonant window (e.g., sobesu as sbs). It is unclear what occurs when the distance between the relevant consonants increases beyond a three-segment window. If learners make use of non-hierarchical, flat representations of the input, one should expect that the learning problem becomes more difficult as the distance between the relevant segments increases. While it is generally understood that n-grams are a rough approximation for human language processing, it is important to understand the ways in which the implications that such models make regarding the nature of the learning problem are appropriate for modeling human inferences in learning.

In the present study, we extend the artificial grammar learning paradigms used in previous harmony learning experiments. Experiment 1 tests whether it is possible for adults to learn long-distance consonant patterns that involve multiple consonant interveners. Experiment 2 tests whether learners exposed to a consonant-harmony pattern that takes place within a three-segment window will be able to extend the pattern to forms that have additional intervening consonants and are therefore outside a three-segment window. We show that adults can learn phonotactic patterns across multiple intervening segments and do not appear to encode exact number of intervening segments in pattern learning.

2. Experiment 1

Experiment 1 tests whether adult English speakers are able to learn a consonant-harmony pattern that takes place across multiple intervening consonants. In this experiment, learners were auditorily exposed to a consonant-harmony pattern in which a suffix alternated between [-su] and [-∫u] depending on whether the first consonant of the word was [s] or [∫]. The form of the suffix was determined by the first vowel of a tri-syllabic stem. This insured that there were two syllables intervening between the triggering consonant and the suffix (e.g., [sobigosu]). Following exposure, participants were tested on familiar and unfamiliar suffixed forms, assessing the level of learning of the consonant-harmony patterns. If learners are able to learn long-distance consonant-harmony patterns outside a three-segment window, learners should be able to learn the consonant-harmony pattern. Further, if learners form a general, second-order long-distance harmony pattern, learners should generalize the harmony pattern to cases that have fewer intervening segments.

2.1. Method

2.1.1. Participants

All participants were adult native English speakers with no knowledge of a consonant-harmony language and had not previously participated in a consonant-harmony learning experiment. Fifty-four University of Rochester undergraduate students and affiliates were paid $10 for participation.

2.1.2. Design

The experiment consisted of a training phase followed immediately by a forced-choice test. All phases of the experiment were presented using PsyScopeX (Cohen, MacWhinney, Flatt, & Provost, 1993). All stimuli were presented auditorily.

Participants in the Critical condition were exposed to a second-order long-distance consonant-harmony language characterized by an alternation between two suffix allomorphs [-su] and [-∫u]; [-su]. The allomorphs were conditioned on the sibilant consonant in a tri-syllabic stem form. Twelve stems were of the form sVCVCV, and 12 stems were of the form ∫V CVCV; [-su] surfaced when the initial consonant was [s], and [-∫u] surfaced when the initial consonant was [∫]. All items were presented in pairs: stem followed by stem + suffix (e.g., [sokubi∼sokubisu], with a 500 ms pause between the stem and the stem + suffix form. The first consonant in each stem was either [s] or [∫], and the additional consonants were stops [p, t, k, b, d, g]. Vowels were drawn from the set [a, i, e, o, u]. There were 24 stem-suffix pairs repeated in a random order five times each. Examples of stimuli can be found in Table 1.

Table 1. 
Examples of exposure and test stimuli
Exp 1TrainingTest
Control∫iboduSee above
Exp 2 OldNewLong Distance
Criticalde∫ugu∼de∫ugu∫udesotisu∼*desoti∫u*be∫odisu∼ be∫odi∫u*∫ukibosu∼∫ukibo∫u
po∫ute∼po∫ute∫u*po∫utesu∼poshute∫ugusepisu∼*gusep i∫usudetusu∼*sudetu∫u
Controlde∫uguSee above

A Control condition was created to assess for biases in the stimuli and/or prior to the experiment. In the Control condition, learners were exposed to the same sVCVCV and ∫VCVCV stems as those heard in the Critical condition. However, participants did not hear the suffixed form, which gave them no access to a harmony pattern. Providing participants with some training makes it possible to give the same instructions (training and test), as the Critical condition (as opposed to a “no-training” Control condition5), and allows us to assess the role of the stem in influencing learners’ responses. It is possible that the sVCVCV and ∫VCVCV nature of the stems could serve to bias learners without actual training on these items. Participants in the Control condition were given identical test items to those in the Critical condition, making it possible to directly compare the Critical and the Control conditions.

Following training, participants were given a two-alternative forced-choice test in which participants chose between two stem + suffix forms. Both alternatives contained the same stem, but the suffix item was either [-su] or [-∫u] (with a 500 ms pause between each stem + suffix option), meaning that only one item would be harmonic (e.g., harmonic [sokebusu] was pit against disharmonic [sokebo∫u]). Test items included Old Items6 (in which the stems were drawn from the training set), New Items (in which the stems were not drawn from the training set, but were of the same sVCVCV and ∫VCVCV form seen at training), and Trigram Items. Trigram Items tested learners’ ability to generalize the consonant-harmony pattern to a different number of intervening segments. The sibilant in Trigram Items always appeared as the second consonant in the stem. Trigram test items were of the form *7 CV∫CVsu vs. CV∫VCV∫u. Given the salience of the fact that all items in the training set began with a sibilant consonant, we expect some degraded performance for Trigram Items. The full stimuli list for Experiment 1 can be found in Appendix 1.

If participants in the Critical condition learned the general harmony pattern, they should select the correct allomorph at a rate significantly greater than participants in the Control condition.

2.1.3. Materials

All stimuli were recorded by an adult female native English speaker in a sound-attenuated booth. While the volunteer was aware that the stimuli would be used in an artificial grammar learning experiment, she was blind to the design or purpose of the study. The speaker produced all vowel sounds without reduction, and stress was placed on the first syllable; stress was consistent across stem and suffixed items.

2.1.4. Procedure

All participants were given written and verbal instructions. Participants were told that they would be listening to words from a language they never heard before, and that their task was to listen to the way the novel language sounded, but that they need not memorize the forms. The training was followed by a forced-choice test with 36 pairs of suffixed items, one item in the pair harmonic and the other item disharmonic (e.g., [*sibopi∫u vs. ∫ibopi∫u]). Participants were told to respond as quickly and accurately as possible. Participants were given a debriefing statement upon completion of the experiment (which took approximately 15 min).

2.2. Results

Proportions of harmonic responses were recorded for each subject in the Critical and the Control conditions, and they can be found in Fig. 2. Participants in the Control condition were compared to participants in the Critical condition via a mixed design anova with alpha set at 0.05. The between-subjects factor was Training, with two levels: Critical and Control. Test Items (Old Items, New Items, Trigram Items) was a within-subjects factor nested under the between-subjects factor Training. All conditions involved between-item comparisons.

Figure 2.

Experiment 1 results: means (proportion harmonic) and standard errors.

There was a significant effect of Training (F(1, 52)=17.21, < .001), in that participants in the Critical condition were more likely to choose the harmonic option than participants in the Control condition (M = 0.69 vs. 0.50, CI ± 0.093). There was a significant interaction (F(2, 104) = 2.98, = .054). This interaction is due to the fact that there was a significant difference between New and Trigram Items for the Critical condition (t(53) = 2.85, = .006), but not the Control condition (t(53) = 1.11, = .28). There was a marginal effect of Test Item (F(2, 104) = 4.62, = .082), which was carried by the fact that there were significant differences between New and Trigram Items (F(1, 52) = 8.72, = .005) and Old and Trigram Items (F(1, 52) = 4.04, = .050).

To directly test whether participants generalized to the Trigram Items, we performed a t test between Critical and Control conditions for the Trigram Items. There was a significant difference between the Critical condition and the Control condition (t(52) = 2.36, = .02), further suggesting that participants extended the harmony pattern to the Trigram Items. There was, however, a significant difference between New and Trigram Items for the Critical condition (t(26)=3.02, = .006). Given that the initial position was relevant to the long-distance items, but not to the Trigram Items, it is likely that some participants may have inferred that the harmony pattern only applied when the initial consonant was a sibilant. However, there are two reasons to believe that a minority of participants made this inference. First, there was significant generalization to Trigram Items compared to the Control condition, suggesting overall generalization to the Trigram Items. Second, of the 12 participants who scored at or below chance on Trigram Items, only three scored above 50% on Old Items, suggesting that these participants failed to learn the harmony pattern. None of these participants showed any preference for suffixes containing /∫/ compared to suffixes containing /s/ (< 1).

2.3. Discussion

Participants in Experiment 1 were able to learn the consonant-harmony pattern at a distance beyond the three-segment window, supporting approaches to long-distance patterns that do not specifically encode distance. Learners were able to extend the pattern to the more local Trigram test items. While participants were less robust at generalization to the Trigram Items, this should be expected given that all of the items in the training set began with a sibilant, but none of the Trigram Items did. What is important is that there was a significant difference between the Critical and Control conditions for the Trigram Items. This suggests that learners did not specifically encode the number of intervening segments between the first and the last consonants.

It is possible that learners only form general representations when a trigram representation is insufficient. If learners use the three-segment window to learn and encode novel long-distance patterns, a learner who was exposed only to input that could be captured by a three-segment window would be unable to generalize the pattern to items that involve a greater number of intervening segments. Further, because the initial consonant was highly relevant in Experiment 1, it is possible that learners were able to encode the long-distance pattern only because of the position of the relevant consonants. These possibilities are tested in Experiment 2; learners were exposed to the long-distance sibilant harmony pattern within the three-segment window, and the initial consonant is not relevant for the harmony pattern. Following exposure, learners were given a set of test items (parallel to Experiment 1) that included more than two segments intervening between the two sibilant consonants. If learners extend the harmony pattern to these “Long-Distance” test items, they have learned a general harmony pattern that does not encode distance.

3. Experiment 2

3.1. Method

3.1.1. Participants

All participants were adult native English speakers with no knowledge of a consonant-harmony language, and they had not previously participated in a consonant-harmony learning experiment. Forty-eight University of Rochester undergraduate students and affiliates were paid $10 for participation. There were 24 participants in each condition.

3.1.2. Design

The design of Experiment 2 was identical to Experiment 1, except that learners were exposed to a long-distance harmony pattern in training that fit within the three-segment consonant window and were tested on generalization to a long-distance pattern that extended the number of intervening segments between the two sibilant consonants. Examples of training and test stimuli can be found in Tables 1.

3.1.3. Materials

The materials were recorded in the same manner as Experiment 1. The full stimuli list for Experiment 2 can be found in Appendix 2.

3.1.4. Procedure

The procedure was identical to Experiment 1.

3.2. Results

Proportions of harmonic responses were recorded for each subject in the Critical and the Control conditions, and they can be found in Fig. 3. Participants in the Control condition were compared to participants in the Critical condition via a mixed design anova with alpha set at 0.05. The between-subjects factor was Training: Critical and Control. Test Item (Old Stems, New Stems, Long Distance) was a within-subjects factor nested under the between-subjects factor Training. All conditions involved between-item comparisons.

Figure 3.

Experiment 2 results: means (proportion harmonic) and standard errors.

There was a significant effect of Training (F(1, 46) = 22.56, < .0001), in that participants in the Critical condition were more likely to choose the harmonic option than participants in the Control condition (= 0.69 vs. 0.45, CI±0.10). There was no effect of Test Item (< 1) and no interaction (F(2, 92) = 1.98, = .14).

To directly test whether participants generalized to long-distance patterns, we performed a t test between Critical and Control conditions for the Long-Distance items. There was a significant difference (t(46)=2.98, = .005), suggesting that learners generalized from the trigram training to a long-distance pattern. Further, there was no difference between New- and Long-Distance test items (< 1).

Because distance and position are confounded in the present study (i.e., the long-distance items always contained a sibilant as the first consonant in the word), we performed several cross-experiment comparisons to insure that the role of distance and position did not differ. We compared Experiment 1 and Experiment 2 for responses to New (< 1) and New Distance (Trigram for Experiment 1 and Long Distance for Experiment 2; < 1) and found no differences. We also compared across Experiments 1 and 2 for the same items. We compared Trigram Items from Experiment 1 with New Items from Experiment 2 and found no differences (< 1), and New Items from Experiment 1 with Long-Distance Items from Experiment 2 and found no differences (< 1). This suggests that position did not directly affect the level of generalization to novel distances.

Of the 24 participants in the Critical condition of Experiment 2, only five participants were below chance for the Long-Distance items, and six were at chance. Of these 11 participants, only three were above 60% for New Items, suggesting that these participants simply failed to learn the harmony pattern. None of these participants showed any preference for suffixes containing /∫/ compared to suffixes containing /s/ (< 1).

3.3. Discussion

Participants in Experiment 2 learned the long-distance consonant-harmony pattern and extended that harmony pattern to cases requiring an even greater distance between the two sibilant consonants. Increasing the number of intervening consonants did not affect learners’ acceptance of harmonic strings at a distance. These results provide further support for a view that learners encode long-distance consonant-harmony patterns without reference to distance, in accordance with a theoretical notion of “unboundedness.”

4. General discussion

This study presented two experiments exhibiting adult humans’ ability to learn long-distance phonotactics at relatively great distances. When exposed to a consonant-harmony pattern in which the first sibilant consonant of a word determined the final sibilant consonant of the word, participants could learn such patterns even when two syllables intervened between the two sibilant consonants. Further, learners extended the consonant-harmony pattern to forms of varying numbers of intervening elements (fewer elements in Experiment 1 and more elements in Experiment 2).

These results provide important information about how phonotactic restrictions are learned and represented. Learners appear to be able to learn long-distance patterns with relative ease. If learners gave privileged status to interactions that can take place at the bigram or trigram level, one might expect that learners would display degraded performance as the number of grams required to represent the pattern increases. However, learners appear to be resilient to increases in the number of interveners. Participants in Experiment 1 and Experiment 2 showed relatively equal levels of learning with mean harmonic responses of 68% for Experiment 1 and 69% for Experiment 2. If learners had access only to trigram representations, not only would participants have failed to learn the long-distance pattern in Experiment 1, they should have showed degraded performance as distance increased, but this did not occur. One interesting pattern in the results is that there seemed to be less robust generalization from long-distance to Trigram Items in Experiment 1 than the reverse generalization in Experiment 2. This is the opposite pattern that would occur if learners only used distance information in encoding phonological patterns. It is unlikely that learners made use of positional information.

The results demonstrate that bigrams and trigrams do not appear to be privileged in learning. This does not mean that learners are never sensitive to bigram and trigram information when they are relevant. Rather, it appears that learners are sensitive to a wider set of statistics. While the results are consistent with an unbounded phonotactic pattern, they are also consistent with a model that encodes up to four consonant n-grams. While it might be possible to show learning at greater distances, it is in general impossible to demonstrate experimentally the learning of truly unbounded patterns. This paper supports the notion that learners are not strictly bound to distance but can generalize beyond the distance that to which are exposed. Note that this notion of unboundedness applies only to second-order long-distance patterns. As Finley (2011) demonstrated, learners of a consonant-harmony pattern that applied without any consonants intervening between the relevant consonants did not generalize to a second-order long-distance pattern with consonants and vowels intervening.

The results of the present study raise the question of how best to measure models of phonotactic learning. While it is often assumed that the model that can capture the most amount of data with the simplest representations is superior, the results of the experiments presented in this paper suggest that if models are to be used as a tool for understanding the workings of the mind, computational models of cognition must take into account how human learners actually behave. Traditional n-gram models may capture a large proportion of human performance with relatively fewer computational constraints, but such models do not capture the intuitions of human learners or their behavior in artificial language learning experiments like this one. On the other hand, hierarchically structured n-gram and precedence models have the power to account for long-distance phonotactic patterns without the sparsity issues encountered in traditional n-gram models. While the present experiment is consistent with the predictions of these models, it does not distinguish between them.

This suggests that goodness of fit and generalization as measures of a model’s success do not provide the most complete measures of a model’s success. In order to evaluate computational models of cognition, experimental and computational approaches must work together. Incorporating experimental results into the implementation of computational models will invariably lead to a better understanding of how the mind is constrained, and how such constraints might be implemented.

While the present study addressed only the consequences for incorporating long-distance dependencies into n-gram models of phonotactics, there are important consequences for models of other areas of language. Because long-distance dependencies can be found at the word level, the morpheme level, and the sentence level, models of a wide range of linguistic processes may be affected by their ability to account for processing long-distance processes. For example, Newport and Aslin (2004) found that adult learners are able to segment words from speech that contained high transitional probabilities for non-adjacent segments (but not from non-adjacent syllables). This biased ability to segment speech at a distance may have important consequences for n-gram models of word segmentation (Cairns, Shillcock, Chater, & Levy, 1997). Future research will work to establish the accuracy of n-gram models in word segmentation, as well as other linguistic areas. For example, word-level trigrams have been used as a basis for learning grammatical categories (St. Clair, Monaghan, & Ramscar, 2009). It would be interesting to understand the extent to which word-level trigram models are affected by long-distance patterns at the sentence level.

5. Conclusions

The present study demonstrated that human adults are able to learn a novel harmony pattern at a distance. Participants were able to learn the harmony pattern even when the pattern fell outside a three-segment, trigram window. These results suggest that learners are not sensitive to distance when learning long-distance patterns and that consonant-harmony patterns are consistent with the notions developed in theoretical linguistics that long-distance patterns need not be bound by arbitrary word lengths.


  • 1

    It is important to note that the idea of “unbounded” in linguistics is a theoretical notion. In theory, a word or sentence may be infinitely long, but in practice, cognitive constraints on attention and memory limit the size of words.

  • 2

    For example, an autosegmental, tier-based analysis of the root and pattern morphology found in Arabic words would break up the word [samam] with a root [sm] and the inflection [a], each on a different tier.

  • 3

    Note that a richer set of structural principles would have to be implemented in order to account for transparent vowels in vowel harmony, particularly in Hungarian, which allows for multiple transparent vowels (Hayes & Londe, 2006).

  • 4

    Heinz (2010) distinguishes between adjacent and non-adjacent precedence relations with ellipses (e.g., [pz] denotes an adjacent relationship (e.g., “pzat”), while [p…z] denotes a non-adjacent relationship (e.g., “pizat”). Without this distinction the precedence model cannot distinguish between first-order and second-order.

  • 5

    Previous research (e.g., Finley and Badecker, 2009a, 2009b) found no differences between “no-training” Control conditions and stem-only Control conditions.

  • 6

    All items were technically “new” for the Control condition, as these participants did not hear any suffixed items. While participants in the Critical condition could respond to Old Items based on familiarity, both harmonic and disharmonic options were equally unfamiliar to participants in the Control condition.

  • 7

    The “*” indicates the disharmonic item.


The author is grateful to Patricia Reeder, Neil Bardhan, Carrie Miller, Kelly Johnston, Lilly Schieber, Anna States, Emily Kasman, and members of the Aslin-Newport lab. In addition, I would like to thank Jeff Heinz and two anonymous reviewers for their helpful comments and suggestions. All errors are my own. This research was supported in part by NIH grant DC000167 to E. Newport, HD37082 to R. Aslin and E. Newport, and NIH Training Grant T32DC00035.


Appendix 1: Experiment 1 stimuli

Training Items

∫edogi, ∫edogi∫usebodi, sebodisu
∫ekobi, ∫ekobi∫usegopi, segopisu
∫epoti, ∫epoti∫usetoko, setokosu
∫ibodu, ∫ibodu∫usidegu, sidegusu
∫igepu, ∫igepu∫usikebu, sikebusu
∫iteku, ∫iteku∫usipiku, sipikusu
∫obude, ∫obude∫usoduge, sodugesu
∫ogupe, ∫ogupe∫usokubi, sokubisu
∫otike, ∫otike∫usopute, soputesu
∫udigo, ∫udigo∫usubedo,subedosu
∫ukebo, ∫ukebo∫usugipo,sugiposu
∫upito, ∫huito∫usutiko, sutikosu

Test Items


Appendix 2: Experiment 2 Stimuli

Training Items

besude, besudesubu∫ego, bu∫ego∫u
bisepu, bisepusudi∫igu, di∫igu∫u
desoti, desotisudi∫oge, di∫oge∫u
gosute, gosutesude∫ugu, de∫ugu∫u
gusiku, gusikusuge∫opi, ge∫opi∫u
kesobi, kesobisugo∫ipe, go∫ipe∫u
kusigo, kusigosuke∫ubi, ke∫ubi∫u
pesobi, pesobisuko∫ide, ko∫ide∫u
piseki, pisekisupo∫ute, po∫ute∫u
pusiko, pusikosupu∫eto, pu∫eto∫u
tisebu, tisebusuti∫oku, ti∫oku∫u
tosude, tosudesutu∫iko, tu∫iko∫u

Test Items

OldNewLong Distance