Non‐adjacent Dependency Learning in Humans and Other Animals

Abstract Learning and processing natural language requires the ability to track syntactic relationships between words and phrases in a sentence, which are often separated by intervening material. These nonadjacent dependencies can be studied using artificial grammar learning paradigms and structured sequence processing tasks. These approaches have been used to demonstrate that human adults, infants and some nonhuman animals are able to detect and learn dependencies between nonadjacent elements within a sequence. However, learning nonadjacent dependencies appears to be more cognitively demanding than detecting dependencies between adjacent elements, and only occurs in certain circumstances. In this review, we discuss different types of nonadjacent dependencies in language and in artificial grammar learning experiments, and how these differences might impact learning. We summarize different types of perceptual cues that facilitate learning, by highlighting the relationship between dependent elements bringing them closer together either physically, attentionally, or perceptually. Finally, we review artificial grammar learning experiments in human adults, infants, and nonhuman animals, and discuss how similarities and differences observed across these groups can provide insights into how language is learned across development and how these language‐related abilities might have evolved.


Introduction
A central feature of syntactic processing is the ability to track structural relationships between words and phrases in a sentence. However, language has a hierarchical structure; hence, syntactic relationships exist not only between adjacent words but also across longer distances, requiring the joint processing of words separated by intervening material. Tracking such nonadjacent dependencies is more cognitively complex than processing the relationships between adjacent words, minimally placing additional demands on working memory. Moreover, the ability to detect nonadjacent dependencies is a crucial prerequisite for learning and processing some more complex syntactic relationships (like center embedded structures). Understanding how we learn to detect and process nonadjacent dependencies, and the ontogenetic and evolutionary origins of these abilities, represents an important challenge in understanding key aspects of language processing, acquisition, and evolution.
A nonadjacent dependency is a relationship between two temporally or spatially separated elements, which cannot be explained simply by the occurrence of multiple pairs of adjacent relationships. For example, in English, nonadjacent dependencies can be observed in tense agreement (e.g., "Is [talk]ing," "Has [talk]ed") or subject-verb agreement (e.g., "the dog [down the street] barks" vs. "the dogs [down the street] bark"). In both of these cases the dependent elements can be separated by intervening material, and one must track the nonadjacent dependency, holding the first element in memory until the appropriate point later in the sequence.
One productive way to study how dependencies are learned and processed is to use artificial grammar learning paradigms and structured sequence processing tasks. These experiments typically test participants' abilities to learn relationships between specific elements in sequences of auditory or visual stimuli (Reber, 1967). Such tasks do not rely on existing semantic or syntactic knowledge and are therefore an important tool with which to isolate and study the cognitive and neurobiological systems that support how specific aspects of syntax may be learned and processed (Petersson, Folia, & Hagoort, 2012). Furthermore, these tasks seem to tap into language-relevant capacities, as evidenced by correlations between nonadjacent dependency learning and natural language processing in adult humans (Misyak, Christiansen, & Tomblin, 2010), and by the fact that nonadjacent dependency learning is impaired in individuals with specific language impairment (Hsu, Tomblin, & Christiansen, 2014) or familial risk of dyslexia (Kerkhoff, De Bree, De Klerk, & Wijnen, 2013). Finally, as these tasks do not require language, they can be used to test preverbal infants, to inform us about language acquisition (e.g., G omez, 2002;Saffran, Aslin, & Newport, 1996), and nonhuman animals, which can provide insights into language origins and evolution (e.g., Fitch & Hauser, 2004;Newport, Hauser, Spaepen, & Aslin, 2004).
In this article, we first characterize different types of nonadjacent dependencies, and how they relate to nonadjacent syntactic relationships in natural languages. We summarizse the evidence for nonadjacent dependency learning in human adults and infants, and nonhuman animals, review the conditions under which learning does and does not occur, and discuss the insights these studies provide into the development and evolution of these abilities.

Characterizing different types of nonadjacent dependency
The simplest form of structural dependency that might appear within a sentence is a syntactic relationship between two adjacent words. In artificial grammar learning tasks, this type of adjacent dependency might be present between two sequentially presented elements "A" and "B" (see Fig. 1i). These adjacent dependencies are present in most artificial grammar learning studies and are rapidly learned by human adults and infants, and nonhuman animals (e.g., Chen & ten Cate, 2015;Fitch & Hauser, 2004;Gebhart, Newport, & Aslin, 2009;Pacton, Sobaco, & Perruchet, 2015;Reber, 1967;Saffran et al., 1996;Wilson, Smith, & Petkov, 2015). By contrast, a nonadjacent dependency represents a relationship between items that are separated by one or more intervening elements in a sequence (Fig. 1ii and iii, e.g., G omez, 2002;. Both adjacent and nonadjacent dependencies might require participants to learn the relationship between the same two elements ("A" and "B" in Fig. 1), yet the cognitive and memory demands might vary in relation to the distance over which this dependency must be processed. Such nonadjacent relationships might appear within a longer sequence (Fig. 1iv), removing the opportunity to rely on the edges of the sequence to identify key dependencies (e.g., Endress, Carden, Versace, & Hauser, 2010;Endress, Nespor, & Mehler, 2009;Grama, Wijnen, & Kerkhoff, 2013). Finally, many artificial grammar learning paradigms are not designed specifically to assess the learning of nonadjacent dependencies, but nevertheless require this ability as a prerequisite to process more complex dependencies. For example, many studies have used a grammar of the form A n B n to assess different forms of hierarchical sequence processing (e.g., center-embedding, crossed dependencies, which are present in some natural languages [Bach, Brown, & Marslen-Wilson, 1986], Fig. 1v, vi), which require the participant to process multiple adjacent and nonadjacent dependencies between "A" and "B" elements (De Vries, Christiansen, & Petersson, 2011;De Vries, Petersson, Geukes, Zwitserlood, & Christiansen, 2012). While these more complex tasks bring additional cognitive demands, all of these tasks require the ability to identify and process nonadjacent dependencies, which represents an important prerequisite to aspects of syntax processing.

A 1 A 2 A 3 B 3 B 2 B 1 v) A n B n Centre embedded
iv) X*AX*BX* X X B X X A Fig. 1. Adjacent and nonadjacent dependencies in several commonly used artificial grammar structures. Adjacent dependencies are shown in blue and nonadjacent dependencies are shown in red. More complex grammars (e.g., A n B n ) go beyond the requirement to learn a single nonadjacent dependency at a time and require several dependencies to be processed simultaneously.
Sequence learning experiments have also manipulated the nature of the relationships between the dependent elements. A simple type of dependency is the relationship between two identical elements, and it is known as an identity relation (see Fig. 2i). This type of identity relation is sometimes observed in natural language. For example, agreement systems in Bantu languages often involve the appearance of an identical prefix on words which agree, as in Swahili, for example, the "ki" prefix in "Ki-kapu ki-dogo ki-mefika" ("the little basket arrived"). It has been suggested that this transparent means of signaling agreement in these languages allows them to sustain unusually complex agreement systems arising from a large number of noun classes (Demuth & Weschler, 2012). In an artificial grammar learning task, a nonadjacent dependency of this type might take the form "AXA," where a dependency exists between two identical sequence elements. Learning and processing this dependency requires only that the initial element be held in memory, and that incoming stimuli be compared to this memory representation.
Bantu languages like Swahili notwithstanding, most languages rarely require us to detect repetitions of the same element within a sentence (as in Fig. 2i), but rather to track dependencies between different words or part-words that share little phonological resemblance, based on learned relationships. As in the previous example of tense agreement,  Fig. 2. Different types of stimulus classes used in nonadjacent dependency learning tasks. Several varieties of nonadjacent dependencies can be assessed in artificial grammar learning studies. These include (i) identity relations between specific elements (as can be seen in certain Bantu languages); (ii) learned relationships between specific elements (as in English tense agreement); (iii) relationships between perceptual classes; (iv) relationships between learned classes (similar to dependencies between syntactic word categories; for example, nouns and verbs). See Endress and Bonatti (2007).
one must learn the relationship between "is" and the suffix "_ing" based on their co-occurrence, before one is able to process this dependency or detect ungrammatical sentences (e.g., . This has been investigated using artificial grammars of the form "AXB" (Fig. 2ii). In such studies the first element predicts a different final element, and the dependent elements are related only by an arbitrary pairwise association, which must first be learned by the participants (e.g., G omez, 2002). Apart from relations between specific linguistic elements, natural languages feature dependencies between syntactic classes or categories of words; for example, subject-verb agreement in English requires subjects and verbs to agree for number: singular subjects must appear with singular verbs, regardless of the specific identity of the nouns and verbs involved. In these cases, it is necessary to both learn which category a specific word belongs to and to also learn the relationships between these categories. This can be assessed using an artificial grammar of the form "{A}X{B}," where {A} and {B} represent two sets of stimuli and therefore any stimulus from set {A} predicts any {B} category stimulus (Fig. 2iii,iv). To limit the requirement that participants must learn the categories associated with many different stimuli, some studies have used sets of stimuli in which perceptual cues denote category membership (see Fig. 2iii). For example, all {A} stimuli might be nonsense syllables containing the vowel "i," while {B} stimuli contain "o" vowels, thus adding a clear perceptual cue to the stimulus categories. A participant would be required to first recognize that there are two (perceptually different) categories of stimuli, and then to recognize and learn the nonadjacent dependency between them (in this case, "_i" syllables predict "_o" syllables; see Fig. 2iii). This approach has been used to assess the learning of center embedded structures, which include nonadjacent dependencies (see Fig. 1v) in humans and nonhuman animals (Bahlmann, Schubotz, & Friederici, 2008;Fitch & Hauser, 2004;Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006). One advantage of this approach is that once participants have learned these relationships, they may be able to generalize to novel stimuli as long as these can also be categorized using the perceptual cue.
Clear operationalized descriptions of specific types of nonadjacent dependencies (Fig. 2) may help us to understand how different types of dependencies are learned and processed across different populations and species. This clarity may be important in understanding the developmental and evolutionary origins of abilities that are critical to language.

Nonadjacent dependency learning in adults, infants, and nonhuman animals
In the previous section, we summarized several different types of nonadjacent dependencies found in natural language and how they have been studied using artificial grammar learning tasks. Next, we review how human adults and infants and nonhuman animals have been tested using these paradigms to better understand how different types of dependencies (Figs. 1 and 2) are learned and how these might vary across development and evolution.

Human adults and infants
Infants as young as 7 months old readily notice dependencies between identical nonadjacent items in a sequence (e.g., "ga po ga" vs. "ga po bi," represented as "AXA" in Fig. 2i) (Gerken, 2006;Gervain & Werker, 2013;Marcus, Vijayan, Rao, & Vishton, 1999). As we discussed above, these identity relations might be relatively easy to recognize because they do not require the participant to learn an arbitrary relationship between two different stimuli (as in Fig. 2ii,iii).
3.1.1. Variability and the detection of predictable dependencies G omez (2002) showed that both adults and 18-month-old infants were able to detect a nonadjacent dependency between the first and last items of a three-element sequence "AXB," where the dependency involved an arbitrary association between items (e.g., initial "pel" predicted final "rud"). However, in both groups learning only occurred when a large number of different nonsense words were presented in the second position in the sequence, making the adjacent dependencies between the first and second, and second and third elements highly unpredictable (Fig. 3). When a smaller number of "X" elements was used, participants appeared to try to learn the (uninformative) adjacent relationships and failed to learn the nonadjacent dependency (e.g., G omez, 2002;G omez & Maye, 2005). This suggests that, while both adults and infants are able to implicitly learn nonadjacent dependencies, in some cases they may fail to do so if more salient adjacent cues are present. However, in some cases both adjacent and nonadjacent dependencies may be learned simultaneously (Romberg & Saffran, 2013b;Wang & Mintz, 2018), at least in some participants (see Milne, Petkov, & Wilson, 2017;Wilson et al., 2015), suggesting that the processes involved in learning these two types of relationships are not inherently antagonistic.

Perceptual similarity between dependent stimuli
A key cue which appears to facilitate the learning of nonadjacent dependencies is perceptual similarity between the "A" and "B" elements that differentiate them from the intervening "X" elements ( Fig. 3; Creel, Newport, & Aslin, 2004;Onnis, Monaghan, Richmond, & Chater, 2005). For example, in G omez (2002), the "A" and "B" elements were represented by monosyllabic nonsense words, and the "X" items were all disyllabic, potentially aiding the learning of the nonadjacent dependency (G omez, 2002). Similarly,  showed that adults could learn dependencies between similar nonadjacent segments (e.g., between acoustically similar consonants separated by vowel sounds) but could not learn nonadjacent dependencies between arbitrarily related syllables (e.g., where initial "ba" predicted final "te"). When no perceptual similarities are present between the "A" and "B" stimuli, adjacent dependencies are easily learned, while there is often no evidence of nonadjacent dependency learning under these conditions (Gebhart et al., 2009;Onnis et al., 2005). Relatedly, the A B X 1 X 2 X 3 A B X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 ...

A X B A X B
Perceptual similarity

A X B A B
Prosodic cues X XAXXXBX AXXXXXB

A X B A B A X B
Starting small

AXBAXBAXBAXBAXB AXB AXB AXB AXB AXB
Pauses between triplet 'words' Fig. 3. Different cues aiding nonadjacent dependency learning. The learning of nonadjacent dependencies can be improved in a number of ways. These include introducing additional variability into the possible intervening elements, thus emphasizing the nonadjacent dependency (e.g., G omez, 2002); adding pauses within streams of stimuli to denote "word" boundaries and the nonadjacent dependencies within them (e.g., Pena et al., 2002); positioning dependent stimuli on the periphery of sequences (e.g., ); initially learning the dependency between adjacent items, before introducing intervening elements (e.g., Lany & G omez, 2008); directing attention toward dependent elements (e.g., Pacton & Perruchet, 2008); using perceptually similar dependent elements (e.g., ; or the addition of prosodic cues that differentiate the dependent elements from the intervening stimuli (e.g., Grama et al., 2016). All of these different cues emphasize the relationships between nonadjacent elements and facilitate learning.
addition of prosodic cues emphasizing the relationship between the "A" and "B" elements aids the learning of nonadjacent dependencies (Grama, Kerkhoff, & Wijnen, 2016). These studies suggest phonological or perceptual similarity acts as a cue facilitating the perceptual grouping of these nonadjacent elements and thus the learning of these dependencies .

Edge effects
An additional cue that might draw attention to nonadjacent elements is positioning them at the start and the end of a sequence, as in "AXB" (Fig. 3). For example, Pena et al. (2002) presented adult participants with a continuous stream of speech stimuli grouped into trisyllabic "words" containing nonadjacent dependencies between the first and last syllables. They showed that these nonadjacent dependencies could only be learned if a brief 25 ms pause was added between each triplet "word," thus emphasizing the word boundaries (Marchetto & Bonatti, 2013;Pena et al., 2002); but also see Onnis et al, 2005). When combined with prosodic information, pauses have also been shown to aid the learning of center embedded structures, which include nonadjacent dependencies (Fig. 1vi), within a speech stream (Mueller, Bahlmann, & Friederici, 2010).  tested the importance of these "edge effects" directly, presenting adult participants with sequences in which elements "A" and "B" occurred either at the edges of a string (e.g., "AXYZB") or within a sequence (e.g., "XAYBZ"). They found that while participants were sensitive to the reversal of the positions of the "A" and "B" elements (e.g., "BXYZA") in either condition, they were only sensitive to the dependency between specific "A" and "B" elements when they occurred at the edges of the sequences. Wang, Zevin, and Mintz (2017) showed that nonadjacent dependencies can be learned even when embedded in a continuous sequence of words, with no pauses or other signaldriven edge cues. They first exposed participants to a continuous natural language stream in which there was a sentence boundary every four words. This entrained subjects to parse subsequent material into periodic subsequences. Seamlessly following this pre-exposure, subjects heard a continuous stream of an artificial language containing nonadjacent "A_B" dependencies. Participants learned the nonadjacent dependency when the "A" and "B" elements appeared within a subsequence, but not if the entrained segmentation placed the dependent elements in different subsequences. Thus, top-down cues (i.e., the entrained rhythm) can facilitate the detection of nonadjacent dependencies, in the absence of surface-level cues such as pauses.

Attentional effects
The presence of pauses between triplets of syllables, dependent elements appearing at peripheral positions within a string, or linguistic entrainment appear to direct attention toward the dependent elements, thus facilitating the learning of nonadjacent dependencies. Pacton and Perruchet (2008) found that when adult participants' attention was actively directed towards either adjacent or nonadjacent elements within a numerical sequence (by asking them to perform mathematical operations on adjacent or nonadjacent pairs of numbers), only dependencies between attended stimuli were learned. When the same task was performed with no attentional requirements, participants implicitly learned relationships between adjacent items but failed to notice the nonadjacent dependencies (Pacton et al., 2015). These studies, along with others (e.g., de Diego-Balaguer, Martinez-Alvarez, & Pons, 2016;Friederici, Mueller, Sehm, & Ragert, 2013;Toro, Sinnett, & Soto-Faraco, 2011), support the notion that attention facilitates the learning of nonadjacent dependencies.

Incremental increases in complexity: Starting small
Nonadjacent dependency learning can be facilitated by initially learning the dependencies between items in adjacent sequence positions, before adding intervening elements. For example, 12-month-old infants were shown to be sensitive to nonadjacent dependencies, but only when they were first exposed to the same "A" and "B" items in adjacent positions in a sequence (Lany & G omez, 2008), before testing with nonadjacent "AXB" sequences. Similarly, hierarchical relationships, which include nonadjacent dependencies (Fig. 1vi), have been shown to be learned better by adult participants when they are initially presented as pairs of adjacent items (Lai & Poletiek, 2011;Rey, Perruchet, & Fagot, 2012). These studies suggest that part of the challenge of learning nonadjacent dependencies might stem not from memory demands of holding the initial element in memory, but from detecting the relationship between temporally separated stimuli in the first place.
Taken together, this research suggests that human adults and infants are able to learn dependencies between nonadjacent elements, but that this learning strongly benefits from additional cues that highlight these elements or the relationships between them. These cues emphasize the dependencies by bringing the nonadjacent elements together-physically, perceptually, or attentionally (see Fig. 3).
While nonadjacent dependency learning is required for language learning and processing, it also applies to nonlinguistic material (e.g., Creel et al., 2004;Endress, 2010;Endress & Wood, 2011;Gebhart et al., 2009;Pacton & Perruchet, 2008), suggesting that this ability might not be domain-specific or restricted to language. In the next section, we will discuss evidence of nonadjacent dependency learning in nonhuman animals, how these abilities relate to those observed in human adults and infants, and how this might inform us about the evolutionary origins of these abilities.

Nonhuman animals
A number of studies have demonstrated that many nonhuman animal species can learn adjacent relationships (for recent reviews, see Santolin & Saffran, 2017;Wilson, Marslen-Wilson, & Petkov, 2017;ten Cate, 2018). However, there is also evidence that some nonhuman animals are able to learn nonadjacent dependencies in certain situations.
Nonadjacent dependency learning based on identity relations across a range of stimuli (i.e., "AXA" in Fig. 2i) has been demonstrated in squirrel monkeys (Ravignani et al., 2013) and chimpanzees (Ravignani & Sonnweber, 2017;Sonnweber et al., 2015). These studies suggest that some animals are able to detect nonadjacent dependencies, at least between identical items. Several studies have used sequences of the form "ABA," in which the first and last element of a sequence are identical (Murphy, Mondragon, & Murphy, 2008;Spierings & ten Cate, 2016). However, these experiments typically assess the ability to differentiate between "ABA" sequences and those of a different form (e.g., "ABB" or "AAB"), which can be identified on the basis of adjacent repetitions and do not provide evidence for learning a nonadjacent dependency.
A number of studies have tested whether nonhuman animals are able to learn nonadjacent dependencies between two different stimuli, using grammars of the form "AXB" (Fig. 2ii). However, as in humans, these studies have produced somewhat mixed results.  conducted a study in tamarin monkeys based on a similar human experiment . The human participants failed to learn nonadjacent dependencies between different syllables but were able to detect dependencies over both vowels and consonants (see above, . By contrast, tamarins were able to learn the dependencies based on syllables and vowels, but not consonants, suggesting that the vowel sounds (including within syllables) might be particularly salient to the monkeys . Recently, nonadjacent dependency learning has been demonstrated in the visual modality in tamarins (Versace et al., 2017) and baboons (Malassis, Rey, & Fagot, 2018), suggesting that these abilities are not limited to the auditory domain. de la Mora and Toro (2013) presented rats with sequences of nonsense words of alternating consonants and vowels of the form "CVCVCV," containing nonadjacent dependencies between either the vowels or the consonants. Rats detected the dependencies in both cases (de la Mora & Toro, 2013, although see Toro & Trobal on, 2005). These studies demonstrate that at least some nonhuman animals appear to be sensitive to these types of nonadjacent dependencies, but also point to potential cross-species differences, including between humans and nonhuman animals, in how they might be learned.
Two recent studies using mixed complexity artificial grammars, which contain both adjacent and nonadjacent dependencies, showed that macaque monkeys were able to learn relationships between adjacent stimuli in the auditory or visual modality, but they found no evidence that they were sensitive to the nonadjacent dependencies (Milne et al., 2017;Wilson et al., 2015). Although humans have been reported to simultaneously learn both adjacent and nonadjacent dependencies within a mixed-complexity grammar (Romberg & Saffran, 2013a;Wang & Mintz, 2018;Wilson et al., 2015) (although not in Milne et al., 2017), it is likely that in these studies the presence of salient adjacent relationships prevented the monkeys from learning these nonadjacent dependencies (as in infants in G omez, 2002). However, a recent EEG experiment (Milne et al., 2016) reported that violations of nonadjacent dependencies evoked similar brain potentials in macaques as had previously been reported in humans using identical stimuli (Mueller et al., 2012). These results suggest that like humans, monkeys might be sensitive to nonadjacent dependencies in some conditions.
Chimpanzees' abilities to learn nonadjacent dependencies between visual stimuli have been assessed using operant training tasks. Some chimpanzees were able to learn nonadjacent dependencies between stimuli at the start and end of a sequence over variable distances (Fig. 1ii) and based on identity relations (Fig. 2i) or arbitrary associations (Fig. 2ii, Sonnweber et al., 2015). Moreover, the presentation of structurally incongruous auditory stimuli appears to interfere with learning in the visual modality (Ravignani & Sonnweber, 2017). These studies suggest that chimpanzees are able to learn nonadjacent dependencies between stimuli at least at the edge of visual sequences. Endress et al. (2010) specifically assessed the salience of these "edge effects," asking whether chimpanzees and humans could learn nonadjacent dependencies between auditory stimuli within a sequence of distracting elements. They showed that while both species were sensitive to positional effects (an "A" or "B" element occurring in an unexpected sequence position), neither humans nor chimps learned the dependency between "A" and "B" elements, embedded within a sequence . Finally, Chen and ten Cate (2017) used a "starting small" approach ( Fig. 3) in which zebra finches first learned the dependency between two adjacent sequence elements (as in Lany & G omez, 2008). They demonstrated that, after training with stimuli of incrementally increasing complexity, the birds were capable of detecting nonadjacent dependencies both over varying distances and at different positions within the sequences.
Taken together, these studies demonstrate that the ability to learn different types of nonadjacent dependencies (Figs. 1 and 2) is not unique to humans but may be shared by some nonhuman animals. In some cases (e.g., nonhuman primates) this might suggest a common evolutionary origin of these abilities, while in other cases (e.g., songbirds) convergent evolution might have led to the independent emergence of impressive learning abilities. Evidence of similarities and differences in sequence learning across different species allows phylogenetic analysis of the evolution of these abilities, and it may further our understanding of the evolutionary origins of cognitive processes that might represent prerequisites for language processing. Moreover, if similar abilities are indeed present across species, this will allow these processes to be studied at a neurobiological level, providing insights into the mechanisms and neural computations that underpin the learning of these types of dependencies, as has been done with adjacent dependencies (Kikuchi et al., 2017;Lu & Vicario, 2014).

Conclusions
The learning of dependencies between adjacent stimuli is a fundamental cognitive ability, widely conserved in the animal kingdom. However, learning and processing nonadjacent dependencies, which is required in language, appears to be more difficult. Learning is aided by the addition of cues which highlight the relationship between the nonadjacent elements, bringing them closer together either physically (e.g., Lany & G omez, 2008), attentionally (e.g., Pacton & Perruchet, 2008), or perceptually, via acoustic similarity , prosodic cues (Grama et al., 2016), pauses between words (Pena et al., 2002), edge effects Wang et al., 2017) or by making the intervening adjacent relationships less salient (G omez, 2002). When sufficient cues are provided, nonadjacent dependency learning has been demonstrated in human adults and infants, and some nonhuman animals, suggesting that in both evolutionary and ontogenetic terms this ability appears to arise before the origins of language. However, some differences in learning are evident between human adults, and nonhuman animals (e.g., de la Mora & Toro, 2013;Wilson et al., 2015), and further research will be required to determine how similarly these groups learn and process different forms of nonadjacent dependencies. Better understanding similarities and differences across these groups will benefit from carefully considering types of stimuli and dependencies involved (Figs. 1 and 2), and the perceptual cues available (Fig. 3). In this way, artificial grammar learning paradigms and sequence processing tasks offer great potential to explore both how language is learned over development and, via comparative studies, how it may have evolved.