Vision Verbs Emerge First in English Acquisition but Touch, not Audition, Follows Second

Words that describe sensory perception give insight into how language mediates human experience, and the acquisition of these words is one way to examine how we learn to categorize and communicate sensation. We examine the differential predictions of the typological prevalence hypothesis and embodiment hypothesis regarding the acquisition of perception verbs. Studies 1 and 2 examine the acquisition trajectories of perception verbs across 12 languages using parent questionnaire responses, while Study 3 examines their relative frequencies in English corpus data. We ﬁnd the vision verbs see and look are acquired ﬁrst, consistent with the typological prevalence hypothesis. However, for children at 12–23 months, touch—not audition—verbs take precedence in terms of their age of acquisition, frequency in child-produced speech, and frequency in child-directed speech, consistent with the embodiment hypothesis. Later at 24–35 months old, frequency rates are observably different and audition begins to align with what has previously been reported in adult English data. It seems the initial orientation to verbalizing touch over audition in child–caregiver interaction is especially related to the control of physically and socially appropriate behaviors. Taken together, the results indicate children’s acquisition of perception verbs arises from the complex interplay of embodiment, language-speciﬁc input, and child-directed socialization routines.


Introduction
Basic perception verbs-such as look, hear, and smell in English-label sensory experiences according to distinct modalities.As the senses provide the basis for our experience of the world, perception verbs have held a particular fascination for language scientists.On the one hand, they encapsulate the universality of our shared human physiology.On the other, cross-linguistic research has revealed substantial diversity across languages in how basic perceptual experiences are lexically represented and referred to (Norcliffe & Majid, 2024;Majid & Levinson, 2011;San Roque et al., 2015;Viberg, 1983).
One theory to emerge from the cross-linguistic study of perception verbs is that a biologically motivated hierarchy of senses constrains perceptual language (Evans & Wilkins, 2000;Viberg, 1983).Viberg (1983Viberg ( , 2001) ) proposed that despite variability across languages, vision verbs are universally linguistically privileged over other perception verbs, followed by hearing, and then the "lower" senses of touch, then taste, and smell.Verbs at the top of the hierarchy are argued to be more frequent, shorter, and more diachronically stable.While later typological studies of perception verbs have not found support for a universally fixed hierarchy of the five senses (Norcliffe & Majid, 2024;San Roque et al., 2015; see also Majid et al., 2018, for property words denoting sensory categories), research does suggest that vision and audition tend to dominate over the other senses in language.For example, see and look are the most frequent basic perception verbs in typical English conversation between adults, followed by hear and listen (Floyd, San Roque, & Majid, 2018;Winter, Perlman, & Majid, 2018), and a similar frequency ranking with respect to vision, audition, and other senses has also been found in a range of other languages (San Roque et al., 2015).
While perception verbs have been the focus of cross-linguistic typological study for many decades, our knowledge of how children acquire and use these key vocabulary items is nascent.A few studies have examined first language acquisition of perception verbs in relation to questions concerning such topics as semantics and polysemy, complementation structures, and pragmatic functions (e.g., Bloom, Rispoli, Gartner, & Hafitz, 1989;Edwards & Goodwin 1985;Johnson, 1999;Landau & Gleitman 1985;San Roque & Schieffelin 2019), and some scholars have suggested the acquisition of perception verbs will follow the sensory hierarchy outlined above (e.g., Bloom et al., 1989; see also Viberg 1994;Gentner & Bowerman 2009): children will learn sight and then hearing verbs before other perception verbs because humans are attuned to the salience of visual and auditory meaning, and supported in this through patterns of language use.This can be referred to as the typological prevalence hypothesis, following Gentner and Bowerman (2009).
However, it is also argued that children's early embodied experience as a whole is crucial for understanding the order in which verbs emerge in child speech, and senses other than vision and audition may be critical earlier in development.This view is based on the idea that the physicality of the body and how it acts in the world is critical to cognition, and this embodied experience underlies conceptual representation and processing (Barsalou, 2008;Pulvermüller, 1999).Accordingly, the unfolding sensory-motor development of children may have implications for language learning too.
Consistent with this, Maouene, Hidaka, and Smith (2008) compiled a set of 101 early learned verbs in English (based on vocabulary questionnaire data) and asked adult English speakers to associate verbs with body parts (e.g., the verb kiss might call to mind the body part mouth).They hypothesized that early learned verbs would show consistent, structured associations to specific body parts, as well as a recognizable learning trajectory, reflecting infant reliance on embodied meaning.Indeed, they found that parents assessed children at younger ages as knowing more verbs relating to actions featuring the mouth (such as kiss), followed by verbs relating to the hands (e.g., take), then eyes (e.g., cry), while ear-related verbs such as hear trailed behind.They concluded this language acquisition pattern is "tantalizing in its similarity to traditional Piagetian (Piaget, 1953) descriptions of the developmental course of sensory-motor development as infants first explore relations in their world" (Maouene et al., 2008(Maouene et al., , p. 1212)): that is, first exploring objects with the mouth and hands, and later moving to other ways of engaging with the world.Thus, within the domain of basic perception verbs specifically, an embodied approach to word meaning would posit touch and taste as frontrunners relative to other senses.Accordingly, the embodiment hypothesis might predict touch and taste verbs are acquired before sight and hearing.
In this paper, we examine these two possibilities-the typological prevalence hypothesis and the embodiment hypothesis.In Study 1 (Section 2), we used a set of parent-report vocabulary production checklists from the MacArthur-Bates Communicative Development Inventory (CDI; Fenson et al., 2007) where parents are asked, for a large set of individual words, whether their child understands and produces the word at a given age.CDIs are able to measure infant vocabulary acquisition on a large scale and are generally considered reliable indicators of infant word learning (Bates, Bretherton, & Snyder, 1988;Dale, 1991;Dale, Bates, Reznick, & Morisset, 1989;Fenson et al., 1994;Styles & Plunkett, 2009).Our analysis of data for English-speaking children indicates acquisition of vision verbs, followed by touch verbs, and then hearing and taste.In Study 2 (Section 3), we tested the cross-linguistic robustness of this ordering by examining available CDI data in a sample of 11 languages from four language families and found a similar pattern of perception verb acquisition.
While the age-of-acquisition findings are suggestive, the CDI data do not include all verbs of interest, and the method is open to response bias, since adult responses may reflect attitudes and expectations about the senses (e.g., vision is most important), rather than accurately reflecting child language use.We therefore complemented this approach for English by examining the rate of perception verb usage in child speech over time (Study 3; Section 4), using naturalistic corpus data from the North American collection of the Child Language Data Exchange System (Childes; MacWhinney, 2000), a publicly available dataset of interactions involving children.This enabled us to include the missing basic perception verbs smell and feel (not present in CDI checklists) and to verify whether the order of acquisition of perception verbs, as measured by parental reports, is in agreement with the relative frequency of perception verbs in actual child speech.We found that vision verbs were produced most frequently, but that touch verbs were used earlier and more frequently than other non-visual perception verbs in under 2-year-olds.At older age ranges, productions of touch verbs were no more frequent than hearing verbs.The child frequency patterns in the earliest age bracket therefore mirror the acquisition ordering found both in English and across languages.In order to gain further understanding of acquisition in context, we also examined verb frequency patterns in child-directed speech in the same dataset and additionally conducted a focused qualitative analysis of perception verb usage in child-caregiver interactions.

Data and methods
We accessed the American English CDI using Wordbank (Frank et al., 2016), a structured database which archives raw CDI data across languages and labs.The data were downloaded on December 20, 2022, using the R *wordbank* package (http://wordbank.stanford.edu/).In total, the dataset consisted of observations for 7,878 children.We did not exclude any children from the analyses; missing observations were treated as NA.
We estimated the age of acquisition of basic perception verbs using vocabulary data.Viberg (1983) identifies basic perception verbs in English: see, look, hear, listen, feel, touch, taste and smell. 1 The words used in our analyses are the six verbs included in the American English CDI (Words and Sentences) surveys: see, look, hear, listen, touch, and taste.The verbs feel and smell are not included in the set of CDI vocabulary and so could not be analyzed.Our analyses were restricted to production data only, for the age range 16-30 months.
We modeled the parental report data of children's word production using Bayesian mixed effects logistic regression.Children's acquisition of perception verbs (1 = produced, 0 = did not produce) was modeled as a function of the fixed effects of Age of child (a continuous predictor, partitioned into 1-month intervals), Modality of verb (Sight, Hearing, Touch, Taste), and the higher order interaction between the two.For the fixed effect of Modality, we collapsed over productions of see and look (such that a child's production of either or both verbs counted as 1) to create a single-level Sight.We did the same for hear and listen to create a single-level Hearing.(See Supporting Information, S1, for a complementary analysis that only considers the set of unambiguously agentive perception verbs, i.e., contrasts look vs. listen vs. touch).The fixed effect of Modality was contrast coded, with Touch treated as the reference level.This allowed us to determine whether Sight and Hearing are acquired earlier than Touch, and, secondarily, to test whether there are differences in the timing of acquisition between Touch and Taste.
In the model, random intercepts were included to account for by-subject variation in verb production.The magnitude of the coefficient of each modality contrast gives an estimate of its independent contribution to words being produced by more children.Interactions between Modality and Age indicate how the effect of Modality is modulated for words learned earlier or later.For example, a positive effect of sight (vs.touch) means that verbs referring to sight are acquired by more children than verbs referring to touch; a negative interaction with Age means that this effect results in higher rates of acquisition for younger children.
We used default priors from the package "brms" for intercept and standard deviation and weakly informative priors on fixed slopes (normal distribution centered at 0 with a standard deviation of 1).As our inference criterion, we observed whether the 95% credible intervals of the posterior distributions for each predictor included zero.Credible intervals that did not contain zero were interpreted as providing strong evidence for the effect of each predictor on the dependent variable.We also report the probability of each effect being above zero.
Markov chain Monte Carlo (MCMC) sampling was performed with 4000 iterations each for four chains (2000 iterations discarded after warmup).Model convergence was indicated by visual inspection of the chains and Rhat values of 1. Posterior predictive checks showed a good fit to the data.

Summary
Children were more likely to acquire touch verbs compared to hearing verbs, consistent with the embodiment hypothesis which holds that early child development privileges touch over hearing.However, they were also more likely to acquire vision verbs compared to touch verbs, consistent with the typological prevalence hypothesis which suggests vision dominates over the non-visual senses overall.Thus, with respect to order of acquisition, there is evidence that vision verbs are learned earliest, but touch precedes hearing and taste.

Data and methods
We next looked beyond English to ascertain whether the same pattern is found crosslinguistically in child language acquisition, again drawing from Wordbank.Other than English varieties, Wordbank currently contains vocabulary data from 29 languages, though these do not have equal vocabulary coverage.To identify languages with the target perception verbs, we used Wordbank's "unilemma" ("universal lemma") coding.Unilemmas are "crosslinguistic mappings from lexical items to single (English) forms that stand for a particular conceptual abstraction" (Frank et al., 2016).We began by selecting languages that had word forms for any of the unilemmas see, look, hear, listen, touch, and taste (n = 25).Of these, 11 languages were missing word forms that corresponded to either of the unilemmas touch or taste and one language was missing any verb for hearing (i.e., both "hear" and "listen" were absent).We excluded these languages, together with one language (Finnish) that was undersampled (only four children in total) and one language (Swedish) that was only sampled every 3 months.We also excluded one variety of French (France), because the language was already represented by Quebecois French (which had a greater number of observations).This left a final sample of 11 languages that had lexical coverage of the four sense modalities targeted by CDI, sight, hearing, touch and taste (see Table 1).Some of these languages do not have separate verb forms for agentive and non-agentive sight or hearing, at least with respect to the concepts sampled in Wordbank; and some have multiple forms listed (Table 1).

Analyses
We modeled the data using Bayesian mixed effects logistic regression, following the same model specifications as Study 1 but now including language and subject as nested random effects to capture by-language and by-subject variability (nested within language).MCMC sampling was performed with 4,000 iterations each for four chains (2,000 iterations discarded after warmup).Visual inspection of the chains and Rhat values of 1 indicated the model converged and posterior predictive checks showed the model fit the data well.

Summary
The cross-linguistic analysis revealed a pattern of acquisition similar to that of English children from Study 1. Across the 11 languages in Study 2, children were more likely to acquire touch verbs compared to hearing verbs and were also more likely to acquire vision verbs compared to touch verbs.We did not find any reliable differences in the timing of acquisition of touch and taste, in contrast to the results from English-speaking children.

Data and methods
For frequency analyses of child and child-directed speech, we used corpus data from the North American collection of the Child Language Data Exchange System, CHILDES (MacWhinney, 2000).This collection includes interactional data from 48 subcorpora, recorded for a wide range of research purposes over a variety of times and contexts (e.g., conversations in the home, lab visits with toy-based play, narrative-focused recordings, etc.).The data were accessed using childes-db (version 2021.1), a database-formatted mirror of CHILDES, using the R *childr* package (Sanchez et al., 2018).The collection contained a total of 10,455,042 word tokens.We excluded tokens coded as unintelligible (n = 241,100), or for which information about the child's age was missing (n = 1,023,125), and that fell outside the age range of 1-5 years old (n = 1,472,728).This resulted in a final dataset of 7,718,089 tokens from 641 children and their parents. 2 Each token was tagged for speaker role, "child" or "parent" (collapsed over mother and father), age of the child when the word was produced, and whether the word was a perception verb or not.Perception verb tokens were coded for their lemma, which consisted of the following categories: see, look, hear, listen, touch, feel, taste, and smell.Any inflected form counted as an instance of the lemma type (e.g., see, seen, saw, seeing, sees are all treated as instances of see). 3 We were also able to include two perception verbs that were not in the CDI, feel and smell.To complement the quantitative analyses, we include a few illustrative examples of conversation between children and caregivers.

Child speech
For an initial overview of the trajectory of perception verb production in child speech, we counted the number of times an instance of a perception verb lemma was produced by each child, with counts grouped into 6-month age bins.These counts were then divided by the total words produced by each child within each age bin and divided by 1,000 to yield normalized frequencies (Fig. 4).Overall, children's production of see and look is more frequent than the non-visual perception verbs for all ages.Not all perception verbs are produced in the earliest age bin (12-18 months): listen, smell, and feel only appear at 18 months.
Visual inspection of Fig. 4 suggests in the first 24 months, touch is the most frequently produced of the non-visual verbs; after 24 months touch productions decrease in frequency.This is clearer in Fig. 5, which zooms in on the mean normalized frequencies of non-visual perception verbs.Within the earliest age band (12-23 months), children produced touch tokens over five times per thousand words on average, whereas hear was produced around one time per thousand words (Fig. 5, top left).This is also evident when collapsed across agentive and non-agentive uses of the verb (Fig. 5, top right).This pattern shifts at 24-35 months, however.The difference between touch and the other non-visual senses is reduced-all non-visual perception verbs are now produced around two times per thousand words or fewer (Fig. 5, bottom left and right).
We fit two separate negative binomial regression models to the frequency data, one for each age band.We specified a negative binomial distribution because the dependent variable was count data (Winter & Buerkner, 2021).The models included an offset term to allow us to model perception verb counts over units of exposure (log of the child's total word count within the age range).This controls for the number of total words and gives a rate at which the target verbs are produced per word in the children's data over the specified age range.Children's rate of perception verb usage was modeled as a function of Modality, collapsed over lemmas, to create a single level for each sensory modality and contrast coded with Touch as the reference level (Supporting Information, S2, considers agentive perception verb lemmas only).In the model, random intercepts were included to account for by-speaker variation in usage rates of perception verbs.To summarize, vision verbs were produced most frequently by children of all ages, while smell and taste verbs were produced less frequently across the board, consistent with the typological prevalence hypothesis.Critically, however, in the earliest productions of children at 12-23 months, reference to touch appeared more frequent than hearing; by 24-35 months this difference disappeared and touch and hearing verbs were produced on par.This could be taken as tentative support for the embodiment hypothesis since touch appears before hearing in children's early language production.The discrepancy between the child frequency data and adult frequency data reported elsewhere (e.g., Floyd et al., 2018;San Roque et al., 2015;Winter et al., 2018) raises the question of whether English child-directed speech recapitulates perception verb production as has been reported in adult speech.Alternatively, it is possible caregivers speak differently in the context of child interaction and refer to touch more frequently than hearing.To disentangle these possibilities, we examined child-directed speech within the same corpus.

Child-directed speech
Following the same procedure as for child speakers (Section 4.2.1),we counted the number of times a parent produced an instance of each of the perception verb lemmas, with counts grouped into 6-month child age bins.These counts were converted to normalized frequencies by dividing the total words produced by each child's parents within each age bin.
Parents' production of see and look far outstripped non-visual perception verbs at all ages, paralleling the child production data.Focusing on the non-visual perception verbs, at 12-23 months parents produced touch most frequently, with touch tokens appearing just under twice per thousand words on average.While this mirrors children's production, the actual rate of touch was higher in the child data (5 times per thousand words).Taste had the next highest mean frequency in the parent speech.Notably, listen and hear were less frequentchildren heard listen at a quarter of the rate of touch on average, and when collapsed across agentive and non-agentive uses, children heard listen/hear verbs at around half the rate of touch/feel verbs.By 24-35 months the situation was different.Parents produced hearing verbs most frequently after vision.Notably, parents' production of touch dropped to half that of the earliest age range.
Applying negative binomial regression to the child-directed speech following the same model specifications as the models of child speech (Section 4.2.1);see Supporting Information, S3, for analysis of agentive perception verbs) revealed the posterior distributions strongly support an overall effect of sensory modality (Sight, Hearing, Touch, Taste, Smell) on rates of perception verb usage talking to children aged 12-23 months.Parents produced reliably more sight verbs compared to touch verbs (log estimate 2.67, SE = 0.09, 95% CI = [2.50,2.86], post > 0 = 100).On the other hand, caregivers produced reliably fewer hearing verbs

Summary
Children's production data showed tentative support for the embodiment hypothesis with young children aged 12-23 months referring to touch more than hearing; this difference disappeared later at 24-35 months.On the other hand, consistent with the typological perspective, vision dominated, while taste and smell appeared infrequently in the child data.The caregiver data were enlightening in this context.Unlike previous studies of English adult-to-adult speech (e.g., Floyd et al., 2018;San Roque et al., 2015;Winter et al., 2018), adults produced more touch verbs than hearing verbs when speaking to children aged between 12 and 23 months, mirroring (or leading) child production patterns as well as age of acquisition data.The dominance of touch relative to hearing in the child-directed speech shifted over time: in the 24-35 month age range, the difference between hearing and touch evened out.This appears to be driven both by a reduction in the frequency of touch as well as an increase in the frequency of hearing.To understand this shift better, we qualitatively examined instances of spontaneous conversation; the primary context of first language acquisition.

Use of perception verbs in context
To understand the differential use of touch and hearing verbs in child-caregiver interactions, we examined contextualized examples of the active perception verbs from the Providence Corpus within CHILDES (Demuth, Culbertson, & Alter, 2006).This subcorpus features six children recorded monthly or bimonthly in naturalistic interactions (usually indoors at home, engaging in a range of activities such as toy-play and make believe, reading texts and examining picture books, food preparation, eating and drinking, looking at photographs, chatting, having altercations, singing or chanting rhymes, among other activities) from around 12 months of age until into their third or fourth years (Demuth et al., 2006).These examples illustrate that perception verb usage is quite varied, suggesting a more nuanced interpretation of the child language data is required.
We have already seen that vision verbs were overwhelmingly the most frequent in both child and child-directed speech.Examples of their usage in context help to illustrate their multiple functions.As we might imagine, adults and children in the recordings commonly use look in imperative or jussive sentences to direct each other's attention in the shared environment (Examples 1-3) (see also Edwards & Goodwin, 1985).As well as commands or encouragements to look (likely a dominant function of this verb with young children), a range of other functions are represented, such as talking about somebody else's visual engagement, requesting that one be allowed to look, and discussing how something or someone looks to the speaker or addressee.In their conversations, adults and children appear to orient to looking as a desirable and prosocial behavior, for example, in the sense that looking facilitates joint attention, shared appraisals, and coordinated projects (such as filling a truck with blocks, admiring a picture, etc.).
Turning to audition, for both parents and children, listening is often verbalized as a pleasurable and interesting activity, for example, in relation to listening to music and stories.In some of the recordings, caregivers and children also treat listening as a sense to be actively discussed and explored.In Example 4, Ethan's mother even explains what listening "is" (here, in relation to the recording/playback device).

Example 4
Mother: Yeah, that's listening.When you put it into your ear, it's listening.
Ethan: Listening.Mother: Listening.But you talk into the microphone.[Ethan 010918] Both parents and children also use listen in relation to listening to human speech; the dominant context for audition verbs in adult-adult conversation more generally (San Roque et al., 2015).One type of this usage is "discourse-oriented" in that listen directs the addressee to attend to upcoming talk (Examples 5, 6).For parents in particular, the recordings show a further emphasis on listening to speech as being "good" and behaving well (cf.a common extension of hearing verbs is to mean "obey"; Sweetser, 1990).Listening can be suggested as a behavior that facilitates the resolution of distressing situations, and not listening is identified as "naughty" or non-cooperative (e.g., when Alex, aged 1 year and 9 months, is told you're being naughty, you're not listening).Functions of listen thus suggest that the growing frequency of audition verbs in adult speech may result from an increased focus on verbal instruction and interaction as children become more competent and metalinguistically aware conversational partners.
In contrast to the plentiful verbal encouragement and positive parent attitude toward looking and listening, acts of touch are overwhelmingly verbalized by adults in contexts of prohibition.Directives of one kind or another against touching in fact account for around 200 of the roughly 330 parent-produced examples of touch in the Providence corpus.Especially for younger children, these instructions are often simply constructed using negatives like don't or no, including expressions that are not standard in adult speech (e.g., no touch), and often involve repetition (Example 7).You can look at it later, okay? [Ethan 001104] Simple negative commands are interspersed with more complex prohibitions (e.g., I'd rather you didn't touch…; didn't I say not to touch…) as well as those that give reasons or specify consequences.Caregivers typically only invite or instruct children to touch within specific contexts, such as action rhymes (e.g., touching one's toes) or with toys and books that have been designed with tactile engagement and textural focus in mind.
Children do, at times, themselves embrace the use of touch prohibitions, such as Don't touch it (e.g., as spoken by Ethan at 1 year and 8 months).In Example 8, a nearly 3-year-old Naima highlights the right to touch her blocks as a privilege that her toy bear, Sleepy (voiced by Naima's mother), is not allowed to share.

Example 8
Child: These blocks are, are part of the sofa, Sleepy.Parent [as Sleepy]: Oh can I jump on them?Child: No. Parent: Oh okay.Sleepy you better come here.

Child:
Only I get to touch them.[Naima 021123] Overall, children's utterances seem to be less dominated by prohibitions than those of their parents.Only around one fifth of the 57 touch tokens produced by children were clear directives against touching, while the rest (barring five unclear instances) exemplified a variety of contexts such as narrating; requesting to touch-potentially the "flipside" of a prohibition; or just contemplating touch, for example, as a possible form of engagement (Example 9).Here, Lily and her mother are discussing snail slime.In a recording made a few months later, looking together at pictures, Lily (at 2 years, 11 months, 25 days) responds to pictures of an urchin, lobster, crab, and dolphin by immediately identifying whether or not these are things that she likes to touch, suggesting this is a salient property for her and that, at least for a brief period, she may have routinized touch as a "go-to" sense in certain contexts.Lily also identifies a characteristic of touch that is rarely spontaneously verbalized by adults in the recordings: its importance to comfort, as where she comments on needing to touch blankie after hurting herself (age 2 years, 11 months, 6 days).Touch prohibitions themselves may also be the topic of extended discussion and explanations.For example, in one conversation Lily asks why one is not allowed to touch a fragile sculpture and her mother explains, we can only look at it but not touch it [Lily 030121].
Summing up, the ways children and adults use the verbs look and listen suggest an overall positive and prosocial attitude toward these activities.In addition, listen is used in relation to attending to speech, showing cooperative behavior, and participating in a conversation.At least for parents, however, touch is strongly identified as an activity that must be carefully regulated and is fraught with risk.It may be encouraged in controlled contexts but is most commonly verbalized in prohibitions such as don't touch.Children also use touch prohibitions and orient toward touch as an acceptable or unacceptable behavior, but in this corpus their contextual range for touch appears potentially broader and more exploratory than that of their caregiving adults.

Discussion
At the outset, we asked if young children's perception verb acquisition is swayed by early experience of touch and taste, as suggested by the embodiment hypothesis, or more closely aligns with the typological prevalence hypothesis that has identified vision and hearing as dominant cross-linguistically.We used parent questionnaire data to analyze perception verb learning in English and other languages, and naturalistic corpus data to further examine frequencies and functions of English perception verbs in children's and child-directed speech.
In Study 1, we found that adult assessments of children's production vocabulary in English put see and look as clear front-runners in the acquisition of basic perception verbs, with touch next, well ahead of hear and taste (note, feel and smell were not in this dataset).Study 2 established a similar pattern in 11 other languages: assessments of children's production vocabulary indicated vision verbs preceded all other sensory verbs, and touch verbs preceded audition.However, no reliable difference was found between the relative ordering of touch and taste terms.Study 3 focused on perception verbs in child speech and child-directed speech in naturalistic conversations.The usage patterns painted a similar picture: at the earliest ages, 12-23 months, vision verbs were dominant in both child and child-directed speech, and touch verbs were more frequent than hearing, taste, and smell.By 24 months, the frequency of touch verbs diminished in both children's speech and child-directed speech.
The dominance of vision observed across all three studies echoes a robust finding in the literature: across languages, vision verbs are linguistically privileged over the other senses.This has been shown with respect to their usage frequencies (Floyd et al., 2018;San Roque et al., 2015;Winter et al., 2018), as well as other dimensions of linguistic expression, for example, their relative morphological simplicity and their tendency to be lexically differentiated from other senses (Norcliffe & Majid, 2024, Viberg, 1983).For some, vision sits at the top of a biologically motivated hierarchy of senses, with audition accorded second place, followed by touch, taste, and smell (Viberg, 1983).While later work does not find support for a fixed hierarchy of senses across languages (Majid et al, 2018;Norcliffe & Majid, 2024;San Roque et al., 2015), the available evidence does suggest that verbs of hearing tend to dominate over non-visual perception verbs in usage in English and typically (although not universally) across languages (Floyd et al., 2018;San Roque et al., 2015).Strikingly, our study reveals the dominance of audition over other non-visual senses in adult usage frequencies is not mirrored in the relative order of acquisition of perception verbs or in speech by or to children: in English (Study 1) and across other languages (Study 2), we found that touch verbs are learned earlier than hearing verbs.Touch verbs were also produced more than hearing verbs by children and by parents speaking to children in the first 24 months (Study 3).
This robust touch-before-hearing finding thus potentially supports Maouene and colleagues' (2008) identification of tactile experience as highly relevant to early acquired verbs (cf.also Freeborn, 2022), consistent with embodied theories of word meaning (Pulvermüller, 1999;Barsalou, 2008).However, the same perspective would also predict the comparatively early learning of taste verbs, due to the high salience of "mouthy" experiences in early development.The English data do not support this, although the cross-linguistic questionnaire data (Study 2) do keep open the possibility that taste verbs may, like touch verbs, be acquired early in relation to audition verbs in some languages.This possibility requires further in-depth study.
Moreover, a deeper fact about the use of touch in child-directed speech suggests an alternative motivation for the use of touch verbs.Adults' increased use of the word touch appears likely to be reflective of its importance in behavior regulation, especially in prohibitions against touching, rather than a focus on tactile sensation per se.Within our society, 1-yearolds are likely to be moving around independently in the home (e.g., as opposed to being carried on the body for hours) but are viewed as vulnerable, unpredictable, and in frequent need of explicit and repeated direction concerning their physical behavior.Looking is safe, but touching is dangerous.There is less evidence that the comparatively high frequency of touch verbs in children's speech (as opposed to child-directed speech) also stems primarily from prohibitions.This disparity across adults and children makes sense from our expectation of social roles (e.g., adults monitor their children's safety and tell them what to do or not do) but leaves open for further exploration what motivates children to use a verb like touch themselves.Parent concerns about tactile contact as unsafe or as a social infraction may contribute to early learning of the word touch (via frequency effects) but do not necessarily determine how children apply it.Further cross-cultural research on child and child-directed speech patterns could help to disentangle motivating factors.
As English-speaking children move into their third year, they and their caregivers reduce the relative frequency of touch terms.For this age range, adults simultaneously increase their use of audition terms in child-directed speech, and thus start to align more closely with patterns of adult discourse.Reference to audition typically relates to the perception of speech across languages (cf.Buck, 1949;Evans & Wilkins 2000;San Roque et al. 2015;Sweetser, 1990).Children and caregivers also exhibit these speech-oriented uses of listen, and adults in particular tend to present listening to others as cooperative and prosocial behavior.It seems reasonable, then, that the observed shift in child-directed speech comes about through adults' declining focus on prohibiting physical touch and increasing emphasis on promoting attention to aural stimuli, specifically, spoken language.This possibility could be explored by more fine-grained examination of perceptual language in context.
To conclude, numerous studies of adult language use across cultures consistently find that see and look are the most frequent basic perception verbs followed by hear and listen, with touch, taste, and smell verbs in contrast appearing relatively infrequently (Floyd et al., 2018;San Roque et al., 2015;Winter et al., 2018).Accordingly, the typological prevalence hypothesis would predict that children would acquire and produce these verbs in the same order.Analysis of English CDI (Study 1) and corpus data (Study 3), as well as CDI data from eleven additional languages (Study 2), shows important discrepancies from the typological prevalence hypothesis.Although vision verbs do appear first and more frequently in young children's language production, touch appears before hearing.This is aligned with embodied accounts of meaning which predict language use is related to specific bodily experiences.

Open Materials Badge
This article has earned Open Materials badges.Materials is available at https://osf.io/mdtb4/.Notes 1 Viberg (1983) also includes sound as a basic audition verb (e.g., that sounds scratchy), the only verb in the group that is purely "copulative," that is, the subject is almost invariably the percept itself.We exclude this from our study and focus only on those that can commonly occur as "activity" or "experiencer" verbs (i.e., the subject of the verb is potentially the perceiver). 2 We treated all instances produced by parents in the corpus as "child-directed" in relation to the target child, who is usually present throughout the entire recording session.For

Fig. 1 .
Fig. 1.Proportion of children producing each of the six basic perception verbs over time.

Fig. 2 .
Fig. 2. Posterior distribution of logistic regression coefficients predicting whether English-speaking infants 16-30 months have acquired perception verbs (Touch is the reference level).
Fig. 3. Posterior distribution of logistic regression coefficients predicting whether infants across languages between 16 and 30 months have acquired perception verbs (Touch is the reference level).

LFig. 4 .
Fig. 4. Mean normalized perception verb frequencies in child speech as a function of age (category values are stacked).

Fig. 5 .
Fig. 5. Mean normalized frequencies of non-visual perception verbs by lemma (left) and collapsed across sensory modality (right) in child speech between the ages of 12-23 months (top) and 24-35 months (bottom).Error bars indicate bootstrapped 95% confidence intervals across words in each category.

Fig. 6 .
Fig.6.Posterior distribution of negative binomial regression coefficients predicting differences in the rates of perception verb usage in child speech between 12 and 23 months (Touch is the reference level).

Fig. 7 .
Fig. 7. Posterior distribution of negative binomial regression coefficients predicting differences in the rates of usage of perception verbs between 24 and 35 months (Touch is the reference level).
Listen did you tell Manuela where we're gonna go later?Mommy.Mother: What?Naima: Nursie Daddy.Mother: Nursie Daddy, that's another joke![Naima 010604] It's a camera sweetie.No no no no touch.No touch.
's a little yucky but it's neat too.Child: Yeah.Parent: Yeah.Child: You could touch some.Parent: Well you could touch some that's true I guess.[Lily 020728]

Table 1
List of the word forms corresponding to the target unilemmas in each language