Abstract
- Top of page
- Abstract
- Introduction
- Methods
- Source localization analysis
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
Speech scientists have long proposed that formant exaggeration in infant-directed speech plays an important role in language acquisition. This event-related potential (ERP) study investigated neural coding of formant-exaggerated speech in 6–12-month-old infants. Two synthetic /i/ vowels were presented in alternating blocks to test the effects of formant exaggeration. ERP waveform analysis showed significantly enhanced N250 for formant exaggeration, which was more prominent in the right hemisphere than the left. Time-frequency analysis indicated increased neural synchronization for processing formant-exaggerated speech in the delta band at frontal-central-parietal electrode sites as well as in the theta band at frontal-central sites. Minimum norm estimates further revealed a bilateral temporal-parietal-frontal neural network in the infant brain sensitive to formant exaggeration. Collectively, these results provide the first evidence that formant expansion in infant-directed speech enhances neural activities for phonetic encoding and language learning.
Introduction
- Top of page
- Abstract
- Introduction
- Methods
- Source localization analysis
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
Language input is assigned roles of varying importance in acquisition models and theories. The ‘poverty-of-stimulus’ argument asserts that language is unlearnable from the impoverished input data available to children (Chomsky, 1980). In contrast, speech research over the past five decades has established that enriched exposure adaptively guides language acquisition early in life (Höhle, 2009; Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola & Nelson, 2008). For example, when talking to infants, people across cultures tend to use exaggerated pitch, elongated words, and expanded vowel space with stretched formant frequencies (Ferguson, 1964; Fernald, 1992; Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg & Lacerda, 1997). This special speech style undergoes important age-related changes to accommodate the communicative capacity of the developing mind (Amano, Nakatani & Kondo, 2006; Fernald & Morikawa, 1993; Kitamura, Thanavishuth, Burnham & Luksaneeyanawin, 2001; Liu, Tsao & Kuhl, 2009).
The acoustic alterations of infant-directed speech (IDS) purportedly serve vital social and linguistic functions in early learning. Prosodic exaggeration is thought to direct infants’ attention, modulate arousal and affect initially, and later fulfill more specific linguistic purposes such as lexical segmentation (Cooper & Aslin, 1994; Fernald, 1992). Formant exaggeration is phonetically associated with hyperarticulation (Johnson, Flemming & Wright, 1993), which may facilitate language learning by making the critical acoustic distinctions more salient and the phonetic categories more discriminable (Kuhl et al., 1997). Supporting evidence indicates that vowel space in maternal speech is positively correlated with infants’ speech perception – mothers who tended to ‘stretch out’ their vowels had better-performing babies in phonetic discrimination (Liu, Kuhl & Tsao, 2003). Furthermore, computer models have demonstrated robust unsupervised learning of speech sounds based on IDS input (de Boer & Kuhl, 2003; Kirchhoff & Schimmel, 2005; Vallabha, McClelland, Pons, Werker & Amano, 2007). However, little is known about the neurobiological mechanisms that promote learning by exploiting the physical properties of IDS.
Brain research studies offer new insights into speech processing and language acquisition (Dehaene-Lambertz & Gliga, 2004; Kuhl et al., 2008). Several neurophysiological indices have been shown to be associated with IDS compared to adult-directed speech (ADS), including increased frontal cerebral blood flow (Saito, Aoyama, Kondo, Fukumoto, Konishi, Nakamura, Kobayashi & Toshima, 2007), increased frontal electroencephalography (EEG) power (Santesso, Schmidt & Trainor, 2007), and enhanced event-related potentials (ERPs) in the frontal-temporal-parietal recording sites (Zangl & Mills, 2007). The IDS-induced enhancement in neural activity may work jointly with arousal, attention and affect to strengthen auditory memory for phonological, syntactic and semantic categories. However, none of the previous infant studies controlled the acoustic parameters to determine the linguistic effects of formant exaggeration specific to IDS independent of the prosodic/affective effects primarily drawn from fundamental frequency (f0) modifications that are also found in pet-directed speech (Burnham, Kitamura & Vollmer-Conna, 2002).
The present study utilized synthesized stimuli and high-density EEG to investigate neural coding of vowel formant exaggeration in infants. EEG records electrical potential signals from electrodes placed on the scalp. ERPs, which are derived from averaging EEG epochs time-locked to stimulus presentation, provide a direct noninvasive measure of postsynaptic activities with millisecond resolution, suitable for studying the online cortical dynamics of acoustic and linguistic processing (Dehaene-Lambertz & Gliga, 2004; Näätänen & Winkler, 1999). High-density EEG, which records data from 64 or more electrodes, additionally allows for reliable source estimation of high-quality ERP data (Izard, Dehaene-Lambertz & Dehaene, 2008; Johnson, de Haan, Oliver, Smith, Hatzakis, Tucker & Csibra, 2001; Reynolds & Richards, 2005). Furthermore, advancement in EEG time-frequency analysis has opened a new venue for studying the event-related oscillations (EROs) in infants (Csibra, Davis, Spratling & Johnson, 2000; Csibra & Johnson, 2007). EROs reflect time-varying neuronal excitability and discharge synchronization at different rhythms subserving communications between neuronal populations for various attentional, memory and integrative functions (Klimesch, Freunberger, Sauseng & Gruber, 2008). Studies have shown that delta (1–4 Hz) and theta (4–8 Hz) activities, among other EROs, are closely associated with linguistic processing (Radicevic, Vujovic, Jelicic & Sovilj, 2008; Scheeringa, Petersson, Oostenveld, Norris, Hagoort & Bastiaansen, 2009). It remains to be tested how formant-exaggerated speech affects neural activation and synchronization in the infant brain.
The experimental design followed a basic assumption in auditory neuroscience – the average response for repeated presentations of the same stimulus (or multiple instances of the same stimulus category) is equivalent to the neural representation of the stimulus (or the category), which codes its acoustic/perceptual features. An alternating block design with an equal stimulus ratio was adopted for this purpose.1 The design took into account developmental changes in infant ERPs. A number of studies have shown that speech perception and auditory ERPs change dramatically in the first year of life, and adult-like language-specific perception occurs by 6 months of age (e.g. Cheour, Ceponiene, Lehtokoski, Luuk, Allik, Alho & Näätänen, 1998; Kuhl, Williams, Lacerda, Stevens & Lindblom, 1992; Polka & Werker, 1994). The developmental changes in the latency, amplitude, polarity and scalp distribution of ERP responses have led to a better understanding of brain mechanisms that support phonetic processing and language learning. There are two salient auditory ERP components at this age, P150, a positive peak at approximately 150 ms, and N250, a negative peak at approximately 250 ms (Dehaene-Lambertz & Dehaene, 1994; Fellman, Kushnerenko, Mikkola, Ceponiene, Leipälä & Näätänen, 2004; Kushnerenko, Ceponiene, Balan, Fellman, Huotilaine & Näätänen, 2002; Novak, Kurtzberg, Kreuzer & Vaughan, 1989; Rivera-Gaxiola, Silva-Pereyra, Klarman, Garcia-Sierra, Lara-Ayala, Cadena-Salazar & Kuhl, 2007; Zangl & Mills, 2007).
Although an exact neurocognitive model is not available to test how exaggerated speech affects neural processing at the segmental level in infants, developmental studies have provided important details about the neural basis of speech perception early in life (Dehaene-Lambertz & Gliga, 2004; Kuhl et al., 2008). Magnetoencephalography (MEG) data show that phonetic discrimination in infants at 6–12 months of age activates the inferior frontal and superior temporal regions in the left brain (Imada, Zhang, Cheour, Taulu, Ahonen & Kuhl, 2006). Functional magnetic resonance imaging (fMRI) data further reveal that activation for speech stimuli in the Broca’s area can be found even in 3-month-old infants (Dehaene-Lambertz, Hertz-Pannier, Dubois, Mériaux, Roche, Sigman & Dehaene, 2006). The co-activation in Broca’s and Wernecke’s areas is thought to indicate perceptual-motor binding to promote speech learning. Consistent with imaging results, ERP studies suggest that both left and right auditory regions are sensitive to coding acoustic/phonetic features of speech stimuli with striking similarities between infants and adults (Dehaene-Lambertz & Gliga, 2004). There exists limited evidence for left-hemisphere dominance for speech in infants, which may be attributable to a functional asymmetry of the auditory system in processing rapid acoustic transitions versus slow spectral changes (Poeppel, 2003; Zatorre & Belin, 2001). For instance, EEG and near-infrared spectroscopy data from newborn infants show bilateral activation for speech-like acoustic modulations and right-hemisphere dominance for acoustic modulations at a much slower rate (Telkemeyer, Rossi, Koch, Nierhaus, Steinbrink, Poeppel, Obrig & Wartenburger, 2009). The vowel stimuli in the present study did not contain rapid acoustic transition and thus provided an opportunity to test functional asymmetry for spectral processing in infants at 6–12 months of age.
The general hypothesis was that formant exaggeration would induce enhanced neural responses for speech processing in the infant brain. Specifically, the present study examined the effects of formant exaggeration in two ERP components (P150 and N250), two ERO bands (delta and theta), and three broad regions of interest (frontal, temporal/central, parietal) in both hemispheres. There were four closely related questions. First, at what time points, or in what ERP components, did the effect occur? Second, was the hypothesized effect mediated by differences in neural synchronization? Third, what cortical regions were affected? Fourth, did the data support early functional asymmetry for spectral processing of formant exaggeration? Answers to these questions would provide an initial account of the neural mechanisms responsible for the facilitative role of formant exaggeration in speech learning and acquisition.
Discussion
- Top of page
- Abstract
- Introduction
- Methods
- Source localization analysis
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information
Speech scientists have long stressed the importance of formant exaggeration in infant-directed speech for phonetic learning (Burnham et al., 2002; Kuhl et al., 1997). The ERP waveforms (including CSD waveforms), TFRs, and MNE data here provided three lines of evidence in support of this view. Despite striking differences in ERP waveforms due to reference choice, significant enhancement in N250 and sustaining activity following N250 for exaggerated speech was confirmed in all the analyses. The reduced P150 effect, on the other hand, was not consistently found. Unlike N250, the early P150 component presumably reflected acoustic mapping of the spectral differences between the stimuli (Rivera-Gaxiola et al., 2007). This functional distinction was partly supported by the scalp distribution of the components in all three topographical calculations using linked-mastoid reference, average reference, and the reference-free CSD approach. The P150 was dominant in the frontal sites, and the N250 extended posteriorly from frontal to temporal-parietal electrode sites. The timing and scalp distribution of the enhanced negativity in the 200–600 ms window were consistent with the notion that the N250 and sustaining negative responses are linked with phonetic and lexical processing in infants at 6 months of age or older (Mills, Prat, Zangl, Stager, Neville & Werker, 2004; Rivera-Gaxiola et al., 2007; Zangl & Mills, 2007). An alternative interpretation is that the P150 and N250 responses do not necessarily serve the strict bifurcation of auditory vs. linguistic processing. Rather, these two components co-occur and behave similarly in many experimental situations, and may thus reflect connected processes. In line with both of these interpretations, a missing or diminished N250 was found to be associated with lower level of cognitive and linguistic development and diverted central auditory processing (Ceponiene et al., 2002; Fellman et al., 2004; Tonnquist-Uhlen, 1996).
Differential patterns of neural activity for IDS and ADS have been reported in previous infant studies (Saito et al., 2007; Santesso et al., 2007; Zangl & Mills, 2007). Saito and colleagues employed near-infrared spectroscopy in examining neonates’ responses to naturally spoken sentences in the two speech styles. They found that IDS increased frontal activation in neonates, which was mainly attributable to the prosodic exaggeration of IDS and its socio-affective impact. Santesso et al. showed that in 9-month-old infants, the overall frontal activation in terms of EEG power was linearly related to affective intensity of natural sentences spoken in IDS. Zangl and Mills compared ERPs for words spoken in IDS and ADS in 6- and 13-month-old infants and found larger N600–800 responses to IDS than to ADS in both age groups. In the older infants, familiar words additionally showed enlarged N200–400 response to IDS. Given that the IDS stimuli in the previous study were significantly longer in duration and higher in fundamental frequency, maximum pitch, and frequency range than ADS stimuli (Zangl & Mills, 2007), the increased brain activity for IDS would presumably reflect a composite effect of both prosodic and linguistic exaggerations. By controlling acoustic exaggeration other than formants in IDS and ADS, the new ERP data here demonstrated that formant exaggeration alone at the segmental level could produce significant enhancement in neural activation in 6–12-month-old infants, which may serve to strengthen associations between phonetic processing and word learning (Swingley, 2009).
The mechanism for the observed enhancement in N250 and sustaining negativity appears to rely on neural synchronization of evoked EROs time-locked and phase-locked to stimulus presentation. In the literature, the adult theta activity has been linked with arousal/orienting responses and working memory of verbal stimuli (Basar, Basar-Eroglu, Karakas & Schürmann, 1999; Hwang, Jacobs, Geller, Danker, Sekuler & Kahana, 2005; Klimesch, Hanslmayr, Sauseng, Gruber, Brozinsky, Kroll, Yonelinas & Doppelmayr, 2006; Scheeringa et al., 2009; Summerfield & Mangels, 2005). In infants, delta (1–4 Hz) and theta (4–8 Hz) activities are both affected by linguistic processing with increased theta power for affective speech (Orekhova, Stroganova, Posikera & Elam, 2006; Radicevic et al., 2008; Santesso et al., 2007). As the pitch level was controlled in the present study, the observed increases in delta activity at frontal-central-parietal sites, as well as in theta activity at frontal-central sites, could not be due to prosodic processing. Rather, it could be a composite effect of attentional and phonetic encoding processes in response to the acoustically more salient and phonetically more distinct speech (Kuhl et al., 1997). As attention was not controlled in the present study, it remains to be tested whether formant exaggeration alone makes speech more attractive to infant listeners.
The MNE differences between the stimuli revealed a bilateral cortical neural network sensitive to formant exaggeration, including the Broca’s area in the left brain and frontal-temporal-parietal regions in the right. Broca’s activation for speech processing has been reported in imaging studies of infants at 3–12 months of age (Dehaene-Lambertz et al., 2006; Imada et al., 2006), suggesting the existence of early perceptual-motor binding in support of language acquisition. The present MNE data further indicate that formant-exaggerated speech leads to enhanced Broca’s activation, which may drive speech learning via interactions with the perceptual-motor system involving temporal, frontal, and parietal cortices in both hemispheres. It is interesting to note that the infant MNE activation patterns for passive listening to speech show striking resemblance to adult fMRI data during passive listening to music (Lahav, Saltzman & Schlaug, 2007). Adult imaging research has also shown that auditory listening alone can recruit production-related regions including Broca’s area (Love, Haist, Nicol & Swinney, 2006; Meyer, Steinhauer, Alter, Friederici & von Cramon, 2004; Skipper, Nusbaum & Small, 2005; Wilson, Saygin, Sereno & Iacoboni, 2004). The adult data were thought to reflect more general mnemonic and integrative functions for the Broca’s area in making associations between motor actions for sound generation (not just speech) and the acoustic product. However, passive listening to nonsense syllables does not reliably elicit inferior frontal activation in adults (Zhang, Kuhl, Imada, Iverson, Pruitt, Stevens, Kawakatsu, Tohkura & Nemoto, 2009; Zhang et al., 2005). As no motor component of speech is measured for comparison in the present design, it remains purely speculative that passive listening to speech might elicit motor activities in the developing minds to mediate phonological acquisition.
The ERP, CSD and MNE data all indicated greater involvement of the right hemisphere for the N250 effect than the left. This result was consistent with a recent study that showed early functional asymmetry of spectral processing in newborns (Telkemeyer et al., 2009). There is a growing literature relating the right hemisphere with speech processing at the prelexical and paralinguistic levels (e.g. Bristow, Dehaene-Lambertz, Mattout, Soares, Gliga, Baillet & Mangin, 2009; Homae, Watanabe, Nakano, Asakawa & Taga, 2006; Scott & Wise, 2004; Simos, Molfese & Brenden, 1997). However, the laterality result directly contradicted previous findings about left-hemisphere dominance in significantly enhanced N200–400 and N600–800 responses for familiar words spoken in IDS relative to ADS (Zangl & Mills, 2007). The laterality inconsistency can be explained by the functional asymmetry model – spoken words involve fine-scale temporal processing of the rapid acoustic transitions in the left brain, and processing steady spectral cues in simple vowel stimuli primarily depends on the right brain (Poeppel, 2003; Zatorre & Belin, 2001). Nevertheless, this model did not specify the time course of functional asymmetry or the time course of interactions of cortical regions in auditory processing. A simple extrapolation would predict the same pattern of functional asymmetry regardless of the time course of brain activities, which was not supported by the current results. Further research is necessary to determine left/right functional asymmetries at different cortical regions and in different time windows and how asymmetry in brain activation varies as a function of stimulus properties, task variables, and subject characteristics.
The reference-dependent and reference-free approaches in the present study showed similarities as well as striking differences. The ERP research field has yet to adopt one standard solution regarding the choice of reference. Caution must be used in interpreting ERP results with different reference methods (Dien, 1998; Yao et al., 2005). The topographical map for common average reference was similar to the CSD map in terms of the polarity reversal pattern. As all electrical activity produces dipolar fields, measurements from the two sides of the dipolar activity will always be negatively correlated. It is noteworthy that polarity reversal in the temporal-parietal electrodes relative to frontal electrodes could potentially cause problems in channel grouping and interpretation. Compared with common average reference, the linked-mastoid reference appears to produce biophysically unrealistic unipolar voltage fields. Although linked mastoid reference was quite popular in the past, it is recommended that researchers should switch to more progressive approaches by adopting the common average reference in future studies. While the CSD and MNE solutions have the advantages of being reference-free, these methods are highly susceptible to noise influence and thus technically challenging to implement when analyzing individual subjects’ data, especially those of infants where there tends to be more noise.
As children learn to speak only the language(s) that they are exposed to, defining the role of input and the neurobiological mechanisms enabling this feat is central to our understanding of the perceptual and computational processes that adaptively shape both the developing brain and the language outcome. There is cumulative evidence that the acoustic and linguistic modifications in IDS have important functions in the acquisition of phonology and grammar (Burnham et al., 2002; Liu et al., 2003; Morgan & Demuth, 1996; Werker, Pons, Dietrich, Kajikawa, Fais & Amano, 2007). The present results add a neural-level account of how formant exaggeration in speech alters infants’ brain activities for phonetic processing. This account is not without its limitations in explaining the role of formant exaggeration in language acquisition. Research has shown that not all aspects of acoustic exaggeration in IDS necessarily aid speech discriminability or learning (Trainor & Desjardins, 2002). As the distributional and statistical properties of language input are embedded within an interactive social learning environment (Meltzoff, Kuhl, Movellan & Sejnowski, 2009), it seems unlikely that any single property of IDS is indispensable to normal language development.
Of particular interest to theory and practice is that the effects of enriched language exposure, including formant exaggeration, are not limited to infancy. IDS-based input manipulation is conceptualized to be an agent of neural plasticity regardless of age or experience (Zhang et al., 2009). The benefits of various input manipulations have been demonstrated in infants, children, and adults with or without learning disabilities (Bradlow, Kraus & Hayes, 2003; Kuhl et al., 2003; Tallal, 2004; Zhang et al., 2009). Given that early brain measures have predictive power for later language skills (Kuhl et al., 2008; Molfese, 1989), more developmental studies are needed to delineate the role of language input in the social context of language acquisition or effective intervention. In particular, an experimental design focusing on the different spectral and temporal aspects of IDS is necessary to build a better understanding of cortical speech processing and functional asymmetry. Both speech stimuli and nonspeech control can be applied to further investigate the effects of acoustic versus phonetic processing in populations of specific ages and neurological conditions (Dehaene-Lambertz & Gliga, 2004).
In summary, the present study examined the effects of formant exaggeration on cortical speech processing in infants at 6–12 months of age. Despite methodological differences, there was significant enhancement in N250 with right-hemisphere dominance in all reference-dependent and reference-free analysis approaches. Time-frequency analysis indicated increased neural synchronization for processing formant-exaggerated vowel stimuli in the delta band at frontal-central-parietal electrode sites as well as in the theta band at frontal-central sites. Minimum norm estimates further revealed a bilateral cortical neural network (frontal, temporal and parietal regions) in the infant brain sensitive to formant exaggeration, which may facilitate learning via cortical interactions in the perceptual-motor systems. Although there was limited support for the early functional asymmetry for spectral processing of formant exaggeration in the right hemisphere, hemispheric laterality may vary depending on the time course of neural activation.