Cortical speech sound differentiation in the neonatal intensive care unit predicts cognitive and language development in the first 2 years of life


  • Nathalie L Maitre,

    Corresponding author
    1. Division of Neonatology, School of Medicine, Vanderbilt University, Nashville, TN, USA
    • Correspondence to Dr Nathalie Maitre, Division of Neonatology, The Monroe Carell Jr Children's Hospital at Vanderbilt University, 11111 Doctor's Office Tower, 2200 Children's Way, Nashville, TN 37232-9544, USA. E-mail:

    Search for more papers by this author
  • Warren E Lambert,

    1. Department of Biostatistics, School of Medicine, Vanderbilt University, Nashville, TN, USA
    2. Vanderbilt Kennedy Center for Research on Human Development, Vanderbilt University, Nashville, TN, USA
    Search for more papers by this author
  • Judy L Aschner,

    1. Division of Neonatology, School of Medicine, Vanderbilt University, Nashville, TN, USA
    Search for more papers by this author
  • Alexandra P Key

    1. Vanderbilt Kennedy Center for Research on Human Development, Vanderbilt University, Nashville, TN, USA
    2. Department of Hearing and Speech Sciences, School of Medicine, Vanderbilt University, Nashville, TN, USA
    Search for more papers by this author



Neurodevelopmental delay in childhood is common in infants born preterm, but is difficult to predict before infants leave the neonatal intensive care unit (NICU). We hypothesized that event-related potential (ERP) methodology characterizing the cortical differentiation of speech sounds in hospitalized infants would predict cognitive and language outcomes during early childhood.


We conducted a prospective study of 57 infants in NICU (34 male, gestational age at birth 24–40wks), quantifying the amplitude of ERP responses to speech sounds before discharge (median gestational age 37.1wks), followed by standardized neurodevelopmental assessments at 12 months and 24 months. Analyses were performed using ordinary least squares linear regression.


Overall validity of constructs using all ERP variables, as well as sex, maternal education, gestational age, and age at ERP, was good and allowed significant prediction of cognitive and communication outcomes at 12 months and 24 months (R2=22–42%; p<0.05). Quantitative models incorporating specific ERPs, gestational age, and age at ERP explained a large proportion of the variance in cognition and receptive language on the Bayley Scales of Infant Development at 24 months (R2>50%; p<0.05).


This study establishes ERP methodology as a valuable research tool to quantitatively assess cortical function in the NICU and to predict meaningful outcomes in early childhood.


Bayley Scales of Infant Development, 3rd edition


Developmental Assessment of Young Children


Event-related potential


Interquartile range


Neonatal intensive care unit

What this paper adds

  • Efficiency of cortical speech sound differentiation predicts later communication and cognition.
  • Event-related potential methodology allows quantitative measurements of neural processing efficiency.
  • Cortical measures of sound differentiation add to the predictive value of neurodevelopmental outcomes models.

Infants born preterm are at high risk for neurodevelopmental delays, and infants of the youngest gestational age are at the greatest risk.[1] Both nervous system immaturity and adverse events in the neonatal intensive care unit (NICU) may contribute to abnormal neurodevelopment in these children.[2, 3] Use of clinical and imaging tools for predicting outcomes of extremely preterm children can be challenging in sick infants, is often qualitative in nature, and can lack direct correlation with neural function. Although these tools may be useful to clinical practice and parental counseling,[4, 5] they are often limited in their prognostic and practical research value.

To quantify neural processing in infants and children, researchers and clinicians have used event-related potential (ERP) measurements, a type of time- and stimulus-locked electroencephalography. ERP technology characterizes cortical signals in response to stimuli by time and amplitude, providing information on the strength and speed of electrical signals traveling through the brain.[6] Event-related potential measures have provided evidence of stimulus discrimination even in preterm infants.[7]

The use of auditory stimuli to study brain function is well established in the field of child development and is an attractive methodology for use in infants and other vulnerable populations because it is non-invasive and does not require active participation. Auditory signal processing in children depends on both the maturity of the primary auditory cortex and the age of the subject. We have previously shown that the ability of the preterm infant cortex to differentiate between speech sounds is affected by gestational age and post-natal age at testing.[8] The current study builds on established findings in term-born infants and children, in whom ERP measures of information processing are a strong predictor of later cognitive outcomes.[9, 10] We hypothesized that cortical differentiation of sound in the NICU, as measured by differences in the amplitude of predefined ERP responses, predicts cognitive and communication functioning on standard behavioral tests during early childhood. To test this hypothesis, we conducted a prospective study quantifying ERP responses to speech sounds in infants before discharge from the NICU, followed by standardized neurodevelopmental assessments at 12 months and 24 months of chronological age.


Patient population

This prospective study of 57 infants cared for in the NICU at the Monroe Carell Jr Children's Hospital at Vanderbilt University was conducted between January 2009 and January 2012. Infants were recruited according to Vanderbilt Institutional Review Board-approved protocols and informed consent was obtained from all parents.

Gestational age at birth (median 28wks; interquartile range [IQR] 26–34wks) was derived from the best obstetric estimate of gestation, based on the mother's last menstrual period, obstetric measurements, and ultrasound measurements in the first trimester of pregnancy. All infants underwent serial cranial ultrasonography, in accordance with the clinical protocol, during their NICU stay, as well as auditory brainstem response testing by a pediatric audiologist before ERP testing. Event-related potential testing was performed after infants reached 32 weeks gestational age equivalent and a minimum head circumference of 31cm (smallest ERP net size), and when infants were considered clinically stable (median gestational age at testing 37wks; IQR 35–42wks). Maternal education was scored according to a 7-point Likert scale with a score of 1 corresponding to less than seventh grade (public school until 11–12y) and a score of 7 corresponding to graduate studies.[11]

ERP methodology

Event-related potential recordings were performed using previously established methodology.[8, 10] Briefly, the auditory stimuli included six computer-synthesized consonant–vowel syllables (/ba/, /da/, /ga/, /bu/, /du/, /gu/; e.g. /ba/ as in ‘barge’ and /bu/ as in ‘booth’). See supplementary materials (online supporting information) for references and extensive description. A high-density array of 124 electrodes (Geodesic Sensor Net, EGI, Inc., Eugene, OR, USA) was used to record each infant's ERP with online filters of 0.1 to 30Hz, 250Hz sampling rate, and a Cz reference. A subset of these electrodes were analyzed, corresponding to frontal and temporal locations in both hemispheres typically used to document brain activity associated with auditory information processing (Fig. 1a). Each infant was tested in a single-patient room, lying in a bassinet or caregiver's arms, with both ears unobstructed and in the awake or quiet-alert state. No restraint was used beyond the infant swaddling, which is routinely used in the NICU.

Figure 1.

Characteristics of event-related potential (ERP) methodology to measure speech-sound differentiation. (a) Electrode clusters on scalp locations: F3, frontal left; F4, frontal right; T5, temporal left; T6, temporal right. (b) Averaged ERP tracings in response to speech sounds. Auditory stimulus presented at time 0ms. Box represents time window for calculation of mean amplitude. Tracings are averaged over all 57 infants in the study.

The syllables were presented binaurally, in a randomly mixed order, 25 times each (150 trials total). Interstimulus intervals varied between 1600ms and 2600ms to prevent habituation to sound onset. During the entire 15-minute testing session, electroencephalography (EEG) was performed and the infant's behavior was continuously monitored so that stimulus presentation occurred only when the infant was in alignment with the speaker and the EEG was free of motor artifacts.

ERP data analysis method

Data were referenced offline to an average reference. Each trial included a 100ms pre-stimulus baseline and a 900ms post-stimulus interval. Trials with ocular or sucking artifacts were excluded. Final averages were based on a mean of 14.89 (SD 3.78) trials per condition. Mean amplitudes were calculated for each selected electrode location by averaging ERP readings for each speech sound 250 to 400ms after stimulus onset (Fig. 1b), based on previous studies by our group and those of others studying newborn infants and older children.[12] Time windows after stimulus were chosen to reflect typical ERP peaks for term-born infant brain waves in response to sound and have been used to predict outcomes. In preterm infants, mean amplitudes during pre-specified time windows are used as they have less defined peaks.[8] As an example of the results generated using ERP, Figure 1b represents average mean amplitude tracings of all 57 infants in response to the /du/ and /gu/ sounds; the difference in mean amplitudes between the two tracings in the specified time window quantifies cortical sound differentiation. For each pair of speech sounds (/ba/ – /ga/, /da/ – /ga/, /bu/ – /gu/, and /du/ – /gu/) an absolute difference in mean amplitude was calculated. This value corresponds to the strength of differentiation for a specific pair of speech sounds and is reported in microvolts. Resulting amplitude differences for each sound contrast were used in the statistical analyses.

Developmental testing

At scheduled visits to the NICU developmental follow-up clinic (at approximately 12mo chronological age), the Developmental Assessment of Young Children (DAYC), a standardized test of infant and child development, was administered The subscales of cognitive and communication function[13] were administered by trained examiners, with parent questionnaires corroborated by infant observation and challenge. Prematurity-adjusted standardized scores were recorded. At the 24-month clinic visit, patients were tested by trained examiners using the Bayley Scales of Infant Development, 3rd edition (BSID III) subscales for cognition and language.[14]

Statistical analysis

Our primary regression analysis assessed whether predefined ERP measurements have any predictive validity for acquisition of word differentiation and speech communication at 12 and 24 months. To avoid overfitting, all scalp locations were combined for each syllable pair, resulting in the following model. Software included SPSS version 20 (IBM SPSS Statistics, Chicago, IL, USA) and SAS version 9.2 (Proc MIXED; SAS Inc., Cary, NC, USA). Unless noted otherwise, traditional statistical criteria were used (p<0.05, two-tailed).

We used an ordinary least squares linear regression model to define associations of the five developmental outcomes at 12 months and 24 months with ERP measures. There were two sets of predictors: four ERP response measures (four sound contrasts averaged across electrodes) and four covariates (gestational age, sex, maternal education, and age at ERP). The five outcomes included scores for (1) 12-month DAYC communication, (2) 12-month DAYC cognition, (3) 24-month BSID receptive communication, (4) 24-month BSID expressive communication, and (5) 24-month BSID cognitive composite score. To satisfy the American Psychological Association's recommendation to report effect sizes in addition to significance probabilities,[15] we estimated Cohen's d using meta-analytic formulae[16] and judged the size of effects by Cohen's criteria.

To address the potential multiple-testing problem associated with performing 40 significance tests and R2 estimates, we examined the likelihood that our results were the result of chance alone by performing a resampling test. We compared empirically observed results with mathematically generated chance results. A new regression was run on an artificially generated, normally distributed random number, Y, as the outcome. The remainder of the model remained exactly as before (four clinical covariates, four ERP measurements). This model was run 1000 times with 1000 independent random Y outcomes. All five actual R2 values were in the 99th centile of the chance R2 values. The separation of the actual R2 values from the randomly generated R2 values suggests that it is extremely unlikely that the R2 values from the ERP study outcomes resulted from chance.

After validating the model through assessment of the predictive ability of speech-sound differentiation by using response amplitudes of the four sound contrasts averaged across electrodes, we focused on examining effects from discrete hemiscalp locations. We used partial Pearson's correlations between the four ERP sound contrast measurements and five outcomes (cognition at 12mo and 24mo, and expressive and receptive language at 24mo); we adjusted for gestational age and age at ERP, as supported by our previous work.[8] For significant correlations, linear regressions including the ERP data, gestational age, and age at ERP were performed using the developmental outcome as the dependent variable. Analysis was limited to these three factors owing to the small sample size and concerns for overfitting.


Characteristics of study population

We studied 57 infants, 34 male, with a median gestational age of 28 weeks (IQR 26–34wks) and median birthweight of 1035g (IQR 875–2010g). Median age at ERP was 2.1 months after birth (IQR 1.4–2.8mo). The median maternal education score was 4, corresponding to high-school education. Three infants had severe abnormalities on head ultrasound (intraventricular hemorrhage with ventricular dilation, periventricular echodensity) and three failed to pass their auditory brainstem response on one side in the NICU only. At 12 months, the median age-adjusted DAYC scores (n=25) were a cognitive score of 88 (IQR 85–94) and communication score of 0 (IQR 7–10). At 24 months, the median age-adjusted BSID scores (n=30) were a cognitive composite score of 90 (IQR 86, 105), an expressive language scaled score of 88 (IQR 81–94) and a receptive language scaled score of 9 (IQR 7–11).

A possibly confounding subject characteristic is attrition caused by missing follow-up scores in data collection at 12 or 24 months. In order to exercise due diligence, testing for this artifact was carried out. Attrition was defined as having incomplete 12- or 24-month follow-ups. In this test, we used two logistic regressions on ERP scores: F (missing data at 12mo) and F (missing data at 24mo). The ERP scores were those listed in Table 1. In both regressions, the overall model was not significant. The overall logistic regression was non-significant at both 12 (χ2[4, n=57]=2.26, p=0.69) and 24 months (χ2[4, n=57]=1.16; p=0.88). These results suggest that ERPs did not differ among infants for whom follow-up data were not provided, but the small sample does not rule out smaller attrition biases. The regression found no attrition bias in the sense of cases with attrition having distinct ERP results.

Table 1. Prediction of neurodevelopmental outcomes at 12 months and 24 months using event-related potential (ERP) and clinical variables
 Communication: 12moCognition: 12moReceptive language: 24moExpressive language: 24moCognition: 24mo
  1. Effect size is Cohen's d based on a meta-analytic formula for effect size estimation.[16] Cohen's d guidelines are small/medium/large ~0.20/0.50/0.80. Effect size d appears in bold if greater than ‘small’ (0.2). According to Cohen, small/medium/large variances explained are Cohen's f2 = 0.02/0.15/0.35. These translate to 2%/13%/26% for R2. *p<0.05, **p<0.01.

ERP responses
/ba/–/ga/0.00 0.27 0.25 0.060.06
/da/–/ga/ 0.45 * 0.30 0.28 0.180.18
/bu/–/gu/0.12 0.27 0.25 0.36 0.36
/du/–/gu/0.07 0.31 0.29 0.48 * 0.48 *
Sex 0.49 * 0.000.00 0.27 0.27
Gestational age 0.46 * 0.60 * 0.56 ** 0.27 0.27
Education 0.25 0.130.12 0.57 ** 0.57 **
Age at ERP 1.41 ** 0.38 0.35 0.45 * 0.45 *
Model R2 (%)4233222424

Predictive model

This model assessed the validity of our construct in predicting cognitive and communication outcomes using ERP measures of speech-sound differentiation and clinical variables previously shown to contribute to these outcomes (Table 1). The contributions of combined ERP responses on the predictive model varied based on timing and the domain of the assessment.

Communication scores at 12 months were significantly predicted by the covariates and ERP measures, with an overall R2 of 42%. Predictors of the 12-month communication score included sex, gestational age, age at ERP, and ERP differentiation responses, with greater differentiation correlating with better outcomes. The covariates and ERP measures also predicted cognitive ability at 12 months for an overall R2 of 33%. ERP responses contributed 34% and 21% to the R2 of 42% and 33% of the model predicting communication and cognitive scores respectively.

At 24 months, maternal education and ERP responses predicted BSID cognitive composite score with an R2 of 24%. Receptive language score was predicted by the model incorporating sex, maternal education, ERP response, gestational age, and age at ERP with an R2 of 22%. ERP responses contributed 9% and 14% to the R2 for prediction of receptive language and cognition respectively. Finally, expressive language score at 24 months was predicted with an R2 of 49% with a medium effect size of the covariates. This was the only domain in which combined ERP responses contributed less to the R2 of the model (5% of 24%). Therefore, we did not consider this outcome in developing a research tool in the secondary analysis to follow.

These results indicate that ERP responses are significantly associated with long-term cognition and communication outcomes. With regard to effect sizes, mean amplitude of differentiation between /du/ and /gu/ had a large effect size on cognition (0.48) and a moderate effect on receptive language scores at 24 months. Likewise, differentiation between /ba/ and /ga/, as well as between /da/ and /ga/, appeared to have large to moderate size effects on 12-month communication and cognition and 24-month receptive language. Therefore, we included these variables in our secondary analysis and differentiated the individual scalp locations in which responses were recorded.

Scalp location-specific analysis

Correlations between specific speech sound pairs described above and neurodevelopmental assessment scores were calculated for each of the four scalp locations. Only two pairs (/ba/ – /ga/ and /du/ – /gu/) consistently showed significant correlations (Table 2). In particular, the ability to differentiate the /ba/–/ga/ pair correlated with communication scores at both 12 months and 24 months. The ability to differentiate the /du/–/gu/ pair correlated with 24-month cognitive scores. Linear regression incorporating gestational age, age at ERP, and the response to specific sound contrast pairs at discrete scalp locations identified in the preceding correlation analysis was used to model the contributions of these variables to BSID scores in cognitive and language domains at 24 months (Table 3). These models explain more than 50% of the variance in outcomes scores, with ERP variables making a significant contribution in all three models (all p<0.01). In particular, a difference of 0.49μV in an infant's ability to differentiate between /du/ and /gu/ sounds, in both frontal and temporal locations, predicts a 1-point increase in the cognitive composite score on the BSID III. Even more significantly, a difference of 0.39μV in an infant's ability to differentiate between /ba/ and /ga/ sounds, in the left temporal location, predicts a 1-point increase in the receptive score on the BSID III on a scale of 1 to 19.

Table 2. Correlations between specified scalp locations and outcomes at 12 months and 24 months
Outcome measuresLocationSound contrastPartial Pearson correlations (R) p
  1. p is two-tailed, adjusted for gestational age at birth and age at event-related potential. DAYC, Developmental Assessment of Young Children; BSID III, Bayley Scales of Infant Development, 3rd edition.

12-mo assessment (DAYC)
Cognitive scaleLeft frontal/ba/–/ga/0.260.22
Communication scaleLeft frontal/ba/–/ga/0.410.05
24-mo assessment (BSID III)
Cognitive compositeRight frontal/ba/–/ga/−0.210.35
Receptive scoreRight frontal/ba/–/ga/−0.510.009
Cognitive compositeLeft frontal/du/–/gu/0.440.02
Receptive scoreLeft frontal/du/–/gu/0.270.19
Cognitive compositeRight temporal/du/–/gu/0.450.02
Receptive scoreRight temporal/du/–/gu/0.070.70
Table 3. Specific speech-sound differentiation event-related potentials (ERPs) before neonatal intensive care unit discharge are predictive of 24-month outcomes
 Standardized beta p
  1. Significance and standardized beta values for linear regression analyses.

Cognitive composite (R2=0.56; p=0.02)
/du/–/gu/ (left frontal) (mV)0.490.006
Gestational age (wk)0.270.98
Age at ERP (mo)−0.070.09
Cognitive composite (R2=0.52; p=0.04)
/du/–/gu/ (right temporal)0.490.007
Gestational age (wk)0.270.91
Age at ERP (mo)−0.070.45
Receptive language (R2=0.52; p=0.04)
/ba/–/ga/ (right frontal) (mV)0.390.03
Gestational age (wk)0.020.03
Age at ERP (mo)−1.140.46


This study demonstrates, for the first time, that cortical sound differentiation in the NICU, as measured by ERP methodology, predicts cognitive and communication functioning during early childhood. It also validates the use of ERP methodology as a research tool to easily assess brain function in hospitalized infants before their discharge from the NICU.

Studies in school-aged children have demonstrated links between ERP measures of sound processing efficiency and cognitive performance.[9] Work by Fellman et al.[9] has also suggested that the ability of preterm infants to process sounds at 12 months of age is correlated with executive function and speech at 5 years of age.[9] This study made use of mismatch negativity, another widely accepted auditory ERP paradigm in older infants and children. The results of the present study corroborate these findings in infants of much lower gestational age at birth, who occasionally do not exhibit a mismatch negativity response; the results also expand past previous findings by showing that cortical function, as measured by ERP, in infants experiencing extrauterine life during the last trimester of gestation is also predictive of their long-term outcomes.

Our initial analysis examined the clinical question of whether cortical sound differentiation in intensive care neonates was a contributor to neurodevelopment. To address this question, the overall efficiency of neural processing in response to stimuli was quantified as an absolute difference in mean amplitudes averaged across four speech-sound contrasts and four scalp locations. However, neural function at near-term equivalent represents only a singular aspect of brain development in childhood, a period of maximal plasticity. The construction of meaningful predictive models in infants must also acknowledge the importance of clinical and socio-economic variables on trajectories. Here, the choice of these variables was driven by a multifactorial approach to outcomes modeling.

First, the development of cognition can be affected by the degree of brain immaturity at birth, here represented by gestational age,[17] and by maternal education.[18] The latter is often used as a marker of the infant's potential and context, an important consideration since the home environment can affect the development of speech sound differentiation reflected by ERP.[19] Second, the sex of the infant has been implicated in characteristics of communication development in infancy,[20] although its role, as measured by ERP, is difficult to define: studies are split on whether females have a maturational advantage.[21, 22] The fact remains that females are associated with decreased mortality and morbidity in the NICU and therefore this must be accounted for in a study of neurodevelopmental outcomes.[23] Finally, the length of NICU hospitalization is often used as a proxy for the degree of illness of an infant, especially in those suffering from either prematurity or brain injury; it has consequences on the future acquisition of most neural processes and is represented by post-natal age at testing time in our study. Together, these variables constitute a clinical and socio-economic context in which the ERP measures of cortical function in our study play a varying role, depending on the age at follow-up and the neurodevelopmental domain studied. Overall, a 15-minute ERP recording can provide new research data that increases the prediction of childhood neurodevelopmental outcomes by 9% to 34% of the variance in high-risk infants, even if they have no gross neuroimaging abnormalities.

Our second goal was to establish a quantitative relationship between location- and sound-specific ERP amplitudes and standard developmental scores. The secondary analysis allowed us to determine a narrow subset of ERP responses and show that they are good measures of cortical processes and developmental potential. They provide a continuum of neural response amplitudes, which is likely to correspond to the spectrum of brain injury present in infants in the NICU. Although only three infants in our sample had macrostructural lesions visible on conventional imaging, up to 70% of infants in the NICU have microstructural injury evident using more demanding imaging research protocols.[24, 25] In the current study, it is possible that these infants would be the ones with decreased sound differentiation ability and poorer outcomes. Thus, ERP could prove to be a complementary methodology to neuroimaging, especially when ease of use and correlations with brain functionality are factors of interest.

Limitations of this study that address associations between brain maturity, post-natal age in the NICU, and neural processing include the use of age at ERP as a proxy for post-natal sound experience. Moreover, the study was not sufficiently powered to incorporate other clinical variables associated with poor developmental outcomes, such as ventilator duration or infectious episodes. These and other variables could affect the developing brain and contribute to the distribution of ERP measurements in infants in the NICU.

In summary, cortical speech sound differentiation by infants in the NICU, as measured by ERP, predicts cognitive and communication standardized scores during early childhood. The quantitative nature, ease of administration in preverbal populations, and strong correlation with meaningful neurodevelopmental outcomes in childhood establishes ERP methodology as a valuable research tool for testing the effects of noxious or neuroprotective interventions on cortical function in the NICU.


This study was funded by a Turner–Hazinski Award, Vanderbilt Department of Pediatrics, 1R21 ES013730, the Gerber Foundation, Vanderbilt GCRC M01 RR 000095, NICHD P30HD15052 to the, Vanderbilt Kennedy Center and Vanderbilt CTSA grant 1 UL1 RR024975.