Automated cot-side tracking of functional brain age in preterm infants with routine EEG monitoring

A major challenge in the care of preterm infants is the early identification of compromised neurological development. While several measures are routinely used to track anatomical growth, there is a striking lack of reliable and objective tools for tracking maturation of early brain function; a cornerstone of lifelong neurological health. We present a cot-side method for measuring the functional maturity of the newborn brain based on routinely-available neurological monitoring with electroencephalography (EEG). We used a dataset EEG recordings from 65 infants to train a multivariable prediction of functional brain age (FBA) from EEG. Using machine learning on traditional and recently-developed computational EEG measures yielded an FBA that correlated strongly with the postmenstrual age of an infant. Moreover, individual babies follow well-defined individual trajectories. We validated the FBA predictor on independent data from a different site with different recording configuration. In a subgroup of infants with repeated EEG recordings, a persistently negative predicted age difference was associated with poor neurodevelopmental outcome. The FBA enables the tracking of functional neurodevelopment in preterm infants. Functional age assessment can be used to assist clinical management and identify infants who will benefit most from early intervention.


Introduction
Preterm birth is a substantial risk to infant health. While mortality rates have dropped considerably over recent years due to improvements in clinical care, these infants remain at significant risk of neurodevelopmental delay and a host of other chronic impairments in later life (1)(2)(3). It is therefore of critical importance to reduce the exposure of the preterm infant to neurological adversities while in the neonatal intensive care unit (NICU), and to identify those infants who will benefit most from early intervention (4). Recent advances in neurological care have stressed the need for improving early functional biomarkers of neurodevelopment to expedite cycles within clinical intervention trials.
Monitoring physiological and anatomical growth is crucial for clinicians when optimizing the care of very or extremely preterm infants. Critical time periods for the direction of care are usually the first days after birth, the time of discharge from tertiary care to step-down units, as well as the follow-up visit at near term-equivalent age. The EEG is the best available tool for cot-side assessment of brain function, and it is widely used for early therapeutic decisions as well as the prediction of neurodevelopmental outcome in preterm infants (5)(6)(7)(8)(9). However, the clinical use of EEG in the NICU is complicated by difficulties in interpretation and the availability of expertise to perform interpretation (10). Computer-assisted analysis presents an opportunity to solve both problems by providing simplified EEG measures that can be interpreted by clinical staff, on demand and in real time.
Assessing brain maturity via the visual interpretation of the EEG during an infant's stay in the NICU has been a part of clinical practice for decades (11). Its use complements traditional anatomical measures such as weight, length, and head circumference. A lag between estimated functional brain age (FBA) and the chronological age of the individualthe predicted age difference (PAD)holds potential as a functional biomarker. It is crucial that the FBA is highly correlated with chronological age to ensure that claims of 'dysmaturity' are valid and not simply alternate manifestations of dysfunction or pathology. The concept of measuring PAD has recently attracted interest for the diagnosis of a wide variety of neurological pathologies over a wide range of ages (12). We have previously shown that it is possible to construct computational measures that emulate many visually observed EEG phenomena such as interburst interval, synchrony, discontinuity, and frequency content. The combination of these computational measures can predict the EEG maturation of the brain with a high degree of accuracy (13,14).
While 'computerizing' visual classification of EEG phenomena is a useful initial stage, it suffers from two shortcomings. First, it is limited by the visual interpretation of EEG characteristics and, therefore, lacks a robust physiological, physical, and technical foundation.
Second, it is inherently susceptible to a myriad of confounders that arise from the relationship between conventionally-recognized EEG phenomenology and technical recording traditions.
Recent advances in computational neuroscience suggest that the EEG contains markers of brain function that are not readily discernible by visual EEG review (15)(16)(17)(18)(19). Key information lies within the widespread network of intermittent bursting that dominates early cortical activity (20,21). This developmentally unique activity is known to be crucial for supporting neuronal growth and guiding early brain wiring (22,23). It changes rapidly over the last trimester, is sensitive to exogenous disturbances, and is predictive of future neurodevelopment (24)(25)(26).
We recently showed that bursting activity possesses the statistical properties of crackling noise (15,27), a form of complex, non-Gaussian noise that occurs when there is a critical balance between amplifying (excitatory) and dissipating (inhibitory) influences within a system (brain) (28). Measures of this noise have been shown to be able to detect neurological challenge in preterm and term neonates (25,29,30). Here, we applied signal analysis methods derived from the analysis of crackling noise to the assessment of early brain maturation, alongside other EEG metrics. Initially, we assessed the accuracy of single variable models to predict PMA. We then built a multivariable model of PMA, trained using machine learning techniques, to yield an estimate of FBA. We validated the subsequent FBA on an independent dataset to address issues of reproducibility and robustness of the method to other EEG recording settings and environments. Finally, we evaluated the applicability of the difference between FBA and PMA (a predicted age difference) as a predictor of neurodevelopmental outcome in a subgroup of infants with multiple, serial recordings.

Results
The aim of this study was to determine the efficacy of automated EEG analysis for the assessment of PMA in preterm infants. This study employed two different datasets of serial EEG recordings of preterm infants recorded from NICUs in different countries. The first dataset (recorded in Vienna: 65 infants, 177 EEG recordings) was used to train and evaluate the FBA measure, as well as investigate the use of FBA as a predictor of neurodevelopmental outcome (Fig. 1). The second dataset (recorded in Utrecht: 42 infants, 99 EEG recordings) was used to validate the FBA measure trained on the first dataset. Infants were born before 29 weeks 5 gestation, with EEG recorded serially at 25-39 weeks PMA (Vienna) or 25-34 weeks PMA (Utrecht). We used machine learning techniques to form an estimate of FBA using quantitative EEG (qEEG) variables that can be grouped into three categories: phenomenological analysis, burst analysis, and other recently-developed analyses ( Fig. S1 and Table S1).

Figure 1:
Data acquisition, training, evaluation and testing of the FBA. The histograms depict the distribution of EEG recordings with PMA (in weeks) in each dataset. The bottom row illustrates the analyses corresponding to each dataset. PAD is the predicted age difference between functional brain age (FBA) and post-menstrual age (PMA).

PMA prediction using a single variable FBA
Across all metrics tested, the qEEG variable that had the highest correlation with PMA was the asymmetry of average burst shape ( Fig. 2A), which exhibits a strong linear relationship with bursts becoming more symmetric with increasing PMA (Fig. 2B). Several additional qEEG variables were strongly associated with PMA (see Table S1 of the supplementary Information). Metrics that were not reliably predictive of PMA were varied in nature and included several relative band powers and measures of burst duration.

PMA prediction using the multivariable FBA
Combining several qEEG variables into a multivariable model improves the prediction accuracy of the FBA (Table 1). Assessed within a leave-one-out cross-validation, the multivariable FBA model had a significantly higher correlation with PMA than a single variable model based on the single best variable (asymmetry of the burst shape) for models based on bursts, phenomenological, and other newly proposed qEEG variables (r = 0.109, 95% CI: 0.059 to 0.162; r = 0.095, 95% CI: 0.045 to 0.150; r = 0.094, 95% CI: 0.057 to 0.142; n = 177, respectively). The multivariable model using burst qEEG variables also had a significantly higher correlation with PMA than multivariable models based on phenomenological or other newly proposed qEEG variables (r = 0.030, 95% CI: 0.001 to 0.062; r = 0.027, 95% CI: 0.005 to 0.049; n = 177, respectively).
Incorporating qEEG variables into a multivariable model, via a variable selection procedure, further improved the accuracy of the FBA ( Table 1). The FBA estimator identified PMA to within 2 weeks for 90% of recordings, with a median absolute error of 0.7 weeks. A scatter plot of FBA versus PMA exhibits a clear linear trend (Fig. 3A), with a tight clustering of FBA within ±2 weeks of the PMA. The performance of this FBA, which contained a mixture of burst, phenomenological, and other recently-developed EEG variables, was significantly higher than multivariable models based on only phenomenological analysis, burst analysis or other analyses alone (r = 0.045, 95%CI: 0.020 to 0.073; r = 0.015, 95%CI: 0.002 to 0.028; and r = 0.042, 95%CI: 0.024 to 0.063, respectively; n = 177). Variable selection resulted in a median of 53 variables (IQR; 49-55; n = 65 folds). In general, the most commonly selected variables were burst shape (asymmetry, sharpness), burst number, relative spectral power, activity synchrony index, path length, multi-scale entropy, EEG amplitude envelope, and interburst intervalsee the Supplementary Information for a complete list of variables and their selection frequencies (Table S1 and Fig. S2). Table 1: The performance of several multivariable FBA models for predicting PMA in preterm infants on training (cross-validation) and validation datasets. r is the correlation coefficient, n is the number of recordings included in analysis, m is the number of qEEG variables used in the model (for variable selection this is the median number across folds of the cross-validation), 95% CI is the 95 th percentile of the confidence interval, IQR is inter-quartile range.

Validation of FBA on an independent dataset
To directly address the generalizability of our results, we validated the multivariable model in an independent dataset. We found that the multivariable model trained on Vienna data (evaluated via cross-validation) and applied to the Utrecht dataset performs near physiological limits in prediction accuracy, with 90% of epochs correctly identified to within ±2 weeks ( Table 1). The absolute error between the FBA and PMA (accuracy), when applied to the Utrecht data, was equivalent to the cross-validation results from the Vienna dataset across a similar range of PMA ( Fig. 3B: p < 0.001; TOST, equivalence boundary of ±0.5 weeks).

FBA for tracking individual growth and predicting neurodevelopmental outcomes
The accuracy of FBA in tracking cot-side development raises the idea that FBA may be useful for individualized assessment of functional maturity. We used linear mixed modelling to account for serial recordings from individual infants which resulted in an adjusted correlation of r = 0.978 (95%CI: 0.974-0.987; n = 65). The improvement in correlation over a point-wise estimate implies that individual infant trajectories are more highly correlated with PMA than the cohort average. In other words, infants tend to follow their individual growth trajectories ( Fig. 4A), and the FBA is able to track these trajectories with high accuracy. Tukey's Range Test). The estimated PAD in these infants was also significantly below 0 weeks (t-test: Cohen's D = 0.661, p = 0.035; n = 9) suggesting a persistent delay in brain maturation (i.e., negative PAD) in infants with abnormal outcome. These differences were not apparent when including infants with less than three serial EEG recordings (ANOVA: F statistic = 0.0112, df = 2, p = 0.894; n = 54), suggesting that multiple recordings may be required to assess a PAD associated with neurodevelopmental outcome. Straight dashed lines denote a difference of plus or minus 2 weeks between PMA and predicted age. (C) Subgroup analysis of EEG predicted age minus PMA with respect to outcome was graded as N -normal (minimum Bayley's score > 85), Mmildly abnormal (minimum Bayley's score between 70 and 85) and Aabnormal (minimum Bayley's score < 70). The asterisks denote p < 0.05 between outcome groups and when testing each outcome group against a null hypothesis of zero mean EEG maturity. Data points in have been shifted for clarity of presentation and are denoted with filled circles. Data points represented by triangles are infants with intra-ventricular hemorrhage.

Discussion
The brain matures rapidly in early life with a wide range of structural and functional indices changing over time spans as short as a few weeks. Here, we showed that automated analysis of preterm EEG can be used to track maturation of cortical function with high accuracy. This multivariable prediction of age from the EEG enables the estimation of functional brain maturity to within 1-2 weeks of PMA; an accuracy that generalized to an independent validation dataset acquired under a considerably different EEG recording environment. The margin of error is far lower than similar predictions in preterm infants based on functional neuroimaging with fMRI (31), and orders of magnitude lower than what is achieved over later stages of life using EEG or MRI (error margins of 5-10 years) (32)(33)(34). Our findings are also comparable to an array of somatic anatomical methods over similar preterm age ranges based on measures of femur length, head circumference, weight, and structural MRI (cortical folding, thickness) (35)(36)(37)(38). This supports the concept of rapid and distinct changes in anatomy and physiology throughout the preterm period and suggests that physiological and anatomical growth are strongly intertwined (23,26,39).
The multivariable model developed here advances previous work that was designed to capture key visual elements of EEG review for age prediction. Incorporating burst measures based on the analysis of crackling noise resulted in the most accurate single variable model, improved multivariable model accuracy, and provided a potential framework to explain the mechanistic origins of rapidly evolving preterm EEG signals. The existence of asymmetric burst shapes replicates our previous findings in independent datasets when identifying pathological changes in the EEG at or near birth (25,30). We also validated several recently proposed qEEG variables of maturation: suppression curve, mPLI, global ASI, multi-scale entropy, and path length (coherence) as excellent predictors of age prediction in the preterm period. This supports the use of automated measures of EEG (qEEG) for the extraction of useful information in excess of visual interpretation.
We also successfully validated the model's robustness on unseen data. This showed that the prediction accuracy of the multivariable model holds when translating to a dataset collected within a different clinical environment and with different recording parameters (e.g. amplifier, electrode type, number, and location). This is a crucial hurdle for the clinical translation of new methods, which is impossible to establish in a dataset acquired under uniform conditions.
Notably, the independent validation dataset was collected using a 4-channel recording montage that is commonly used in brain monitoring with the amplitude integrated EEG (40)(41)(42). This validation on heterogeneous data establishes the wider clinical applicability of determining functional brain age from EEG.
Various measures of growth are commonly used in health care. The finding that neurological dysfunction manifests as immaturity in the EEG is intuitively appealing, and indeed, has been a cornerstone of clinical EEG review for decades (5,(43)(44)(45). This hypothesis can only be accurately tested with qEEG measures that are strongly correlated with age. We show that most phenomenological measures used in clinical EEG research, such as inter-burst interval, EEG amplitude, or spectral power, are only weakly correlated with age, which challenges their applicability for maturational EEG assessment. We show that more recently proposed qEEG measures (such as asymmetry and sharpness of burst shape, suppression curve, mPLI, MSE, and path length) and multivariable models of age are strongly correlated with PMA and, therefore, more relevant for maturational analyses.
We show that infants follow individual functional maturation trajectories in a highly predictable manner. Analysis of these trajectories with measures such as PAD (the difference between FBA and PMA) can be used to predict neurodevelopmental outcome. We used serial EEG recordings to identify persistent PADs. This suggests that early neurological adversities become embedded in cortical function (EEG recordings), accumulating steadily over the course of neurodevelopmental (46). It would be clinically useful to assess whether single PAD measurements at a later stage such as term equivalent age, can provide a sufficiently accurate prediction of future outcomes and, hence, an important proxy for accumulated developmental adversities.
Limited sample sizes, a reality when studying critically ill infants, mean that the reported links between PAD and outcome cannot take into account the variety of factors that may confound this result such as physiological challenges, routine cares, and interventions experienced by preterm infants during their stay in intensive care. These factors will also confound the FBA.
There is evidence to suggest that several of these challenges, interventions (many which are designed to accelerate maturation), and other relevant clinical variables can influence EEG activity and, therefore, contribute to variability in the FBA and, therefore, PAD. These

Data Acquisition
The training dataset consisted of 67 preterm infants admitted to the NICU at the Vienna General University Hospital, Austria (see Table 2). Infants were included if they were born before 29 weeks of gestational age (GA) and parental consent was received. EEG was acquired with nine scalp electrodes using a Brain Quick / ICU EEG (MicroMed, Treviso, Italy) at a sampling frequency of 256 Hz. Electrode positions reflect the 10-20 international system (modified for neonates) and were located at Fp1, Fp2, C3, C4, T3, T4, O1, O2, with a reference at Cz. A bipolar montage (double banana) was used in analysis: Fp1-C3, C3-O1, Fp1-T3, T3-O1, Fp2-C4, C4-O2, Fp2-T4, T4-O2 (Fig. S1). The EEG was recorded as soon as possible after birth and at fortnightly intervals until term equivalent age, where possible.
Each EEG recording was split into 1 h epochs (with a 75% overlap). Epochs with excessive artefact were excluded from further analysis (see Table 2). GA was defined according to the last menstrual period (LMP). If the LMP-based assessment of gestation deviated considerably from ultrasound findings in the first trimester, ultrasound measurements were used as a A summary of the database before and after the rejection of artefactual epochs is presented in Table 2. Data collection was approved by the local ethics committee and written, informed, parental consent was received for each infant included in the database (Medical University Vienna, Austria; study protocol EK Nr 67/2008).

Data Acquisition for Independent, Validation Dataset
We used an independent dataset to validate the single and multivariable models of age prediction. This validation dataset contained EEG recordings from 43 neonates admitted to the NICU at the Wilhelmina Children's Hospital, Utrecht, Netherlands. The data were collected as part of a multi-center European study (65). Infants were included in this study if they were born less than 28 weeks GA and informed, written parental consent was received. Infants were  Prediction of Post-Menstrual Age using qEEG EEG recorded in an intensive care environment is prone to contamination from electrical activity that is not cortical in origin. All data were high pass filtered with a high pass filter (Butterworth, 4 th order, cutoff frequency at 0.5Hz) and then a low-pass filter (Butterworth, 6 th order, cutoff frequency at 16 Hz) to eliminate high frequency activity that is more commonly associated with artefacts, including muscle activity (66). EEG recordings were then segmented into 1 h epochs. Epochs were scrutinized for further artefacts and were ignored if there was significant spatial differences in amplitude, and if the amplitude was too high or too low (see Supplemental Information for details).
Single and multivariable models of PMA were calculated using regression analysis. The qEEG variables used in this study can be grouped into three categories ( Leave-one-subject-out cross-validation was used when generating models with a single or multivariable input and FBA as output. In the case of N subjects (infants), a training set consisting of the burst metrics from N-1 subjects was used to define the model parameters.

Statistical Analysis
A prediction was made on a 1 h epoch of EEG; if multiple EEG epochs exist per recording then the average predicted age per recording was used. The goodness-of-fit between predicted age (single and multivariable models) and PMA was evaluated using the correlation coefficient (Pearson's) and was used to determine the accuracy of the prediction. The bias, variance, and absolute error between predicted age and PMA were also used as measures of goodness-of-fit (13). The use of repeated (serial) measures allowed the application of a linear mixed effects model (LMM) where the model output was a fixed effect and the infant ID was a random effect.
The LMM was implemented using the fitlme function in Matlab [pma ~ model output + (1 | infant ID)]. The adjusted r value was used to assess the goodness-of-fit taking into account multiple recordings from each infant. Statistical comparisons of measures of the goodness-offit between EEG metrics for the prediction of PMA were performed using resampling methods (bootstrap). A correlation coefficient was deemed significantly different if the 95% confidence interval of differences (estimated via a bootstrap) did not span zero, i.e., was either positive or negative. Differences in prediction accuracy (absolute error) between the Vienna (training) and Utrecht (validation) sets were evaluated using equivalence testing with a two, one-sided t-tests (TOST) procedure based on Welch's t-test (73). We used a mean difference in absolute error of ±0.5 weeks as a conservative equivalence boundary based on the results of previous work which reported an absolute prediction error of approximately ±1 week (14). Differences in FBA trajectories (FBA subtracted from PMA and then averaged across all serial recordings) between outcome groups were tested using a one-way ANOVA, with Levene's test for homogeneity of group variances and a post hoc analysis performed using Tukey's Range test to correct for multiple comparisons. Furthermore, FBA trajectories in each group were assessed to determine if they were significantly different from zero using a t-test. For post-hoc analyses, Cohen's D statistic, with small sample size correction, was used to estimate the effect size between groups.
All tests were two-tailed and used a level of significance of 0.05.

Data availability
Access to the raw EEG recordings is available at request (LdV and MJNLB for the Utrecht dataset and KKS for the Vienna dataset).

Code availability
Implementations of the analysis methods are available via GitHubgithub.com/nstevensonUH/Neonatal-EEG-Analysis/tree/master/Preterm_Features_Literature. Competing interests: JAR, MB, and SV have a pending patent application on the burst metrics used in this paper. The authors declare no other competing interests.

Supplementary Information
Fig. S1: Analysis overview. Estimating FBA from the EEG signal using a 'bag of features' combined using kernel support vector regression.