Test‐retest reliability of EEG network characteristics in infants

Abstract Introduction Functional Electroencephalography (EEG) networks in infants have been proposed as useful biomarkers for developmental brain disorders. However, the reliability of these networks and their characteristics has not been established. We evaluated the reliability of these networks and their characteristics in 10‐month‐old infants. Methods Data were obtained during two EEG sessions 1 week apart and was subsequently analyzed at delta (0.5–3 Hz), theta (3–6 Hz), alpha1 (6–9 Hz), alpha2 (9–12 Hz), beta (12–25 Hz), and low gamma (25–45 Hz) frequency bands. Connectivity matrices were created by calculating the phase lag index between all channel pairs at given frequency bands. To determine the reliability of these connectivity matrices, intra‐class correlations were calculated of global connectivity, local connectivity, and several graph characteristics. Results Comparing both sessions, global connectivity, as well as global graph characteristics (characteristic path length and average clustering coefficient) are highly reliable across multiple frequency bands; the alpha1 and theta band having the highest reliability in general. In contrast, local connectivity characteristics were less reliable across all frequency bands. Conclusions We conclude that global connectivity measures are highly reliable over sessions. Local connectivity measures show lower reliability over sessions. This research therefore underlines the possibility of these global network characteristics to be used both as biomarkers of neurodevelopmental disorders, but also as important factors explaining development of typical behavior.


| INTRODUC TI ON
The brain is a complex network consisting of highly interconnected regions. During early childhood, these networks develop at a rapid pace. Electroencephalography (EEG) can be used to study this early development of functional networks (Boersma et al., 2013;Orekhova et al., 2014). The high temporal resolution of EEG allows for the study of high-frequency oscillatory brain activity, while the infant is relatively unrestricted in its movements. Synchronized oscillatory activity allows for optimized flow of information between two regions (Fell & Axmacher, 2011) and therefore studying oscillatory brain activity, either at rest or during a task, gives insight in underlying functional connectivity and brain networks. Oscillatory brain activity ranges from ultraslow oscillations (0.05 Hz) to fast transient oscillations (up to 500 Hz) (Buzsáki, 2004). Infant EEG has limited functionality in the detection of high-frequency oscillations, as contamination with muscle-induced high-frequency artifacts is difficult to remove. Therefore, most developmental EEG researchers focus on slower oscillatory activity, including delta (0.1-3 Hz), theta (3-6 Hz), alpha (6-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), and low gamma  bands. Functionally distinct networks can be found at these different frequency bands, which is most notably represented in the spatial scale of oscillatory synchrony, which ranges from several centimeters in slow oscillations (Schoffelen, 2005) to micrometers in ultrafast oscillations (Izhikevich, 2001). Functional brain networks and characteristics have been used in the past to study differences between typical and atypical brain development. In autism spectrum disorder (ASD) for example, global connectivity (the averaged connection strengths of the whole brain network) tends to be deteriorated at lower frequencies, which is compensated by increased global connectivity at higher frequencies (Boersma et al., 2013;O'Reilly, Lewis, & Elsabbagh, 2017;Orekhova et al., 2014;Peters et al., 2013;Righi, Tierney, Tager-Flusberg, & Nelson, 2014). Similarly in children with attention-deficit hyperactivity disorder show an increase in frontal low alpha connectivity and a decrease in frontal high alpha connectivity (Murias, Swanson, & Srinivasan, 2007).
Comparing these networks on a global connectivity level has shown usefulness. However, to better understand the differences between these complex networks on a detailed level, a graph theoretical framework can be used (Albert & Barabási, 2002;Bullmore & Sporns, 2009), which simplifies the network into nodes (centers of information or-in the case of EEG connectivity-EEG sensors) and edges (connections between the nodes). With this mathematical approach, several metrics can be calculated describing certain aspects of a network. The most commonly used network metrics are the characteristic path length (Lw), the average clustering coefficient (Cw) and the small-worldness index (SWI). The characteristic path length is the average shortest path length between all nodes in the network. A shorter characteristic path length generally indicates a higher global efficiency in networks. The average clustering coefficient describes the number of clusters in a network. Higher clustering generally indicates higher local efficiency in networks. Small-world networks are networks in which both short path lengths and high clustering are present. As such, small-worldness is calculated as the ratio between the normalized clustering coefficient and the normalized path length. All of these characteristics have been connected to several neurodevelopmental disorders, like ASD (Peters et al., 2013;Rudie et al., 2013;Tsiaras et al., 2011) andADHD (Ahmadlou, Adeli, &Adeli, 2012).
While these connectivity and graph measures show potential as biomarkers to detect atypical development, biomarkers are only useful if they have a low inter-subject variability and a high test-retest reliability (Hardmeier et al., 2014). Several studies have shown that this is the case for adult EEG/MEG networks (Deuker et al., 2009;Hardmeier et al., 2014;Kuntzelman & Miskovic, 2017).
Whether this also holds true for infants, however, is currently unknown. For the early detection of neurodevelopmental disorders, it is especially vital that network measures are reliable during infancy. Therefore, in this study, we set out to determine the testretest reliability and inter-subject variability for functional EEG network measures, created by task-dependent continuous EEG in infants.

| Subjects & Procedure
Seventy-seven 10-month-old infants, recruited from communal registers in the Netherlands, participated in the study. The final sample consisted of 60 infants (29 males, at first visit: mean age = 301 days, range = 272-342, at second visit: mean age = 308 days, range = 279-349). During the EEG recording infants were seated in a high chair and watched 2 different one-minute videos on a computer screen, three separate times. The first video depicted social stimuli with singing women as the subject, the second video depicted non-social stimuli of toys that were moving without human interference, earlier used in a study by Jones and colleagues (Jones, Venema, Lowy, Earl, & Webb, 2015). The lack of fixed sleep patterns in most young infants, caused the start times of the experiment not to be fixed over sessions. However, where possible, infants were tested before noon.
Eat and sleep patterns were recorded on both sessions. These patterns could not be kept similar across sessions. The parents/guardians received information about the study beforehand and signed an informed consent form before the start of the first session. The medical ethical committee of the University Medical Center Utrecht approved the study (application number: 14-221). Children received a toy after participation.

| EEG acquisition
EEG was recorded using a cap with 32 electrodes (ActiveTwo system, BioSemi) positioned according to the international 10/20 system, at a sampling rate of 2048 Hz. A Common Mode Sense (CMS) and Driven Right Leg (DRL) electrode were used to provide an active ground. In addition, two mastoid electrodes (EXG1 & EXG2) were placed behind the ears and one ocular electrode under the eye (EXG3).
The original 2048 Hz data were down sampled to 512 Hz, using chip interpolation and band-pass filtered at 0.1-70 Hz with a twoway Butterworth filter. Data were purposefully not deep cleaned, to limit subjective outside influences pushing the data into a highly reliable mold. However, clearly nonneurological signals, like jumps, cuts, and high variability within a trial, were detected and removed. Channels were removed if more than 50 percent of the signal in a channel contained artifacts. Bad channels were removed from both sessions of a subject. The cleaned data were used for further analysis.

| Connectivity calculation
The cleaned data for each subject were bandpass filtered into six bands: delta (0.5-3 Hz), theta (3-6 Hz), alpha1 (6-9 Hz), alpha2 (9-12 Hz), beta (12-25 Hz), and gamma (25-45 Hz). Since individual theta and alpha peaks are influenced by development, alpha1, and theta bands were chosen to encompass all theta and alpha peaks ±1 Hz. The resulting data were cut into 5s. epochs. Twenty random epochs were picked per subject per session. For each epoch, connectivity between pairs of electrodes (32*31/2 = 496) was calculated with the phase lag index (PLI) and the debiased weighted PLI, both relying on the same principle of phase locking or phase synchrony (Tass et al., 1998). The PLI, proposed by Stam et al., (Stam, Nolte, & Daffertshofer, 2007), describes the asymmetry of the distribution of phase differences between pairs of signals: where Δ is the instantaneous phase difference between signals at time point t for k = 1 … N per epoch (N = 5*512 = 2,560), determined using the Hilbert transformation. || stands for absolute values, <> for the mean values and the sign for a signum function (phase difference is either −1, 0, or 1). The resulting PLI can range from 0 to 1. Volume conduction, the effect that multiple electrodes register activity from the same source, plays a minimal role in the PLI. Activity from a single source will appear in both electrodes as having a phase difference of exactly zero. Since the PLI indexes the stability of phase leaping or lagging, a phase difference of zero will lead to a PLI of zero.
The debiased weighted PLI (dwPLI) is an adjustment of the PLI developed by Vinck and colleagues (Vinck, Oostenveld, van Wingerden, Battaglia, & Pennartz, 2011). The PLI is weighted by the amount of lag between the two signals, thereby limiting the influence of near zero phase differences. This minimizes the amount of false positive connectivity between near zero phase difference signals, which could be caused by noise in the data. Since infant data are notorious for its noisiness, the dwPLI is included as well. Our used version of the weighted PLI also debiases the connectivity based on the number of epochs, since infant data likely involve few trials. This debiasing can cause the dwPLI to be negative and, therefore, ranges from −1 to 1.

| Graph analysis
Several graph measures were calculated using the acquired individual connectivity matrices. The complete weighted matrices were used, eliminating the need for arbitrary thresholds. The following graph measures were calculated using the brain connectivity toolbox (Sporns & Rubinov, 2010) (Table 1): average clustering coefficient (Cw), characteristic (average shortest) path length (Lw); and smallworldness index (SWI, calculated as the ratio between normalized Cw and normalized Lw). Both the averaged clustering coefficient and the characteristic path length are normalized to limit the influence of global connectivity on these characteristics.

| Statistical analysis
The test-retest reliability was determined differently across three different steps of the analysis (Figure 1). At the most basic level  (Shrout & Fleiss, 1979;Weir, 2005), which uses a one-way ANOVA to determine the mean squared error (MS e ) and the between object (subject) variance (MS r ). Shrout and Fleiss (1979) describe six distinct statistical models which carry the name, of which we are using an ICC(3,1) two-way mixed effect model, similar to other studies on the reliability of graph measures (Hardmeier et al., 2014;Hatz et al., 2016). ICC values were calculated using: where k is the number of measurements per subject.
We assessed the reliability of both global and local (dw)PLI connectivity matrices (step 2, Figure 1b). The global PLI/dwPLI (ICC glob ) was calculated by averaging over all 325 electrode pairs of each subjects' matrix, creating one value per subject per frequency band per session. A single ICC value per frequency band was calculated by comparing session 1 versus session 2. The local PLI/dwPLI unit-wise reliability was determined by calculating an ICC value per electrode pair over all subjects' session 1 versus session 2, creating 325 ICC values. Since these values did not follow a normal distribution, the median was taken as the single reliability value (ICC unit ). To summarize, the reliability of the global PLI/dwPLI is the reliability of all
To test the reliability of the graph measures (Cw, Lw, and SWI), values were calculated for each subject, per session, per frequency band (step 3, Figure 1c). An ICC was used to calculate the reliability of these graph measures over sessions. In accordance to previous F I G U R E 1 Overview of the different steps in network analysis and their respective reliabilities. This figure shows the complete steps of network analysis and graphically depicts the reliabilities calculated for each step. (a) reliability at the most fundamental level, in which connectivity matrices are correlated over sessions for each subject, for each frequency band. (b) reliability of global (left) and local, "unitwise" (right), connectivity. (c) graph theoretical representation of the network and several graph characteristics, which are compared over sessions research on graph metrics, we report ICC values below 0.4 as low reliability, 0.4 < ICC < 0.6 as mediocre reliability, 0.6 < ICC < 0.75 as good reliability and an ICC >0.75 as excellent reliability (Hardmeier et al., 2014;Jin, Seol, Kim, & Chung, 2011). To understand the effect of outliers, a bootstrapping procedure with replacement and 10,000 permutations was used to estimate the 95% confidence intervals for both COV and ICC values, similarly used by Hardmeier and colleagues (Hardmeier et al., 2014). For a clear overview of the reliability tests, please refer to Figure 1. Lastly, for both the connectivity and graph measures the inter-subject variability was determined using the coefficient of variation (ratio between mean and standard deviation).
It is common to perform spectral analyses along with the connectivity analyses to get a better overview of how power and connectivity are associated. Therefore, reliability of EEG-power metrics was calculated as well. Results and methods for this section can be found in Data S1.

| Reliability of connectivity matrices
The results of the correlation of the connectivity matrices across sessions are presented in Figure 2. Correlation coefficients range widely and the median of the coefficients is generally low. There is little difference between the reliability of dwPLI and PLI connectivity matrices, showing ranges of respectively 0.1-0.37 and 0.03-0.33.
Also, note the wider 95% confidence intervals for the dwPLI calculated global connectivity. Therefore, dwPLI is excluded from this point onwards in the results to prevent misinformation.
The reliability of local, unit-wise, PLI connectivity was lower than global PLI, with the median ICC showing mediocre to good reliability in the theta and alpha1 frequency band  Table 4 shows the reliability of graph measures calculated from the PLI matrices. PLI average clustering coefficient (Cw) was excellently reliable across alpha1, alpha2 and theta frequency bands (0.84 < ICC Cw < 0.91) and was mediocre to good in reliability in delta, beta and theta frequency bands (0.59 < ICC Cw < 0.73).
During session 1, not all networks showed small-worldness (range: F I G U R E 2 Connectivity matrix correlation coefficients for all frequency bands. Boxplot of all individual connectivity matrix correlations for session 1 versus session 2, shown for delta, theta, alpha1, alpha2, beta, and gamma. The left graph shows the correlation coefficients for the connectivity matrices calculated with the dwPLI, the right graph shows the PLI calculated connectivity matrices. Correlations range widely, but the median of the correlations within each frequency band is low. Plotted with BoxplotR (Spitzer, Wildenhain, Rappsilber, & Tyers, 2014) 0.9869 < SWI < 1.02). Average connectomes were created for both sessions for all frequency bands, which shows a strong similarity in strongest connections and connection strength between session 1 and 2 (Figure 4).  Thirdly, the reliability of global first order graph metrics tested in this study ranged from moderate to excellent, with both average clustering coefficient (Cw) and characteristic path length (Lw) being excellently reliable across theta, alpha1, and alpha2 frequency bands. This is also found in other EEG network reliability studies.

| D ISCUSS I ON
Previously mentioned Hardmeier and colleagues also tested the reliability of graph metrics and found excellent reliabilities for both Cw and Lw in theta, alpha1 and alpha2 bands (Hardmeier et al., 2014).
More recently, Kuntzelman & Miskovic tested adults during an eyesclosed resting state EEG paradigm, comparing global and local graph measures on coherency and dwPLI. They reported good reliability of global dwPLI metrics in theta, alpha1 and alpha2 frequency bands (Kuntzelman & Miskovic, 2017).
Across the study, we report lower reliabilities for delta, beta and gamma frequency bands than for theta, alpha1, and alpha2 frequency bands. This is in concurrence with several previously mentioned studies in which lower beta and gamma reliabilities (Hardmeier et al., 2014;Jin et al., 2011;Kuntzelman & Miskovic, 2017); and lower delta reliabilities (Deuker et al., 2009;Kuntzelman & Miskovic, 2017) were found. Most commonly, the lower reliability of higher frequency bands is explained by the dichotomy between higher and lower frequency bands, where higher frequency bands are more involved in establishing cognitive representation, while lower frequencies are more anatomically constrained (Bassett & Bullmore, 2006). This constraint could aid higher reliabilities over sessions. Also, both theta and alpha have been suggested to be important for processing attention (Aftanas & Golocheikine, 2001;Klimesch, Doppelmayr, Russegger, Pachinger, & Schwaiger, 1998) and top down control (Engel, Fries, & Singer, 2001). Since our task could specifically target these systems, the resulting higher signal to noise ratio in these frequency bands could result in more reliable networks. Lastly, the higher prevalence of muscle artifacts in the higher frequency bands could limit reliability, especially in children. The small-worldness index (SWI) is also less reliable in our study, which is in concurrence with previous studies (Hardmeier et al., 2014;Kuntzelman & Miskovic, 2017).
Since small-worldness is calculated using both clustering coefficient and path length, and both these characteristics vary independently across sessions, a combination of these variances in the SWI (SWI) could contribute to a lower reliability for the SWI.
The overall spatial resolution has a large influence on test-retest reliability with global connectivity characteristics being highly reliable, while local connectivity characteristics are somewhat less reli- able. This study also shows that different steps of the analysis yield tics, which can be explained in several ways. Firstly, it is possible that some lowly connected, noisy connections are present in the full connectivity matrices, which are averaged out in global connectivity characteristics. Secondly, brain networks fluctuate in activity over time (Chang & Glover, 2010). It is possible that, comparing multiple sessions, the state of the network is different, but the underlying characteristics and anatomy are equal. Thirdly, a difference in fixing the EEG cap over sessions could lead to a rotation in connectivity matrices over sessions (Hatz et al., 2016) and lastly, an unknown covariate, that remains stable over sessions, could influence network characteristics, but not connectivity matrices. It is currently unknown which of these explanations (or a combination of these explanations) is correct and future research is needed to further understand the relationship between unreliable connectivity matrices and reliable connectivity characteristics.
It is important to note that reliability does not imply validity and that this study, therefore, does not allow conclusions on the validity of these measures. It is currently unknown how tightly these measures reflect true cortical and subcortical brain connectivity. This becomes more difficult with EEG, which is restricted to measuring activity at the sensor level. While resting state oscillations have been found to be connected to resting-state connectivity gathered from functional MRI data (Laufs, 2008;Mantini, Perrucci, Del Gratta, Romani, & Corbetta, 2007), in our study, due to the difficulty of doing restingstate research with infants, we opted for a continuous video stimulus.
While this makes it more difficult to understand how these network characteristics are reflected in the structural connectome, it comes with the added benefit of minimizing the variance over sessions, thereby possibly improving reliability. This is also reflected in the study by Deuker and colleagues, where task-dependent connectivity measures were shown to be more reliable than resting state connectivity measures (Deuker et al., 2009 (Boersma et al., 2013). Others have noted differences in graph characteristics in adults suffering from ASD (Belmonte et al., 2004) and ADHD (Ahmadlou et al., 2012). This, together with the here reported excellent reliability of graph and connectivity measures in theta, alpha1 and alpha2 frequency bands in infants, underlines the potential of using these measures to detect neurodevelopmental disorders at an earlier age, conceivably increasing our fundamental knowledge on how these disorders develop and could possibly be treated.

| CON CLUS IONS
This study showed for the first time that global and to a lesser extent local PLI connectivity measures in infants are reliable over a 1-week period. We recorded EEG from infants twice, one week apart, while they were watching social and nonsocial videos. We found that when comparing the resulting PLI networks, global network measures are stable over time. Reliable global network measures could play a vital role in finding biomarkers for several disorders. The unrestrictive nature and the relative ease of an EEG recording make it especially useful to detect these network characteristics at a very young age, giving us important insight in the development of these disorders, possibly making early detection, and intervention possible.

ACK N OWLED G M ENTS
We would like to thank dr.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest do declare.