Neonatal brain connectivity outliers identify over forty percent of IQ outliers at 4 years of age

Abstract Background Defining reliable brain markers for the prediction of abnormal behavioral outcomes remains an urgent but extremely challenging task in neuroscience research. This is particularly important for infant studies given the most dramatic brain and behavioral growth during infancy. Methods In this study, we proposed a novel prediction scheme through abstracting individual newborn's whole‐brain functional connectivity pattern to three outlier measures (Triple O) and tested the hypothesis that neonates identified as “brain outliers” based on Triple O were more likely to develop as IQ outliers at 4 years of age. Without need for training with behavioral data, Triple O represents a novel proof‐of‐concept approach to predict later IQ outcomes based on neonatal brain data. Results Triple O correctly identified 42.1% true IQ outliers among a mixed cohort of 175 newborns with different term, twin, and maternal disorder statuses. Triple O also reached a high level of specificity (96.2%) and overall accuracy (90.3%). Further incorporating a demographic information indicator, the enhanced Triple O+ could further differentiate between high and low 4YR IQ outliers. Validation tests against seven independent reference samples revealed highly consistent results and a minimum sample size of ~50 for robust performance. Conclusions Considering that postnatal brain growth and various environmental factors likely also contribute to 4YR IQ, the fact that Triple O, based purely on neonatal functional connectivity data, could identify >40% of 4YR IQ outliers is striking. Together with the very high level of specificity, each outlier predicted by Triple O represents a meaningful risk but future efforts are needed to explore ways to identify the rest of outliers. Overall, with no need for training, a high level of robustness, and a minimal requirement on sample size, the proposed Triple O approach demonstrates great potential to predict later outlying IQ performances using neonatal functional connectivity data.


| INTRODUC TI ON
Over the past two decades, the successful application of resting-state functional magnetic resonance imaging (rsfMRI) technique (Biswal, Yetkin, Haughton, & Hyde, 1995) in the infant population has dramatically improved our understanding of the fast-paced, nonlinear, and patterned development of the brain's functional networks during the first years of life (Doria et al., 2011;Fransson, Aden, Blennow, & Lagercrantz, 2011;Gao, Lin, Grewen, & Gilmore, 2016;Gao et al., 2009;Smyser et al., 2010;Thomason et al., 2013). Plastic and modifiable by a range of beneficial and risk factors , functional brain development during this period harbors both great opportunity and vulnerability, consistent with the developmental origins of health and disease (DOHaD) hypothesis (Monk, Lugo-Candelas, & Trumpff, 2019;Silveira, Portella, Goldani, & Barbieri, 2007). Therefore, one of the most important goals for developmental imaging research is to explore whether and how early brain-based biomarkers could identify risks for later adverse developmental outcomes. To this aim, promising associations between early brain functional connectivity and later behavioral outcomes have been reported (Alcauter, Lin, Smith, Short, et al., 2014;Ball et al., 2015;He et al., 2018;Salzwedel et al., 2019;Strahle et al., 2019), supporting a positive answer to the "if" question but "how" exactly accurate prediction can be achieved remains elusive.
Supporting this notion, previous reports on brain-behavioral relationships in infants often report weak to moderate correlations between a number of functional connections and a single behavioral output (Alcauter, Lin, Smith, Short, et al., 2014;He et al., 2018;Salzwedel et al., 2019), suggesting that individual functional connections/processes likely only contribute a small percentage of variance to a given behavioral phenotype. Adding to the challenge, the collection of functional connections/processes may show both positive and negative associations with the same behavioral output and may act in both additive and interactive ways, making efforts to combine them for prediction even more challenging. Intuitively, these issues may seem to suggest advanced machine learning-based algorithms as a suitable solution but the need for "big data" for training of such algorithms make them impractical at this point, given the sparsity of available data due to inherent challenges in research-based infant MRI imaging.
An alternative solution lies in hypothesis-driven prediction through informed feature selection and dimension reduction. The key for the success of such approaches lies in the way of abstraction so that the selected features could capture the most salient information that are predictive of the later behavioral output in question. In this paper, we sought to derive a novel abstraction-prediction framework based on neonatal functional connectivity pattern and test its predictive values for 4-year IQ (4YR IQ) performance. As a composite measure of general cognitive capability, IQ covers multiple functional domains including fluid reasoning, verbal and nonverbal knowledge, quantitative skills, visual-spatial processing, and working memory (Roid, 2003). Each of these functional domains is supported by a distributed set of brain regions and/or networks; thus, the brain basis of IQ is likely represented by a complex whole-brain system composed of widely distributed but coordinating functional networks (Goriounova & Mansvelder, 2019). Given this integrated and global nature of the potential brain basis of IQ, we propose a "Triple Outliers (Triple O)" approach to reduce the neonatal wholebrain functional connectivity pattern into three global outlier measures and applied them to 175 study participants to categorically predict their 4-year IQ performances as being or not being an "outlying" performer. Given the DOHaD (Monk et al., 2019;Silveira et al., 2007) and the developmental cascading hypothesis (Masten & Cicchetti, 2010), we hypothesize that newborns characterized as "global brain outliers" would more likely manifest later outlying IQ performances. This hypothesis is empirically supported by previous studies reporting a number of functional connections showing linear correlations with composite cognitive outcomes later in life Alcauter, Lin, Smith, Short, et al., 2014;Salzwedel et al., 2019). These studies suggest that subjects more frequently lying on the two ends of the functional connectivity continuum should also be more likely to appear at the two ends of the behavioral spectrum. We propose sign-insensitive outlier measures in Triple O since both positive and negative associations at the individual connection level have been observed Alcauter, Lin, Smith, Short, et al., 2014;Salzwedel et al., 2019). Similarly, the sign-insensitive nature of Triple O may also mitigate the recently reported heterogeneity in brainbehavior relationships within the neonate population (Chen et al., 2020) through equally accounting the absolute deviations from the population mean regardless of the different directions of brain-behavior associations in different subgroups. Note, however, there are likely other postnatal brain (both structural and functional) and environmental factors (e.g., parenting, family environment/enrichment activity, adverse life events, and nutrition) that also contribute to 4YR IQ outcomes so we do not expect the proposed Triple O to capture all 4YR IQ outliers. Overall, as a novel hypothesis-driven prediction approach with no need for training with IQ outcome data, we do anticipate the proposed Triple O approach and its future extensions to have high translational potential to aid clinical identification of newborns at risk for adverse developmental outcomes later in life due to abnormal functional brain growth before the neonatal stage.

| Participants and Image Acquisition
Three hundred and ninety three infant participants were retrospectively identified from the UNC-Chapel Hill Early Brain Development Study, characterizing early childhood brain and behavior development Gilmore, Knickmeyer, & Gao, 2018). Based on whether there was an available 4-year-old IQ (4YR IQ) score, these infants were separated into two groups: SAMPLE 1 (N = 175) with both available neonate MRI scans and 4-year-old IQ scores, and SAMPLE 2 (N = 218) with only available neonate MRI scans but no 4YR IQ. Both SAMPLE 1 and SAMPLE 2 were composed of mixed samples with respect to sex, term status (i.e., either full-term or preterm), twin status (i.e., either singleton or twins), and maternal mental disorder status (i.e., with or without maternal disorder diagnosis, including schizophrenia, bipolar disorder, and other nonspecified psychiatric disorders). Besides these three categorical variables, we also have continuous measures of gestational age (GA) at birth, GA at MRI scan/4YR IQ assessment, birthweight, birth length, maternal/paternal age, maternal/paternal education in years, and total annual family income. The detailed demographic information for both SAMPLE 1 and SAMPLE 2 is listed in Table 1. These heterogeneous samples were selected to enrich developmental outcomes as well as to test the practical applicability of our prediction approach.
IQ scores at 4 years of age were measured using the Stanford-Binet Intelligence Scales, 5th edition (Roid, 2003). The Stanford-Binet is a series of tasks administered individually in a structured setting. These scales were designed to assess intelligence across the life span, with focuses on five major domains, including fluid reasoning, knowledge, quantitative, visual-spatial processing, and working memory. In the current study, the Abbreviated IQ (ABIQ) score was used as a measure of general cognitive ability. This IQ score is calculated from performance on two routing subtests: a nonverbal test involving object or sequence/pattern recognition and a verbal test of vocabulary. The Abbreviated IQ score provides a quick estimate of a child's general cognitive ability and as it requires the administration of only two subtests. Therefore, it is easier to obtain than the full-scale IQ, especially for 4-year-old children. The ABIQ score has shown strong test-retest (r = .87) reliability. The Stanford-Binet scales also have strong interrater reliability (ranging from 0.74 to 0.97 across all scales). The overall study protocols were approved by both the UNC at Chapel Hill and Cedars-Sinai Institutional Review

Boards.
MRI data were acquired using two scanners: a 3T Siemens Allegra scanner with a circular polarization head coil (330 neonatal scans) and a 3T Siemens Tim Trio with a 32-channel head coil (63 neonatal scans). Functional images were acquired with a T2* weighted echo planar imaging (EPI) sequence: TR/TE = 2,000 ms/ 32 ms, 33 slices, voxel size = 4 mm 3 , 150 volumes of repetition. Structural images were acquired using a 3D MPRAGE sequence: TR/TE = 1,820 ms/ 4.38 ms, TI = 1,100 ms, voxel size = 1 mm 3 . Infant subjects were fed, swaddled, and fitted with ear protection prior to imaging. All subjects were in a natural sleep state during the imaging session.
The 24 motion-related parameters (six motion correction parameters, derivative, and their quadratic terms), white matter, CSF, and global signals (also including their derivative and quadratic terms) were included as nuisance signals. All the nuisance signals were band-passfiltered (0.01-0.08 Hz) before regression to match the frequency of the BOLD signal. Data scrubbing was performed as an added motion correction step in addition to the standard rigid-body motion correction procedures. Specifically, volumes with global signal changes >0.5% and/or framewise displacements (FD) >0.3 mm (Power et al., 2014) were excluded (plus one before and two after). Subjects with <90 volumes (=3 min) were excluded from the study. After functional images preprocessing, all functional images were registered to the age-specific anatomical template space (Shi et al., 2011) (Medu), birthweight (BW), and gestational age at birth (GA at birth)) were highlighted and listed in the first three panels. Gestational age at birth (GA at birth), gestational age at IQ assessment (GA at 4YR IQ), gestational age at MRI scan (GA at MRI scan), birthweight, birth length, maternal/paternal age, maternal (Medu)/paternal education (Pedu) in years, and total annual household income (THI) anatomical images to the target images. Spatial transformations were performed in FSL. The amount of volumes scrubbed and residual framewise FD (rFD) were compared cross-sectionally to ensure there were no differences in motion and included as motion covariates in statistical analysis. Finally, the images were spatially smoothed with Gaussian kernel (FWHM = 6 mm) and truncated into 90 volumes to increase consistency across subjects. Finally, UNC-CEDARS functional parcellation atlas for neonate brains (Shi, Salzwedel, Lin, Gilmore, & Gao, 2017), including 223 regions, was used to create the whole-brain functional connectivity matrix. One region was lost when downsampling into 4-mm spatial resolution resulting in a final matrix size of 222 × 222 for each subject. The correlation of fMRI signals between paired regions was Fisher-Z-transformed for subsequent analysis.

| Brain outlier detection
To define brain outliers, we propose to abstract the whole-brain 222 × 222 functional connectivity matrix to three simple measures quantifying three different, but related, aspects of being "outlying" against a reference population. The first measure directly counts the number of "outlying" connections against the reference group mean.
Similar to IQ outlier definition, a one-tailed significance of p < .05 was used as the threshold to define connection-level outliers. The rationale for the first measure is intuitive and straightforward; if multiple functional connections linearly correlate with a later behavioral outcome, then the number of times a particular subject sits on the extreme of these connections would likely be associated with the chance of the subject appearing on the extreme of the behavioral outcome given the underlying linear relationships. In other words, the number of "outlying" connections may be viewed as an index of the number of "at-risk" brain features that may contribute to later abnormal IQ outcomes.
Therefore, the number of "outlying" connections was chosen as the first measure. Considering that there could be other subthreshold deviations from the group mean, a second measure was defined as the average Euclidean distance of one subject's vectorized connectivity matrix to every subject's vectorized matrix in the reference group, which summarized the overall deviation of one subject's whole-brain connectivity pattern to the reference group regardless of any threshold. The second measure represents a complementary measure to the first one to further include those subjects that may show connectionlevel subthreshold deviations from the group mean but the sum of which is sufficient to put them at the extreme at the whole-brain level.
Since both the first two measures evaluate subject-level deviations from the group mean, we proposed another within-subject measurethe standard deviation across all individual connections as a third within-subject measure with the expectation that subjects showing extreme level of functional connectivity variability across the whole brain may also be more likely to manifest as IQ outliers.
We chose to use these whole-brain level outlier measures in the proposed approach since both positive and negative associations at the individual connection level have been observed (Alcauter, Lin, Smith, Short, et al., 2014;He et al., 2018;Salzwedel et al., 2019).
Through identifying global "outliers" relative to the group mean, the proposed Triple O approach would summarize extreme deviations from the population mean regardless of whether they are at the low or high end of the functional connectivity strength spectrums.
Therefore, Triple O is not sensitive to positive or negative associations; rather, it "harmonizes" and summarizes absolute deviations from the population mean in both types of brain-behavioral associations for better prediction. More importantly, recent evidences (Chen et al., 2020) suggest that even within an otherwise "homogenous" neonate population, different subgroups of neonates may possess qualitatively different brain-behavior associations (i.e., in different directions). A typical way to address such relational heterogeneity in prediction would likely require subgroup-specific prediction models to better utilize the differential information. However, the proposed global brain outlier-based Triple O approach would mitigate such heterogeneity since regardless of the different directions/sings of brain-behavior correlations, the outlier measure will equally capture the absolute deviations of a subject with respect to the group mean for effective prediction.
After the calculation of all three measures, they were normalized to Z-scores based on the mean and standard deviations from the reference group and any values >1.645 standard deviations above the mean (i.e., one-tailed p < .05) were defined as an "outlier." Therefore, each subject would have three outlier indicators (i.e., Triple O) and those with two or more "outliers" indicators detected to be true was finally defined as a "brain outlier." Based on neonatal functional connectivity data from SAMPLE 1, brain outliers were defined using the proposed Triple O scheme as described above. The correspondence between the detected "brain outliers" and 4YR IQ outliers was examined to test Triple O's performance in terms of sensitivity, specificity, and overall accuracy.
Note for Triple O, all three brain outliers were defined purely based on brain rsfMRI data and no training with 4YR IQ data was needed.
Instead, the 4YR IQ outcome data were only used to test the performance of the Triple O model. Therefore, Triple O represents a data-based prediction scheme with no need of training with outcome data.

| Comparison with demographic informationbased outlier detection
Based on the continuous measures of gestational age (GA) at birth, GA at MRI scan/4YR IQ assessment, birthweight, birth length, maternal/paternal age, maternal/paternal education in years, and total annual family income, their correlations with 4YR IQ were calculated. Among all demographic variables, we identified three that (a). we have data on all the 175 subjects in SAMPLE 1; and (b) showed significant positive correlations with 4YR IQ. These three variables were maternal education, birthweight, and gestational age at birth.
All three measures were Z-transformed and tested through a similar outlier detection pipeline as did the three brain outliers to define demographic information-based outliers. The performance of these "demographic outliers" in detecting 4YR IQ outliers was compared with the brain-based Triple O results.

| Demographic information-enriched Triple O (Triple O+)
To explore whether incorporating demographic information would help specify the polarity of brain-based outlier prediction, we calculated the average of the three demographics identified as meeting the two criteria above (i.e., GA at birth, birthweight, and maternal education in years) as a Cumulative Demographic Risk Index (CDRI) in SAMPLE 1. Given their consistent positive correlations with 4YR IQ, we predicted that the detected brain "outliers" with above-average CDRI would correspond to high 4YR IQ performers, while those with below-average CDRI would correspond to low 4YR IQ performers. Therefore, CDRI was used to further provide a "sign" for Triple O prediction and this enriched model was termed as Triple O+.

| Validation based on SAMPLE 2 and its subsamples
To test the robustness of the Triple O+ prediction, the 218-subject SAMPLE 2 and six subsamples from SAMPLE 2 (i.e., only males (N = 108), only females (N = 110), data from one of the two scanners (N = 181/37 for Scanner 1/2), and two randomly selected 100-subject subsamples) were used as independent reference samples to identify brain outlier from SAMPLE 1. The corresponding performances were compared with the one using SAMPLE 1 as its own reference to examine Triple O+'s robustness against different reference samples.

| The sample size limit and underlying mechanism of Triple O+ prediction
With the validation analyses suggesting that the sample size of the reference group may be a limiting factor for robust performance, we performed a set of random sampling analyses to explore the performance of Triple O+ with different sample sizes of the reference group. Specifically, we randomly selected 10-216 subjects 1,000 times from the SAMPLE 2 (N = 218) and tested the prediction performances on the independent SAMPLE 1 (N = 175) data. Sensitivity, specificity, and accuracy were calculated at each sample size step.
Moreover, we also calculated the mean and standard deviations of the three brain outlier measures (i.e., the number of connection-level outliers, the Euclidean distance of the overall matrix with the reference group, and the within-subject standard deviation) to further reveal the underling mechanisms of the performance.

| Participants and 4YR IQ distribution
The detailed demographic information for both SAMPLE 1 and SAMPLE 2 is listed in and 11.8, respectively. The actual distribution is shown in Figure 1a (left panel). When compared between females/males, singleton/ twin birth, full-term/preterm, and with/without maternal disorders, significant differences were observed between term-preterm (P/F, p = .039) and singleton-twin birth (T/S, p = .018), while marginally significant differences were observed between children with and without maternal disorder diagnosis (D/nD, p = .103, Figure 1a, middle panel), suggesting that twin status, preterm birth, and maternal disorder diagnosis can be viewed as potential risk factors for lower IQ. Therefore, in the following discussions, we primarily separate SAMPLE 1 into two cohorts: 1. the CONTROL group (i.e., singleton birth, full-term, and no maternal disorder diagnosis, N = 56) and the RISK group (i.e., meeting at least one of the three criteria, including twin, preterm birth, and maternal disorder diagnosis, N = 119).
When splitting the whole histogram into the two groups, it is apparent that there were more CONTROLs at the right end (high IQ) but more RISK participants at the left end of the overall 4YR IQ distribution ( Figure 1a There were 13 "High IQ outliers" (i.e., IQ > 127, 7.43%, corresponding to the "Superior" and "Gifted" categories in SB5 classification (Roid, 2003)) and 6 "Low IQ outliers" (i.e., IQ < 85, 3.43%, corresponding to the "Low Average" and "Borderline/Mildly impaired or delayed" categories according to SB5 classification (Roid, 2003)).
Among the 13 high IQ outliers, 7 were from the CONTROL group and 6 were from the RISK group, while all 6 low IQ outliers were from the RISK group, reaffirming a higher risk of lower 4YR IQ performance from the RISK group. The detailed breakdown of the 13 high/low performers against all demographic categories is listed in Table S1, which shows a mixed distribution, and none of the IQ outliers could be identified simply by his/her demographic classification. Similarly, when the defined IQ outliers were highlighted in the scatter plots of quantitative demographic variable-4YR IQ relationships (Figure 1b), it is apparent that despite significant quantitative correlations with a subset of demographic variables, the IQ outliers could not be readily identified through simple thresholding of any of the demographic variables.

| Prediction performance of Triple O
Triple O's performance on SAMPLE 1 is shown in Figure 1. The results showed brain outliers defined based on Triple O could identify 8 out of 19 4YR IQ outliers, translating to a 42.1% detection rate ( Figure 2a).
Specifically, 2 out of 7 high IQ outliers from the CONTROL group (i.e., full-term, singleton birth, no maternal mental disorder diagnosis), 3 out of 6 high IQ outliers from the RISK group (i.e., 1 preterm, 1 twin birth, 1 preterm + twin birth, Table S1), and 3 out 6 low IQ outliers from the RISK group (i.e., 1 with maternal mental disorder diagnosis, 2 preterm + twin birth, Table S1) were correctly identified based on Triple O (Figure 2a, Table S1). There were also 6 false positives and 11 false negatives, translating to an overall sensitivity of 42.1%, specific-

| Prediction performance of Triple O+
The prediction performance of the three demographic variables showing significant positive correlations with 4YR IQ outcomes (i.e.,

F I G U R E 2 Performances of Triple O and Triple O+ on predicting 4YR IQ outliers based on neonatal functional connectivity outliers.
(a) Triple O performance. The X-axis represents individual subjects, while the Y-axis indicates Z-scores of either 4YR IQ performance (blue line) or the three neonatal functional connectivity outlier measures (i.e., Triple O, gray lines). The idea is to examine whether brain outliers (defined as one-way t test of p < .05 (the gray horizontal line) for at least 2 out of three brain outlier measures) correspond to 4YR IQ outliers as defined in Figure 1a. As shown in (a), there is a general correspondence between gray line peaks and blue line peaks (either high or low peaks). The solid black dotes indicate true positives (i.e., 4YR IQ outliers that were correctly identified as brain outliers based on Triple O), the crosses indicate false positives (i.e., subjects detected as brain outliers based on Triple O but not 4YR IQ outliers), and the empty circles indicate false negatives (i.e., 4YR IQ outliers that were not detected as brain outliers based on Triple O). The belongings of each true positive to either the CONTROL or RISK group were also noted. (b) Triple O+ performance. Similar curve plots for the three brain outlier measures (gray lines) and their 4YR IQ (blue line) but the detected brain outliers were given either a positive (red) or negative (green) sign depending on whether the individual's Cumulative Demographic Risk Index (CDRI) was above or below the average of the reference group (i.e., incorporating Triple O with CDRI to form Triple O+). Also included in this plot were the individual functional connectivity matrices for the identified true positives, as well as the mean matrices for the true positive (high IQ), true positive (low IQ), and the true negative groups GA at birth, birthweight, and maternal education in years) based on the same Triple O procedure is shown in Figure S1. The results (1 true positive, 8 false positives, and 18 false negatives, Figure S1) are not above chance (p = .537) and much worse than the brain-based "Triple O." When we further incorporated the signs of the standardized CDRI scores (i.e., + for above-average and −for below-average) into Triple O, the 14 detected brain outliers split into 8 with a positive sign and 6 with a negative sign ( Figure 2b). As expected, all 5 true high 4YR IQ outliers were among the positive sign ones and all 3 true low 4YR IQ outliers were among the 6 negative sign ones. Therefore, Triple O+ was able to separate the detected brain outliers into high IQ/low IQ categories with more specific predictions.
In practice, without information from Triple O+ (Table S2a), there would be 12.50%/0% chance for a CONTROL baby to develop high/low outlying 4YR IQ, while there was 5.04%/5.04% chance for F I G U R E 3 Validation of Triple O+ performances using seven testing reference samples from SAMPLE 2 for prediction of 4YR IQ from SAMPLE 1: including the SAMPLE 2 as a whole, the male/female subsamples, the Scanner 1/2 subsamples, and two random subsamples of 100 subjects. The curve plots are similar to those in Figure 2b a RISK baby to develop high/low outlying 4YR IQ. However, with Triple O+ (Table S2b)

| Robustness of Triple O+ prediction
The prediction performances from all seven testing samples (i.e., the whole SAMPLE 2, only males (N = 108), only females (N = 110), data from one of the two scanners (N = 181/37 for Scanner 1/2), and two randomly selected 100-subject subsamples) are summarized in Table S3. It is striking that 6 out of 7 testing samples produced highly significant (p < .001) and consistent prediction performances with those obtained using SAMPLE 1 as its own ref-

F I G U R E 4
The effects of sample size of the reference group for Triple O prediction performance of 4YR IQ scores. (a) A random resampling (1,000 times at each sample size step) of 10-216 (step size 1) subjects from SAMPLE 2 was done at each step to form the respective reference sample, and the corresponding Triple O performance on predicting 4YR IQ scores in SAMPLE 1 was calculated and shown. (b) The means and standard deviations of the three brain outlier measures for the corresponding reference sample at each step size were calculated and shown. Blue curves represent the mean, while the gray areas represent the standard deviation across the 1,000 random samplings at each sample size step for curves in both (a) and (b). Red line corresponds the step of 50

| The practical limit underlying Triple O+ prediction
Results from the testing analyses seemed to suggest that variabilities related to sex and scanner were not significant contributors to prediction performance since the male-alone, female-alone, and Scanner 1 subsamples were all able to produce highly consistent prediction results in the mixed testing sample of SAMPLE 1. Indeed, the two randomly selected 100 subsamples also produced equivalent results. However, when the sample size dropped to 37 with the Scanner 2 subsample, performance dramatically degraded. These observations suggested that the sample size of the reference group may be a limiting factor for robust performance. To test this hypothesis and reveal the limit, we randomly selected 10-216 subjects 1,000 times from SAMPLE 2 as the reference sample and tested their prediction performances on SAMPLE 1. As expected, the sensitivity, specificity, and accuracy quickly rose with increasing sample size till around 50, after which all three measures reached plateaus ( Figure 4a), suggesting that a sample size of ~50 is needed for a robust prediction based on Triple O. Based on the same resampling, we further calculated the mean and standard deviations of the three brain outlier measures at different sample size steps. It is again apparent that the first two brain outlier measures (i.e., the number of connection-level outliers and the Euclidean distance of the overall matrix with the reference group) stabilized at around sample size 50 (the third one (i.e., within-subject standard deviation) showed little changes across the spectrum), providing the underlying functional connectivity stability basis for the observed prediction performance curves.

| D ISCUSS I ON
In this study, we showed that a novel Triple O framework could abstract the neonatal whole-brain functional connectivity pattern to three Z-scores measuring the degrees of being "outliers" against a reference population. The resulting brain-based "outlier" indicator could correctly predict 42.1% of 4YR IQ outliers with high specificity (96.2%) and accuracy (90.3%) in 175 subjects with mixed demographic makeup (SAMPLE 1). After combining demographic information, Triple O+ could further differentiate prediction between high and low IQ outliers making it more informative for translational applications. With no need for training, the proposed Triple O+ approach demonstrates high levels of robustness and generalizability underscored by consistent results obtained using a range of independent datasets as the reference sample (i.e., SAMPLE 2 and its five subsamples). Our results further showed a lower limit of 50 on the reference sample size for successful Triple O+ performance.

| The importance of early prediction and Triple O/Triple O+ performances
The importance of early identification of risks for adverse developmental outcomes is well agreed upon in the field as the developmental origins of various mental disorders have been increasingly recognized (Monk et al., 2019;Silveira et al., 2007;Swanson & Wadhwa, 2008). The most direct benefits of early prediction include the possibility for early intervention given the widely reported better outcomes associated with earlier initiation of intervention strategies (Guralnick, 2011). In neuroimaging-based predictions, the neonatal stage may represent the earliest possible timepoint given technical/ practical difficulties associated with prenatal MRI imaging (Gao, Lin, Grewen, & Gilmore, 2017), although there are encouraging development in this front in recent years (Thomason et al., 2013). Therefore, the derivation of neonatal brain image-based prediction schemes, as explored in this study, represents an urgent and important direction that aligns well with our ultimate goal of helping all at-risk children to reach their greatest potential.
However, one has to recognize that in addition to the neonatal brain developmental status, there are likely a range of other postnatal factors that also contribute to later developmental outcomes (e.g., the 4YR IQ in this study). These may include later structural and functional brain growth, family environment/enrichment activity, parenting, nutrition, adverse life events, among others. Therefore, in the context of this study, one has to be mindful that the neonatal brain may only contain part of the predictive information for 4-year IQ outcomes, while later brain development and other postnatal environmental factors likely contribute to the rest. As a result, we may not expect that neonatal brain-based predictions, such as Triple O, to be able to predict all 4YR IQ outliers and this may help explain the 42.1% sensitivity observed in this study. Taking together these considerations, it is actually striking that a simple abstraction scheme such as Triple O, which was based purely on functional connectivity measures at the neonatal stage, could identify >40% of IQ outliers 4 years down the road. This observation reaffirms the developmental cascading hypothesis stating that early brain deviations could cascade and potentially lead to far-reaching behavioral consequences. We do want to stress that combined with the very high level of specificity (i.e., 96.1%), each and every child identified as a potential "Low IQ outliers" by Triple O likely represents a meaningful risk. If independently validated, Triple O/Triple O+ and their future extensions could significantly aid in clinical decision making, particularly among those infants at higher risk for adverse developmental outcomes (e.g., infants with prenatal drug exposure (Grewen et al., 2020;Salzwedel, Grewen, Goldman, & Gao, 2016;Salzwedel et al., 2015), maternal mental health problems (Qiu et al., 2015), poverty (Kim et al., 2013), and maternal obesity (Salzwedel et al., 2018), among others). Note it is in the RISK group that Triple O actually has a higher detection rate (i.e., 50% in our RISK group compared to 28.6% in our CONTROL group), making its application in at-risk infant population more justified. Through the earliest possible identification of risks based on Triple O, early intervention becomes possible in this population to potentially rectify early abnormal growth for better outcomes.
Besides postnatal factors beyond the neonatal brain features that may have contributed to 4YR IQ performance, there could be other factors that, if successfully addressed, may help improve the current prediction. These may include (a) other kinds of predictive brain outliers that are not covered by Triple O; and (b) other nonbrain outlier-based mechanisms/features. Therefore, future studies are needed to explore these possibilities for better prediction.
Besides false negatives, the false positives may arise if these subjects possess both "good" outliers and "bad" outliers at the individual connection level whose effects on IQ might cancel each other, thus resulting in "middle" performances. Indeed, when examining the individual and mean group matrices for the high IQ and low IQ outliers, they showed clearly different patterns (Figure 2b). However, the limited sample size from each category prevented us from deriving connectivity signatures for "good" and "bad" outliers. This limitation points to the need for future studies with larger sample sizes to further characterize different types of connection-level outliers for more accurate prediction.

| The rationales behind Triple O/Triple O+
The proposed Triple O approach was inspired by the developmental cascading hypothesis (Masten & Cicchetti, 2010;Monk et al., 2019;Silveira et al., 2007) stating that early changes in the brain could cascade and evolve with age leading to abnormal developmental outcomes. Driven by these hypotheses and our previous empirical results showing significant associations between early functional brain connectivity measures and later behavioral outcomes (Alcauter, Lin, Smith, Short, et al., 2014;Salzwedel et al., 2019), we hypothesize that neonates identified as "brain outliers" would also more likely manifest as "behavioral outliers" four years down the road. This hypothesis serves as the foundation for the Triple O approach. Indeed, the detection of the three low IQ outliers all of whom were in the RISK group, including one with maternal mental disorder diagnosis and two with premature and twin birth status, echoes with this hypothesis and suggests that the genetical factors underlying the disorder diagnosis/potential maternal distress associated with it and premature/twin birth may have contributed to the detected outlying brain connectivity patterns and the subsequent outlying low IQ performance at 4 years of age (Monk et al., 2019). A detailed examination of all eight true positives against the 10 demographic variables failed to show clear signs of extremes that could potentially explain their outlying neonatal brain and 4YR IQ patterns ( Figure S2).
Future studies with genetic modeling and more comprehensive environmental monitoring are needed to more rigorously examine the potential underlying factors leading to the detected "brain outliers" at the neonatal stage and the later outlying (both high and low) IQ performances.
Empirically, the choices of abstraction measures are almost unlimited, but we chose the three global abstraction measures given the multifaceted nature of IQ performance that covers a multitude of different functional domains (e.g., fluid reasoning, verbal and nonverbal knowledge, quantitative skills, visual-spatial processing, working memory) (Roid, 2003). These functions likely require coordinated functioning of a complex set of distributed functional networks throughout the whole brain (Goriounova & Mansvelder, 2019).
Note these three measures were chosen in this proof-of-concept study to demonstrate the link between brain outliers and IQ outliers. Future expansions/revisions of the proposed approach, such as including more sophisticated graph theoretical measures (Rubinov & Sporns, 2010), are possible and deserve further exploration. As previously mentioned, another important rationale to use brain outlier measures for prediction relates to its insensitivity to signs of brainbehavioral relationships that are either homogeneous Alcauter, Lin, Smith, Short, et al., 2014;Salzwedel et al., 2019) or heterogeneous (Chen et al., 2020) within the examined population. While prediction schemes directly using functional connectivity strength measures would inevitably be affected/complicated by the different signs of brain-behavior associations between connections and/or the heterogeneity across different subgroups of subjects, the proposed Triple O approach captures the absolute deviations of different functional connectivity strength measures against the population mean regardless of the relative signs and subgroup differences. This sign insensitivity may partly contribute to the observed high level of generalizability to different reference samples as discussed below.
The observed significant quantitative correlations between a range of demographic/participant variables, including parental education, birth outcomes, gestational age, family income, maternal age, and 4YR IQ outcomes (Figure 1b), are consistent with previous findings (Bacharach & Baumeister, 1998;Elgen, Sommerfelt, & Ellertsen, 2003;Eriksen et al., 2013;Tong, Baghurst, Vimpani, & McMichael, 2007). Leveraging these quantitative associations, we derived a Cumulative Demographic Risk Index (CDRI) and predicted that the detected brain "outliers" with above-average CDRI would correspond to high 4YR IQ outliers, while those with below-average CDRI would correspond to low 4YR IQ outliers. Our results confirmed this prediction, and the addition of the CDRI indicator enabled the resulting Triple O+ approach to distinguish between high and low IQ outliers (Figure 2b). Therefore, by com- since none of them develop as low IQ outliers in SAMPLE 1, the main benefit is the knowledge that the detected "brain outliers" have a higher chance of developing as high IQ outliers (Figure 2b).
This information, although not as clinically critical as predictions of low IQ outliers, could still be helpful and beneficial to identify potential gifted children.

| The high level of robustness and generalizability of Triple O+
The Triple O+ approach proposed in this study comes from a hypothesis-driven understanding of the infant brain and brain-behavioral relationships specifically designed to promote robustness and generalizability of prediction. In particular, the three most salient features of Triple O lie in its high level of abstraction, insensitivity to signs of brain-behavioral associations either homogenous or heterogeneous within the examined population, and no need for training with behavioral outcome data. All three features promote robustness and generalizability and could potentially avoid the commonly encountered "overfitting" problem in training-based machine learning approaches. Indeed, highly consistent prediction performances were observed when using the independent SAMPLE 2 data as the reference sample. Importantly, even when using the male-only or female-only subsamples within SAMPLE 2 as reference, highly consistent predictions were observed for SAMPLE 1 with mixed sexes, suggesting that the proposed scheme is not sex-dependent. Moreover, similar prediction performances could be achieved even when using data from one scanner to predict data from two scanners. These observations are encouraging and support future cross-institutional applications of Triple O. To add to its practical applications, a low limit of 50 in sample size was demonstrated to achieve similar performances as shown in Figure 4. Taken together, the demonstrated robustness and a relatively minimal requirement of sample size (i.e., ~50) support Triple O as a promising and practical way of identifying newborns at risk for adverse IQ outcomes.

| Limitations
Several additional limitations of this study are worth discussing. The first one reiterates the undetected 4YR IQ outliers. Other types of brain outliers, non-outlier-based brain mechanisms, and postnatal factors (e.g., brain development, family environment, adverse life events, education, and nutrition) could all underlie this observation, and future studies are needed to explore/validate these possibilities.
In particular, future studies further incorporating postnatal environmental factors may prove particularly effective for better prediction but this inclusion may inevitably delay the timing of prediction and considerations have to be given to balance between higher sensitivity and earlier detection. Related, in the current study, the demographic information was reduced to a sign indicator and incorporated in our Triple O+ to only help identify the sign of IQ outliers (i.e., high and low IQ outliers) but future efforts are needed to further explore best ways to incorporate quantitative demographic information for potential improvement in prediction sensitivity/accuracy. The second limitation relates to the relatively small number of IQ outli-

| CON CLUS IONS
In conclusion, we propose a Triple O+ approach to use brain-based outlier indicators in neonates, enriched by demographic information, to predict high and low outlying IQ performers at 4 years of age.
Our results revealed an 42.1% identification rate of 4YR IQ outliers (i.e., 8 out of 19) among a mixed cohort of 175 newborns with differential term, twin, and maternal disorder statuses. Together with a high specificity of 96.2%, Triple O reached an overall accuracy of 90.3% in identifying 4YR IQ outliers. High levels of robustness and generalizability were also observed when using independent datasets as the reference samples. Featured by no need for training, a "small-data" requirement (lower limit of 50), straightforward interpretations, and high levels of robustness and generalizability, Triple O+ may have the potential for translational applications as a novel way for brain-based identification of newborns at risk for adverse IQ outcomes years down the road. However, one should also be mindful of the ~60% 4YR IQ outliers that were not detected using Triple O based on the neonatal functional connectivity data. Other types of neonatal brain outliers, other-than-outlier-based mechanisms, postnatal brain growth, or other postnatal family/environmental factors could all have contributed to these false negatives, and future efforts are needed to improve upon Triple O for a higher rate of risk identification.

ACK N OWLED G M ENT
This work was supported by the National Institutes of Health (R01DA042988, R01DA043678, R34DA050255 to WG; R01MH064065 and R01HD05300 to JHG) and Cedars-Sinai Precision Health Initiative Awards to WG. The authors declare no competing financial interests.