Multivariate models of brain volume for identification of children and adolescents with fetal alcohol spectrum disorder

Abstract Magnetic resonance imaging (MRI) studies of fetal alcohol spectrum disorder (FASD) have shown reductions of brain volume associated with prenatal exposure to alcohol. Previous studies consider regional brain volumes independently but ignore potential relationships across numerous structures. This study aims to (a) identify a multivariate model based on regional brain volume that discriminates children/adolescents with FASD versus healthy controls, and (b) determine if FASD classification performance can be increased by building classification models separately for each sex. Three‐dimensional T1‐weighted MRI from two independent childhood/adolescent datasets were used for training (79 FASD, aged 5.7–18.9 years, 35 males; 81 controls, aged 5.8–18.5 years, 32 males) and testing (67 FASD, aged 6.0–19.6 years, 38 males; 74 controls, aged 5.2–19.5 years, 42 males) a classification model. Using FreeSurfer, 87 regional brain volumes were extracted for each subject and were used as input into a support vector machine generating a classification model from the training data. The model performed moderately well on the test data with accuracy 77%, sensitivity 64%, and specificity 88%. Regions that contributed heavily to prediction in this model included temporal lobe and subcortical gray matter. Further investigation of two separate models for males and females showed slightly decreased accuracy compared to the model including all subjects (male accuracy 70%; female accuracy 67%), but had different regional contributions suggesting sex differences. This work demonstrates the potential of multivariate analysis of brain volumes for discriminating children/adolescents with FASD and provides indication of the most affected regions.


| INTRODUCTION
A diagnosis of fetal alcohol spectrum disorder (FASD) relies on the identification of physical, cognitive, and behavioral impairments related to prenatal alcohol exposure (PAE; Popova et al., 2016). Quantitative structural magnetic resonance imaging (MRI) studies have consistently reported reductions of total brain, white matter, and gray matter volumes in individuals with prenatal exposure to alcohol who are often diagnosed with FASD (for reviews, see Donald et al., 2015;Lebel, Roussotte, & Sowell, 2011). Some structures may be disproportionately affected in FASD with larger proportional reductions in specific deep gray matter structures such as the caudate and putamen (Nardelli, Lebel, Rasmussen, Andrew, & Beaulieu, 2011;Roussotte et al., 2012). These brain volume reductions have also been reported in infants and neonates with PAE for the corpus callosum (Jacobson et al., 2017) and gray matter (Donald et al., 2016). In addition, larger volume reductions have been observed in males with FASD suggesting sex differences (Chen, Coles, Lynch, & Hu, 2012;Dudek, Skocic, Sheard, & Rovet, 2014;Treit et al., 2017). However, most of these studies analyze each brain region separately (i.e., univariate analysis) and volumes have considerable overlap between groups making them unsuitable for individual FASD diagnosis.
Machine learning classification takes multiple variables as input to build a multivariate classification model capable of separating groups based on the provided input. In short, a multivariate classification model is a mathematical equation that describes a multidimensional boundary (e.g., a plane) where data points located on opposite sides of the boundary are classified into different groups (i.e., FASD vs. control). Machine learning classification of neuroimaging features has shown promise to discriminate individuals with brain disorders from healthy controls (Arbabshirani, Plis, Sui, & Calhoun, 2017). These techniques have been applied in pediatric populations to identify neurodevelopment disorders such as attention deficit hyperactivity disorder (ADHD) and autism (Levman & Takahashi, 2015). Multivariate classification studies with neuroimaging data typically rely on a large number of samples to achieve stable models (Nieuwenhuis et al., 2012) and to date ADHD classification studies have been performed most often on the same cohort of children and adolescents collected as part of the ADHD-200 consortium (Milham, Fair, Mennes, & Mostofsky, 2012). Classification models on the ADHD-200 data have achieved accuracies ranging from 55% using structural brain features (Colby et al., 2012) to 81% using resting-state functional connectivity features (Fair et al., 2013) in classifying children/adolescents with ADHD. Similar accuracies have been achieved in studies of large cohorts (>100 participants) of children/adolescents with autism reporting classification accuracies of between 70% using a combination of regional brain volume and functional connectivity features (Zhou, Yu, & Duong, 2014) to 91% using functional connectivity features alone (Chen et al., 2015). To our knowledge, only one study focusing on eye tracking and psychometric data has attempted FASD classification using neuroimaging-based features. This study extracted features from diffusion MRI of the corpus callosum, and achieved an accuracy of 65-70% in classifying children/adolescents with FASD (41 individuals with FASD, 35 controls) (Zhang et al., 2019) that was a subset of the larger cohort used in the current study. However, to date, no study has investigated the utility of multivariate classification models using regional brain volumes (notably the most consistent finding across FASD MRI studies) in FASD. Additionally, classification studies of neurodevelopmental disorders typically use a linear regression to reduce sex-related variation of input features; however, in cases where there are group by sex interactions (e.g., those observed in FASD) this would be suboptimal.
This study had two key aims to: (a) identify a multivariate model based on regional brain volume capable of discriminating children/ adolescents with FASD and (b) determine if FASD classification performance can be increased by building classification models separately for each sex given the known volume differences between males and females as a group (Cahill, 2006;Cosgrove, Mazure, & Staley, 2007). The brain volume model was developed and then tested on independent FASD/un-exposed control cohorts from two studies-a four-site pan-Canadian "NeuroDevNet" cohort (79 FASD, 81 controls) and a local single-site "Canadian Institutes of Health Research (CIHR)" cohort (67 FASD, 74 controls).
2 | METHODS 2.1 | FASD/typically developing subjects training and testing datasets Two previously collected independent MRI datasets were used to generate and validate a predictive model. The training data were collected at four different sites across Canada as part of the Neu-roDevNet project on FASD (Reynolds et al., 2011) and was selected as the training dataset so that outputted models were generalizable to different centers or scanners. One hundred and eighty-one childhood/adolescent healthy and FASD participants underwent brain MRI at four sites, but 21 subjects (11 FASD, 10 controls) were excluded for poor structural imaging quality. The remaining 160 subjects included 79 children with FASD (12.7 ± 3.2 years, 35 males) and 81 healthy unexposed controls (11.9 ± 3.4 years, 32 males). Group analysis of brain volumes has been reported elsewhere for the healthy controls and FASD groups in this cohort . FASD participants were recruited from six clinics across Canada and had an alcohol-related disorder in accordance with the Canadian Guidelines for diagnosis of FASD (Chudley et al., 2005) or had confirmed PAE.
The FASD participants in the training data included seven fetal alcohol syndrome (FAS), 13 partial FAS (pFAS), 38 alcohol-related neurodevelopmental disorder (ARND), and 21 confirmed PAE. In this study, subtypes were combined into two diagnostic groups, either 20 FASD with sentinel facial features (FAS or pFAS) or 38 FASD without sentinel facial features (ARND) in-line with updated diagnostic guidelines (Cook et al., 2016). PAE subjects remained in a single group as the diagnostic guidelines characterize this group as "at risk of neurodevelopmental disorder and FASD." All FASD subtypes were labeled as a single group for machine learning classification.
The testing data for model validation was collected under a CIHR project on brain development. Participants with brain MRI included 67 participants with FASD (12.1 ± 3.3 years, 38 males) and 74 controls (11.5 ± 3.5 years, 42 males). Notably, 57 FASD and 66 control participants were included in our previous study on volumes/DTI/cortical thickness . The other 10 FASD participants were included in a much earlier diffusion MRI study (Lebel et al., 2008), and were the participants that did not overlap the FASD participants from ). An additional eight controls were randomly selected males from a typical development cohort (Narvacan, Treit, Camicioli, Martin, & Beaulieu, 2017) and were added to provide a similar ratio of males and females in the control and FASD groups. All three studies combined for the test data used the same threedimensional (3D) MPRAGE protocol on the same scanner at the University of Alberta. Participants from the FASD group were recruited primarily through an FASD diagnostic clinic at the Glenrose Rehabilitation Hospital in Edmonton, AB, and were diagnosed based on Canadian guidelines (Chudley et al., 2005) and the 4-digit diagnostic code (Astley, 2004). The FASD participants in the testing data included 10 FAS, four pFAS, two ARND, one fetal alcohol effect (FAE), seven neurobehavioral disorder alcohol exposed (NBD:AE), nine static AE, or FASD) consistent with updated diagnostic guidelines (Cook et al., 2016). All FASD subtypes were labeled as a single group for the testing of the machine learning classification model. Further demographic information for training and testing datasets was collected via questionnaire including ethnicity and current medication and is summarized for the training and testing cohorts in Tables 1 and 2, respectively.   and testing data . Only behavioral tests that were conducted in the majority of participants in both the training/testing cohorts were included for analysis in the current study: the Woodcock Johnson III Tests of Achievement evaluated mathematic and quantitative reasoning skills (Woodcock, McGrew, & Mather, 2001) and the Woodcock Reading Mastery Tests-Revised (WRMT-R) provided a comprehensive assessment of reading ability (Woodcock, 1998). Results for behavioral tests for the participants/ cognitive tests in the current study are presented for both the training and testing groups in Tables 1 and 2, respectively.

| Image acquisition
The training "NeuroDevNet" MRI data were acquired at four MR time 4:29 min. Other images were also acquired over 25 min included T2-weighted, fluid-attenuated inversion recovery, resting-state functional (for NeuroDevNet), and DTI; however, none of these are the focus of the current report on brain volumes.

| Automated brain segmentation
In this study, only regional brain volumes rather than other imaging metrics were used as predictors for classification because reductions in regional brain volumes have been the most commonly reported differences in FASD populations relative to controls (Donald et al., 2015;Lebel et al., 2011). Regional brain volumes were extracted from T1-weighted structural images using the automated segmentation pipeline FreeSurfer version 5.3 (Fischl, 2012). Volumetric loss relating to FASD has been observed in numerous brain regions (Donald et al., 2015) with some regions being consistently reported including: regions of subcortical gray matter, total white matter, corpus callosum, and regions of the cortex. Hence, volumes of 87 regions were selected for classification analysis including subcortical gray matter  (Morey et al., 2010), and cerebellum/brain stem were excluded due to partial coverage in many participants. Each included volume was then standardized across training and test datasets (i.e., mean centered to zero and scaled to unit variance over entire training/testing datasets) as this is a requirement of the support vector learning algorithm used to build a classification model.

| Predictive model training
Using the brain volumes from the training data as input, a linear support vector machine (SVM) was trained to predict FASD or control using the scikit-learn machine learning toolbox version 0.18.1 (Pedregosa et al., 2011). This SVM algorithm was selected based on accurate performance in other neurological and psychiatric disease classification studies (Orrù, Pettersson-Yeo, Marquand, Sartori, & Mechelli, 2012) and a linear kernel was used to allow for the identification of highly contributing brain regions to the model. The multisite data were selected for training so that the classification model generated by the SVM was robust to between site variation of regional brain volume measurements, and would perform consistently across different sites. A single classification model was generated by fitting the SVM hyperparameter "C" based on the training data using a combination of leave-one-out cross-validation with internal 10-fold validation for parameter selection. For each internal fold, the soft margin constant "C" was selected from a list of possible values (10 −4 , 10 −3 , 10 −2 , 1, 10, 100) as the parameter with the highest average accuracy over the 10-fold internal validation. A single value of "C" for the training data was then chosen as the mode of all selected parameters from the leave-one-out folds and a single classification model was fit to the entire training data. This model was then used to predict FASD or control for each subject in the test data.

| Sensitivity of model to participant demographics
To test for sensitivity of the classifier to FASD subgroup, the number of true positives (TPs) and false negatives (FNs) was compared between the three subtypes (FASD with sentinel facial features, FASD without sentinel facial features, and confirmed PAE without official FASD diagnosis). Next, the distance from support vector decision boundary was calculated for each subject in the test data as a measure of how closely a subject matched the FASD prediction model. A positive boundary distance value indicates the subject was predicted "control," whereas a negative value indicates the subject was predicted "FASD." For comparison between models, distance values were scaled by the maximum absolute distance of the test samples.
Regional brain volumes are known to differ between males and females (Cahill, 2006;Cosgrove et al., 2007) and change throughout childhood/adolescence with regionally specific developmental trajectories (Giedd et al., 1999;Narvacan et al., 2017

| Sex specific modeling
Following these primary analyses, two approaches were taken to address sex-related differences in model performance. Approach 1: The addition of sex as a control variable in a linear regression is a common approach for addressing sex-related variation in classification studies (some examples; Fair et al., 2013;Nielsen et al., 2013). In this study, the entire modeling procedure was repeated with brain volumes adjusted for sex using a linear regression prior to model training.
Approach 2: The same modeling procedure was performed on raw

| Sex specific models
To further investigate the effect of sex on model performance, the original classification model was evaluated separately for males and females. The classification accuracies of the entire training set were similar for males (76%, p = .0005) and females (77%, p = .0005), but sensitivity was lower and specificity was greater for males (sensitivity 53%, specificity 98%) compared to females (sensitivity 79%, specificity 75%). In other words, 1/42 male controls were misclassified as FASD, whereas 8/32 female controls were misclassified as FASD. A larger difference in classification accuracy was observed in the FASD groups where 17/38 male FASD were misclassified as controls whereas only 6/29 female FASD were misclassified.
The first approach for reducing sex-related bias in sensitivity/ specificity was to fit a model based on sex adjusted volumes. This approach performed moderately well on the test data; however, sensitivity remained low for males relative to females: male accuracy 72% (p-value = .0005), sensitivity 58%, specificity 86%; female accuracy 74% (p-value = .0005), sensitivity 72%, specificity 75%. The second approach for reducing sex-related bias in FASD prediction was to create separate models for males and females. Both models performed moderately well on the test data and had similar sensitivity and specificity between males and females: male accuracy 70% (p-value = .0005), sensitivity 68%, and specificity 71%; female accuracy 67% (p-F I G U R E 2 Performance of the multivariate brain volume prediction model (solid red line) compared to models generated using each brain region volume separately (dashed lines). Both the accuracy of the models on the test data and leave-one-out cross-validation accuracy on the training data are shown. Models are listed from highest to lowest accuracy and are presented if they performed significantly greater than chance (permutation test, p < .0005) in the test cohort. The multivariate model outperformed all univariate models in both the training and testing data. Notably, 8 of these 13 regions are deep gray matter structures including bilateral caudate and bilateral thalamus value = .01), sensitivity 62%, and specificity 72%. Notably, sensitivity in the male FASD group was increased by creating separate models for males and females at the cost of decreased specificity and overall accuracy. Classification performance and classifier boundary distance separated by sex and group is presented in   (Zhang et al., 2019). Notably, the accuracy using diffusion MRI features extracted from the corpus callosum was lower than the accuracy reported from the current study using brain volumes (77%); however, the same study reported highest accuracies using features derived from other physiological/behavioral measurements (e.g., eye tracking data 76% and psychometric data 78%).
Other studies have classified FASD participants based on other modalities such as epigenetic DNA methylation features where a predictive model trained on an overlapping cohort from the training dataset in the current study achieved 83% accuracy in predicting FASD (Lussier et al., 2018). Additionally, features extracted from 3D facial laser scans achieved~80-90% accuracy identifying individuals with FAS (Fang et al., 2008), a subtype of FASD that exhibits sentinel facial features, the same subtype of FASD that had a high classification accuracy with multivariate brain volumes in the current study (11/14 FASD participants with sentinel facial features correctly classified in the test cohort). In a three-way classification task of FASD, ADHD and control participants, a 77% classification accuracy was achieved using features extracted from eye tracking data collected while participants attended to videos (Tseng et al., 2013). which were not part of this current analysis on brain volumes. Multimodal classification of FASD has been performed using features derived from psychometric and eye tracking data achieving 83% accuracy (Zhang et al., 2019), but showed no additional accuracy when including diffusion MRI; however, the sample size was limited in that study (22 controls,24 FASD).
Changes in total brain volume as well as unique regional trajectories of subcortical and cortical gray matter development during childhood/adolescence (Giedd et al., 1999;Narvacan et al., 2017) may impact classifier performance. In a supplementary analysis of classification performance (data not shown), no difference in age was observed between incorrectly/correctly classified controls or between incorrectly/correctly classified individuals with FASD in the test cohort, suggesting that classification performance was not confounded by age.

| Relating multivariate and univariate analysis of FASD regional brain volumes
In this study, the multivariate FASD classification model outperformed all univariate models that were based on separate brain region volumes by~5% in both the test and training cohorts. This result F I G U R E 4 Distance from classification boundary is presented here for each subject in the test data separated by group (control, circle/fetal alcohol spectrum disorder [FASD], diamond) and sex (male, blue/female, red) for the unadjusted multivariate model (a), the multivariate model using regional brain volumes adjusted for sex (b), and creating separate classification models for males and females (c). Positive values indicate the subject was classified to the control group while negative values indicate the subject was classified to the FASD group. The most misclassifications in the unadjusted model were male FASD participants labeled as controls (18/38 misclassified) and a notable number of female controls were incorrectly labeled FASD (8/32 misclassified). Adjusting brain volumes for sex improved imbalance in specificity between males compared to females, whereas creating separate models improved the sex-related imbalance in both specificity and sensitivity suggests that there is a pattern of volume change involving multiple brain structures that are more discriminative of children/adolescents with FASD relative to any one brain region independently. Of the univariate models with above chance accuracy, regions are consistent with previous studies reporting volume loss associated with FASD (Donald et al., 2015). In the current study, a model trained using only the left caudate had a test accuracy only 5% lower than the model generated from all brain regions together. Notably, the caudate was one of the first reported brain structures with differences associated with PAE (Mattson et al., 1996). Since then, the caudate has been reported in animal models to be one of the more vulnerable regions to ethanolinduced apoptosis (Young & Olney, 2006) which may underlie the observed volume reductions associated with prenatal exposure to alcohol in children and adolescents (Astley et al., 2009;Cortese et al., 2006;Riikonen et al., 2005). Additionally, caudate volume has also been associated with deficits in both cognitive control and verbal learning/recall in children/adolescents with FASD (Fryer et al., 2012).  Hence, a larger effect of PAE on the caudate relative to other basal ganglia structures may reflect larger deficits in FASD to higher order cognitive functions (e.g., executive function, problem solving) compared to motor functions. Additionally, a more recent study has demonstrated that shape-based features of caudate asymmetry can be combined with facial morphology features to better discriminate controls from those with FAS (Suttie et al., 2018). Taken  showing that facial dysmorphic features are related to more severe volumetric reductions in FASD (Astley et al., 2009;Roussotte et al., 2012), and may reflect the timing of ethanol exposure between 3 and 4 weeks postgestation in humans when the brain and face are early in their development (Godin et al., 2010;Godin, Dehart, Parnell, O'Leary-Moore, & Sulik, 2011). Notably, more extensive volumetric reductions in the dysmorphic FASD participants could also be related to a higher level of prenatal ethanol exposure (although this was unavailable in our study) complicating the face-brain interpretation.
Along with ADHD, the participants in this study had a wide range of comorbid diagnoses (e.g., ADHD, oppositional defiant disorder, etc.). Importantly, to be of clear clinical use, a classification model would be able to discriminate individuals with FASD from those with other neurodevelopmental disorders. Results from this study demonstrate that individuals with FASD can be discriminated from controls using regional brain volumes. However, it is unknown whether regional brain volumes or the same classification model could be used to discriminate individuals with FASD from those with other neurodevelopmental disorders.
The investigation of model weights can also aid in identifying regions that may be affected in FASD but that are not detected by univariate analysis alone. In this study, both the left and right pars triangularis of the frontal lobe heavily contributed to the model. Notably, the volume of the bilateral pars triangularis has been associated with reading disorders such as dyslexia (Eckert et al., 2003) and deficits in language have been repeatedly observed in complex language tasks in participants with FASD (Becker, Warr-Leeper, & Leeper, 1990;Mattson, Riley, Gramling, Delis, & Jones, 1998). Although the frontal lobe has shown volume loss in children/adolescents prenatally exposed to alcohol, to our knowledge pars triangularis volume has not been associated with FASD. Given that the pars triangularis regions were absent from the univariate models that performed higher than chance, this result implies that in the context of other FASD-related regional volume change a multivariate model can extract additional information about structural change that is undetectable by univariate analysis alone.

| FASD classification with sex specific models
To date, the most common approach for dealing with sex-related variation in large classification studies of neurodevelopmental disorders is to perform classification on volumes adjusted for sex resulting from a multivariable linear regression with sex added as a covariate (some examples being: Fair et al., 2013;Zhou et al., 2014). However, in neurodevelopmental disorders such as FASD where reductions in regional brain volumes appear to be larger for males relative to females (Chen et al., 2012;Dudek et al., 2014 ;Treit et al., 2017) assuming the same effect of sex on volume between controls and FASD will have the effect of reducing but not eliminating between sex bias in sensitivity/ specificity. Results from the current study demonstrate experimentally that when sex is not accounted for in FASD classification, sensitivity/ specificity can differ greatly for males (sensitivity 53%, specificity 98%) compared to females (sensitivity 79%, specificity 75%) but this disparity can be reduced at the expense of accuracy by using sex adjusted volumes (male accuracy 72%, sensitivity 58%, specificity 86%; female accuracy 74%, sensitivity 72%, specificity 75%). Furthermore, this study proposes building FASD classification models separately for males and females which further reduced the imbalance in sensitivity/specificity, albeit at a larger decrease in accuracy (male accuracy 70%, sensitivity 68%, and specificity 71%; female accuracy 67%, sensitivity 62%, and specificity 72% Several neurophysiological/neurochemical effects of PAE are reported to be greater in males relative to females, including reductions in long-term potentiation (Sickmann et al., 2014), increases in dopamine D1R binding (Converse et al., 2014), and reduced sensitivity to testosterone (Lan, Hellemans, Ellis, Viau, & Weinberg, 2009). More heavily weighted cortical regions in the female model is surprising, given that previous studies have reported no significant differences in the volume of cortical regions in females with FASD (Chen et al., 2012) and less pronounced effects of PAE on measures of cortical thickness relative to subcortical volume . It seems here that a pattern (i.e., multivariate) of cortical volume reduction may more accurately discriminate females with FASD from controls compared to PAE related volume change within individual (i.e., univariate) cortical regions. Overall, results from this study suggest that there is value in modeling FASD related regional brain volume change separately for males and females. Notably, the classification differences reported here between males/females could be confounded by sex by group imbalances in demographics. However, no such group by sex interaction effects were observed in the test cohort for any of the demographic variables listed in Table 2 (data not shown), suggesting that demographic imbalances are not driving the observed classification differences between males and females. In the training cohort, a small difference in age was observed between male control (age 11.3 ± 3.5 years) and male FASD (age 13.3 ± 2.7 years) participants, potentially impacting the weightings of the male FASD classification model. However, this male classification model heavily weighs subcortical regions whose volumes have been shown to change minimally over childhood/adolescence in both longitudinal and cross-sectional samples (Narvacan et al., 2017), suggesting that age differences are not influencing the model weightings presented in this study.

| Study limitations and future directions
There are several limitations in this study, primarily related to the imbalanced distribution in demographics/comorbidities in the training/testing FASD groups relative to controls. The samples in this study consisted of control groups primarily of Caucasian descent, whereas about half the FASD participants self-identified as indigenous potentially confounding classification results. However, in a follow-up analysis, sensitivity to FASD classification differed minimally between the ethnic categories in the testing cohort (indigenous: 63%; Caucasian 67%; other 60%; unknown 64%) suggesting that ethnicity was not influencing classification performance. ADHD is a common comorbid diagnosis within FASD populations having an estimated prevalence of >70% (Burd, Klug, Martsolf, & Kerbeshian, 2003) and was highly prevalent in the training/ testing cohorts included in the current study (training FASD: 50%; testing FASD: 49%). Additionally, in this study, a large proportion of FASD participants were on medication regimens that were highly discordant between individuals, and those participants were not asked to refrain from taking medication throughout the study. Such confounds in comorbid diagnosis and medications may impact reported cognitive scores and classification results in the FASD group. Again, a secondary analysis was conducted and showed minor differences between classification sensitivity between an ADHD-comorbid diagnosis (67% sensitivity) /no-ADHD diagnosis (62% sensitivity), as well as classification of FASD participants on different medications (stimulants 58%, atypical antipsychotics 59%, antidepressants 60%, and other medication 67%). This equally distributed sensitivity among demographic categories suggests that even though the FASD classification model was generated from imbalanced control/FASD training data, the model itself represents a discriminative pattern of brain volume difference that is associated with PAE and does not reflect differences based on ethnicity, comorbid diagnosis or medication regimen.
The training and testing FASD cohorts of the current study contained both individuals with a formal FASD diagnosis as well as those with confirmed alcohol exposure but nondiagnosed. Importantly, the classification results from the test cohort showed similar sensitivity between the FASD participants without sentinel facial features (test sensitivity 60%), and those in the PAE (nondiagnosed) group (test sensitivity 61%), suggesting that regional brain volumes were similarly affected in the diagnosed and undiagnosed individuals. In a secondary analysis excluding the PAE group (data not shown), decreased accuracy and sensitivity was observed in the test cohort (accuracy 74%, sensitivity 53%, specificity 88%) relative to when PAE (nondiagnosed) were included (accuracy 77%, sensitivity 64%, and specificity 88%) warranting the inclusion of the PAE group in the analysis.

| CONCLUSIONS
In this study, a multivariate classification model was generated for discriminating children/adolescent controls from those with FASD. The model performed better than univariate analysis in discriminating FASD from controls and had predictive contributions from regions with known volumetric reduction in FASD. Additionally, a large proportion of FASD participants in the test data had little to no overlap with controls at negative distance from boundary values, and low left caudate volume values, suggesting that these measures should be investigated as a potential indicator of FASD. Classification accuracy of models generated separately for males and females had lower accuracy than the model containing all participants, but notably these models were more balanced in sensitivity and specificity suggesting that sex should be taken into account in brain volume-based classification of FASD. Overall, this study shows the value in multivariate analysis of brain volume for the classification of FASD and identification of brain regions affected in children and adolescents prenatally exposed to alcohol.

DATA AVAILABILITY STATEMENT
Research data are not available.