Adjusting for allometric scaling in ABIDE I challenges subcortical volume differences in autism spectrum disorder

Abstract Inconsistencies across studies investigating subcortical correlates of autism spectrum disorder (ASD) may stem from small sample size, sample heterogeneity, and omitting or linearly adjusting for total brain volume (TBV). To properly adjust for TBV, brain allometry—the nonlinear scaling relationship between regional volumes and TBV—was considered when examining subcortical volumetric differences between typically developing (TD) and ASD individuals. Autism Brain Imaging Data Exchange I (ABIDE I; N = 654) data was analyzed with two methodological approaches: univariate linear mixed effects models and multivariate multiple group confirmatory factor analyses. Analyses were conducted on the entire sample and in subsamples based on age, sex, and full scale intelligence quotient (FSIQ). A similar ABIDE I study was replicated and the impact of different TBV adjustments on neuroanatomical group differences was investigated. No robust subcortical allometric or volumetric group differences were observed in the entire sample across methods. Exploratory analyses suggested that allometric scaling and volume group differences may exist in certain subgroups defined by age, sex, and/or FSIQ. The type of TBV adjustment influenced some reported volumetric and scaling group differences. This study supports the absence of robust volumetric differences between ASD and TD individuals in the investigated volumes when adjusting for brain allometry, expands the literature by finding no group difference in allometric scaling, and further suggests that differing TBV adjustments contribute to the variability of reported neuroanatomical differences in ASD.

of diagnoses reported by parents on national surveys (Kogan et al., 2018). ASD is additionally 3-4 times more prevalent in boys than girls (Fombonne, 2009) and is accompanied by an intellectual disability (intelligence quotient, IQ < 70) in one third of patients, while 25% are in the borderline IQ range (from 70 to 85; Christensen et al., 2016).
Considering that neuroanatomical markers within the brain are more closely associated to symptoms of a condition, the present study investigated neuroanatomical differences in the Autism Brain Imaging Data Exchange I (ABIDE I, Di Martino et al., 2014;N = 1,112) between ASD and typically developing (TD) individuals in terms of their regional (i.e., subcortical and cortical) volumes and the scaling relationship between their regional volumes and total brain volume (TBV; sum of total gray matter (GM) and white matter (WM)).
However, reported neuroanatomical group differences in this literature are largely inconsistent and difficult to replicate (Lai, Lombardo, Chakrabarti, & Baron-Cohen, 2013;Lenroot & Yeung, 2013;Riddle et al., 2017;Zhang et al., 2018). For instance, van Rooij et al. (2017) reported that ASD subjects between 2 and 64 years old in the ENIGMA cohort (N ASD = 1,571) had smaller amygdala, putamen, pallidum, and nucleus accumbens volumes-regions involved in sociomotivational and cognitive and motor systems (Shafritz, Bregman, Ikuta, & Szeszko, 2015). Yet, Bellani, Calderoni, Muratori, and Brambilla (2013) found that ASD toddlers and young children had larger amygdala volumes in their review of the role of the amygdala in autism and Haar et al. (2016) did not report any subcortical group differences in 9.5-24.9 years old subjects in the ABIDE I (N ASD = 453). Inconsistencies in regional volumetric differences between ASD and healthy individuals are thought to stem from small sample size and heterogeneity, specifically in age (Lin, Ni, Lai, Tseng, & Gau, 2015;Riddle et al., 2017;Zhang et al., 2018), sex (Lai et al., 2017;Lai, Lombardo, Auyeung, Chakrabarti, & Baron-Cohen, 2015;Mottron et al., 2015;Schaer, Kochalka, Padmanabhan, Supekar, & Menon, 2015;Zhang et al., 2018), and intelligence quotient (IQ) (Stanfield et al., 2008;Zhang et al., 2018). To address these limitations, meta-analyses and cohorts such as the ABIDE I are used to investigate the influence of sex, age, IQ, and TBV on brain volumes in ASD. But the conclusions of these studies tend to vary. For example, a meta-analysis examining total and regional brain volume variations across ages in ASD found that the size of the amygdala decreased with age compared to controls (Stanfield et al., 2008), while a recent ABIDE I study did not replicate this effect and instead reported a smaller putamen in ASD females from 17 to 27 years old (Zhang et al., 2018). Although differences in segmentation algorithms (Katuwal et al., 2016), correction for multiple comparisons, and age range selection may contribute to these discrepancies, studies examining regional neuroanatomical differences in sex Jäncke, Mérillat, Liem, & Hänggi, 2015;Mankiw et al., 2017;Reardon et al., 2016Reardon et al., , 2018Sanchis-Segura et al., 2019) and ASD (Lefebvre et al., 2015) report that different methods of adjustment for individual differences in TBV yield varying regional volumetric group differences.
Classical methods of adjustment for TBV (e.g., proportion method [regional volume/TBV], covariate approach) can lead to over and/or underestimating volumetric group differences (Reardon et al., 2016;Sanchis-Segura et al., 2019) for two reasons. First, they omit the potential group variation in the relationship between a regional volume and TBV. Second, they assume that the relationship between TBV and each regional volume is linear when the relationship can be allometric-or nonlinear. If the relationship between TBV and a regional volume was linear, the exponent (α) of the power equation: would be equal to 1, indicating isometry. However, the exponent tends to be either hyperallometric (α > 1) or hypoallometric (α < 1) depending on the regional volume (Finlay, Darlington, & Nicastro, 2001;Mankiw et al., 2017;Reardon et al., 2016Reardon et al., , 2018. When a region has a hypoallometric coefficient, the regional volume increases less than TBV as TBV increases and when the coefficient is hyperallometric, the regional volume increases more than TBV as TBV increases (e.g., Liu, Johnson, Long, Magnotta, & Paulsen, 2014;Mankiw et al., 2017).
Adjusting for differences in TBV with allometric scaling has two major implications for neuroanatomical research in ASD. First, if the allometric coefficient (α) differs between individuals with and without ASD, the relationship between regional and total volumes may serve as an additional cerebral marker to differentiate between groups. Second, allometric scaling group differences aside, adjusting for the allometric relationship of each subcortical and cortical volume with total volume yields a more precise estimate of each regional volume, and in turn, provides a more accurate evaluation of volumetric group differences.
To this day, brain allometry in ASD has only been considered in two studies that examined corpus callosum and cerebellar differences between ASD and control individuals (Lefebvre et al., 2015;Traut et al., 2018, respectively). Thus, the primary goal of this study was to investigate allometric scaling and volumetric differences between ASD and control individuals in subcortical volumes while taking into account brain allometry. The second aim was to identify whether neuroanatomical group differences depend on sex, age and/or full scale intelligence quotient (FSIQ), variables previously reported to influence group differences in brain volumes in studies where brain allometry was omitted (Stanfield et al., 2008;Zhang et al., 2018). As the first study to investigate and adjust for allometric scaling differences in regional volumes between TD and ASD individuals, no a priori hypotheses were postulated.
Subcortical allometric and volumetric group differences were investigated in the ABIDE I, a cohort which consists of 539 individuals with ASD and 573 age and sex matched controls (Di Martino et al., 2014). A multiple group confirmatory factor analysis (MGCFA) -a multivariate statistical approach which advantageously tests for global group differences in brain allometry and considers the mutual relationship between regional brain structures (de Jong et al., 2017;Toro et al., 2009) Martino et al. (2014) reported that 94% of the 17 sites using the ADOS and/or Autism Diagnostic Interview-Revised obtained research-reliable administrations and scorings. Data was anonymized and collected by studies approved by the regional Institutional Review Boards. Further details on participant recruitment and phenotypic and imaging data analyses are provided by Di Martino et al. (2014).

| Subsamples' descriptive statistics
In addition to the analyses on the entire sample, we ran exploratory MGCFAs and LMEMs on four sufficiently powered subsamples (Mundfrom, Shaw, & Ke, 2005)  Subgroups were defined based on previous studies reporting age effects in ASD (e.g., Lin et al., 2015;Stanfield et al., 2008;Zhang et al., 2018): boys from 6 to under 12 years old (N ASD = 87, N Control = 97) and boys from 12 to under 20 years old (N ASD = 138, N Control = 141). Age did not differ between ASD and TD individuals in each group.
In light of the group differences in FSIQ and the association between FSIQ and brain volume (Maier et al., 2015;McDaniel, 2005) Finally, since the sample size was predefined, power analyses were run a posteriori on significant LMEM main effects and interactions with the simr package (Green & MacLeod, 2016; Supporting Information 2: Power Analyses).

| Analyses
Analyses performed on R (R Core Team, 2019) were preregistered on OSF (https://osf.io/wun7s), except where indicated. The data and scripts that support the findings and figures of this study are openly available in "Subcortical-Allometry-in-ASD" at http://doi.org/ 10.5281/zenodo.3592884.
Since previous research either did not examine the scaling coefficients of some of the presently investigated volumes or potential hemispheric differences (de Jong et al., 2017;Liu et al., 2014;Reardon et al., 2016), we analyzed the scaling relationship between left and right regional volumes and TBV in ASD and TD individuals separately.
Although not preregistered, we reported scaling coefficients with the 95% confidence interval and tested whether the scaling coefficients of each regional region with TBV differed from 1 with the car R package (Fox & Weisberg, 2019). Analyses were conducted with and without age, sex, and age by sex interactions to examine the extent to which these additional variables influence the scaling coefficients.
Additional analyses were also conducted without outliers, without individuals with comorbidities, and with medication use (medication vs. no medication) as a covariate to assess whether scaling coefficients were robust to these factors.
MGCFAs and LMEMs were conducted to address the study's primary goal to investigate allometric scaling and volumetric group differences and the study's secondary goal to examine whether allometric scaling and volumetric group differences depend on age, sex and or FSIQ. Briefly, a MGCFA is a multivariate approach that involves simultaneous confirmatory factor analyses (CFA) in two or more groups and tests measurement invariance across groups (i.e., that the same model of equations measures the same latent construct). In a CFA, observed variables (brain volumes) are used to measure an unobserved or latent construct (TBV). A CFA in turn corresponds to a system of equations that describes the relationship the observed variables and the latent construct they measure (TBV). MGCFAs advantageously measure group (i.e., ASD vs. Control) differences across all regional volumes simultaneously (i.e., global test) and in each regional volume (i.e., regional test), while adjusting for the mutual relationships between regional brain volumes. MGCFAs were run with the lavaan R package (Rosseel, 2012).
We additionally conducted LMEMs, which measure group differences in each regional volume separately, with the lmerTest R package (Kuznetsova, Brockhoff, & Christensen, 2017)

| Equations in the MGCFA
The observed variables estimating the latent construct (TBV) were the following 22 regional volumes ( Table 2). All brain volumes were log10 transformed in order to take into account the power relationship between each regional volume and TBV within the general linear model framework. This yielded the linear allometric scaling Equation (1) where i corresponds to the investigated regional volume, α to the exponent of the power relationship (the allometric coefficient), and group to ASD or Control: 2.2.2 | Testing for Allometric and Volumetric Group Differences: MGCFA Global and Regional Tests First, TBV differences between groups identified by regressing TBV on group in the MGCFA models were adjusted for in the configural models of each sample. Second, configural invariance-whether the same observed variables explain the same latent construct across groups-was tested by establishing a configural model with correlated residuals between regional volumes that similarly fits both groups when the intercept and slope values of the allometric equations for each regional volume differs between ASD and Controls. Good model fit was determined using commonly used fit indices: the Tucker Lewis Index (TLI), the Comparative Fit Index (CFI), and the Root Mean Square Error of Approximation (RMSEA) with a TLI and CFI > .95 and a RMSEA ≤ .06 indicating good fit (Hu & Bentler, 1999). The TLI, CFI, and RMSEA robust fit indices were used to correct for non-normality and were obtained from the maximum likelihood robust estimator from the lavaan package (Rosseel, 2012). Although we preregistered that we would additionally use the standardized root mean square residual (SRMR), the SRMR was not used since the lavaan package (Rosseel, 2012) does not provide a robust SRMR.
Third, allometric scaling group differences were identified by testing for metric invariance (equality of slopes, or α i coefficients from Equation 1) between groups. Fourth, volumetric group differences adjusted for allometric scaling were identified by testing for scalar invariance (equality of intercepts, or Intercepts from Equation 1) between groups.
Metric and scalar invariance were tested with a global test followed by a regional test in each volume if the global test was significant. In a global metric invariance test, regional volumes are simultaneously tested for allometric scaling (slope) group differences by comparing the configural model where the intercept and slope values differ between groups to a model where the slope values are constrained (the same) across groups. In a global scalar invariance test,  (Chen, Curran, Bollen, Kirby, & Paxton, 2008;Chen, 2007;Hu & Bentler, 1999), groups respectively differ in allometric scaling (slopes) and/or volumes (intercept) in one or more of the regional volumes.
Regional volumes that differ in terms of allometric scaling and/or volume between groups are then identified by conducting a regional invariance test on each volume. In a regional invariance test, a model where the parameter (e.g., intercept, slope) values are constrained across groups is compared to a model where all but one of the parameter values of a regional volume are constrained across groups. We initially preregistered the following criteria for significant group differences in parameters in regional invariance tests based on the CFA lit-

| Testing for Allometric and Volumetric Group Differences: LMEMs Regional Tests
Corresponding LMEMs were run on each regional volume in the entire sample an in the exploratory subsamples, on specific volumes that significantly differed between ASD and control participants in terms of allometry (as indicated by the regional metric invariance test) and/or volume (as indicated by the regional scalar invariance test).
TBV was calculated as the sum of gray and white matter. The same equation used in the MGCFA was entered in the LMEMs except that scanner site was also included in the Equation (2) as random intercept (slopes were not written out in Equation (2) for clarity). Log 10 Regional Volume ð Þ An interaction of log10(TBV) by group indicated a significant difference in allometric scaling between groups while a significant group effect suggested a significant volumetric group difference. Additional sensitivity analyses were run excluding outliers and individuals with comorbidities, and with medication use as a covariate to ensure that findings were robust.

| Testing the dependence of allometric and volumetric group differences on age, sex, and FSIQ effects : LMEMs
To address the study's secondary goal, we ran LMEMs in the entire sample with Equation (3) where scanner site was included as a random intercept (slopes were not written out in Equation (3) for clarity).  (2) in the exploratory LMEMs for each regional volume.
Scanner site was always included as a random effect in the LMEMs. If groups in a subsample differed in terms of age and/or FSIQ, we ran LMEMs in the subsamples with age and/or FSIQ as interactive fixed effects (i.e., if groups differed in terms of age, the group × log10 (TBV) × age interaction was included). Total GM volume was also investigated in the latter LMEMs. All possible interactions were maintained in all LMEMs and only significant main effects and interactions were reported. LMEMs revealing significant neuroanatomical group differences were also conducted without outliers, individuals with comorbidities, and with medication use as a covariate to ensure that the findings were robust.

| Testing the relationship of ASD severity with neuroanatomical group differences
We additionally ran post hoc analyses, which were not preregistered, on brain regions exhibiting neuroanatomical group differences to examine whether the total ADOS score in ASD individuals was a significant predictor of the investigated volume and the allometric scaling relationship between that volume and TBV. These LMEMs were run on ASD individuals with the same fixed and random effects as the LMEMs revealing neuroanatomical group differences, except that the group fixed effect was replaced by the total ADOS score. LMEMs were additionally conducted without outliers and individuals with comorbidities and medication status was added as a covariate. Other ASD scores available in ABIDE I were not employed due to the small number of individuals in each category (Supporting Information 3: MGCFA & LMEMs Assumptions).

| Testing the influence of TBV adjustment techniques on reported neuroanatomical differences
The additional LMEMs, which were conducted to contribute to the literature suggesting that neuroanatomical group differences vary depending on the applied TBV adjustment technique, were not preregistered. We examined the influence of four types of TBV adjustment techniques by comparing results from LMEMs (a) without TBV adjustment (e.g., Zhang et al., 2018), (b) with a linear adjustment considering TBV as a covariate (most common; Prigge et al., 2013;van Rooij et al., 2017;Zhang et al., 2018), (c) with linear adjustment while considering the interaction of TBV by Group (e.g., Lefebvre et al., 2015), and (d) with an allometric scaling adjustment by considering the interaction of log10(TBV) by Group (e.g., Lefebvre et al., 2015;Mankiw et al., 2017;Sanchis-Segura et al., 2019). In the no adjustment and linear adjustment LMEMs, all volumes were standardized raw volumes.

| Testing the influence of TBV adjustment techniques on our replication of Zhang et al.'s (2018) study
We sought to replicate the study by Zhang et al. (2018), who similarly examined the subcortical correlates of ASD with ABIDE I, to assess the reliability of their findings and examine the influence of different adjustment techniques on the findings that we successfully replicated.
Dependent variables in the LMEMs were Cortical WM Volume, Total GM Volume, the caudate, the amygdala, the hippocampus, the thalamus, the pallidum, the putamen, and the accumbens. Scanner site was always included as a random intercept and subject as a random intercept when hemisphere was included in the LMEMs. Fixed effects differed based on the type of adjustment technique, as described below. Dependent and independent variables were entered in the models as raw values except for age (linear and quadratic), which was centered (i.e., demeaned). Significant group main effects and interactions were reported and compared across LMEMs with varying adjustment techniques and p-values were not adjusted for multiple comparison as in Zhang et al.'s (2018) study.

LMEMs without TBV adjustment
Fixed effects were sex, age (quadratic or linear), hemisphere (except for Cerebral WM and Total GM volumes), and group (ASD and Controls). Two replication strategies were put into place: a "result replication" and a "methodological replication." In the "result replication," models were identified based on the significant interactions reported by Zhang et al. (2018) to compare effect sizes even if group interactions and main effects were not statistically significant in our sample.
In the "methodological replication," LMEMs were identified using Zhang et al.'s (2018) technique of maintaining main effects in the model and sequentially removing nonsignificant interactions (p > .05) from the model.

LMEMs with linear TBV adjustment
As in Zhang et al.'s (2018) analyses, TBV was added as a covariate to the LMEMs identified with the "result replication" and "methodological replication" techniques. Although the authors commented on whether results were similar after covarying for TBV, they did not provide statistics (i.e., effect sizes, p values).
Comparing LMEMs with the lack of and differing TBV adjustment techniques All brain volumes were log 10 transformed prior to scaling. LMEMs identified with the "result replication" and "methodological replication" techniques were run with the interaction of group by log 10 (TBV).

| Testing for allometry
When examining the relationship of each regional volume with TBV, we found that cerebral WM was hyperallometric (slope > 1), cortical volume was isometric (slope = 1), and most subcortical regions were hypoallometric (slope < 1). After removing outliers and including medication as a fixed effect, all subcortical regions were hypoallometric except for the right amygdala in controls which remained isometric (α = .74, CI low = 0.73, CI high = 1.02, p = .094; Tables S6-S12). The same results were found when adjusting for the interaction and effect of sex and age (Tables S10-S13).
Medication use was not significant across regional volumes for ASD and Control individuals.

| Allometric and volumetric group differences
In the MGCFA, the variance of TBV (the latent factor) was set to one to freely estimate the factor loading of the first regional volume. As a result, all ß reported from the MGCFA correspond to standardized effect sizes where the variance of regional volume and TBV are set to 1. Group differences in the MGCFA were estimated by calculating the group difference in standardized slopes and intercepts.
In the LMEMs, standardized estimates, ß, were reported by centering and scaling dependent and independent variables. Reported pvalues are not corrected for multiple comparisons in the MGCFAs and were FDR corrected for the LMEMs. Statistics were reported for the age measure (age or age 2 ) with the largest effect size estimate. Correlated residuals slightly differed across samples (Table S14)  Since factor levels were set to 1: Controls and 2: ASD in all LMEMs conducted in this study, a negative effect size in the MGCFA suggests that the slope or intercept is greater for Controls compared to ASD individuals, while a positive effect size suggests that the slope or intercept is smaller for Controls compared to ASD individuals.

| MGCFA
Global metric invariance was supported in the entire sample (Δχ2 [22] = 17.4, p = 0.7395), suggesting that there was no allometric scaling (slope) difference between ASD and TD individuals.
Scalar invariance was supported in the entire sample (Δχ2 [22] = 26.1, p = .2487), suggesting that there are no regional volumetric differences between ASD and TD individuals when adjusting for individual differences in TBV by taking into account allometric scaling. Thus, a regional metric invariance test was conducted on each regional volume of these subsamples to establish where allometric scaling discrepancies between groups lied.

| LMEMs
3.3.3 | Regionalallometric scaling group differences in boys aged 12 to under 20 years old Regional metric invariance χ 2 difference test indicated that the constrained configural model significantly differed from the constrained configural model with one freed slope, when the slope was freed for the brain stem (ß = −0.06, Δχ2 (1) = 11.7, p = 6.13 × 10 −3 ), the left amygdala (ß = 0.08, Δχ2 (1) = 11.3, p = 7.87 × 10 −4 ), and the right hippocampus (ß = 0.22, Δχ 2 (1) = 58.2, p = 2.34 × 10 −14 ). Although the robust CFI and robust RMSEA fit indices were invariant across models according to Chen's (2007)  To examine if the allometric scaling group difference reported the right hippocampus of boys from 12 to under 20 years old depended on FSIQ, we ran a LMEM on the right hippocampus a with TBV by group by FSIQ as fixed effects and scanner site as random intercept.
Again, ASD individuals had a smaller allometric scaling coefficient compared to controls before and after outlier and comorbidity removal and medication use inclusion (Table 4a,b; a posteriori Power   Analyses Table S17).
Post hoc analyses revealed that the total ADOS score did not significantly predict right hippocampal volume (ß = 10.01, SE = 0.01, p = .695) or the allometric scaling relationship (ß = 0.01, SE = 0.02, p = .695) of that volume in ASD individuals with an available total ADOS score (N = 81).
3.3.4 | Regional allometric scaling group differences in boys with an FSIQ > median (107.8) The constrained configural model with one freed slope significantly differed from the constrained configural model, when the slope was freed for the left hippocampus (ß = 0.11, Δχ 2 (1) = 9.1, p = .003), the left caudate (ß = 0.04, Δχ2 (1) = 4.84, p = .028), the left accumbens (ß = 0.21, Δχ2 (1) = 6.2, p = .013), left pallidum (ß = 0.22, Δχ2 (1) = 7.8, p = .005), and the right ventral diencephalon (ß = 0.05, Δχ2 (1) = 5.9, p = .015). Since the covariance matrix of the residuals was not positive definite in group 2, we were not able to interpret the cortical white matter freed slope model. Although the robust CFI and RMSEA fit indices were invariant across models according to Chen's (2007)   F I G U R E 1 Relationship between the right hippocampus and total brain volume across groups after outlier and comorbidity removal (N Control = 137, N ASD = 123) in boys from 12 to under 20 years old. ASD, autism spectrum disorder. 95% confidence region are given by group. Volumes were log transformed and scaled T A B L E 3 Right hippocampus LMEM results (a) and unstandardized allometric coefficients (b) for boys from 12 to 20 years old Note: ß corresponds to standardized beta for all main effects and interactions. α corresponds to the unstandardized allometric scaling coefficient of log 10 (TBV) with log 10 (left accumbens). C corresponds to controls, FDR to false discovery rate correction for multiple comparison, TBV to total brain volume and FSIQ to full scale intelligence quotient.
To examine if the allometric scaling group difference reported in the left accumbens of boys with an FSIQ > median depended on age, we ran a LMEM on the left accumbens with TBV by group by Age (linear or quadratic) as fixed effects and scanner site as random intercept (Table 6a). Again, ASD individuals had a smaller allometric scaling coefficient compared to controls before and after outlier and comorbidity removal and medication use inclusion (Table 6a,b). Linear age and age effects were similar, although the effect sizes were slightly greater in the model with quadratic age (Table S18).
Post hoc analyses revealed that the total ADOS did not significantly predict left accumbens volume (ß = −0.01, SE = 0.02, p = .770) or the allometric scaling relationship (ß = −0.02, SE = 0.02, p = .770) of that volume in ASD individuals with an available total ADOS score (N = 59). Note: ß corresponds to standardized beta for all main effects and interactions. α corresponds to the unstandardized allometric scaling coefficient of log 10 (TBV) with log 10(left accumbens). C corresponds to controls, FDR to false discovery rate correction for multiple comparison, TBV to Total Brain Volume and FSIQ to Full Scale Intelligence Quotient.

| Global volumetric group differences
F I G U R E 2 Relationship between the left accumbens and total brain volume across groups after outlier and comorbidity removal (N Control = 167, N ASD = 85) in boys with a full scale intelligence quotient < median (107.8). ASD, autism spectrum disorder. 95% confidence region are given by group. Volumes were log transformed and scaled also found for boys with an FSIQ > the median before (ß = 0.30, Note: ß corresponds to standardized beta for all main effects and interactions. α corresponds to the unstandardized allometric scaling coefficient of log 10(TBV) with log 10(left accumbens). C corresponds to controls, FDR to false discovery rate correction for multiple comparison, and TBV to total brain volume.
T A B L E 6 Left accumbens LMEM results with age (a) and unstandardized allometric coefficients (b) for boys with a full scale intelligence quotient > median (107.8)  Note: ß corresponds to standardized beta for all main effects and interactions. α corresponds to the unstandardized allometric scaling coefficient of log 10 (TBV) with log 10(left accumbens). C corresponds to controls, FDR to false discovery rate correction for multiple comparison, and TBV to total brain volume. 3.4.2 | Replication of Zhang et al. (2018) In the LMEMs without TBV adjustment, we replicated the significant interaction of group by linear age by sex in the hippocampus. We were unable to replicate the remaining group differences reported by Zhang et al. (2018); Table 8). Although Zhang et al. (2018) reported that the interaction of group by linear age by sex in the hippocampus was no longer significant when covarying for TBV (no statistics were provided), the interaction remained minimally significant in our sample (Table 8).
When comparing results from LMEMs across all brain volumes with varying TBV adjustment techniques (Table 8 and Tables S19-S27), we found that the effect size of TBV was smaller when considering allometric scaling across all volumes. Although generally consistent, there were some differences in effect size and significance across TBV adjustment techniques. For instance, the interaction of group by linear age by sex in the hippocampus previously reported in LMEMs without TBV and with linear TBV adjustment was no longer significant when adjusting for TBV with allometric scaling (Table 8).
Instead, the interaction of group by log10 (TBV) by sex was significant (ß = −0.40, SE = 0.20, p = .041, d = −0.08) when linear age was included in the model (Table S19). The interaction was no longer significant following FDR correction for multiple comparisons and was not significant when linear age was included in the model.

| DISCUSSION
The primary aim of this study was to investigate subcortical allometric scaling and volumetric differences between TD and ASD individuals from the ABIDE I, while adjusting for individual differences in TBV by taking into account brain allometry. The secondary goal of

| Allometric scaling in ABIDE I
In line with previous studies (Liu et al., 2014;Reardon et al., 2016), the right and left cortex were isometric (α = 1), cerebral white matter was hyperallometric (α > 1), and subcortical volumes in TD and ASD individuals were hypoallometric (α < 1). Yet, following outlier removal, the scaling coefficient of the right amygdala in controls were also isometric when sex and age effects were considered.
While our findings could suggest that allometry is not a characteristic of all brain regions, allometry may still be present in subcortical subregions. A recent study examining surface area scaling coefficients reported different scaling coefficients within brain regions (e.g., both, negative and positive scaling in the amygdala (Reardon et al., 2018)). Brain allometry should in turn be investigated in cortical and subcortical subregions (not examined in the present study) since allometric scaling across these regions may serve as cerebral markers of ASD.

| Absence of general group differences in TBV
TBV only differed between ASD and TD individuals in the sample of boys with an FSIQ ≤107.8 and TBV was greater for individuals with ASD compared to their control counterparts. However, this difference in TBV between groups may be artifactual considering that IQ and brain size are differently correlated between ASD subjects (r = 0.08) and controls (r = 0.31). The study that provided the ABIDE I data simulated the impact of matching patient and control subjects by FSIQ and reported that FSIQ matching can bias TBV group differences by increasing the number of patient with a large TBV (Lefebvre et al., 2015). This biasing effect of IQ matching on TBV differences may also explain why one ABIDE I study reported a subtle TBV group differences (1-2%) after controlling for IQ in the matched but not the entire cohort (Riddle et al., 2017). T A B L E 8 Replication of the significant group effects reported by Zhang et al. (2018)  Note: Statistics are reported for the underlined effects of the model with or without adjustment for total brain volume (TBV). Zhang model corresponds to Zhang et al.'s (2018) linear mixed effects models. Hemi corresponds to hemisphere and d to Cohen's d. Groups (1: controls, 2: ASD). Hemi (1: left, 2: right). P values were not corrected for multiple comparisons (*p < .05). B are unstandardized estimates.
The lack of a general TBV difference is consistent with past ABIDE I studies examining volumetric group differences (Haar et al., 2016;Riddle et al., 2017;Zhang et al., 2018). While previous studies reported neuroanatomical differences between ASD and TD individuals across stages of development (Duerden et al., 2012;Stanfield et al., 2008), no group differences in TBV were found in children and adolescent boys in the present study. Since the studies that report a greater TBV in children with ASD suggest that TBV group differences are greater in early childhood and disappear in 10 year old children (Courchesne, Campbell, & Solso, 2011;Lange et al., 2015), children in the present sample may be too old to exhibit TBV group differences (First Quartile Age = 9.3 years old).
As for adolescents, the majority of studies were either underpowered (Freitag et al., 2009;Hazlett et al., 2005) or grouped adolescent and children (Duerden et al., 2012), suggesting that their findings may be unreliable or biased by the younger children in their sample. The present study provides further evidence that enlarged TBV may not serve as a reliable biomarker of ASD after young childhood and may instead represent a bias in population norm (Raznahan et al., 2013). 4.3 | No regional group differences in the entire sample ASD and TD individuals did not differ in terms volume or allometric scaling across presently investigated cortical and subcortical volumes.
Although consistent with recent large-scale studies (Riddle et al., 2017;Zhang et al., 2018), this finding contrasts with the largest study to our knowledge (N ASD = 1, 571 and N Controls = 1, 651; van Rooij et al., 2017) examining cortical and subcortical differences in ASD. The authors linearly adjusted for TBV (covariate approach) and reported volumetric group differences in the pallidum, putamen, amygdala, and nucleus accumbens (Cohen's d = −0.08 to −0.13).
While the absence of such small volumetric group differences may stem from our smaller sample size, the covariate approach for TBV adjustment has also been shown to yield a higher rate of false positives (Liu et al., 2014;Sanchis-Segura et al., 2019), suggesting that these results should be replicated with an allometric scaling adjustment for TBV to be judged robust.
Volumetric group differences may lie in other cortical areas and WM volumes that make up the large-scale neurocognitive systems assumed to mediate ASD symptoms. Reported group differences in cortical regions (e.g., the insula; and prefrontal cortex (Duerden et al., 2012) thought to be involved in social cognition (Blakemore, 2008)) and in WM volumes (e.g., corpus callosum assumed to enable the integration of multiple sources of stimulation; Just, Cherkassky, Keller, Kana, & Minshew, 2007) must nonetheless be replicated in sufficiently powered studies (Di & Biswal, 2016;Haar et al., 2016;Lefebvre et al., 2015) that appropriately adjust for TBV (Liu et al., 2014;Sanchis-Segura et al., 2019) to be judged as robust neuroanatomical markers of ASD.
4.4 | No regional group differences depending on age, sex, and FSIQ in the entire sample When considering age and sex effects and their interactions, we did not find group differences in allometric scaling or volume. This contrasts with several cross-sectional studies and meta-analyses on the neuroanatomical variations of ASD (Duerden et al., 2012;Greimel et al., 2013;D. Yang, Beam, et al., 2016;X. Yang et al., 2016) and the ABIDE I study we aimed to replicate (Zhang et al., 2018), which reported that ASD male adolescents and adults had smaller hippocampal volumes and that ASD female adolescents and adults had a smaller right putamen compared to their control counterparts.
These discrepancies with the literature may stem from (a) limited statistical power, (b) publication bias in favor of positive results, and (c) from the lack of correction for multiple comparison across a majority of studies, which increases the risk of false positives. Consistent with our entire sample analyses, the largest-scale ASD study to date addressing these limitations did not report age by sex or age by diagnostic effects when the linear effects of age were considered (van Rooij et al., 2017). However, based on previous findings that omitting brain allometry can lead to underestimating group differences Reardon et al., 2016), we cannot rule out the presence of small age by sex or age by diagnostic effects on the investigated regional volumes since they would not be detectable with our current sample size.
Unlike the largest study to date on cerebral markers of ASD, which linearly corrected for TBV (covariate approach) and found volumetric sex differences in the thalamus, caudate, putamen, amygdala, and nucleus (van Rooij et al., 2017), no sex effects were found in our study. Although the absence of sex effects may be due to the few females (N = 106) in our sample, some significant sex effects may be false positives considering that the covariate TBV adjustment tends to overestimate volumetric sex differences (Reardon et al., 2016;Sanchis-Segura et al., 2019). In light of the numerous methodological discrepancies in the studies on the neuroanatomical group differences in ASD, more large-scale studies with an allometric scaling adjustment for TBV will be necessary to unbiasedly estimate cerebral differences in ASD across sexes.
4.5 | Exploratory regional group differences depending on age, sex, and FSIQ Based on the LMEMs in the entire sample, allometric scaling and volumetric group differences did not depend on sex, age, and/or FSIQ.
Exploratory analyses were nonetheless run on previously examined ASD subsamples (e.g., Lin et al., 2015;Maier et al., 2015) to compare our findings with previous studies and to further examine result consistency between MGCFAs and LMEMs. Exploratory MGCFAs and LMEMs revealed that allometric scaling coefficients were smaller for ASD individuals in the right hippocampus for boys aged 12 to under 20 years old and in the left accumbens for boys with an FSIQ < median.
This finding suggests that although both groups had hypoallometric scaling coefficients, indicating that these regional volumes grow at a slower rate than TBV, the regional volume increased less with TBV in ASD individuals compared to controls.
Hypoallometry (exponent < 1) in the right hippocampus and left accumbens regions of ASD boy subsamples did not covary with ASD severity, although previous studies suggest that the neuroanatomy of ASD is heterogeneous and varies with ASD severity (Bedford et al., 2020;H. Chen et al., 2019). One possibility is that the size of the present sample is not sufficient to detect a link between the allometric scaling coefficient and ASD severity. Another is that the severity of ASD may not correlate with allometry in the investigated subcortical structures.
While allometric scaling group differences were consistent across methods, LMEMs revealed a greater right hippocampal volume in boys from 12 to under 20 years old, which was not present in the

MGCFA. Discrepancies in how parameter values are estimated in
LMEMs and MGCFAs may explain inconsistencies across methods.
For instance, unlike LMEMs, the MGCFA considers all regional volumes when predicting allometric scaling and volumetric group differences and takes into account correlated residuals when estimating parameter values. Yet, in light of the absence of allometric and volumetric group differences when examining the entire sample and the exploratory nature of these results, these results must be replicated in a larger sample to be judged as robust.

| MGCFAs and LMEMs: Methodology
Although MGCFAs and LMEMs generally provided similar results, MGCFAs may not be optimal to investigate neuroanatomical differences between groups in future studies for several reasons. First, although the MGCFA can simultaneously conduct global and regional tests, the MGCFA cannot simultaneously examine FSIQ, age, and sex effects, factors thought to influence brain anatomy (Duerden et al., 2012;Mankiw et al., 2017;Reardon et al., 2016;Sacco et al., 2015;van Rooij et al., 2017;Zhang et al., 2018). The present use of the MGCFA was nonetheless appropriate considering that the primary goal was to examine neuroanatomical group differences regardless of age, sex, and FSIQ. Second, the latent construct in the MGCFA cannot be equated with log10(TBV) which is typically employed to examine allometric scaling (Finlay et al., 2001), as in LMEMs. Instead, the latent construct reflects the shared variance between the observed variables: the log-transformed regional volumes. Third, numerous correlated residuals (overlap in variance between volumes that measure something else than TBV) were included in each MGCFA to reach appropriate fit and these correlated residuals slightly differed in the entire sample and each subsample.
Since brain regions across and within hemispheres are highly interconnected, the measurement error of one volume correlates with the measurement error of another volume. However, it is unclear to what extent the correlated residuals established in the present model reflect general relationships between brain regions, and to what extent they reflect idiosyncratic properties of the present sample.
Only a comparison with another large dataset would allow one to assess how generalizable this model is. Nonetheless, we emphasize that the model fit of all MGCFAs were similar across groups and the results between LMEMs and MGCFAs were overall consistent.
Fourth, while the number of participants included in each subsample was sufficient to provide a MGCFA factor solution in agreement with the population structure from which the sample was taken (Mundfrom et al., 2005), more MGCFA simulation studies and the development of packages to estimate MGCFA power are needed to establish the number of participants required to observe a specific group difference in parameter (slope or intercept) at 80% power.
Finally, additional simulation studies are required to ensure that the current MGCFA thresholds employed in the literature reflect "real" rather than mathematical differences (Putnick & Bornstein, 2016). In the present study, Chen's (2007) cutoff values for fit indices to determine regional metric invariance between groups were too conservative to detect the small neuroanatomical group differences reported by the χ 2 difference tests and the LMEMs. One possibility is that Chen's (2007)

| Limitations
The current article is limited in its capacity to study sex, age, and FSIQ effects on allometric scaling and volumetric group differences due to the insufficient number of girls, adults aged over 20, and individuals with an FSIQ < 70 in the ABIDE I sample. Further research on these populations is necessary to better understand ASD's etiology for numerous reasons. For instance, while some females exhibit symptoms similar to males at an early age, high functioning females are thought to have more efficient coping strategies than males, specifically in the social domain (Dworzynski, Ronald, Bolton, & Happé, 2012;Lai et al., 2015Lai et al., , 2017, which mask the severity of their ASD until later in adolescence or adulthood . By examining such individuals, who vary in ASD symptomatology, future studies may shed a light on the neuroanatomical markers related to specific ASD traits. In light of the cognitive changes associated with age-related brain volume alterations in the adult population (Scahill et al., 2003;Takao, Hayashi, & Ohtomo, 2012;Vinke et al., 2018), more adults in the young adult and older adult age ranges must be scanned and studied to accurately depict how age influences neuroanatomical differences reported in ASD. Finally, given that 1/3 of ASD individuals have an FSIQ < 70 (Christensen et al., 2016) and that they have a high within-group variability at the genomic level (Srivastava & Schwartz, 2014), neuroanatomical variations in these individuals likely depend on specific genetic components, warranting the investigation of cerebral differences with an imaging genetics approach in this population (Jack & Pelphrey, 2017

| Implications of studying allometry
Correcting for TBV with allometric scaling provides more accurate estimates of group differences in cerebral volumes and investigates whether allometric scaling could serve as a neuroanatomical marker for group differences in behavior and cognition. However, while numerous studies have proposed functional correlates for regional volume changes, the influence of allometric scaling on behavior and cognition remains unknown. For instance, while a reduced hippocampal volume has previously been linked to impaired episodic memory (Salmond et al., 2005;Williams, Goldstein, & Minshew, 2006) and a decrease in the left putamen volume to greater repetitive and stereotyped behavior in ASD (Cheung et al., 2010;Estes et al., 2011), atypical allometric scaling relationships may or may not translate to such cognitive and behavioral symptoms. Yet, prior to linking cerebral markers to variations in cognition and behavior, robust neuroanatomical markers that consider additional factors thought to influence cerebral diversity in the TD (e.g., sex, age) and in the ASD (e.g., minimally verbal subtype, IQ) population must be established.
There are numerous efforts aimed at identifying cerebral markers of ASD with brain imaging techniques for diagnosis purposes (e.g., Alvarez-Jimenez, Múnera-Garzón, Zuluaga, Velasco, & Romero, 2020;Kong et al., 2019;Nielsen et al., 2013). However, our study along with the increasing literature reporting the absence of (Haar et al., 2016;Lefebvre et al., 2015) or very subtle (van Rooij et al., 2017) volumetric group differences, suggest that previous group differences in subcortical volumes are potentially false positives or that individual regions may not constitute useful cerebral markers to employ for the diagnosis of ASD. This is consistent with the emerging literature that focuses on training classification algorithms with numerous brain regions and various methods, such as resting state functional MRI, to generate a more accurate diagnostic tool for ASD (Heinsfeld, Franco, Craddock, Buchweitz, & Meneguzzi, 2018;Plitt, Barnes, & Martin, 2015). Thus, from a clinical standpoint, our findings further support that the cerebral markers of ASD, which could be used for diagnosis, should not be restricted to a specific region in the brain.
Once robust cerebral markers that covary with cognitive abilities and disease severity are identified, mediation models can be conducted by future studies to uncover the diverse causal links of ASD that integrate genetic, environmental, cognitive, and behavioral information (Lai et al., 2013). These advances may enable the creation of more accurate ASD subgroups, offer more accurate diagnostic criteria, which are increasingly being used to automate diagnosis (H. Chen et al., 2019;Nielsen et al., 2013), as well as facilitate person-centered treatment by providing insights on ASD's complex etiology.

| CONCLUSION
The primary goal of this study was to identify allometric scaling and volumetric differences between TD and ASD individuals when taking into account brain allometry. The second goal was to examine whether cerebral group differences depended on age, sex, and/or FSIQ. We analyzed data from ABIDE I using a common univariate approach, LMEMs, and a multivariate approach part of structural equation modeling, MGCFA.
No robust allometric and volumetric group differences were observed in the entire sample, although exploratory analyses on subsamples based on age, sex, and FSIQ suggested that allometric scaling and volume may depend on age, sex, and/or FSIQ. While the LMEMs and the MGCFA were generally consistent, we propose that LMEMs may be more efficient to examine neuroanatomical group differences in light of the encountered methodological MGCFA constraints (e.g., no interaction effects, correlated residuals inclusion). Additional LMEM analyses with different TBV adjustment techniques revealed that the effect sizes and significance of cerebral differences between TD and ASD individuals differed across TBV adjustment techniques.
In addition to being the first study to examine allometric scaling and volumetric differences between ASD and TD individuals in the presently investigated volumes, the study adds to the literature by offering reference scaling coefficients for future studies in both ASD and TD individuals and by comparing two statistical methods: the MGCFA and LMEMs. Finally, in its difficulty to replicate a recent similar study, the article contributes to the literature on the replication crisis and, through its comparison of TBV adjustment techniques, supports the consideration of brain allometry to reduce reporting biased estimates of neuroanatomical group differences.

ACKNOWLEDGMENTS
This work received support under the program "Investissements d'Avenir" launched by the French Government and implemented by ANR with the references ANR-17-EURE-0017 and ANR-10-IDEX-0001-02 PSL.

CONFLICT OF INTERESTS
On behalf of all authors, the corresponding author states that there is no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in "Subcortical-Allometry-in-Autism" at http://doi.org/10.5281/zenodo.