Extensive brain structural heterogeneity in individuals with schizophrenia and bipolar disorder

Identifying brain processes involved in the risk and development of mental disorders is a major aim. We recently reported substantial inter-individual heterogeneity in brain structural aberrations among patients with schizophrenia and bipolar disorder. Estimating the normative range of voxel-based morphometry (VBM) data among healthy individuals using a Gaussian Process Regression (GPR) enables us to map individual deviations from the healthy range in unseen datasets. Here we aim to replicate our previous results in an independent sample of patients with schizophrenia (n=166), bipolar disorder (n=135) and healthy individuals (n=687). In line with previous findings, our results revealed robust group level differences between patients and healthy controls, yet only a small proportion of patients with schizophrenia or bipolar disorder exhibited extreme negative deviations from the norm in the same brain regions. These direct replications support that group level-differences in brain structure disguise considerable individual differences in brain aberrations, with important implications for the interpretation and generalization of group-level brain imaging findings to the individual with a mental disorder.


INTRODUCTION
Recently, the degree of inter-individual heterogeneity in brain structure was found to be considerably larger than previously anticipated for both schizophrenia and bipolar disorder (Wolfers et al., 2018). As expected, based on the substantial body of literature reporting results from case-control comparisons, patients with schizophrenia or bipolar disorder show evidence of group level deviations from a normative trajectory in brain structure. However, applying normative modeling (Marquand et al., 2016) to chart variation in brain anatomy across individual patients showed highly idiosyncratic patterns of deviation, suggesting that such group effects are inaccurate reflections of the brain aberrations found at the individual level (Wolfers et al., 2018). Of note, a similar high level of heterogeneity has recently also been observed in attention-deficit/ hyperactivity disorder (Wolfers et al., 2019) and autism spectrum disorder (Zabihi et al., 2019).
Given the extant literature on reproducible group-level differences in brain structure between cases and controls (Moberget et al., 2017;Van Erp et al., 2016), our initial findings of substantial heterogeneity within disorders demonstrated that moving beyond the study of group differences is highly beneficial to understand variability within clinical cohorts and may be required to make inferences at the level of the individual. Due to these important implications, we here report an attempt to replicate our initial findings in an independent sample following an identical analytical procedure as in our previous discovery study.

RESULTS
We included 166 patients with a schizophrenia diagnosis, 135 patients with a bipolar disorder diagnosis and 687 healthy individuals (Table 1). As the discovery sample we selected 256 healthy individuals, 163 patients with schizophrenia and 190 with bipolar disorder. All participants were recruited from the same population and catchment area in Oslo, Norway but there was no overlap between the discovery and replication samples. We analyzed our data using a normative modelling approach, which can be understood as a statistical model that maps the population variation in quantitative brain readout to demographic or behavioral variables (Marquand et al., 2016). We depict the spatial representation of the voxel-wise normative model, characterized by widespread gray matter decreases from age 20 to 70, with most pronounced agedifferences in frontal areas ( Figure 1). Figure 2 shows the result from pairwise group comparisons, corrected for multiple comparisons using permutation testing. In gray matter, patients with schizophrenia showed stronger mean negative deviations than healthy individuals in frontal, temporal, and cerebellar regions; mean deviations were also more negative than in patients with bipolar disorder and were localized primarily in fontal brain regions. In a sensitivity analysis we matched the age distribution for the controls of the discovery and replication study, with highly consistent findings (Supplementary Figure 1).    Extreme negative deviations in people with schizophrenia were most pronounced in temporal, medial frontal and posterior cingulate regions ( Figure 3). In patients with bipolar disorder the overlap was strongest in the thalamus. In line with our previous findings the overlap of extreme negative ( Figure 3) and positive deviations (Supplemental Figure 2) is sparse across individuals with the same diagnosis, with peak voxels showing extreme negative overlap is 2% healthy individuals, 4.44% for individuals with bipolar disorder and 5.42% for schizophrenia.

DISCUSSION
Advanced brain imaging technology has allowed for probing the brain functional and structural correlates of complex human traits and mental disorders. While group-level normative deviations in brain structure in patients with a diagnosis of schizophrenia and bipolar disorder are robust and replicable ( Figure 2) we observe substantial inter-individual differences in the neuroanatomical distribution of extreme deviations at the individual level (Figure 3, Supplementary Figure 2). These findings replicate our previous study (Wolfers et al., 2018) and are largely in line with evidence of large heterogeneity across mental disorders (Wolfers et al., 2019;Zabihi et al., 2019).
Our results confirm that MRI-based brain structural aberrations in patients with severe mental disorders are highly heterogeneous in terms of their neuroanatomical distribution. These findings are in line with recent evidence of substantial brain structural heterogeneity in patients with schizophrenia (Alnaes et al., 2019) and also comply with accumulating evidence from psychiatric genetics strongly suggesting that mental illnesses are complex and heterogeneous disorders associated with a large number of genetic variants as well as environmental risk factors  Wolfers et al. 2018, Jama Psychiatry. We show that the overlap across studies is comparable with only a few brain regions showing overlap in more than 2% of the individuals. Interestingly, the overlap while consistent is even smaller in the replication study. Note: Extreme negative deviations are defined as Z < -2.6 at the individual level.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . (Sullivan and Geschwind, 2019). Along with documented clinical heterogeneity (Insel, 2009) and large interindividual variability in treatment response and outcome (Kapur et al., 2012), our successful replication of considerable neuroanatomical heterogeneity supports the need for statistical approaches that allow for inferences at the level of the individual. Characterizing the magnitude and distribution of brain aberrations in individual patients is key for identifying neuronal correlates of specific symptoms across diagnostic categories and would represent an important step towards increasing the utility of brain imaging in a clinical context.
While the present findings are robust, it must be considered that other data modalities beyond those provided by structural brain imaging may be more able to capture any common pathophysiological processes in patients with schizophrenia or bipolar disorder. Thus, we may observe larger overlaps across individuals with the same mental disorders in other data domains, such as those measuring brain function, cognition or specific behaviors, or relevant biological assays. While this possibility cannot be ruled out the present results indicate that multiple pathophysiological processes and pathways are at play, which is also supported by the large number of identified genetic variants (Ripke et al., 2014;Smoller et al., 2013;Stahl et al., 2019).
Over the last decades it has become increasingly apparent that replication attempts in psychology, psychiatry, neuroscience and related fields frequently fail (Avinun et al., 2018;Dinga et al., 2019;Eklund et al., 2016;Hong et al., 2019;Ioannidis, 2005; Open Science Collaboration*, 2015; Tackett et al., 2019)(Baker and Penny, 2016), which has fueled initiatives promoting reproducible science (Munafò et al., 2017;Poldrack et al., 2017;Schooler, 2014). The neuroimaging field is no exception, and lack of reproducibility in brain imaging studies have been attributed to the high researcher degree of freedom in terms of the many and sometimes arbitrary analytical choices (Eklund et al., 2016). Here we strictly adhered to the analytical protocols as specified in our original study (Wolfers et al., 2018). The entire analytical pipeline is made publicly available to ease replication by independent researchers and to allow for application to different cohorts and disorders.
While we believe that these analytical protocols are appropriate for testing the reproducibility of our original report, the approaches will be improved in future studies and are under continuous development. For instance, in the current study as well as the original study all data was collected on one MRI scanner, ruling out scanner effects as a possible source of confound. As we aim to build large population based normative models and apply those to different cohorts improving methods that deal with noise introduced through multiple sites is an important research priority (Marquand et al., 2019). Moving forward, we will scale up this work towards larger samples covering a wider age range, including neurodevelopment, incorporate different modalities and levels of information, including genetics and biomarkers, and link different experimental designs to the normative modeling framework.
With low reproducibility rates across various scientific disciplines (Baker and Penny, 2016) building confidence through replication is critical. Here, by replicating the findings in our initial report in an independent sample we show that group differences in brain structure between healthy controls and patients with schizophrenia or bipolar disorder mask large individual differences in brain aberrations, with important implications for the generalization of group-level brain MRI findings to the level of the individual patient.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . Table 1 summarizes the demographic and clinical information of the current sample and the sample used in the original publication (Wolfers et al., 2018). All participants were recruited as part of the Thematically Organized Psychosis (TOP) study, approved by the Regional Committee for Medical Research Ethics and the Norwegian Data Inspectorate (Doan et al., 2017). Patients were recruited from in-and out-patient clinics in the Oslo area, understood and spoke a Scandinavian language, had no history of severe head trauma, and had an IQ above 70. Patients were assessed by trained physicians or clinical psychologists. Psychiatric diagnosis was established using the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID). Healthy individuals were randomly sampled from national registries and neither they nor their relatives had a psychiatric or alcohol/substance use disorder or cannabis use during the last 3 months. Written informed consent was obtained from all participants.

Estimation of gray matter volume
In the same way as in our previous study, raw T1-weighted MRI volumes were processed using the computational analysis toolbox version 12 (CAT12; http://www.neuro.uni-jena.de/software/ ), based on statistical parametric mapping version 12 (SPM12). Images were segmented, normalized, and bias-field-corrected using VBM-SPM12 (http://www.fil.ion.ucl.ac.uk/spm, London, UK) (Ashburner andFriston, 2003, 2000), yielding images containing gray and white . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.08.20095091 doi: medRxiv preprint matter segments. Prior to the estimation of the normative models, all gray and white matter volumes were smoothed with an 8 mm FWHM Gaussian smoothing kernel and we restricted our analyses to voxels included in the gray matter mask constructed for the previous study.

Normative modeling
As in our previous article, we estimated the normative model using Gaussian Process Regression (GPR) to predict VBM based regional gray matter volumes across the brain from age and sex. To avoid overfitting of the normative models, it is crucial to estimate predictive performance out of sample. Therefore, we estimated the normative range for this model in healthy individuals under 10-fold cross-validation, and then applied one model across all healthy individuals to patients with schizophrenia and bipolar disorder. GPR yields coherent measures of predictive confidence in addition to point estimates. This is important in normative modelling as we need this uncertainty measure to quantify the deviation of each patient from the group mean at each brain locus. Thus, we are able to statistically quantify deviations from the normative model with regional specificity, by computing a Z-score for each voxel reflecting the difference between the predicted and the observed gray matter volume normalized by the uncertainty of the prediction (Marquand et al., 2016).
In line with our previous article, we thresholded the individual normative probability maps at p<.005 (i.e. |Z|>2.6) and extreme positive and extreme negative deviations from the normative model were defined based on this threshold. All extreme deviations were combined into scores representing the percentage of extreme positively and negatively deviating voxels for each participant, relative to the total number of voxels in the brain mask. We tested for associations between diagnosis and those scores using a non-parametric test corrected for multiple comparisons using the Bonferroni-Holm method (Holm, 1979). To assess the spatial extent of those extreme deviations, we created individualized maps and calculated the voxelwise overlap between individuals from the same groups. In the main text we report this overlap for healthy individuals, and patients diagnosed with schizophrenia and bipolar disorder. All analyses were performed in python3.6 (www.python.org) and scripts are available on GitHub (https://github.com/RindKind/). Also, in line with our previous article, we fed the normative probability maps into PALM (Winkler et al., 2015) to test for mean differences between groups by means of a general linear model framework and permutation-based inference.

ACKNOWLEDGEMENTS AND FUNDING
The study is supported by the Research Council of Norway (223273,249795,298646,300768,276082), the South-Eastern Norway Regional Health Authority (2014097,2015073,2016083,2019101), a Wellcome Trust Innovator award ('BRAINCHART', 215698/Z/19/Z), and the European Research Council under the European Union's Horizon 2020 research and Innovation program (ERC StG Grant 802998 and Grant 847776). This work was performed on the TSD (Tjenester for Sensitive Data) facilities, owned by the University of Oslo, operated and developed by the TSD service group at the University of Oslo, IT-Department (USIT) with resources provided by UNINETT Sigma2 -the National Infrastructure for High Performance Computing and Data Storage in Norway. TW gratefully acknowledges the Niels Stensen Fellowship for supporting this work.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020 Supplementary Figure 1: We depict the contrast between healthy individuals, bipolar disorder and schizophrenia in a subsample of the replication sample with a similar age distribution as in the discovery sample for healthy individuals (mean age 34.89 +-11.11; males = 50.86%). All contrasts are corrected for multiple comparisons. We show that our results are robust when age and sex in the healthy group more similar to the distribution in the discovery sample. You can compare the mean and standard deviations reported above to the relevant numbers in table 1.
Supplementary Figure 3: Extreme positive deviations for healthy individuals, bipolar disorder and schizophrenia. We show that the overlap across studies is comparable with only a few brain regions showing overlap in more than 2% of the individuals diagnosed with the same mental disorder. Peak voxels show extreme positive overlap in 4.66% in healthy individuals, 8.15% in individuals with bipolar disorder and 6.13% in schizophrenia. In the lower panel we depict results based on the data reported in Wolfers et al. 2018, Jama Psychiatry. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020