Multimodal MRI assessment for first episode psychosis: A major change in the thalamus and an efficient stratification of a subgroup

Multi‐institutional brain imaging studies have emerged to resolve conflicting results among individual studies. However, adjusting multiple variables at the technical and cohort levels is challenging. Therefore, it is important to explore approaches that provide meaningful results from relatively small samples at institutional levels. We studied 87 first episode psychosis (FEP) patients and 62 healthy subjects by combining supervised integrated factor analysis (SIFA) with a novel pipeline for automated structure‐based analysis, an efficient and comprehensive method for dimensional data reduction that our group recently established. We integrated multiple MRI features (volume, DTI indices, resting state fMRI—rsfMRI) in the whole brain of each participant in an unbiased manner. The automated structure‐based analysis showed widespread DTI abnormalities in FEP and rs‐fMRI differences between FEP and healthy subjects mostly centered in thalamus. The combination of multiple modalities with SIFA was more efficient than the use of single modalities to stratify a subgroup of FEP (individuals with schizophrenia or schizoaffective disorder) that had more robust deficits from the overall FEP group. The information from multiple MRI modalities and analytical methods highlighted the thalamus as significantly abnormal in FEP. This study serves as a proof‐of‐concept for the potential of this methodology to reveal disease underpins and to stratify populations into more homogeneous sub‐groups.


| INTRODUCTION
Although neuroimaging abnormalities in patients with first episode psychosis (FEP) have been demonstrated, their quantitative distribution is still are under debate. Different patterns of atrophy in the frontal cortex, cingulate cortex, parahippocampal gyri, the basal ganglia, and the thalamus has been reported from multiple groups in the past (Buchy, Makowski, Malla, Joober, & Lepage, 2018;Calvo, Delvecchio, Altamura, Soares, & Brambilla, 2019;Castro-de-Araujo & Kanaan, 2017;Kuang et al., 2017;Makowski et al., 2019;Nakamura et al., 2007;Schubert, Clark, & Baune, 2015;Tordesillas-Gutierrez et al., 2018). The inverse trend (gray matter increase) has also been reported (Dukart et al., 2017). Data in resting state functional MRI (rs-fMRI) that compare FEP patients with healthy controls (HC) are also controversial, varying from no differences between groups to regional or diffuse differences (Alonso-Solis et al., 2012;Argyelan et al., 2015;Bang et al., 2018;Choe et al., 2018;Ganella et al., 2018;Gohel et al., 2018;Huang et al., 2020;Tang et al., 2019). Previous observations of white matter changes have not been consistent. Differences in diffusion tensor imaging (DTI) were erratically reported in various brain areas such as the anterior limb of the internal capsule, the corpus callosum, the superior longitudinal fasciculus, and the uncinate. There was no specific cluster of white matter abnormalities that were unquestionably related to FEP Di Biase et al., 2017;Kuswanto, Teh, Lee, & Sim, 2012;Lei et al., 2015;Ren et al., 2017;Serpa et al., 2017;Zhou et al., 2017). Recently, a largescale study from the ENIGMA group identified widespread white matter microstructural abnormalities in chronic schizophrenia (Kelly et al., 2017). The reproducibility of this finding in individuals with FEP is still unknown.
While these studies focused on patients with established diagnosis of schizophrenia, mostly far from the clinical onset, this assessment was not fully applied to study patients in initial stages of illness, when this characterization is likely to have greater utility. Patients with FEP were mostly assessed in with limited number of ROIs and few MRI modalities Keymer-Gausset et al., 2018;Lei et al., 2015;Peruzzo, et al., 2015;Zhao et al., 2018).
Although its strengths, the implementation of this multimodal assessment is not straightforward: simply combining larger number of variables leads to multiple comparison issues and stress limitations of the sample size (Arbabshirani, Plis, Sui, & Calhoun, 2017). Multiinstitutional brain imaging studies have recently emerged to overcome conflicting results among individual studies (Thompson et al., 2020).
However, adjusting multiple variables at the technical and cohort levels remains a continuous challenge (Levin-Schwartz, Calhoun, & Adali, 2017). Developing strategies to reduce the dimensions of data, while preserving the information is a field in current development (Bassett, Xia, & Satterthwaite, 2018;Lottman et al., 2018;Miller, Vergara, & Calhoun, 2018;Qi et al., 2019;Sui, Adali, Yu, Chen, & Calhoun, 2012;Tu et al., 2019;Xia et al., 2018). A strong basis on biological knowledge is needed to develop and implement the algorithms in a comprehensive and practical way, so the research can eventually be translated to clinical field.
In order to overcome these challenges, we analyzed multiple MRI characteristics in the FEP and HC groups through whole-brain automated segmentation in a fully data-driven integrative approach that we have recently established Rezende et al., 2019). This approach aims to reduce the dimensions of image data in a biologically meaningful way, increasing the statistical power and offering comprehensive results about the brain structure (Faria, Liang, Miller, & Mori, 2017;Miller et al., 2013;. We combined this approach with supervised integrated factor analysis (SIFA) (Li & Jung, 2017) to examine multiple MRI features (volume, DTI indices, rs-fMRI) in the whole brain of FEP participants. We also accessed whether this multimodal approach would be efficient on classification of participants in subgroups of individuals with schizophrenia and schizoaffective disorder (S-FEP) and individuals with bipolar disorder and major depressive disorder with psychotic features (M-FEP). We investigated how this novel analytical pipeline may provide evidence of pathological abnormalities in the early stage of illness and potentially aid to stratify groups of clinical relevance.  (Kamath et al., 2019;Kamath, Lasutschinkow, Ishizuka, & Sawa, 2018;Wang et al., 2019). In the present study, the participants included individuals with FEP (n = 87) [SZ (n = 47), schizoaffective disorder (n = 14), bipolar disorder with psychotic features (n = 20), major depressive disorder with psychotic features (n = 6)] and 62 HC. We included individuals with schizophrenia and schizoaffective disorder in the schizophrenia-associated psychosis group (S-FEP) and individuals with bipolar disorder with psychotic features and major depressive disorder with psychotic features in the mood-associated psychosis group (M-FEP). This decision was based on two meta-analyses (Pagel, Baldessarini, Franklin, & Baethge, 2013;Rink, Pagel, Franklin, & Baethge, 2016) that found patients with schizoaffective to have illness characteristics that align more closely with patients with schizophrenia than with those with bipolar disorder and major depression.

| Imaging
The multimodal MRI was performed on a 3 T scanner, and included T1 high-resolution-weighted images (T1-WI), diffusion weighted images (DWI), and resting state functional MRI (rs-fMRI We analyzed multiple MRI contrasts in an atlas-based, structurally focused, integrative, and non-biased whole-brain approach (Figure 1).
The images were automatically segmented and postprocessed through MRICloud (www.MRICloud.org) (Mori et al., 2016), a public web-based service for multi-contrast imaging segmentation and quantification.
In MRICloud, the process for segmenting the T1-WI, used for volumetric analysis, involves: (a) orientation and homogeneity correction, (b) twolevel brain segmentation, (c) image mapping based on a sequence of linear algorithms and Large Deformation Diffeomorphic Mapping (LDDMM), and (d) a final step of multi-atlas labeling fusion (MALF), adjusted by PICSL (Tang et al., 2013). For the DWI postprocessing, the tensor reconstruction and quality control follows the algorithm used by DtiStudio (www.MRIStudio.org). The automated DTI segmentation is similar to that used for T1-WIs and differs in the use of complementary contrasts (mean diffusivity [MD], fractional anisotropy [FA], and eigenvector such as fiber orientation) and a diffeomorphic likelihood fusion algorithm (Tang et al., 2014) for multi-atlas mapping.
For the rs-fMRI postprocessing (Faria et al., 2012), the T1-WI and its respective segmentations are co-registered to the motion and slice timing-corrected resting-state time points. Intensity and motion "outliers" are extracted with ART (SPM toolbox). Time courses are extracted from all the cortical and subcortical gray matter regions defined in the atlases and regressed for physiological nuisance. For "denoising" the time courses, we used the six motion parameters as regressors, as well as CompCor (Behzadi, Restom, Liau, & Liu, 2007) to regress nonneuronal physiological activity. These procedures, automatically performed in MRICloud, are similar to those adopted by major rs-fMRI postprocessing pipelines (e.g., SPM) and are detailed described in our previous publication (Faria et al., 2012). Furthermore, we calculated frame-wise displacement (FD) using six motion parameters (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012). No participant had FD > 0.5, and only one participant had FD > 0.3. Both groups had small average values of head motion (mean FD < 0.1). Still, the FEP group showed larger FD than HCs (p = .02), justifying our procedure of using the motion-corrected time courses for the analysis.
Seed-by-seed correlation matrices are obtained from the "nuisancecorrected" time courses and z-transformed by the Fisher's method.
Note that MRICloud pipelines include well accepted protocols to minimize artifacts, as those created by motion, in DWI and rsfMRI. These and other technical procedures are detailed in the original publications (Faria et al., 2012;Jiang, van Zijl, Kim, Pearlson, & Mori, 2006;Tang et al., 2014).
After the multimodal brain segmentation and quantification, each individual's brain was represented by a vector of image features: pairwise resting-state z-correlations from 70 seeds in the superficial gray matter (i.e., cortex) and the deep gray matter (i.e., basal ganglia plus thalamus). This process is represented in Figure 1.

| Statistical analyses
We investigated group differences between FEP groups and HC (HC vs. FEP, HC vs. S-FEP, and HC vs. M-FEP) in the various imaging modalities by using two-sample t tests. The False Discovery Ratio (FDR) (Benjamini & Hochberg, 1995) was used to correct for multiple comparisons at a "q" ("p-corrected") level of significance <.05.
The SIFA was implemented to integrate data collected from multiple imaging modalities while facilitating characterization with auxiliary covariates. For each modality, it is assumed that the data approach. Similarly to what is done in a sparse principal components analysis (Zou, Hastie, & Tibshirani, 2006), loading profiles were sparsified through a penalized regression to achieve the purpose of feature selection. The covariates for adjustment included group (FEP, HC, S-FEP, M-FEP) as well as race, sex, and age. Because in our study the data dimension is slightly unbalanced among modalities, we implemented the "SIFA-B" approach (Li & Jung, 2017), which is robust to unbalanced dimensions due to orthogonal and equal norm identifiability conditions. To test the significance of the coefficients in the regression models, confidence intervals were obtained using 500 bootstrap samples. Details about SIFA, the procedure to draw inference, and the classification approach are in Section A of the Supplementary material.
While group features may reveal pathological mechanisms, it is important to know if the multimodal features revealed by SIFA are expressed at an individual level. Therefore, we investigated the power of these image features to classify individuals within their respective diagnostic groups. It is also important to detect the effectiveness of models using multiple modality features, as compared with those using singular modality features for individual classification. For this purpose, we used leave-one-out cross-validated receiver operating characteristic (ROC) curves. The logistic classification models were trained using the latent factors estimated from the factor analysis.
The area under the curve (AUC) and the 95% confidence interval were calculated. We also calculated the sensitivity, specificity, and F1 score of the classification performance using the latent factors estimated from SIFA. We compared the performance with the support vector machine (SVM) approach considering three different types of kernel functions: linear, polynomial and radial kernels. For all approaches, five sets of data were considered to train the prediction model: (a) data of all modalities, (b) rs-fMRI data, (c) FA (from DTI), (d) MD (from DTI), and (e) T1-volumetric data. The data analyzed in this study and the analytical code are available under request to the authors.
F I G U R E 1 Schematic representation of the automated image parcellation using MRICloud (www.MRICloud.org). Each brain image is mapped to a set of multiple atlases and the pre-defined labels are applied to each original brain. T1-weighted images (for volumetric analysis) and DTI pipelines run in parallel. For the low-resolution modalities (e.g., rs-fMRI), the labels are brought to the original space by co-registering the T1-WIs. Through this process, the multiple MRI modalities are converted to a matrix of structures by image features, which represent each individual 3 | RESULTS

| Demographic analysis
Clinical and demographic variables are presented in Table 1. Since covariates were group-matched based on study design, the FEP group, its subgroups, and the HC group did not differ with respect to age, sex, race, and parental education. The FEP subgroups did not differ in illness duration and antipsychotic medication dosage.  Table 2). Association areas also showed abnormal DTI indices. Compared with the HC group, the FEP group showed lower FA and higher MD in the corona radiata and the inferior fronto-occipital fasciculus; lower FA in the white matter beneath the right superior frontal gyrus; as well as higher MD in the uncinate fasciculus, cingullum, and in the white matter beneath the inferior temporal and middle and inferior frontal gyri. In the deep nuclei, FEP showed lower FA in the globus pallidus, higher FA in the caudate, and higher MD in the thalamus and the putamen when compared with HC.

| Group comparison (FEP vs. HC) of imaging characteristics in each modality
With respect to rs-fMRI differences, the FEP group showed higher rs-fMRI z-correlations than the HC group between several pairs of regions. Regions most often detected as seeds of abnormal correlations were the thalamus, the cerebellum, the somato-sensorial cortex (parietal, post-and pre-central), and the cingulate cortex ( Figure 2, Table 2).

| Subgroup comparison (S-FEP and M-FEP vs. HC) of imaging characteristics in each modality
Given that the FEP group and the HC group display anatomical differences, we next addressed the possible FEP subgroup that might contribute most to these differences. As described in the Methods section, we subdivided the FEP subjects into S-FEP and M-FEP groups and compared these individual groups with the HC group.
The differences detected between S-FEP and HC were more widespread and had a higher effect size than those detected between FEP and HC. This is of great importance, considering the fact that S-FEP is a subset, and therefore a smaller group than the more general FEP group. Volumetric differences did not overcome the T A B L E 1 Demographic and clinical characteristics  threshold for multiple comparison correction, possibly due to the inclusion of individuals in early disease stage, whose brain structure was under minimum effect of the treatment and disease chronicity Vita, De Peri, Deste, Barlati, & Sacchetti, 2015).
In addition to what was observed in the FEP versus HC comparison, the DTI abnormalities spread to the putamen and the white matter beneath middle temporal ( Figure 2, Table 2). The rs-fMRI abnormalities had, in general, a higher effect size than in the FEP versus HC comparison (Table 2).
No differences in volume were detected between M-FEP and HC. Differences in DTI and rs-fMRI between M-FEP and HC were more constrained to a few regions and had a lower effect-size than those observed between S-FEP and HC ( Figure 2, Table 2).

| Multimodal characterization (SIFA) of FEP group
Given that the group comparison in the present study still involved many features from multiple MRI modalities (hundreds of volumes and DTI indices, and thousands of rs-fMRI), we next applied SIFA for data integration. As described in the Materials and Methods section, SIFA allows us to identify the latent factors (i.e., the combination of features) related to the different groups, as well as leverage information across modalities.
SIFA identified two common latent factors as different between FEP and HC; one factor in the S-FEP versus HC comparison, and one factor in the M-FEP versus HC comparison. The weights (sparsified loadings) of these factors are shown in Figure 3 and Table 3. The corresponding model coefficients and the 95% bootstrap confidence intervals are in Supplemental Figure S1.  Due to the sample size and lack of comparable external dataset for testing, we used leave-one-out cross validation to minimize overfitting.
Using the factors identified by SIFA yielded more robust and better performance compared with the SVM approaches. Note: For the DTI analysis of FA and MD, the white matter is categorized in projection, association, and commissural (corpus callosum) fibers. We also analyzed deep nucleae and the white matter as a whole ("total white matter"  Figure S1 were obtained following a bootstrap procedure.

| DISCUSSION
The goal of this study is to establish a pipeline that allows us to obtain meaningful brain imaging results from a relatively small sample size.
We demonstrated a successful case of utilizing an unbiased, datadriven, structure-based analysis to characterize FEP patients. We reduced the dimensionality of the data while still preserving the individual variability, and enhanced the statistical sensitivity and power (Glasser et al., 2016). As an example, with a sample size 50 times smaller than that recently used by the ENIGMA group (27) (Faria et al., 2015), a problem aggravated by the data-driven design. Therefore, the structure-based analysis is a complementary approach, rather than an alternative, to the voxel-based analysis.
In the same way, SIFA is a supervised approach that facilitates the association with auxiliary covariates compared with other methodologies for dimension reduction, such as joint ICA (Moosmann, F I G U R E 3 Characterization of FEP group and subgroups (S-FEP and M-FEP), compared with controls, by the SIFA. Representation of the regional loadings of the common factors that show significant difference between groups (two in the all FEP vs. HC, one in the S-FEP vs. HC, and one in the M-FEP vs. HC), in a glass brain. The loading values are reported in Table 3. Visualization with the BrainNet Viewer (http://www.nitrc. org/projects/bnv/, by Xia et al., 2013) T A B L E 3 Sparsified loadings of the common factors that show significant difference between groups in the supervised integrated factor analysis (SIFA)    Eichele, Nordby, Hugdahl, . A limitation of SIFA is that the estimation procedure was derived from the normal likelihood function, which assumes the data follow Gaussian distributions.
Joint ICA was designed for nonnormal data. Therefore, in our study, when implementing SIFA, proper data transformation was done to satisfy the normal assumption, for example, the functional connectivity was Fisher z-transformed. With strong associations between the covariates and the latent structure of the multimodal imaging data, incorporating the supervised effects from covariates improves estimation accuracy and interpretability. By its particular comprehensibility and power, the combination of SIFA with the structure-based approach is particularly relevant for translational studies, hypothesisgeneration, and for multimodal characterization of modest samples.
The power of this multimodal characterization is evidenced by the fact that multiple modality classifiers were shown to be more efficient than single modalities in classifying S-FEP individuals. Previous MRI studies applying multivariate machine-learning algorithms in neuroimaging have shown the potential to discriminate between individuals with schizophrenia and HCs (Cabral et al., 2016;Cetin, Houck, Vergara, Miller, & Calhoun, 2015;Du et al., 2012;Lei et al., 2020;Peruzzo, et al., 2015;Qureshi et al., 2017;Sui et al., 2013). The large range of discrimination accuracy previously reported (72-100%) is explained by the variably in samples sizes, differences in the populations and type of images analyzed, differences in validation approaches, and the "publication bias" (only the best performances are published). However, conclusions draw from patients with established schizophrenia, whose brain structure is known to be affected by the disease chronicity and long term treatment, In the present study, the accuracy of the classifier was increased by the combination of rs-fMRI abnormalities, which were more specific to the S-FEP group, with more "general" DTI abnormalities, which were greater in the S-FEP group. This points to the value in using multimodal data integration to stratify a heterogeneous population (e.g., FEP) into subgroups of potential clinical relevance. Our group previously reported greater cognitive impairment in individuals with schizophrenia as compared with those with bipolar disorder (Schretlen et al., 2007). Other studies attempted to perform subtype prediction (Arribas, Calhoun, & Adali, 2010;Calhoun, Maciejewski, Pearlson, & Kiehl, 2008;Costafreda et al., 2011;Ota et al., 2013;Pardo et al., 2006;Rashid et al., 2016;Sacchet, Livermore, Iglesias, Glover, & Gotlib, 2015;Schnack et al., 2014;Yang et al., 2018;Schretlen et al., 2007. Although most of these studies used single modality classifiers and focused on different subgroups, they all show predictive value in modeling of "spectrum-like" mental illness. The subgroup distinction supported by SIFA in the present study is in accordance to this notion, and adds proof of the potential value method for stratification in early disease stage. T A B L E 4 Area under the curve (95% confidence interval) for the leave-one-out cross-validated receiver-operating characteristic (ROC) curves classifying of FEP and controls Note: "All" includes features from all the modalities (T1-based volumes, DTI metrics (fractional anisotropy -FA and mean diffusivity -MD) and resting state fMRI.

F I G U R E 4
Leave-one-out cross-validated ROC curve for the classification of S-FEP and controls. The logistic models were trained using the factors (both common and individual) estimated from the SIFA. The model including multimodality-imaging features (volumes, FA, MD, and rs-fMRI synchrony) was the most effective on correctly classifying individuals with S-FEP, achieving an accuracy of 77% (Table 4) The comprehensive characterization of the FEP population and its subgroups highlights brain areas that may represent an important locus of the pathology. One of our main findings point to widespread abnormalities in DTI (FA increase and/or MD decrease) in projection and commissural pathways. This was a very robust finding, as diverse anatomically related segments showed the same pattern, agreeing with previous findings of single modality studies in FEP (Cheung et al., 2008;Faria et al., 2019;Lyall et al., 2018;Mitelman et al., 2007;Perez-Iglesias et al., 2010;Price et al., 2007;Schmidt et al., 2015;Wang et al., 2011;Whitford et al., 2010;Zhou et al., 2017) and in data-driven, large sample studies of schizophrenia patients (Kelly et al., 2017;Oestreich et al., 2017). Although stronger in S-FEP, most of the DTI features were shared in S-FEP and M-FEP and suggest involvement with common pathology.
In contrast to the widespread DTI features, within the functional modality we found more localized effects, and the thalamus was among the areas providing the greatest contribution to classification. The particular pattern of connectivity between the thalamus and the somato- that thalamus was identified as an important structure for classification "cross-modalities," adding evidences to its core involvement in psychosis.
The connections between thalamus and temporal areas, and among temporal areas and basal frontal areas, were also identified as important features for classification of FEP individuals, in agreement with similar reports in patients with established schizophrenia (Lei et al., 2020). This is physiologically reasonable given the role of these areas for cognitive functions and sensory integration. Although we are tempted to draw direct correlations between brain regions and function, results from multimodal integration must be interpreted as a spatially distributed pattern rather than focusing in individual regions or features. Together, our findings indicate that both functional and physical characteristics (note that volumes of different structures were identified as important features by SIFA, despite of the lack of volumetric group differences) are implicated in FEP at the individual level.
Although our methodology is optimized for relatively modest samples, increasing the cohort would allow us to test the models in independent data, as well as cluster patients into more specific groups. A second limitation is that most of the patients were receiving psychiatric treatment at the time of the scans. The value of these findings must ultimately be proved in drug-naïve cohorts. Information about self-education level, handedness, disease stage, and nonantipsychotic medications was not quantitatively available; these factors were not included in our analysis. Finally, the DWI was acquired with nonisotropic voxels, which may introduce issues related to partial volume effects. Despite these limitations, it is reasonable to infer that multimodal imaging features carry information about psychosis overall, FEP subgroups and FEP individuals. The present study may serve as a proof-of-concept for the potential of this methodology to be used in the study of a broader range of neurological and psychiatric disorders. The combination of multiple observables within neuroimaging and across nonimage domains is crucial for conditions like FEP and most other psychiatric disorders in which there is no single dominant discriminating feature. In these cases, the subgroup and individual characterization is more likely to reside in multiple features of small effect size that capture different aspects of the condition.

ACKNOWLEDGMENTS
We thank study participants and the recruitment team staff members

DATA AVAILABILITY STATEMENT
The data analyzed in this study and the analytical code are available under request to the authors.