Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual

Abstract Schizophrenia is a severe psychiatric disorder associated with both structural and functional brain abnormalities. In the past few years, there has been growing interest in the application of machine learning techniques to neuroimaging data for the diagnostic and prognostic assessment of this disorder. However, the vast majority of studies published so far have used either structural or functional neuroimaging data, without accounting for the multimodal nature of the disorder. Structural MRI and resting‐state functional MRI data were acquired from a total of 295 patients with schizophrenia and 452 healthy controls at five research centers. We extracted features from the data including gray matter volume, white matter volume, amplitude of low‐frequency fluctuation, regional homogeneity and two connectome‐wide based metrics: structural covariance matrices and functional connectivity matrices. A support vector machine classifier was trained on each dataset separately to distinguish the subjects at individual level using each of the single feature as well as their combination, and 10‐fold cross‐validation was used to assess the performance of the model. Functional data allow higher accuracy of classification than structural data (mean 82.75% vs. 75.84%). Within each modality, the combination of images and matrices improves performance, resulting in mean accuracies of 81.63% for structural data and 87.59% for functional data. The use of all combined structural and functional measures allows the highest accuracy of classification (90.83%). We conclude that combining multimodal measures within a single model is a promising direction for developing biologically informed diagnostic tools in schizophrenia.

(mean 82.75% vs. 75.84%). Within each modality, the combination of images and matrices improves performance, resulting in mean accuracies of 81.63% for structural data and 87.59% for functional data. The use of all combined structural and functional measures allows the highest accuracy of classification (90.83%). We conclude that combining multimodal measures within a single model is a promising direction for developing biologically informed diagnostic tools in schizophrenia.

K E Y W O R D S
functional connectivity, graph theoretical analysis, machine learning, neuroimaging, schizophrenia 1 | INTRODUCTION Schizophrenia is a severe psychiatric disorder, characterized by delusions, hallucinations and disorganized thinking (Hu et al., 2016), which affects about 1% of the world's population (Dhindsa & Goldstein, 2016;Lieberman, Scott Stroup, & Perkins, 2006;Nowak, Sabariego, Switaj, & Anczewska, 2016;Rajji, Miranda, & Mulsant, 2014). Its etiology and neuropathology are not well understood, even though neuroimaging studies have revealed distributed structural and functional brain abnormalities (Karlsgodt, Sun, & Cannon, 2010;Oh et al., 2015;Oh et al., 2017). Schizophrenia is usually diagnosed by a clinical examination carried out by psychiatrists. However, accurate diagnosis can take up to 2 years due to its heterogeneous and fluctuating presentation. Given the importance of providing the right treatment to patients in the early stages of the illness, there is an urgent clinical need for an objective diagnostic test that could be used to detect the illness and reduce the risk of misdiagnosis without the need for a long follow-up.
While a small number of studies have reported higher accuracies using sMRI (96%; Pardo et al., 2006) and resting-state fMRI (100%; Fekete et al., 2013), these tended to include a small number of subjects (e.g., 18 subjects in total in Pardo et al., 2006 and 28 subjects in total in Fekete et al., 2013) and therefore the reliability of the findings is unclear. Considering that patients with schizophrenia show both structural and functional abnormalities (Fitzsimmons, Kubicki, & Shenton, 2013;Karlsgodt et al., 2010), accuracy might be improved by combining different neuroimaging modalities using a multimodal ML framework.
So far, a total of four studies have attempted to do this with the aim of detecting schizophrenia at the level of the individual patient (Du et al., 2012;Ota et al., 2013;Qureshi, Oh, Cho, Jo, & Lee, 2017;Sui et al., 2013). However, the results of these multimodal studies have been inconsistent with accuracies ranging 93-98% (Du et al., 2012), 72-88% (Ota et al., 2013), 99.29% (Qureshi et al., 2017), and 79% (Sui et al., 2013)  In this study, we aimed to classify patients with schizophrenia and healthy controls by combining structural (sMRI) and resting-state functional (rs-fMRI) data. For sMRI, gray matter and white matter volume were extracted and used as input for classification, whereas for rs-fMRI we used two most widely used resting-state metrics, amplitude of low-frequency fluctuation (ALFF) and regional homogeneity (ReHo). ALFF captures fluctuations in spontaneous, low-frequency oscillations; in contract ReHo reflects the temporal homogeneity of regional BOLD signals regardless of their intensities. Consequently, these two measures can be used to extract different types of information from the BOLD signal during the resting state. In addition, in light of current neurobiological models of schizophrenia as a dysconnectivity syndrome (Lynall et al., 2010;van den Heuvel, Mandl, Stam, Kahn, & Hulshoff Pol, 2010;Yu et al., 2011) and recent advances in the application of graph-based theoretical approaches to the human brain (E. Bullmore & Sporns, 2009;E. T. Bullmore & Bassett, 2011), we extracted connectome-wide based metrics from the sMRI (H. Wang, Jin, Zhang, & Wang, 2016) and rs-fMRI data (J. Wang, Zuo, & He, 2010) and used these as additional inputs for classification.
In order to assess the reliability of the findings, we used five independent datasets resulting in a total sample of 295 patients with schizophrenia and 452 healthy controls. Each dataset comprised highresolution T1-weighted images and rs-fMRI, allowing us to examine classification accuracy for each modality as well as their integration.
Because schizophrenia is considered to involve structural as well as functional brain abnormalities (Karlsgodt et al., 2010;Oh et al., 2015;Oh et al., 2017), we hypothesized that (a) both structural and functional data would allow single-subject classification with statistically significant accuracy. In addition, given the existing evidence for both regional and network-level alterations in schizophrenia (Y. Liu et al., 2008;Micheloyannis et al., 2006;Shen, Wang, Liu, & Hu, 2010), we hypothesized that, within each modality, (b) the combination of voxel-wise images and connectome-wide based matrices would be superior to the use of either voxel-wise images or connectome-wide based matrices by themselves. Finally, based on (a) and (b) (Poldrack et al., 2016). After receiving a verbal explanation of the study, participants gave written informed consent following procedures approved by the Institutional Review Boards at UCLA and the Los Angeles County Department of Mental Health.
Dataset 3 was acquired at Maastricht University, The Netherlands. Patients were recruited through clinicians working in selected representative geographic areas in the Netherlands and Belgium.
Diagnosis of schizophrenia was based on DSM-IV criteria (American Psychiatric Association, 2000), assessed with the Comprehensive Assessment of Symptoms and History (CASH) interview (Andreasen, Flaum, & Arndt, 1992). Exclusion criteria included confirmed or suspected pregnancy, any history of neurological disorders, a history of intellectual disability and/or a history of substance abuse/dependence within the last 12 months. The ethics committee of Maastricht University approved the study, and all the participants gave written informed consent in accordance with the committee's guidelines and the Declaration of Helsinki. pregnancy, and any chronic physical illness such as a brain tumor, hepatitis, or epilepsy, as assessed by clinical evaluations and medical records.
The study was approved by the ethics committee of West China Hospital, and written informed consent was obtained from all participants.

| MRI data acquisition
At each site, the high-resolution three-dimensional T1-weighted images and rs-fMRI images were acquired. Dataset 1 was acquired using a 3T Siemens scanner. The sequence parameters were as fol-

| MRI data analysis
Six individual measures were analyzed using support vector machine, including structural covariance matrix, gray matter and white matter volume, which are extracted from sMRI data, and functional connectivity matrix, ALFF and ReHo, which were extracted from rs-fMRI data ( Figure 1).

| Extraction of voxel-wise measures from structural data
Structural images were processed using Statistical Parametric Mapping software (SPM8; http://www.fil.ion.ucl.ac.uk/spm). In brief, individual structural images were first segmented into gray matter (GM) and white matter (WM) using the unified segmentation model (Ashburner & Friston, 2005). The resulting GM and WM maps were then normalized to the Montreal Neurological Institute (MNI) space using a high-dimensional "DARTEL" approach and subjected to nonlinear modulation to compensate for spatial normalization effects.
Finally, the GM and WM data were resampled to 1.5 mm 3 voxels and spatially smoothed (Gaussian kernel with a full width at half maximum of 6 mm). The preprocessed GM and WM volume images would then be used as features for the ML analyses.

| Extraction of covariance matrix from structural data
Following the preprocessing of the structural data, we constructed large-scale morphological brain networks for each participant based on their GM volume images. First, to define the network nodes, we parcellated the brain into different regions of interests (ROIs) in terms of automated anatomical labeling (AAL) 90 atlas. Next, to estimate internodal network edges, we utilized a Kullback-Leibler divergencebased (Kullback & Leibler, 1951) similarity measure to quantify morphological connectivity between two regions (Kong et al., 2014), which generated a 90 × 90 morphological brain networks matrix for each individual (H. Wang, Jin, et al., 2016).

| Functional data preprocessing
Image preprocessing was performed using SPM8 and DPARSF soft- the "head motion scrubbing" method proposed by Power and colleagues (Power et al., 2014) to ensure that motion artifacts were not contributing to the group differences we observed. For each participant, volumes with framewise displacement (FD) greater than 0.5 mm were identified and excluded. After these corrections, the images were spatially normalized to a 3 × 3 × 3 mm 3 MNI 152 template and then linearly detrended and temporally bandpass filtered (0.01-0.08 Hz) to remove low-frequency drift and high-frequency physiological noise. Finally, the global signal, the white matter signal, the cerebrospinal fluid (CSF) signal and the motion parameters (1.5 translational and 1.5 rotational parameters) were regressed out (Fox, Zhang, Snyder, & Raichle, 2009). None of the subjects included in the present investigation showed excessive head motion during scanning (defined as translational movement >1.5 mm and/or rotation >1.5 ).

| Extraction of voxel-wise measures from functional data
ReHo maps were extracted from the preprocessed images using DPARSF software. After removing linear trends in the unsmoothed images and applying a bandpass filter (0.01 < f < 0.08 Hz) to the data to reduce low-frequency drift and high-frequency respiratory and cardiac noise, ReHo maps were generated by calculating the concordance of Kendall's coefficient (with values ranging from 0 to 1) of the time series of a given voxel with those of its 26 nearest neighbors.
The ReHo value of each voxel was then standardized by dividing this value by the global (within the brain) mean ReHo value.
The ALFF was also calculated using DPARSF software. After application of a band-pass filter (0.01-0.08 Hz) and removal of linear trends, the time series were transformed to the frequency domain using fast Fourier transforms (FFTs). The square root of the power spectrum was calculated and was then averaged across 0.01-0.08 Hz for each voxel. This averaged square root was referred to as ALFF.
Finally, the ALFF of each voxel was standardized by dividing it by the global (within the brain) mean ALFF value for further statistical analysis.

| Extraction of functional connectivity matrices from functional data
The graph theoretical network analysis was performed using GRETNA software (http://www.nitrc.org/projects/gretna/) (J. . First, the whole brain was divided into 90 cortical and subcortical ROI-each representing a network node-using the AAL atlas.
Next, to define the edges of the network, we extracted the mean time series of each region and calculated Pearson's correlations of the mean time series between all pairs of nodes. This process resulted in a 90 × 90 weighted correlation matrix for each subject.

| Support vector machine
We performed all machining learning analyses using Python programming language, and made the scripts publicly available on https:// github.com/Warvito/integrating-multi-modal-neuroimaging. For each dataset, we used support vector machine (SVM) (Cortes & Vapnik, 1995) to perform single-subject classification. SVM maps the input vectors to a feature space using a set of mathematical functions known as kernels. In this feature space, the model finds the optimum separation surface that can maximize the margin between different classes within a training dataset. Once the separation surface is determined, it can be used to predict the class of new observations using an independent testing dataset. Here a linear kernel was preferred to a nonlinear one to minimize the risk of overfitting. The model was based on LIBSVM (Chang & Lin, 2011) and implemented by the Scikit-Learn library (Pedregosa et al., 2012).
During the multiple measure analysis, we combined the SVM predictions of single measures using a weighted averaging method (i.e., soft voting), as a previous study had reported that this method appeared to be slightly more effective than either sum of kernels or multikernel learning (MKL) (Pettersson-Yeo et al., 2014). In this approach, we first trained each SVM using a single measure; this allowed us to estimate the likelihood of an individual belonging to the patient or control group (the likelihood was calculated using Scikit-Learn library default method). Then, we calculated the weighted probabilities of each specific measure by multiplying its predicted probabilities by a coefficient (see Section 2.4.1 for how this was optimized). Finally, we calculated the average of the predicted weighted probabilities, and the group with the highest score was defined as the predicted class for a given subject.

| Evaluation of the support vector machines
For each SVM model, we used an independent set of individuals to perform a nonbiased assessment the performance. Specifically, a 10-fold stratified cross-validation scheme was used to separate the original samples (of each dataset) in 10 nonoverlapping partitions. In each iteration of the scheme, one partition was considered as the independent test set (where the performance metric is calculated), and the remaining subjects were defined as the training sample.
Within each training set, we performed an internal crossvalidation (i.e., 10-fold stratified nested cross-validation) to select the optimal set of hyperparameters of the ML models. The linear SVM has only one hyperparameter (the soft margin parameter C) that controls the trade-off between reducing training errors and having a larger separation margin. This parameter was optimized performing a grid search in the following range of values: C = 10 −3 , 10 −2 , 10 −1 , 1, 10 1 , 10 2 , 10 3 , 10 4 . At the end of this internal cross-validation, we had the optimum C value for each input measure.
When we performed a multiple measure analysis, after the grid searches for the C parameter, a second nested cross-validation was performed to optimize the coefficient of each specific measure for the soft voting. Each coefficient was evaluated using a grid search with a coefficient search space assuming an integer value between 1 and 10.
This second nested cross-validation was also performed using a scheme of 10-fold stratified cross-validation. In both nested cross-validation, the highest mean balanced accuracy of the model was used to find the best hyperparameter value.
After these nested cross-validation steps, an SVM model with the optimal set of values of the hyperparameters was trained using the whole training set. Its performance was assessed on the test set in terms of balanced accuracy, specificity, and sensitivity. The reported balanced accuracy, specificity, and sensitivity are the mean values of the metrics calculated on each partition of the crossvalidation scheme. Statistical significance was estimated using permutation testing (1,000 permutations). The whole training process was performed 1,000 times with the subject labels permuted.
The p-values were then obtained by dividing the number of times that the permuted version was better than the original performance by the number of permutations.

| Controlling for age and sex effects
For each measure of brain function (i.e., whole brain images, connectome-wide matrices or graph-based metrics) in each dataset, we build a regression model that represented how the measure varied with age and sex, and subtracted age-and gender-related variance from the actual measures. This was done using the Gaussian process regression method and kernel function that were used in a previous investigation (Kostro et al., 2014), with the regression model based on control subjects only. The resulting residuals would then be used as features for the ML analyses.

| Using different datasets for training and testing
In this study, the five datasets were acquired using different scanners and different scanning parameters. Because site-related differences are likely to be larger than differences between patients and controls, we did not expect it would be feasible to use subjects from one site as training sample and subjects from another site as testing sample.
Nevertheless, we explored the feasibility of this by using different datasets as training and testing data, and 10-fold cross-validation was used to assess the performance of the model.

| Which brain regions provided the greatest contribution to single-subject classification?
In order to explore which brain regions contributed to single-

| Multimeasure integration within a single modality
When combining the three measures extracted from structural images, namely GM, WM, structural covariance matrix, the balanced accuracy reached statistical significance for each of the five datasets (Table 2). Here the mean balanced accuracy of the five datasets was 81.63%. Likewise, when combining the three measures extracted from functional images, namely ReHo, ALFF and functional connectivity matrix, the balanced accuracy reached statistical significance for each of the five datasets.
Here the mean balanced accuracy of the five datasets was 87.59%. This pattern of results suggests that, within structural or functional modalities, the combination of connectome-wide based matrices and voxel-wise images allows a marginally higher accuracy of classification than the use of either connectome-wide based matrices or voxel-wise images alone.

| Multimodal and multimeasure integration
When combining all measures across structural and functional modalities, balanced accuracies reached statistical significance for each of the five datasets (

| Using different datasets for training and testing
The results are reported in Table 3; it can be seen that, as we expected, an algorithm developed using one dataset does not perform T A B L E 4 Ten brain regions making the greatest contribution to single-subject classification across the different measures well when applied to other datasets. Therefore, in our manuscript, we have opted to analyze and report the five datasets separately; this allowed us to detect reliable effects that were expressed across the different datasets.

| Brain regions providing the greatest contribution to single-subject classification
The 10 brain regions with the highest mean values across the five datasets are reported in Table 4 and represented graphically in Figure 2. It can be seen that, within the structural modality, the brain regions contributing to single-subject classification varied across our three measures of interest. Only the inferior temporal gyrus and middle temporal gyrus were detected consistently across GM and WM measures. In contrast, within the functional modality, the thalamus featured consistently across our three measures of interest.
In addition, the inferior temporal gyrus (functional matrices and ReHo), middle temporal gyrus (functional matrices and ALFF) and putamen (ReHo and ALFF) were detected in two of our three measures of interest. Taken collectively, these results suggest that the pattern of regions contributing to single-subject classification is dependent on the specific structural or functional measure being employed.

| DISCUSSION
In the present article, we aimed to classify patients with schizophrenia and healthy controls by combining sMRI and rs-fMRI data. In order to assess the reliability of the findings, we used five independent datasets. However, our investigation extends the results of these previous studies by showing for the first time that functional data allow a higher accuracy of classification than structural data. Interestingly, the single measure that achieved the highest performance of classification was functional connectivity matrix (83.07%). This is consistent with the notion that schizophrenia cannot be explained in terms of localized dysfunction within specific brain areas and is better understood as a disruption of network-level functional properties (Rish & Cecchi, 2017). In their examination of the relationship between functional and structural brain networks, Wang and colleagues have reported that functional connectivity profiles are largely shaped but not fully Note: All brain regions are identified using AAL (automated anatomical labeling). For matrix-based measures (i.e., Struct M and Func M), the vectors are absolute values of the weights for the connectivity between each brain region and the remaining 89 regions across the different folds of the crossvalidation. For voxel-wise measures (i.e., GM, WM, ReHo, and ALFF), the vectors are computed using a template mask based on the AAL atlas to extract the absolute value of weight for each brain regions across the different folds of the cross-validation. Abbreviations: ALFF, amplitude of low-frequency fluctuation; Func M, functional connectivity matrix; GM, gray matter; L, left hemisphere; R, right hemisphere; ReHo, regional homogeneity; Struct M, structural covariance matrix; WM, white matter.
determined by structural pathways (Z. Wang, Dai, Gong, Zhou, & He, 2015). This suggests that measures of structural connectivity may be complementary to measures of functional connectivity in the classification of individual patients (Cabral et al., 2016;Mitra et al., 2016). In the present investigation, the use of structural covariance matrix resulted in a mean balanced accuracy of 75.11%. This may confirmed the promising results of previous studies that had investigated this measure in individuals at familial risk for schizophrenia (Tijms et al., 2015) and patients with Alzheimer's disease (Tijms, Moller et al., 2013;Tijms et al., 2014;Tijms et al., 2016).
Consistent with our second hypothesis, within each modality the combination of voxel-wise images and connectome-wide based matrices marginally improved performance. Specifically, combining voxelwise images and connectome-wide based matrices within the structural modality improved accuracy to 81.63%, whereas combining voxel-wise images and connectome-wide based matrices within the functional modality improved accuracy to 87.59%. To our knowledge, no previous studies have combined voxel-wise images and connectome-wide based matrices to detect psychiatric or neurological disorders at the level of the individual. However, we know that the vast majority of psychiatric and neurological illnesses are associated with a combination of the regional and network-level brain (Fornito & Bullmore, 2015;Worbe, 2015). The results of our investigation, therefore, raise the possibility that the integration of these two types of information might also improve detection in other psychiatric and neurological illnesses.
Different neuroimaging modalities may capture different aspects of neuropathology and therefore may provide complementary information for detecting schizophrenia at the level of the individual patient. Recent studies had shown the advantages of using a multimodal approach for classifying Alzheimer's disease (Dai et al., 2012;D. Zhang, Wang, Zhou, Yuan, & Shen, 2011), Parkinson's disease (Long et al., 2012), PTSD (Q. Zhang et al., 2016) and schizophrenia (Qureshi et al., 2017). Consistent with our third hypothesis, we found that the highest accuracy (90.83%) of classification was achieved when combining all structural and functional measures within a multimodal, multimeasure model. Therefore, combining multimodal measures within a single model appears to be a promising direction for improving classification of individual patients with schizophrenia.
However, we note that the higher accuracy of classification resulting from multimodal integration does not necessarily imply clinical utility in real-world clinical practice. The clinical utility of a clinical test depends on several aspects such as the ability to generate a "divergent prediction" and inform subsequent interventions (Mechelli, Prata, Kefford, & Kapur, 2015). The eventual development of tools for detecting schizophrenia at the level of the individual, therefore, will ultimately require higher levels of diagnostic and prognostic F I G U R E 2 Regions providing the greatest contribution to single-subject classification. Ten brain regions making the greatest contribution to single-subject classification for each of our six measures of interest. In addition, we explored which regions provide the greatest contribution to single-subject classification. Within the structural modality, the brain regions providing the greatest contribution varied across our three measures (i.e., GM, WM, and structural covariance matrix).
In contrast, within the functional modality, we found that the thalamus was among the areas providing the greatest contribution to classification based all three functional measures (i.e., ReHo, ALFF, and functional connectivity matrix). Alterations in thalamic functional connectivity are a key feature of psychotic disorders (Woodward & Heckers, 2016) and include both reduced prefrontal-thalamic connectivity and increased sensorimotor-thalamic connectivity (Woodward, Karbasforoushan, & Heckers, 2012). These alterations are also evident in individuals at clinical high risk for schizophrenia, especially those who later go on to convert to psychosis (Anticevic et al., 2015), and therefore are thought to represent a marker of future risk. In addition to the thalamus, the inferior temporal gyrus and middle temporal gyrus contributed to classification in most of our functional measures of interested. The middle temporal gyrus and inferior temporal gyrus subserve a range of cognitive functions, including language processing, semantic memory, and multimodal sensory integration, that are impaired in patients with schizophrenia relative to healthy controls (Kuroki et al., 2006;Onitsuka et al., 2004). Interpretation of these findings must take the multivariate nature of our analytical method into account. While standard mass univariate techniques consider each voxel as a spatially independent unit, multivariate methods such as support vector machine may be additionally based on inter-regional correlations. An individual region may therefore display high discriminative power due to two possible regions: (a) a difference in volume between groups in that region; (b) a difference in the correlation between that region and other areas between groups. Thus, discriminative networks should be interpreted as a spatially distributed pattern rather than as individual regions. Taken collectively, these findings confirm that both subcortical and cortical networks are implicated in the neuropathology of schizophrenia at the individual level-consistent with current neurobiological models of the disease (Howes & Murray, 2014).
In the present study, when we pooled the five datasets, use sitestratified cross-validation and used leave-one-dataset-out cross-validation, the performance was very poor. This can be explained by the fact that the five datasets were acquired using different scanners and different scanning parameters, resulting in site-related differences larger than the differences between patients and controls. This aspect of our findings indicates that intersite differences remain a critical challenge in the development of imaging-based clinical tools and their translational implementation in real-world psychiatry. Future multisite imaging studies might benefit from the use of novel methods for removing site-related differences, such as feature harmonization (Xia et al., 2019;Yamashita et al., 2019).
The present study has several limitations. First, our data were acquired at five different sites using different scanners and acquisition parameters; on the other hand, the use of independent datasets allowed us to demonstrate the replicability of our findings. Second, the graph theoretical analysis of sMRI data was implemented using so-called spatial similarity methods (Kong et al., 2014;H. Wang, Jin, et al., 2016); however, there are alternative graph analytic methods based on intracortical similarity (Tijms, Series, Willshaw, & Lawrie, 2012) that could be used in the future to confirm our findings. In addition, our graphic theoretical analyses were based on the use of Pearson's correlations. Again, there are alternative approaches, such as partial correlation matrices and binary topology metrics, that could be considered in future studies. Third, since our investigation included both structural and functional data, we performed node selection using the AAL atlas, which can be applied to both modalities. Future studies could use alternative approaches such as newly developed functional parcellation (Gordon et al., 2016) to assess the reliability of our findings. Fourth, antipsychotics medication may lead to changes in brain structure (Ho, Andreasen, Ziebell, Pierson, & Magnotta, 2011) and function (Vogel et al., 2016). However, our results were consistent across the five datasets including Dataset 5 in which all patients were medication-naive; this suggests that our findings are unlikely to be explained by the effects of antipsychotic medication. Finally, a major challenge in the application of ML to high-dimensional neuroimaging data is the risk of overfitting that is, the learning of irrelevant fluctuations within a dataset that limits generalizability to other datasets. Here we minimized such risk through the use of region-level features, which are associated with less noise and lower risk of overfitting, rather than voxel-level data, which are associated with more noise and higher risk of overfitting (Vieira, Pinaya, & Mechelli, 2017).
In addition, the fact that our results were replicated in five independent samples, with the use of 10-fold cross-validation in each dataset, provides some reassurance about the reliability of the findings.
In conclusion, the present study demonstrates that functional measures allow classification of schizophrenia at the individual level with greater accuracy than structural measures, and that multimodal integration of voxel-wise images and connectome-wide based matrices improves accuracy relative to single-modality classification. These findings are consistent across five datasets with a multimodal model of the disease that includes structural and functional alterations that are expressed at regional and network-level. We propose that combining multimodal measures within a single model appears to be a promising direction for improving classification of individual patients with schizophrenia. However, the eventual development of clinical tools for detecting schizophrenia and informing treatment will ultimately require higher levels of accuracies than those reported in the present investigation. This might be achieved by combining neuroimaging measures with other types of data within a multivariate supervised ML framework.

Collection of Dataset 5 was funded by the National Natural Science
Foundation of China (grant nos. 81621003，81761128023 and 81227002). The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

CONFLICT OF INTEREST
The authors declare no competing financial interests.

DATA AVAILABILITY STATEMENT
Datasets 1 and 2 are available in the public domain, whereas Datasets 3, 4 and 5 are unavailable due to research data confidentiality. This complies with the requirements of the funding bodies and institutional ethics approval.