Integrating proteomic and clinical data to discriminate major psychiatric disorders: Applications for major depressive disorder, bipolar disorder, and schizophrenia

We report that integrating proteomic and clinical data enables objective differentiation between major depressive disorder (MDD), bipolar disorder (BD), and schizophrenia (SCZ). These major psychiatric disorders are associated with mortality and life-long disability. 1 However, objective discrimination of these disorders remains a formidable challenge. Thus, this study aimed to distinguish MDD, BD, and SCZ by integrating targeted/untargeted proteomic data obtained from liquid chromatography-mass spectrometry (LC-MS) and clinical data.

showed consistent expression level patterns across disease types, low inter-correlation with covariates ( Figure  S5A-C), and low interdependence between each other ( Figure S6A-C).
Multiprotein-marker (MPM) models were constructed by LASSO (least absolute shrinkage and selection operator) with 100-repeated 5-fold cross-validations, additionally with feature extraction and weighted model averaging, 2 in the training sets (Table S6 and Figure  S7A-C). After evaluating model performances in the validation sets based on selection fractions, the simplest models (selection fraction = 1) were selected, as the performances only mildly increased with selection fraction ≥.8 ( Figure 1A-C; Figure S8A-C). The final MPM models for differentiating MDD versus BD, MDD versus SCZ, and BD versus SCZ consisted of 17, 20, and 17 proteins, and the AUROC values were .74, .82, and .78, respectively in the independent test sets ( Figure 1A-C). Due to different analytical methods, the corresponding proteins differed with our previous study for discriminating MDD versus BD except for ITIH2. 2 However, the current models were constructed with larger samples and expanded targets, and validated in an independent set; implying greater reproducibility. For each MPM model, the direction of each average coefficient corresponded to the alteration in expression (fold-change) ( Figure 1A-C). The MPM models had similar performances in differentiating MDD, BD, and SCZ with different subgroups ( Figure S9A-F), all of the proteins were less influenced by psychotropic medication (Figure S10), and only few proteins showed associations with specific symptoms (Table S7). Particularly for BD, the proteins were unrelated to depressive or manic symptoms. The mass spectral information of proteins in the MPM models is presented in Table S8, and the alterations in the expression of the proteins is presented in Table S9 and Figure S11. There was no protein that overlapped in all three MPM models. F I G U R E 1 Development of multiprotein marker (MPM) models to discriminate disease types by machine learning. For each pairwise comparison, the selection fraction for proteomic candidate features (proteins), weighted average coefficient, and discriminatory performance are presented. The selected features (selection fraction = 1) in the MPM models are shown as pink bars. Weighted average coefficients corresponding to the selected features and their directions for disease types are presented. Discriminatory performance of each MPM model is presented as AUROC value in the training, validation, independent test, and total sets. Results of MPM models for (A) MDD versus BD, (B) MDD versus SCZ, and (C) BD versus SCZ. MDD, major depressive disorder; BD, bipolar disorder; SCZ, schizophrenia; MPM, multiprotein marker; AUROC, area under the receiver operating characteristics F I G U R E 2 Discriminatory and diagnostic performances of ensemble (ES) models combining MPM and SCLB models and comparison of the performances between ES and CRSB models. For each ES model, discriminatory performance is presented as AUROC value in the training, validation, independent test, and total sets. Diagnostic performance with the independent test sets is presented as accuracy, sensitivity, specificity, PPV, and NPV at optimal cutoff (Youden index). Symptom checklist-based (SCLB) models were constructed by generalized linear models (GLMs). The models with the highest discriminatory power considering all combinations of the Symptom Checklist-90-Revised (SCL-90-R) 3 dimensions, were selected (Table S10 and Figure   S12A-C). Then, ensemble (ES) models were constructed by combining MPM and SCLB models through the stacking ensemble strategy. 4 At last, clinician rater score-based (CRSB) models were constructed by GLMs, combining the total scores of the Brief Psychiatric Rating Scale (BPRS), 5 F I G U R E 3 Integrated protein networks and associated canonical pathways for proteins in MPM models. Integrated protein networks and the corresponding canonical pathways were generated. Two networks with network score ≥20 were integrated. For edge information, direct and indirect interactions are presented by solid and dashed lines, respectively. Canonical pathways associated with proteins in the network are presented as dotted lines (light pink). Regarding node information, shapes signify the molecular class of proteins defined in the legend, and colours surrounding the nodes represent expression patterns for each disease type. Overlapping proteins between MPM models are denoted by an asterisk. Each protein is presented as a gene name and the corresponding protein entry in parentheses. Alterations in protein expression are presented as fold-change for each disease type. MDD, major depressive disorder; BD, bipolar disorder; SCZ, schizophrenia; CP, canonical pathway; MPM, multiprotein marker Hamilton Anxiety Scale (HAM-A), 6 Montgomery-Asberg Depression Rating Scale (MADRS), 7 and Young Mania Rating Scale (YMRS) 8 (Table S10). The discriminatory and diagnostic performances of the ES and CRSB models were overall comparable (Figure 2A-F and Figure S13A-C).
For 43 proteins from all MPM models, an integrated network comprising up to two networks was predicted (Table S11 and Figure 3). Diseases/functions associated with the network included cellular movement (p = 7.87 × 10 -21 -1.61 × 10 -7 ), cell-to-cell signalling and interaction F I G U R E 4 Proteins in MPM models with overlapping and consistent expression patterns between targeted proteomics and proteomic profiling. Clusters originating from DEPs of proteomic profiling analysis and their corresponding expression levels (represented as Z-score) (p = 9.14 × 10 -10 -1.61 × 10 -7 ), immune cell trafficking (p = 2.3 × 10 -12 -1.3 × 10 -7 ), neurological disease (p = 7.47 × 10 -12 -8.17 × 10 -8 ), and psychological disorder (p = 6.09 × 10 -12 -3.89 × 10 -2 ). Furthermore, the network was related to significant canonical pathways including complement and coagulation cascade dysregulation, neural signalling, and oxidative and inflammatory pathways, which has been replicated in previous studies ( Figure 3). 2,9 Especially, reelin signalling was a significant canonical pathway, which is known to regulate neuronal migration and synaptogenesis in the brain, and has been linked to MDD, BD, and SCZ. 10 Through proteomic profiling, analytically stable plasma proteome (902 quantified proteins) were constructed in each pooled sample for the four groups (Table S12 and Figure S14A-D). Subsequently, 267 differentially expressed proteins (DEPs) with 4 clusters, 347 DEPs with 5 clusters, and 339 DEPs with 4 clusters were determined between MDD versus BD versus HC, MDD versus SCZ versus HC, and BD versus SCZ versus HC, respectively (Table S13). The DEPs that had consistent significance and expression patterns in both targeted proteomics and proteomic profiling were as follows; ITIH2 for the MPM model of MDD versus BD, TFPI1 and ITIH2 for MDD versus SCZ, and C1RL for BD versus SCZ. (Table S14; Figure 4A-C). The overall alterations in abundance of these 3 DEPs in each group is presented in Figure 4D. Further discussion of these key proteins is described in Supporting Information.
Our study has its limitations regarding sample size, the possibility of other potential confounders and proteomic targets including duration of the current episode, and medication dosage/duration, the cross-sectional study design, biological interpretations of proteins in peripheral blood, and limited practicalness to clinical practice as a diagnostic tool (Supporting Information). Nevertheless, we demonstrated the viability of integrating proteomic and clinical data in discriminating MDD, BD, and SCZ. We developed MPM and ES models for each pairwise comparison of groups, reporting their potential in differentiating and diagnosing these disorders.

C O N F L I C T O F I N T E R E S T
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported. (D) Alterations in expressions of three selected proteins, which satisfied consistent statistical significance and expression pattern between targeted proteomics and proteomic profiling, between all disease groups and HCs are presented as heatmap and line graphs. Alterations in protein expression are indicated by a red line, and average protein expression for each group is signified by a purple line. TFPI1 was upregulated in MDD and SCZ but downregulated in BD compared with HCs (MDD > SCZ > HC > BD). ITIH2 showed no difference of expression between MDD and HC but was downregulated in BD and SCZ versus HC (MDD≈HC > SCZ > BD). C1RL showed no difference of expression between BD and HCs but was downregulated in SCZ and MDD compared with HCs (BD≈HC > MDD > SCZ). MDD, major depressive disorder; BD, bipolar disorder; SCZ, schizophrenia; HC, healthy control; MPM, multiprotein marker; DEP, differentially expressed protein