Separating generalized anxiety disorder from major depression using clinical, hormonal, and structural MRI data: A multimodal machine learning study

Abstract Background Generalized anxiety disorder (GAD) is difficult to recognize and hard to separate from major depression (MD) in clinical settings. Biomarkers might support diagnostic decisions. This study used machine learning on multimodal biobehavioral data from a sample of GAD, MD and healthy subjects to differentiate subjects with a disorder from healthy subjects (case‐classification) and to differentiate GAD from MD (disorder‐classification). Methods Subjects with GAD (n = 19), MD without GAD (n = 14), and healthy comparison subjects (n = 24) were included. The sample was matched regarding age, sex, handedness and education and free of psychopharmacological medication. Binary support vector machines were used within a nested leave‐one‐out cross‐validation framework. Clinical questionnaires, cortisol release, gray matter (GM), and white matter (WM) volumes were used as input data separately and in combination. Results Questionnaire data were well‐suited for case‐classification but not disorder‐classification (accuracies: 96.40%, p < .001; 56.58%, p > .22). The opposite pattern was found for imaging data (case‐classification GM/WM: 58.71%, p = .09/43.18%, p > .66; disorder‐classification GM/WM: 68.05%, p = .034/58.27%, p > .15) and for cortisol data (38.02%, p = .84; 74.60%, p = .009). All data combined achieved 90.10% accuracy (p < .001) for case‐classification and 67.46% accuracy (p = .0268) for disorder‐classification. Conclusions In line with previous evidence, classification of GAD was difficult using clinical questionnaire data alone. Particularly cortisol and GM volume data were able to provide incremental value for the classification of GAD. Findings suggest that neurobiological biomarkers are a useful target for further research to delineate their potential contribution to diagnostic processes.


| INTRODUCTION
Generalized anxiety disorder (GAD) is among the most prevalent anxiety disorders in the general population (Beesdo, Pine, Lieb, & Wittchen, 2010;Kessler, Petukhova, Sampson, Zaslavsky, & Wittchen, 2012) and associated with considerable burden for the individual and the society (Andlin-Sobocki & Wittchen, 2005;Hoffman, Dukes, & Wittchen, 2008). Concurrent comorbidity with major depression (MD) is high (Kessler, Chiu, Demler, Merikangas, & Walters, 2005) and typically between 40% and 60% (Carter, Wittchen, Pfister, & Kessler, 2001;Hunt, Issakidis, & Andrews, 2002). Previous research has shown both insufficient sensitivity in detecting a GAD patient as a case in real-world clinical settings and low specificity when separating a GAD diagnosis from MD (Calleo et al., 2009;Wittchen et al., 2002). Wittchen et al. (2002) found that only about two-thirds of all primary care patients with GAD but no depression were identified by their primary care physician as cases with any mental disorder, whereas case recognition was 85% in patients with comorbid GAD and MD. Only 34% of pure GAD cases and 43% of comorbid GAD cases were diagnosed with GAD. Calleo et al. (2009) reported that only 28% of all elderly GAD patients presenting in specialty medical clinics received a diagnosis of any anxiety or mood disorder and only 1.5% were correctly diagnosed with GAD. In other primary care studies, between 30% and 55% of GAD patients were recognized and correctly diagnosed (Munk-Jorgensen et al., 2006;Vermani, Marcus, & Katzman, 2011). GAD is particularly difficult to separate from MD: Calleo et al. (2009) report that the number of GAD patients receiving a diagnosis of a depressive disorder is more than twice the number of GAD patients receiving a diagnosis of an anxiety disorder. In primary care settings, GAD recognition is facilitated by further clinical information such as the presence of a higher number of disorder symptoms, the presence of comorbid mental disorders and by patients primarily reporting nonsomatic symptoms to their physician (Wittchen et al., 2002). Additionally, detection of GAD in primary care might be supported using screening measures (Herr, Williams, Benjamin, & McDuffie, 2014) such as the GAD-7 (Spitzer, Kroenke, Williams, & Lowe, 2006) or the Anxiety Screening Questionnaire (ASQ; Wittchen & Boyer, 1998;Wittchen & Perkonigg, 1997). The correct diagnosis is of vital importance as it largely determines the choice of psychotherapeutic or pharmacologic treatment.
Improving the differentiation of GAD and MD during the diagnostic process is therefore essential to support clinical decisions.
The use of biomarkers based on the neurobiological differences in disorders has been proposed as one option for increasing diagnostic accuracy (for a review see Wolfers, Buitelaar, Beckmann, Franke, & Marquand, 2015). A useful biomarker has to provide sufficient sensitivity and specificity to predict a given patient's status on the individual level (Lueken et al., 2016;Savitz, Rauch, & Drevets, 2013). Machine learning algorithms have shown predictive potential for single-subject diagnostic purposes and may thus support personalized medicine approaches. Supervised machine learning algorithms such as support vector machines (SVM) have been used to investigate the potential use of these biomarkers for separating different disorders based on their neural correlates (Grotegerd et al., 2013;Lim et al., 2013;Lueken, Hilbert, Wittchen, Reif, & Hahn, 2015;MacMaster, Carrey, Langevin, Jaworska, & Crawford, 2014;Ota et al., 2013;Pantazatos, Talati, Schneier, & Hirsch, 2014;Schnack et al., 2014;Serpa et al., 2014;Takizawa et al., 2014). Given that GAD and MD do not only show common but also separate neural correlates (Beesdo et al., 2009;Canu et al., 2015;Etkin & Schatzberg, 2011;Oathes, Patenaude, Schatzberg, & Etkin, 2015), machine learning might also be successfully applied to the problem of recognizing GAD patients and separating them from MD patients.
This study aims to use machine learning on multimodal biobehavioral data from a sample of subjects exhibiting GAD, MD, both disorders, or no disorder. In a first step, supervised machine learning based on SVM was used on the entire sample aiming to detect cases versus noncases, that is subjects with a disorder versus healthy comparison subjects (case-classification). In the second step, SVM was used on patients only in order to detect GAD, that is differentiate subjects with GAD only or GAD with comorbid MD from those with MD only (disorder-classification). Clinical questionnaire data, cortisol release, and structural brain data including gray matter (GM) and white matter (WM) volumes were used separately and in combination. Classification based on clinical data was hypothesized to perform well for caseclassification but not for disorder-classification. Given inconsistent results related to cortisol release in GAD and MD, no specific hypotheses were formulated for the hormonal data. Previous work in GAD and MD alone, however, suggested abnormalities related to cortisol for each of these disorders (for reviews see Dedovic & Ngiam, 2015;Hilbert, Lueken, & Beesdo-Baum, 2014;Staufenbiel, Penninx, Spijker, Elzinga, & van Rossum, 2013), therefore classification based on hormonal data was expected to perform above chance level for both classification problems. Brain imaging data were hypothesized to perform well for both classification problems and thus provide incremental value for the detection and classification of GAD. GM volume input data were restricted to anatomically defined brain regions repeatedly reported in the GAD and MD literature (reviews and meta-analyses from Bora, Fornito, Pantelis, & Yucel, 2012;Bora, Harrison, Davey, Yucel, & Pantelis, 2012;Du et al., 2012;Hilbert et al., 2014;Kempton et al., 2011;Lai, 2013;Sacher et al., 2012) as recommended in Chu et al. (2012). As no brain regions were identified that were consistently reported in the GAD and MD literature for WM volume, WM input data were only restricted to anatomically defined WM areas in the brain in general.

| Subjects
A convenience sample of subjects with GAD and/or MD as well as healthy comparison subjects were recruited from the outpatient centre for psychotherapy at the Institute of Clinical Psychology and Psychotherapy at TU Dresden and the general public. Inclusion criteria were a current diagnosis of GAD and/or MD according to DSM-IV-TR criteria (APA, 2000) for the clinical groups or no lifetime diagnosis of a mental disorder for the healthy comparison group. Subjects were excluded due to psychotropic medication, a nonremitted diagnosis of substance dependence or smoking of more than 10 cigarettes per day or inability to safely obtain a MRI scan. As a result, n = 19 subjects with a diagnosis of GAD (n = 12 with comorbid MD), n = 14 subjects with a diagnosis of MD without GAD and n = 24 healthy comparison subjects were included. Current and lifetime diagnoses were determined using the Munich Composite International Diagnostic Interview (DIA-X/M-CIDI; Wittchen & Pfister, 1997) and confirmed by experienced clinicians. Appendix S1 provides an overview about comorbid disorders within the clinical groups. The Penn State Worry Questionnaire (PSWQ; Meyer, Miller, Metzger, & Borkovec, 1990), Beck Depression Inventory-II (BDI; Beck, Steer, & Brown, 1996), Intolerance of Uncertainty Scale-12 (IUS-12; Carleton, Norton, & Asmundson, 2007) and the trait version of the State-Trait-Anxiety-Index (STAI-T; Spielberger, Gorssuch, Lushene, Vagg, & Jacobs, 1983) were used as additional dimensional measures for characterizing all groups. MRI data of the GAD and healthy subjects included in this analysis have been used previously to investigate structural alterations in GAD . While the previous analysis aimed at informing neurostructural disease models, the present analysis chose a complementary view by testing the predictive value of brain morphology as a putative differential diagnostic marker for the indi-

| Analysis of demographic and clinical data
Chi-square tests and univariate analyses of variance were used for the analysis of demographic and clinical data as appropriate. Subsequent post hoc tests were used for pairwise comparisons. The level of significance was set at p < .05. SPSS 23 (IBM, New York, NY, USA) was used for all calculations. Clinical questionnaire sum scores of the PSWQ, BDI, IUS-12, and STAI-T were subsequently used as input data for classification.

| Acquisition and analysis of cortisol data
To determine the cortisol release saliva samples were acquired using Salivettes "code blue" (Saarstedt, Nümbrecht, Germany) at six time points over the course of the experimental procedure, including samples ca. 10 min before scanning, directly before scanning, and after four different MRI scans including three different tasks and a structural scan, covering a total of 100 min. Samples were stored at −20°C until being assayed using a commercial chemiluminescence immunoassay (IBL RE 62011) at the Chair of Biopsychology of the TU Dresden (Prof. Dr. Clemens Kirschbaum). Cortisol values were log-transformed to reach normal distribution. For an estimation of the total cortisol release we calculated the area under the curve with respect to the ground (Fekedulegn et al., 2007;Pruessner, Kirschbaum, Meinlschmid, & Hellhammer, 2003). One subject was excluded from further analyses due to an incomplete cortisol profile.

| Structural MRI data acquisition and preprocessing
Imaging data were acquired on a 3-Tesla Trio-Tim MRI wholebody scanner (Siemens, Erlangen, Germany) with a 12 channel head coil located at the Neuroimaging Center of the TU Dresden.

| Pattern recognition
A total of eight separate classification analyses were conducted, depending on four input data modalities (clinical scores, cortisol data, GM data, WM data) and two classification problems: First, a classifier was trained to correctly classify subjects from both clinical groups (GAD and MD groups) as cases and subjects from the healthy comparison group as noncases. This included 33 subjects with a disorder and 24 HC subjects. Second, only GAD and MD subjects were used and the classifier was trained to correctly classify subjects according to their diagnostic category as GAD subjects (independently of whether comorbidity was present) or MD subjects. This included 19 subjects with GAD and 14 MD-only subjects. The following procedure was applied for all separate analyses: clinical questionnaire scores, cortisol release, GM maps, and WM maps were used as input for the PRoNTo toolbox (http://www.mlnl.cs.ucl.ac.uk/pronto/; Schrouff, Rosa, et al., 2013). For the MRI data analyses, an overall mask was used to restrict analyses to voxels for which every subject was able to provide data. An additional region-of-interest (ROI) mask restricting analysis to the anterior cingulate cortex (ACC), amygdala, prefrontal and orbitofrontal areas, the putamen and nucleus caudate and the hippocampus and thalamus was applied for classification based on GM data given the recommendation to use feature selection based on prior knowledge if prior knowledge is available (Chu et al., 2012). Please see Appendix S2 for an additional wholebrain approach. These regions were anatomically defined according to the automated anatomical labeling atlas (aal; Tzourio-Mazoyer et al., 2002) as implemented in the wfu pickatlas toolbox (Maldjian, Laurienti, & Burdette, 2004;Maldjian, Laurienti, Kraft, & Burdette, 2003). An additional ROI mask restricting analysis to WM according to the talairach daemon as implemented in the wfu pickatlas toolbox (Lancaster, Summerln, Rainey, Freitas, & Fox, 1997;Lancaster et al., 2000;Maldjian et al., 2003Maldjian et al., , 2004 was applied for classification based on WM data. Given the relative lack of prior studies reporting WM volume data in GAD no additional ROIs were used for WM data. Input data were mean centered and normalized for all analyses. SVMs were used for classification within a leave-one-out crossvalidation (LOOCV) framework. Sensitivity, specificity, and balanced accuracy of the resulting classification solution were calculated and permutation tests based on 5,000 iterations were used to assess the level of statistical significance set at p < .05. Weight-maps and rankorders of the regional weight averages were calculated for the GM data as described in Schrouff, Cremers, et al. (2013).
Beyond classification based on a single input data modality all available data was also integrated into a single decision on group membership and tested. Weight-adjusted voting for ensembles of classifiers (WAVE; Kim, Kim, Moon, & Ahn, 2011) weights the results from the single classifiers according to which classifiers performed better on difficult cases (i.e. cases which are often misclassified) and allows for the calculation of classifier-weights and case-weights. The classifier-weights can be used to achieve a final decision. For applying WAVE, however, data in every modality for every subject are needed.
The subject with incomplete cortisol data was therefore excluded from all following analyses. As WAVE requires assessing the performance of the classifiers before the resulting weights can be used on a new case, a nested LOOCV framework was applied for the integration of classifiers, thus guaranteeing independence of predictions. Classifiers were trained and tested using LOOCV to assess the performance of each classifier for each subject and derive the classifier-and subjectweights in an inner fold. Afterwards, classifiers and their corresponding weights were used to classify a new subject neither part of the training nor test sets in an outer fold. This procedure was again rotated in a LOOCV scheme. For significance testing of the combined classification, permutation testing was used as well: the classifier-weights resulting from the inner fold were used on permuted labels and the frequency of resulting predictions that were more accurate than the true prediction were counted. This procedure was done for 5,000 iterations. The p-value was subsequently calculated by dividing the number of better predictions during permutation testing by the number of permutations. Table 1 depicts the sample characteristics per group. GAD (with and without MD), MD and healthy comparison subjects were comparable regarding sex, age, handedness, and education. They were also comparable in smoking status and overall cortisol release. GAD and MD groups showed significantly higher scores compared to the healthy comparison group in all clinical questionnaires. Clinical groups revealed comparable scores in each questionnaire except the intolerance of uncertainty scale-12 (Carleton et al., 2007), for which GAD subjects scored significantly higher than MD subjects. 95.83%; see Figure 1). Case-classification using GM data resulted in a balanced accuracy of 58.71% (p = .09, sensitivity: 75.76%, specificity: 41.67%). Table 3 shows the averaged weights of the brain regions according to aal.

| Case-classification
Particularly the putamen and amygdala were important for caseclassification while areas such as the hippocampus or thalamus were ranked as comparably less important for both classification problems.  Combining data from all four modalities using WAVE resulted in a nominally lower balanced accuracy of 67.46% (p = .0268, sensitivity: 77.78%, specificity: 57.14%) than cortisol or GM accuracy alone.

| Disorder-classification
Classifiers based on cortisol data and GM data were weighted significantly higher than classifiers based on WM and clinical questionnaire data (ps < .001), which were of comparable size (p = .579). Classifiers based on cortisol data were also weighted significantly higher than classifiers based on GM data (p < .001; mean-weights: clinical data: 0.17, cortisol data: 0.37, GM data: 0.28, WM data: 0.18).
Results from the additional analyses using whole-brain data were less accurate in classification than the results based on ROIs for separate classifiers but comparable for the combined approach. Please see Appendix S2 for further details.

| DISCUSSION
Generalized anxiety disorder is a common and impairing disorder but recognition, diagnosis and differentiation from depression is a well-known problem hampering treatment decisions. This proof-of-  Contrary to clinical data, case-classification using cortisol and MRI data yielded only poor results. Nonsignificant results were achieved for GM data, WM data, and cortisol. This finding is in line with previous research. Previous studies reported differences between GAD or MDD compared to healthy controls for cortisol (Bhagwagar et al., 2005;Hek et al., 2013;Hinkelmann et al., 2012;Mantella et al., 2008;Phillips et al., 2011;Steudte et al., 2011;Ulrike, Reinhold, & Dirk, 2013;Vreeburg et al., 2009;Wei et al., 2015) but the exact nature of these differences was mixed and some studies did not find such differences (Burke, Davis, Otte, & Mohr, 2005; see also the meta-analysis by Vythilingam et al., 2004). The heterogeneity of prior studies can be attributed to methodological differences in data collection (e.g. diurnal profiles, awakening response, and stress response) or sample characteristics (e.g. age or comorbidities). We here assessed the cortisol release in a 100 min window during the experimental investigation.
Because MRI scanning can be perceived as stressful situation, including a stress-related cortisol reaction (Muehlhan, Lueken, Wittchen, & Kirschbaum, 2011), our data are rather comparable to a challenging situation instead of "baseline" release or diurnal profiles. Few data are available for GAD patients exposed to challenging situations. A study in adolescents with different anxiety disorders including GAD indicates no differences between GAD and healthy subjects as well (Gerra et al., 2000). The result was contrary to hypotheses for MRI data as brain anatomical differences in GAD and MD in areas such as the amygdala or parts of the basal ganglia have been repeatedly reported (Bora, Fornito, et al., 2012;Bora, Harrison, et al. 2012;Hilbert et al., 2014;Kempton et al., 2011). These areas were also indicated as most important for case-classification in this study. The inability of the SVM to classify cases versus noncases based on cortisol and structural imaging data with better accuracy than questionnaire data alone might be related to the fact that subjects with different mental disorders were accumulated in one group. SVMs are linear classification algorithms. The inclusion of subjects with different disorders in one group also leads to the inclusion of brain scans with anatomical changes in different directions, for example in the case of GAD and MD increased and decreased GM volumes in certain frontal areas. The difficulty of finding a linear decision function to reliably separate these both disorder groups with their partly diverging abnormalities from the mean of HC subjects might explain the poor results achieved for the classification using MRI data here and likewise apply to the cortisol data.

| Differentiating GAD from MD (disorder-classification)
Separation of GAD and MD subjects based on clinical questionnaire data only resulted in poor accuracy. GAD and MD groups were very comparable regarding the range of questionnaire scores with significant differences being present only in the IUS-12. Hence, the IUS-12 most prominently contributed to the disorder-classification. This is in line with the hypotheses derived from studies reporting that GAD is difficult to diagnose in primary care settings (Calleo et al., 2009;Munk-Jorgensen et al., 2006;Vermani et al., 2011;Wittchen et al., 2002). We are not aware of studies in more specialized settings or in settings using more standardized diagnostic instruments, where diagnostic classification may be more accurate. On the other hand, while both the PSWQ and IUS-12 measure constructs closely related to GAD, neither worrying nor intolerance of uncertainty are exclusively related to GAD but also present in MD (Carleton et al., 2012;Chelminski & Zimmerman, 2003;Gentes & Ruscio, 2011;Starcevic, 1995). Screening questionnaires designed specifically for GAD such as the GAD-7, the ASQ or dimensional ratings such as the dimensional anxiety scales for DSM-5 (dimensional scale for GAD: GAD-D; Beesdo-Baum et al., 2012;Lebeau et al., 2012) may therefore be better suited for the task of detecting GAD and might have supported diagnostic classification. Generally, the integration of 14 studies in a meta-analysis by Plummer, Manea, Trepel, and McMillan (2016) indicated good sensitivity and specificity for the detection of GAD in different settings. Clinical interviews such as the SCID (First, Spitzer, Gibbon, & Williams, 1997) or MINI (Sheehan et al., 1998) served as reference. Fewer data are available for the GAD-D for which good sensitivity but only moderate specificity have been reported . Particularly screening instruments such as the GAD-7 might therefore have provided additional information beyond the PSWQ and IUS-12.
Classifiers based on cortisol and MRI data performed better for the disorder-classification. Correct GAD classification rates of 74.60%, 68.05%, and 58.27% were achieved for cortisol, GM, and WM data.
While these accuracies are not sufficient for clinical use at this stage and the proof-of-concept nature of this study has to be kept in mind as well, these findings still provide first evidence that (neuro-)biological markers may provide incremental value supplementing clinical information. Accuracy for disorder-classification is overall comparable to the results of proof-of-concept studies based on structural MRI data in other mental disorders such as MD which were reported to range from 67.6% to 90% (Costafreda, Chu, Ashburner, & Fu, 2009;Mwangi, Ebmeier, Matthews, & Steele, 2012;Patel et al., 2015). It is important to note that classification in this study was successful although a substantial proportion of GAD patients exhibited a comorbid depressive disorder. Inspection of the features related to classification revealed for cortisol that GAD subjects showed significantly lower cortisol release during the investigation. This is in line with the metaanalysis by Burke et al. (2005) indicating cortisol release in response to psychological stress in MD being comparable to healthy subjects but contrary to a report indicating also comparable cortisol release in GAD (Gerra et al., 2000). However, interpretation of the GAD result is somewhat difficult as this is the only study on cortisol release in response to stress in GAD but it consisted only of adolescent subjects and also included other anxiety disorders besides GAD. To our best knowledge, cortisol data has not been used to support classification individual subjects so far.
Inspection of the brain areas associated with classification accuracy based on GM data suggests that mainly frontal and prefrontal areas provided information for differentiating GAD and MD. There is no study directly comparing GM volumes in pure GAD and pure MD, but results from work investigating structural correlates within one of the disorders suggest decreased GM volume in frontal areas for MD (Bora, Fornito, et al., 2012;Bora, Harrison, et al. 2012;Du et al., 2012;Kempton et al., 2011;Sacher et al., 2012), whereas increased GM volume has been reported for GAD (Schienle, Ebner, & Schafer, 2011).
Findings from separate classification were overall in line with the previous literature: clinical information performed well for caseclassification but not for disorder-classification, while the reversed pattern was found for cortisol and MRI data. As a consequence, it seemed reasonable to combine both types of information. Accuracy rates resulting from this combined approach were comparable to accuracy rates resulting from the best respective modality. The combined approach classified more than ninety percent of all subjects correctly as cases and noncases and about two-thirds of all clinical subjects correctly as GAD subjects or pure MD subjects. Additionally, for both the case-classification and the disorder-classification, sensitivity was higher than specificity, that is most cases and GAD subjects were recognized as such. This is advantageous given the consequences which would follow this decision under real-world circumstances, such as intervention. These results indicate that diagnostic markers based on biological information such as cortisol levels or brain anatomy might be helpful for complementing clinical data in the future, mostly in situations where classification is difficult.

| Limitations
There are limitations to the results obtained in this study. The aim of this proof-of-concept paper was to demonstrate how (neuro-)biological data might provide incremental value for the classification of GAD on an individual subject level. The results warrant further attention but do also indicate the need for improving accuracy rates in future studies. At the moment, the potential biomarkers investigated here would likely be outperformed by standardized clinical interviews in terms of accuracy and cost-efficiency. Still, biomarkers bear potential for their usage in clinical contexts of mental disorders as, for example first studies provided promising findings for the prediction of treatment outcomes Levine, Rabinowitz, Uher, & Kapur, 2015;Uher, Tansey, Malki, & Perlis, 2012). The sample size was small in this study, groups were unbalanced and GAD subjects had to be separated only from one other related disorder. Future studies should try to employ larger balanced samples and might want to include other disorders that share some characteristics with GAD as well, such as other anxiety disorders or somatoform disorders. This way, the task of recognizing GAD and separating GAD from other disorders would be harder and show more resemblance to the task in real-world clinical settings. While the inclusion of comorbidity in general made the clinical groups more heterogeneous and might therefore have reduced the accuracy of the classification, it also enhanced the similarity of the study samples to clinical GAD populations and therefore ensures the ecological validity of this investigation. Testing the classifiers on a second and independent dataset instead of in a LOOCV scheme would additionally increase the degree to which results can be generalized and are externally valid. Classification accuracy might also be improved by including specific screening instruments for GAD such as the GAD-7 or GAD-D in the questionnaire modality or by including more sophisticated methods of measuring WM characteristics such as diffusion tensor imaging in neuroimaging modality (Wen, Steffens, Chen, & Zainal, 2014). Furthermore, future studies could include further modalities such as functional MRI data, (epi)genetic data, or behavioral data.

| CONCLUSIONS
In this proof-of-concept study, we investigated the ability to accurately classify subjects according to the presence of a mental disorder and according to the presence of GAD using clinical questionnaire data, cortisol data, and structural MRI data. Results showed that cortisol and MRI data were particularly able to provide incremental value to the disorder-classification of GAD subjects beyond clinical questionnaire data alone. Classification based on combined data resulted in significant accuracy rates as well. Thus it seems possible that MRI data might be able to facilitate the correct diagnosis of GAD in the future. Further research on this question is warranted.