Identifying resting‐state effective connectivity abnormalities in drug‐naïve major depressive disorder diagnosis via graph convolutional networks

Abstract Major depressive disorder (MDD) is a leading cause of disability; its symptoms interfere with social, occupational, interpersonal, and academic functioning. However, the diagnosis of MDD is still made by phenomenological approach. The advent of neuroimaging techniques allowed numerous studies to use resting‐state functional magnetic resonance imaging (rs‐fMRI) and estimate functional connectivity for brain‐disease identification. Recently, attempts have been made to investigate effective connectivity (EC) that represents causal relations among regions of interest. In the meantime, to identify meaningful phenotypes for clinical diagnosis, graph‐based approaches such as graph convolutional networks (GCNs) have been leveraged recently to explore complex pairwise similarities in imaging/nonimaging features among subjects. In this study, we validate the use of EC for MDD identification by estimating its measures via a group sparse representation along with a structured equation modeling approach in a whole‐brain data‐driven manner from rs‐fMRI. To distinguish drug‐naïve MDD patients from healthy controls, we utilize spectral GCNs based on a population graph to successfully integrate EC and nonimaging phenotypic information. Furthermore, we devise a novel sensitivity analysis method to investigate the discriminant connections for MDD identification in our trained GCNs. Our experimental results validated the effectiveness of our method in various scenarios, and we identified altered connectivities associated with the diagnosis of MDD.

Globally, the total years lived with disability (YLD) of depressive disorders was 7.5% among all YLD, which has been ranked the highest of all disease (World Health Organization, 2017). Hence, depressive disorders are the leading cause of disability.
Despite the debilitating effects of MDD, the diagnosis of MDD is still made by phenomenological approach. Given the proximity to the psychiatric symptoms in terms of mood and cognitive dysregulation, brain MRI has been used to investigate the neural mechanisms of MDD (Kempton et al., 2011). Specifically, resting-state functional magnetic resonance imaging (rs-fMRI) has been widely used for the diagnosis of MDD by investigating altered functional networks while a subject is at rest (Anand et al., 2005;Craddock, Holtzheimer, Hu, & Mayberg, 2009;Greicius et al., 2007). In the meantime, more recently, the investigation of dynamic changes between connections beyond simple correlations has been attracting increasing interest (Geng, Xu, Liu, & Shi, 2018;Rolls et al., 2018). The notion of effective connectivity (EC) describes the influence of one neural system on another (Friston, Ungerleider, Jezzard, & Turner, 1994), in contrast to functional connectivity (FC) that denotes intrinsic correlations.
Several studies have revealed that the EC may be used as an efficient biomarker for the diagnosis of MDD. Specifically, (Schlösser et al., 2008) found that adolescents suffering from MDD exhibited a significant difference in EC between the amygdala and subgenual anterior cingulate cortex (ACC) during an emotion-relevant task. In addition, Geng et al. (2018) directly utilized both FC and EC measures as features for the diagnosis of MDD and established that the discriminative power of EC features is higher than that of FC features. More recently, using a large sample size (336 patients with MDD and 350 control subjects), Rolls et al. (2018) identified significantly altered EC measures in MDD, such as reduced connectivity from temporal lobe areas to the medial orbitofrontal cortex. These findings imply that the EC measures are beneficial for determining if it is altered in neurological disorders, in addition to FC in the resting-state paradigm in neuroimaging.
Several approaches such as dynamic causal modeling (DCM) (Park & Friston, 2013) and Granger causality (GC) (Granger, 1969) have been suggested for estimating EC. DCM is a commonly used approach; however, it requires the selection of seed regions of interest (ROIs) that are widely known as discriminant biomarkers in relevant literature rather than the whole brain connectivity due to computational complexity (Geng et al., 2018). GC, owing to its simplicity and ease of implementation, has been widely used to estimate the EC (Hamilton, Chen, Thomason, Schwartz, & Gotlib, 2011;Liao et al., 2011;Wu & Marinazzo, 2015). However, studies have shown that EC estimations given by GC cannot correctly determine the intensity of the actual causality in the time domain (Hu et al., 2012). In the meantime, structural equation modeling (SEM) (McIntosh, Rajah, & Lobaugh, 1999) has been successfully used as a statistical approach for investigating the EC (Büchel & Friston, 1997;Penny, Stephan, Mechelli, & Friston, 2004;Suk, Wee, Lee, & Shen, 2015;Wee, Yap, Zhang, Wang, & Shen, 2014;Zhuang, Peltier, He, LaConte, & Hu, 2008). The original work of SEM requires a large sample size to model complex relationships between brain activities.
In recent years, beyond the group-level analyses, there has been growing interest in using machine learning (ML) techniques to identify clinically meaningful phenotypes for clinical diagnosis. A typical ML pipeline for the diagnosis of MDD can be summarized as follows: feature extraction, feature selection, model training, classification, and performance evaluation. In studies that differentiate MDD patients from healthy controls (HC), the following have been used as features extracted from rs-fMRI: spatial independent components (Ramasubbu et al., 2016;Wei et al., 2013), the Hurst exponent (Jing et al., 2017), degree centrality , and regional homogeneity (Ma, Li, Yu, He, & Li, 2013). In addition, many previous studies also applied graph theory approaches (Bhaumik et al., 2017;Cao et al., 2014;Drysdale et al., 2017;Guo et al., 2014;Lord, Horn, Breakspear, & Walter, 2012;Sundermann et al., 2017;Wang, Ren, & Zhang, 2017;Yoshida et al., 2017;Zeng, Shen, Liu, & Hu, 2014;Zhong et al., 2017) to the preestimated FC for investigating the disrupted functional brain networks in MDD patients. A small number of MDD classification studies have utilized EC as the feature. In Geng et al. (2018), EC was estimated using spectral DCM with predefined ROIs, and then, it was used as the feature for MDD classification; in this case, four supervised learning classifiers are used: linear support vector machine (SVM), nonlinear SVM, linear regression, and k-nearest neighbor.
Recently, graph-based approaches have gained popularity in medical applications owing to their ability to accommodate complex pairwise similarities in imaging/nonimaging features between subjects (Parisot et al., 2018). They model individuals as vertices and associations or similarities between them as edges, which have been widely used for supervised (e.g., classification (Tong et al., 2017)) and unsupervised tasks (e.g., manifold learning (Brosch & Tam, 2013;Wolz et al., 2012) and clustering (Parisot et al., 2016)). In this study, we focus on disease classification using a graph-based model. In particular, a generalization of convolutional neural networks (CNNs) to an irregular graph domain, called spectral graph convolutional networks (GCNs), has been successfully applied to perform brain disease classification (Parisot et al., 2018). Specifically, (Parisot et al., 2018) utilized a population graph for GCNs, where a vertex represents a subject and an edge encodes pairwise similarities of phenotypic data and/or imaging features between subjects. This combines imaging and nonimaging data in a single framework and delivers competitive classification performance.
In this study, we go beyond the FC toward an EC-based approach using a group sparse representation leveraged with SEM in an unsupervised manner. Specifically, this group-constrained sparsity imposes similar connectional patterns among subjects but maintains individual differences in correlation weights. To identify MDD, inspired by Parisot et al. (2018), we exploit the spectral GCNs based on the population graph to successfully integrate our EC features and nonimaging demographic features. Furthermore, we devise a sensitivity analysis (SA) method for our learned GCNs to investigate discriminant EC measures for MDD identification. Through various scenarios, our experimental results validate the effectiveness of the proposed method in terms of extracted features, feature selection, and classifiers. Our main contributions can be summarized in two aspects as follows: • We estimated EC by using a whole-brain data-driven approach with low computational costs through group-constrained sparsity leveraged with SEM-like mechanism and used it for the diagnosis of MDD via GCNs for the first time.
• In addition to superior experimental results for MDD identification, through an SA for our learned GCNs, we successfully identified meaningful connectivities associated with the diagnosis of MDD that have been reported in psychiatry literature.

| Participants
We collected the rs-fMRI from 29 drug-naïve MDD patients recruited from the outpatients of the Korea University Anam Hospital (Seoul, Republic of Korea). These patients included 8 males and 21 females; their ages ranged from 19 to 60 years, and the mean age was 43.79 years (±13.06). The outpatients were prospectively recruited as participants who agreed to visit the clinic after 4 weeks, 8 weeks, and 6 months. We defined drug-naïve MDD patients based on the following two criteria: (a) those who were consistently diagnosed with MDD over the visits, and (b) those who had no record of prescribed medicine due to depressive symptoms at their first visit. The diagnosis was determined by board-certified psychiatrists based on the Structured Clinical Interview from the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) Axis I disorders. Basic demographic and clinical information such as family history of MDD and education level were acquired during the psychiatric interview at the clinic. The severities of depressive symptoms in all the participants were assessed using the 17-item Hamilton Depression Rating Scale (HDRS-17) (Hamilton, 1960) that reflects the degree of depression.
The participants, at each visit, were assessed using the HDRS-17, and MRI scanning was performed at the first visit.
A total of 44 HCs (17 males; 27 females) were recruited from the community; their ages ranged from 21 to 58 years. The recruitment was made with the help of an advertisement for those who voluntarily responded. The similar psychiatric diagnosis was carried out for HCs who were confirmed with none of any current symptoms and past history of psychiatric disorders. For both the groups, the participants who satisfied the criteria such as comorbidity of any other major psychiatric disorders, expressing psychotic features (i.e., delusion, hallucination), having a history of a serious or unstable medical illness including any primary neurological illness, and exhibited any contraindication to MRI scanning (e.g., metal implants) were considered inapplicable to the study. The protocol of the study was approved by the Institutional Review Board of Korea University Anam Hospital. In accordance with the Declaration of Helsinki, all the 73 participants signed a written informed consent prior to participating in the study.
All participants were acknowledged thoroughly to drop out of the study at any stage, but there was no participant who dropped out.
The demographic information is summarized in Table 1.
There have been consistent evidences that patients with MDD had lower educational attainment as compared to HCs (Lorant et al., 2003). This means that lower educational level is one of the essential components of MDD which could not be separable from the diagnosis of MDD. So, in regard to the significant difference (p-value = .018) between two groups in the education level, the distribution of the educational level between the two groups seems to appropriately reflect real-world clinical situations. The unbalanced distribution of the educational level between the two groups would influence the classification results. However, there is no reason not to utilize nonneuroimaging data with neuroimaging data in one classification model. In clinical psychiatry, ML-based approach primarily aims to build pragmatic model so that it can help psychiatrists to diagnose and treat mental disorders (Steele & Paulus, 2019 to take full advantage of available data and maximize the performance of the classification model. In our method, we combine imaging and phenotypic data such as educational level in a single framework by constructing GCNs to enhance the classifying performance.

| Data acquisition
Volumetric structural MRI scans were acquired using a 3.0 Tesla Siemens Trio whole-body imaging system (Siemens Medical Systems, Iselin, NJ). A T1-weighted magnetization-prepared rapid gradient-

| Preprocessing
We preprocessed data samples using the Data Processing Assistant

| METHODS
In this section, we describe our experimental approaches for distinguishing drug-naïve MDD patients from HCs based on rs-fMRI time series. As shown in the overall procedure ( Figure 1), we first estimate EC by a group sparse representation along with SEM in an unsupervised manner. This allows to impose similar connectional patterns among subjects but maintain individual differences in their network characteristics. We transform the estimated connectivity map into a vectorial feature space and further reduce its dimension based on statistically significant features while eliminating the redundant and less informative features in a univariate manner. The selected imaging feature vector and the phenotypic information (e.g., age, gender, etc.) of the subjects are incorporated into a population graph that F I G U R E 1 Overall framework of the proposed method for MDD identification. Test samples were marked with gray boxes to indicate that the test sample labels are never used during training. GCNs, graph convolutional networks; GSL, group-constrained Sparse LASSO; MDD, major depressive disorder; SEM, structural equation model forms the basis for our GCNs. A vertex represents each subject's acquisition, and an edge weight encodes the pairwise similarities of phenotypic information. By operating the spectral graph convolutions through the layers, the GCNs perform a binary classification between the MDD patients and HCs. In addition to MDD identification, we further introduce an SA method for our trained GCNs to detect discriminative EC measures.

| Sparse estimation of EC
To estimate the fMRI-derived features in the ML pipeline of MDD diagnosis, FC coefficients have been typically used (Bhaumik et al., 2017;Sundermann et al., 2017;Wang et al., 2017;Yoshida et al., 2017;Zhong et al., 2017). However, to validate the potential of the EC as a biomarker, we estimate the EC coefficients by leveraging the concept of SEM (Suk et al., 2015;Wee et al., 2014). Assume that a sequence of T-length mean time series of rs-fMRI from R ROIs is provided for subject n, that is, In this study, we hypothesize that the response of an ROI can be represented by a linear combination of those of other ROIs. That is, given the time course of the other ROIs excluding a target rth ROI, X nr n R T × R−1 ð Þ , we can formulate the time course of the target ROI as x r n = X nr n w nr n + e , where w nr n R R− 1 is a regression coefficient vector, and e is a zero-mean Gaussian distributed error vector. It should be noted that these learnable regression cate the causal relations between a target ROI and the other ROIs.
Further, motivated by a recent study (Supekar, Menon, Rubin, Musen, & Greicius, 2008) that validated the effect of sparsity constraints for detecting robust connections from noisy connectivities, we apply a group-constrained sparse least absolute shrinkage and selection (LASSO) (Wee, Yap, Zhang, Wang, & Shen, 2012) into our estimation of the EC. This sparse representation through ℓ 1 -norm penalization can provide a biologically plausible interpretation, following the fact that a brain region typically forms relatively few numbers of connections.
Hence, the objective function, ℒ(W \r ), is defined as follows: where α > 0 is a regularization parameter that indicates the magnitude of sparsity and k Á k 2,1 denotes an ℓ 2,1 -norm. The ℓ 2,1 -norm is derived from the summation of ℓ 2 -norms of kw nr n k 1 that is an individually imposed ℓ 1norm for each subject. This group-constrained sparsity not only captures the consistent characteristics among subjects, but also retains intersubject variability. It is noteworthy that self-to-self connections are ignored by filling the rth element with zeros for each ROI, where we newly defineŴ nr 1:N R R × N . The resulting unsupervised representation,Ŵ  In this study, we define f n described in Section 3.1 as our feature vector for each vertex. Regarding the adjacency matrix, we consider the similarities of both imaging and nonimaging phenotypic features (e.g., age, gender) between subjects (Parisot et al., 2018). Given a set of H phenotypic measures p n = p h n È É H h = 1 for subject n, each weight W ij between subject i and j is defined as follows:

| Population graph construction
where σ is a predefined kernel width of a Gaussian similarity function.
With respect to δ(Á), it depends on the type of phenotypic measure.
For example, δ(Á) is defined as the Kronecker delta function for categorical measures (e.g., subject's gender) or the unistep function for quantitative measures (e.g., subject's age) satisfying 1 iff j p h i − p h j j < γ; 0 otherwise, where γ is a threshold to be determined. Therefore, according to Equation (2), the edge weights increase when two subjects have a high similarity of vertex feature vectors and/or phenotypic measures. It is noteworthy that this population graph incorporates not only nonimaging features, but also imaging features, compared with many existing studies that use only imaging features for brain disease prediction.

| Graph convolutional networks for MDD identification
After constructing the population graph represented in Section 3.2, we learn the GCNs to predict the target labels of MDD/HC. To this end, we introduce a spectral graph convolution as the main building block in GCNs, which generalizes the conventional convolution operation in the Euclidean domain to irregular graphs. It requires the eigendecomposition of the graph Laplacian (Chung & Graham, 1997) to be computed, followed by a graph Fourier transform (GFT) (Shuman, Narang, Frossard, Ortega, & Vandergheynst, 2013).
First, our population graph is represented by its Laplacian matrix . Particularly, it can be normalized as ℒ = I N − D − 1=2 WD − 1=2 , where I N R N × N is an identity matrix, and the eigenvalues belong to the range of [−1, 1]. Accordingly, ℒ contains information about the connections between subjects and their similarities.
Following the property of the GFT, given vertex features F and a filter g θ that is a diagonal matrix parameterized with Fourier coefficients θ R N , the spectral convolutions are operated in the Fourier domain as g θ * F = g θ (ℒ)F = g θ (UΛU > )F = U g θ (Λ)U > F. Specifically, in this study, we apply filter approximation by representing g θ (Λ) as a Kth order Chebyshev polynomial function of the eigenvalues (Defferrard, Bresson, & Vandergheynst, 2016;Hammond, Vandergheynst, & Gribonval, 2011) is a set of polynomial coefficients. This provides the benefits of K-localization and cost-effective computation of convolution. Thus, the convolution can be rewritten as follows: On the basis of the spectral graph convolution, the overall model comprises multiple convolutional layers and a fully connected layer for the final prediction. In terms of the convolutional layer, layer-wise activations are propagated, thus resulting in the representation of the jth output graph for the (l + 1)th layer activation from the lth layer activation, as follows: where σ(Á) is a nonlinear activation function such as a rectified linear unit (ReLU) and θ i,j k is the (F in × F out ) vector of polynomial coefficients to be learned, and b l ð Þ j denotes the (1 × F out ) bias vector in the lth layer. Here, we assume that by the GCN training, the vertices connected with high edge weights become more similar as they pass through multiple layers .
Finally, the final prediction layer comprises the fully connected layer followed by a softmax activation function. That is, the GCNs output a prediction labelŷ n that describes the brain state (e.g., MDD or HC) of a subject n.
After training the GCNs, during the test, test samples are predicted with labels that maximize the probabilities of the softmax output.

| Sensitivity analysis for interpretation of GCNbased prediction
Many previous works have developed the methods to explain the predictions of deep learning models such as SA (Baehrens et al., 2010;Simonyan, Vedaldi, & Zisserman, 2013) and layer-wise relevancepropagation (Bach et al., 2015), and so forth. Recently, SA has been used in various applications such as medical diagnosis (Khan et al., 2001) and ecological modeling (Gevrey, Dimopoulos, & Lek, 2003), and so forth. However, to the best of our knowledge, interpretation techniques for GCNs have not been investigated yet.
Thus, we devise a novel SA method for analyzing our trained GCN model. That is, in addition to the diagnosis, it provides an interpretation of what enables the GCNs to reach their individual predictions, thus allowing the identification of significantly altered EC measures in MDD patients.
SA is a gradient-based model interpretation method. As shown in the Figure 2, it computes the norm k Á k q over partial derivatives for a differentiable prediction function with respect to the input (i.e., a sensitivity of the prediction based on the changes in the input). Given our prediction function g and the vertex feature input f n for subject n, relevance scores in SA are defined as follows: where k Á k q is the norm of the partial derivative. To represent the magnitude to which variations of the input contribute to the output, the ℓ 1 or ℓ 2 -norm can be used (Kardynska & Smieja, 2016). A high relevance score implies that changes in the EC value influence the diagnosis of MDD significantly.

| EXPERIMENTAL SETTINGS AND RESULTS
In this section, we validate the effectiveness of the proposed method for MDD identification by considering the following scenarios: (a) using FC or EC as features, (b) applying the feature selection or not, and (c) using GCNs or other ML method as a classifier.
Furthermore, we identify the discriminant connectivities from the magnitude of resulting relevance scores in our SA method. All the codes are available at "https://github.com/ejju92/EC_GCN."

| Experimental settings
For performance evaluation, we took a 10-fold stratified crossvalidation technique (Bishop, 2006 SVM is exploited, which is a widely used classifier for brain disease diagnosis (Chen et al., 2016;Craddock et al., 2009;Fan et al., 2011).
The SVM estimates an optimal hyperplane that best separates the two classes. We selected the model parameter C that balances between a regularization term in the set of {10 −5 , 10 −4 , …, 10 4 } by nested cross-validation.

F I G U R E 2 A schematic diagram of sensitivity analysis (SA) for our trained graph convolutional networks (GCNs). Gray lined arrows represent forward computation for major depressive disorder (MDD)/healthy control (HC) prediction, and purple dashed arrows denote gradient backpropagation of prediction with respect to input, resulting in the relevance scores
For the deep learning method, we evaluated BrainNetCNN (Kawahara et al., 2017) and discriminative/generative long short-term memory (LSTM-DG) (Dvornek, Li, Zhuang, & Duncan, 2019). The BrainNetCNN (Kawahara et al., 2017) is based on a CNN framework to capture the topological locality of structural brain networks. By taking the connectivity matrix as input, it uses novel edge-to-edge, edge-to-node, and node-to-graph convolutional filters for neurodevelopment prediction. With respect to the LSTM-DG (Dvornek et al., 2019), i.e., joint LSTM-DG network, it performs a multi-task learning of brain disorder identification and rs-fMRI time-series data generation, given the rs-fMRI ROI time-series as input.
When calculating the relevance scores in the SA, we used the ℓ 1 -norm that is the absolute of the partial derivative.

| Performance results and analysis
For a quantitative evaluation of the comparable scenarios illustrated in Section 4.1, we considered the following metrics: • ACCuracy (ACC) = (TP + TN)/(TP + TN + FP + FN).
where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. Specifically, higher values of the sensitivity and specificity represent the lower chances of misdiagnosing each clinical label. We summarized the experimental results under various conditions in Table 2.
As presented in Table 2, our method of GCNs w/LASSO demonstrated the best performance with respect to all the metrics, compared to other competitive methods including SVM, BrainNetCNN (Kawahara et al., 2017), and LSTM-DG (Dvornek et al., 2019). From the experimental results, the following findings can be inferred: feature selection helps improve the performance in all scenarios. In particular, the effect of feature selection resulted in significant performance gains for high dimensional (R 2 ) EC feature vector, which is approximately twice higher than that of FC (R × (R − 1)/2) given To demonstrate the statistical power of our method, we conducted a power (1-probability of Type II error) analysis with R package (Kohl, 2019) that is based on a previous research (Flahault, Cadilhac, & Thomas, 2005). As shown in Table 2, the mean sensitivity (SD) of our classifier generated from 10-fold cross-validation is 0.566 ± 0.300. As the formula of a confidence interval is mean AE Z SD ffiffi In addition, in order to validate whether any observed difference between the proposed method and others is statistically significant, we conducted the McNemar' statistical test. We observed that the proposed method outperformed statistically (p − value < .05); the competing methods of BrainNetCNN (Kawahara et al., 2017) and GCNs for EC feature, SVM, GCNs, GCNs w/LASSO for FC feature, and LSTM-DG (Dvornek et al., 2019).
We compared the computational time 4 of the proposed method with that of our comparative methods in terms of training and test time (second) per epoch, as presented in Table 3. We measured the time on a NVIDIA GTX 1070 GPU. It is noteworthy that as our GCNs are tuning network parameters in a transductive manner, basically the learning process occurs in a testing phase only. Thus, the training and test time is identical.
Furthermore, we conducted a comparative experiment to estimate EC through GC analysis (GCA) for comparison with that of our proposed method. By using the estimated EC as feature, we performed MDD identification using GCNs, SVM, and BrainNetCNN (Kawahara et al., 2017) as classifier. The results are summarized in Table 4. It is noteworthy that with the GCA features, our proposed method was still superior to the competing methods in ACC, SEN, and AUC.

| SA-based interpretation
As described in Section 3.4, we conducted the SA for our GCNs to identify significantly altered EC measures in MDD patients compared to HCs. From the SA, we obtained the relevance scores estimated for N subjects, R = R n f g N n = 1 . Here, after averaging them over all subjects, the mean relevance scoresR were considered for analysis. Specifically, to investigate the discriminative EC measures, we selected the connectivities whose relevance scores were higher than (μ + 1.5 * σ), where μ and σ denote the mean and SD of the mean relevance scores, respectively. The selected connections are presented in Table 5 and  Table A2.
We examined the resulting LASSO coefficients for 13 connectivities chosen in the SA, as presented in Table 5. Considering that the mean coefficient for 107 connectivities is −0.00024, it is noteworthy that the coefficients for 13 connectivities have significantly high values and thus we believe that our GCNs well captured the informative features and their relations.

| DISCUSSIONS
In this study, we successfully distinguished drug-naïve MDD patients from HCs using GCNs. Hitherto, ML algorithms have been widely used for diagnosing MDD (Gao, Calhoun, & Sui, 2018). The accuracies of the performances ranged from good to excellent. For example, Lord et al. (Lord et al., 2012) and Wang et al. (Wang et al., 2017) reported 99.0 and 95.0% accuracy, respectively. Therefore, from the sheer number of reported accuracies, the difference in performance between ours and previous studies appears slight.
However, two distinguished features ensure the intrinsic reliability of our results. One is that we conducted a diagnostic evaluation of participants in the drug-naïve state. Measuring neuroimaging materials in the drug-naïve state is substantially important because drugs such as antidepressants have substantial effects on the structural (Dusi, Barlati, Vita, & Brambilla, 2015) and functional (Wessa &  indexes. To avoid the potential pitfall of cross-sectional design, it is necessary to ensure longitudinal diagnostic stability. However, if the participation in the study is postponed until 1 or 2 years after the initial diagnosis, the confounding effects of the antidepressants can become problematic. Therefore, as suggested in a recent review (Kim & Na, 2018), we partially solved this issue using the MRI of participants whose diagnostic stability were confirmed for at least 6 months. Many previous ML studies did not provide reliable information of these critical methodological issues. Both the aforementioned studies that reported better discriminating performances than our results (Lord et al., 2012;Wang et al., 2017) did not mention the selection procedure of participants in terms of longitudinal diagnostic instability. Regarding antidepressants medication, one study reported that all the participants were taking antidepressants (Lord et al., 2012), and another study did not provide medication-related information. We believe that the well-defined selection process of the participants rendered our results more reliable than those of previously conducted studies.  (Epstein, 2008;Mitchell, Czajkowski, Zhang, Jeffery, & Nelson, 2018); they are crucial in emotion regulation (Bubb, Kinnavane, & Aggleton, 2017;Maddock, 1999). Animal studies revealed that the retrosplenial cortex receives inputs primarily from the parahippocampal and prefrontal cortex (Sugar, Witter, van Strien, & Cappaert, 2011;Suzuki & Amaral, 1994). Indeed, the retrosplenial cortex is activated more than other regions in response to negative emotional words (Maddock & Buonocore, 1997). A possible mechanism by which the disturbed connectivity between the retrosplenial and parahippocampal cortices contribute to the MDD is through associative functions. Both the retrosplenial and parahippocampal cortices play a key role in the processing of contextual associations in MDD (Harel, Tennyson, Fava, & Bar, 2016). Broad scope and lively association exhibit a reciprocal relationship with positive mood and increased activity; narrow scope and ruminative pattern of thoughts tend to be associated with depressed mood, pessimistic thoughts of the future, and decreased energy (Bar, 2009;Harel et al., 2016;Nolen-Hoeksema, 2000). We speculate that the decoupling of the retrosplenial and parahippocampal can result in inappropriate associative processing that, in turn, contributes to the negative view of future.

| Limitations
This study has a few limitations that must be noted. First, the sample size (29 MDD patients and 44 HCs) may not be sufficiently large. Indeed, a recent study reported the characteristics of EC from the rs-fMRI of MDD patients (n = 336) as compared to HC (n = 350) (Rolls et al., 2018). However, a fundamental difference exists between the previous study and our study. Whereas the F I G U R E 3 Discriminative effective connectivities from the sensitivity analysis (SA) of our graph convolutional networks (GCNs). Each color denotes the following brain networks: (1) central visual network, (2) peripheral visual network, (3) somatomotor network, (4) dorsal attention network, (5) salience/ventral attention network, (6) limbic network, (7) control network, (8) default network, and (9) temporal parietal network. All the above networks follow 17 brain networks defined in the study of Thomas Yeo et al. (2011) previous study primarily examined the characteristics of EC in MDD via group-level analysis, we aimed to discriminate MDD patients from HCs using the individual-level approach. To the best of our knowledge, a GCN-based deep learning model for distinguishing MDD patients from the HCs has not been developed. Second, detailed sociodemographic variables (e.g., marital status, cohabitation, and socioeconomic status) and clinical variables (e.g., current and past suicide attempt, family history of psychiatric disorder, and/or suicide death) were not fully obtained in the MDD group.
Third, we discussed abnormal EC (e.g., disturbed bidirectional connectivity between parahippocampal and retrosplenial cortices) in relation with the characteristic symptoms of MDD (e.g., negative scope and rumination). However, we could not directly confirm such connections between EC and symptomatology in the case of MDD.
Future studies require a larger sample size and relevant instruments for the investigation of symptoms.

| CONCLUSION
In this study, we successfully estimated EC from rs-fMRI and developed the GCN model for discriminating drug-naïve MDD patients from HCs. We empirically exhibited the superiority of our method in various MDD classification scenarios, in terms of extracted features, feature selection, and classifiers. Because the performance ability did not provide any insight into the discriminant connectivity for the diagnosis of MDD, we devised a novel interpretation approach of our trained GCNs. Specifically, we applied the SA for the GCNs and selected the connectivities with high relevance scores. From the results of the SA, we could successfully identify regions that were previously identified as those associated with the MDD symptoms in the psychiatry literature. Thus, our results showed that EC may be promising for building deep learning-based models in the field of neuroimaging. Further studies with a larger sample size are required to validate our findings.