Predicting escitalopram monotherapy response in depression: The role of anterior cingulate cortex

Abstract Neuroimaging biomarkers of treatment efficacy can be used to guide personalized treatment in major depressive disorder (MDD). Escitalopram is recommended as first‐line therapy for MDD and severe depression. An interesting hypothesis suggests that the reconfiguration of dynamic brain networks might provide important insights into antidepressant mechanisms. The present study assesses whether the spatiotemporal modulation across functional brain networks could serve as a predictor of effective antidepressant treatment with escitalopram. A total of 106 first‐episode, drug‐naïve patients and 109 healthy controls from three different multicenters underwent resting‐state functional magnetic resonance imaging. Patients were considered as responders if they had a reduction of at least 50% in Hamilton Rating Scale for Depression scores at endpoint (>2 weeks). Multilayer modularity framework was applied on the whole brain to construct features in relation to network dynamic characters that were used for multivariate pattern analysis. Linear soft‐threshold support vector machine models were used to separate responders from nonresponders. The permutation tests demonstrated the robustness of discrimination performances. The discriminative regions formed a spatially distributed pattern with anterior cingulate cortex (ACC) as the hub in the default mode subnetwork. Interestingly, a significantly larger module allegiance of ACC was also found in treatment responders compared to nonresponders, suggesting high interactivities of ACC to other regions may be beneficial for the recovery after treatment. Consistent results across multicenters confirmed that ACC could serve as a predictor of escitalopram monotherapy treatment outcome, implying strong likelihood of replication in the future.

modulation across functional brain networks could serve as a predictor of effective antidepressant treatment with escitalopram. A total of 106 first-episode, drug-naïve patients and 109 healthy controls from three different multicenters underwent resting-state functional magnetic resonance imaging. Patients were considered as responders if they had a reduction of at least 50% in Hamilton Rating Scale for Depression scores at endpoint (>2 weeks). Multilayer modularity framework was applied on the whole brain to construct features in relation to network dynamic characters that were used for multivariate pattern analysis. Linear soft-threshold support vector machine models were used to separate responders from nonresponders. The permutation tests demonstrated the robustness of discrimination performances. The discriminative regions formed a spatially distributed pattern with anterior cingulate cortex (ACC) as the hub in the default mode subnetwork. Interestingly, a significantly larger module allegiance of ACC was also found in treatment responders compared to nonresponders, suggesting high interactivities of ACC to other regions may be beneficial for the recovery after treatment. Consistent results across multicenters confirmed that ACC could serve as a predictor of escitalopram monotherapy treatment outcome, implying strong likelihood of replication in the future.

| INTRODUCTION
Major depressive disorder (MDD) affects approximately 216 million individuals during their lifetime (Katikireddi, 2017) and constitutes the second leading course of years lived with disability worldwide (Global Burden of Disease Study, 2015). Antidepressants are commonly used to treat MDD, especially for those with moderate to severe depression.
Escitalopram is an allosteric selective serotonin reuptake inhibitor (SSRI) that exhibits a superior efficacy and tolerability compared to other SSRIs in several randomized controlled trials (Cipriani et al., 2009;Sanchez, Reines, & Montgomery, 2014). On average, it takes 8-12 weeks for evaluating the treatment efficiency of individuals with MDD. The response rates of initially administered antidepressants normally ranges 50-75% (Listed, 2000). However, numerous studies have suggested that treatment with antidepressants may lead to a response rate of only 30-40%, resulting in a large number of patients with continued "alternative lifestyle" (Crane et al., 2017;Holtzheimer & Mayberg, 2011;Trivedi et al., 2005;Williams, 2017). In cases where patients do not respond to one SSRI, treatment can either be switched to another antidepressant (Whooley & Simon, 2000) or to an atypical antidepressant, which could conservatively be more effective than SSRIs (Papakostas, Thase, Fava, Nelson, & Shelton, 2007;H. R. Wang et al., 2015). However, it is noteworthy that every medication trial takes weeks, prolonging the disability. In the meantime, frustration due to delays in remission increases the risk of adverse outcomes such as suicide. If treatment response of individuals could be predicted, clinical intervention would be optimized.
Convergent neuroimaging biomarker studies focused on the main underlying pathophysiologic processes in depression provided a better understanding of treatment mechanism and a possibility for the development of personalized treatment strategies (Phillips et al., 2015). A systematic review (Dichter, Gibbs, & Smoski, 2015) linking restingstate functional magnetic resonance imaging (fMRI) with treatment response reported an increase in functional connectivity (FC) between frontal and limbic regions in responders. A higher connectivity within cognitive control network (CCN) and a negative correlation of the anterior cingulate cortex (ACC) with the subcallosal cortex (Kozel et al., 2011) were reported to be able to identify the preferred treatments for individuals with MDD. Cerebellar connectivity  and interaction among visual recognition circuits (L.-J. Wang, Kuang, Xu, Lei, & Yang, 2014) were also suggested to be potential predictors. In addition, intra/inter hyperconnectivity in relation to default mode network (DMN) has been identified in treatment resistant depression compared to treatment sensitive depression. This potentially suggested that lower DMN related connectivity is a feasible predictor of effective antidepressant medications (Ma et al., 2012).
Beata and others (Godlewska et al., 2018) have also shown that pretreatment pgACC activity is predictive of response to escitalopram after 6 weeks. FC of subcallosal cingulate cortex could identify individuals' treatment outcome to cognitive behavioral therapy (CBT) and antidepressant medicine .
However, it is important to note that most of these studies that examined the antidepressant effects on the connectivity or circuit level rarely explored the underlying dynamic characteristics. Recent studies have revealed brain modular-level abnormalities in MDD patients (Tao et al., 2013;Tian et al., 2019;Zheng et al., 2017), suggesting that modular-related properties may be more sensitive than regional properties in reflecting brain alterations. In addition, the dynamic brain network reconfiguration of schizophrenic patients with antipsychotic medication was reported to have a significant "network hyper-flexibility" (Braun et al., 2015). Therefore, the topic is of particular interest in light of growing understanding that MDD is not only associated with abnormalities of a single or independent brain region, but also with systems level dysfunction affecting discrete functionally integrated neural circuits. An advanced study should be able to extend the conventional methodological framework for a better understanding of the discrepancy concerning mechanisms of MDD.
Truly, the two complementary principles of large-scale brain networks namely, functional segregation and dynamic integration, became of considerable interest. However, it is challenging to take them into account simultaneously. In 2010, Mucha and Onnela (2010) developed a generalized framework to study the community structure of time-dependent, multiscale, and multiplex networks. It has been introduced on neuroimaging data for better understanding our brain (Bassett et al., 2011;Braun et al., 2015Braun et al., , 2016. After modularizing the multislice community structure over time, the node flexibility as the network parameters can be used to characterize the dynamic community structure. The node flexibility was defined as the number of times that node changed its modular allegiance normalized by the total number of changes that were possible across the scanning time (Bassett et al., 2011). It was suggested that successful brain function might partly depend on a set of regions whose allegiance to putative functional modules is flexible through time to smooth the function transition (Hermundstad et al., 2014). In this study, we try to find a personalized escitalopram monotherapy treatment marker for treatment prediction, with an interesting hypothesis that reconfiguration of dynamic brain network might provide important insights into the depressive disorder and also be potential for effective antidepressant treatment prediction.

| Study population
In this cohort study, we examined a total of 106 first-episode, drugnaïve patients from three different hospitals. Thirty-five Han Chinese depressive inpatients were recruited from Nanjing Brain Hospital between May 2010 and October 2017 (Sample 1). Thirty-six outpatients were enrolled from the Nanjing Drum Tower Hospital between September 2014 and October 2017 (Sample 2). Thirty-six Chinese outpatients were recruited from the Peking University Institute of Mental Health between May 2010 and July 2016 (Sample 3). In addition, a total of 109 healthy controls from these centers were recruited for comparing analyses.

| Inclusion and exclusion criteria
First-episode patients suffering from an acute episode of depression, with a HDRS-17 total score > 17, were included. The patients needed to have experienced symptoms of depression for ≤24 months but ≥1 month. None of the subjects have repetitive transcranial magnetic stimulation (rTMS), CBT, or other forms of psychotherapy during the study period. Subjects with a history of alcohol and substance abuse were also excluded. Patients that had comorbidity with other Axis I or Axis II disorders psychiatric illnesses were ruled out. Pregnant patients and those who are unable to undergo a MRI scans were excepted.
The patients who changed medications due to serious side effects were not considered for further analysis. Similar exclusion criteria were made for controls.
In this study, responders were conventionally defined as those having a reduction of at least 50% in HDRS-17 scores at endpoint (≤8 weeks). These patients showed a clinical improvement to escitalopram treatment, with a more than 20% decrease from the baseline HDRS-17 scores at 2 weeks. Patients who changed medications or received electroconvulsive therapy due to poor improvement (a reduction of less than 20% in HDRS-17 scores) after at least 2 weeks escitalopram administration or had a reduction of less than 50% in HDRS-17 scores at endpoint were defined as nonresponders. One participant who had large head motion from Site 1 and one who had cerebral cysts from Site 2 were discarded from further neuroimaging analysis.

| Escitalopram administration
All participants underwent a baseline functional MRI scan, following which they consented to commence escitalopram treatment at a dose of 10 mg/day. Afterward, the dose for each individual was determined by each patient's response and tolerance. After 7 days, patients had their escitalopram dosage increased during the study period. The final escitalopram dosage in the three multicenters (Nanjing Brian Hospital/Drum Tower Hospital/Peking University Institute of Mental Health) were 20 mg/day (n = 15/4/11), 15 mg/day (n = 15/20/19), and 10 mg/day (n = 4/10/6). The average dose (±SD) at the endpoint was 16.6 ± 3.4/14.1 ± 3.1/15.7 ± 3.4 mg.

| MRI data acquisitions
Participants were not allowed to carry any piece of metal in the magnetically shielded room. They were instructed to keep their eyes closed, not fall asleep and minimize movement. Head motion was confined to less than 1.5 mm in any direction. No subjects were reported to fall asleep during the scanning.
Data from the first and second cohorts were collected by a 3.0T Siemens MRI Scanner (Siemens Medical solutions, Germany) equipped with a 12-channel neurovascular array coil. Resting-state functional images were recorded using echo-planar imaging sequence, yielding wholebrain coverage in all participants. Images from the cohort Sample 3 were also acquired by a 3.0T Siemens MRI Scanner with the following parameters: TR/TE = 2,000ms/30 ms, FA = 90 , matrix = 64 × 64, thickness/ gap = 4.0 mm/0.8 mm, number of slices = 30, and lasting scan time = 7 min. The parameters for the T1-weighted anatomic images were TR/TE/FA = 2,300 ms/3.01 ms/9 .

| Image preprocessing
The first 10 functional volumes were deleted for signal equilibrium and to allow the participants' adaptation to the machine noise. Data preprocessing was handled via the Data Processing Assistant for Resting-State fMRI (DPARSF) (Yan & Zang, 2010)  The functional images were then normalized into the Montreal Neurological Institute (MNI) space and the parameter was set as 3 × 3 × 3 mm 3 . After that, functional images were smoothed with 6-mm full-width at half-maximum Gaussian kernel and band-pass filtered (0.01-0.08 Hz) to reduce the effects of low-frequency drift and high-frequency noise.

| Data analyses
The signals of each region were extracted via a sphere with a 6 mm radius based on a template in a previously published article by Allen et al. (2014). A table for the definition of these regions by node location and MNI coordinates has been added in the Table S2. The whole brain dynamic FC was determined using pair-wise Pearson correlation between 95 regions of interest with sliding window width = 45 TRs, sliding step = 1 TRs. The resulting correlation matrices of each subject were partitioned into time-dependent communities using a multilayer community detection algorithm (approach details can be referred to the Supporting Information and our previous work; Shao et al., 2019;Tian et al., 2019). This framework embodies the notion of segregation and integration, as the presence of modules/communities reflects the balanced interactive between intra-and interclusters. Hence, community detection methods enable the investigation of the interplay between functional segregation and integration of brain, together with numerous dynamic characteristics that other standard graphtheoretical measures such as degree, rich club, and small-word property cannot offer.
Subsequently, module allegiance (MA) matrices, demonstrated the probability of two nodes being assigned to the same module/communities, were utilized to analyze the dynamic integration across largescale networks. We obtained MA matrices by making contingency tables (1 if two regions were assigned in the same community; and 0 otherwise) and averaging all the variables such as time and iterations. The larger the MA presented, the stronger connection the pair of nodes possessed over the time domain. As was suggested via the findings of Bassett, Yang, Wymbs, and Grafton (2015), the intercluster interaction can be better delineated by the MA matrices than done by FC matrices.
In order to capture the temporal variability for nodes in MA, we calculated a time-dependent system flexibility vector F. After normalized by the total number of time windows, each element F i indicates the times that a node i changes its modules between two consecutive time windows in personal level. In this way, the representative flexibility resulted in a total of 95 features (one for each region) for predicting individual treatment outcome.

| Multivariate pattern analysis
Multivariate pattern analysis (MVPA) was a well-suitable method and was conducted to discern slight differences underlying the F values (Mohr, Wolfensteller, & Ruge, 2017). The input features for the MVPA were constructed by identifying subsets of data that are most relevant to the treatment outcome using minimum redundancy maximum relevance (mRMR) (Peng, Long, & Ding, 2005). As a popular approach over neuroimaging studies, the linear soft-threshold support vector machine (SVM) model (linear kernel, soft margin C = 1) (Schrouff et al., 2013) was used to separate responders from nonresponders.
The model accuracy was tested using leaving-one-out cross validation (LOOCV) (Cawley & Talbot;Varoquaux et al., 2016). In this procedure, each subject was left out once and used to test the prediction model trained on the other subjects. In addition, we used a discriminative mapping approach to visualize the relative contribution of the different brain regions for the classifier decision. To eliminate site effects, the leave-one-group-out analyses were also applied.

| Permutation test
The statistical significance of the accuracy and weight was tested by randomly permuting the labels of the training samples with the derived models. This process was repeated 1,000 times to determine the null-distributions of accuracies/weights. The p value of accuracy/ weight was calculated from the permutation procedure that had a larger accuracy/weight compared to the null model. The value and significance of the weights provided an indication of the relative importance of the respective features for classification and prediction.
It should be noted, however, that these values must be interpreted with care. The weights indicated the direction and quantified the contribution of each node to classifier decision rather than unfolded the difference between classes.
F I G U R E 1 Dynamic modular structure and functional connectivity.

| Statistical analysis
The interaction among meaningful discriminative regions was compared between responders and nonresponders according to twosample two-tailed t tests. In order to account for multiple comparisons and counteract the likelihood of false positives, false-discovery rate (FDR) correction was applied (Storey, 2003). The statistically significant difference was set at p < .05.

| Demographic and clinical characteristics
The demographic and clinical characteristics of the subjects were summarized in Table 1. There were no statistical differences in the demographic variables between responder and nonresponder individuals within the three separate cohorts, respectively, including age, gender, years of schooling, and symptom scores. There were no statistical differences in the demographic variables between depressive individuals and healthy controls as well (see Table S1). The changes of depression severity were scatter plotted in Figure S1.

| Dynamic network module
Dynamic modules, in this case, represent the groups of mutually correlated brain regions, which are weakly connected to the rest of network along time. We investigated modular organization across the brain networks in both responders and nonresponders using multilayer modularity framework. The multilayer community assignment of one randomly selected responder and nonresponder were showed in

| SVM predictors for effective escitalopram treatment
There was no significant difference in flexibility between the responders and nonresponders (see details in Figure S4). Feature wrapping approach of mRMR selected the related flexibility features while SVM models were applied to differentiate responders from the other patients. The real LOOCV accuracy was 79.41%, while sensitivity and specificity were 84.21% and 73.33%, respectively (see Figure 2a).
Sample balance is really an important point in modeling, especially for small sample cases (Scheinost et al., 2019). In order to retest the robustness of our model, one participant from each of the two groups was taken away while the above framework was reapplied to the remaining subjects. Then, we cross-checked all the 285 (19*15) sample combination possibilities. Figure 2b illustrates the distribution of optimum classification results over difference sample combination, with a peak possibility concentrated around the range of 70-80%.
These leave-two-out results implied that the discrimination performance was robust and not just a mere coincidence arisen by limited samples.

| Map of discriminating regions
The first eight discriminative regions of the optimal classier were displayed on Figure

| MA of responders and nonresponders
In further analyses, the interaction of these eight significantly discriminative regions was explored on a system or subsystem level to assess the differences between responders and nonresponders. The statistical t test showed that the MA of ACC with all other nodes was lower in patients with MDD comparing with healthy controls (t = 4.932, df = 67,  regions formed spatially distributed system whereby ACC was the structural core (Figure 3c). It suggests that ACC acts as a pivotal part in facilitating the communication with a network that is particularly germane to escitalopram treatment.

| Multicenter comparison
Parallel individual treatment response model for both Center 2 and Center 3 were trained and tested independently as well. The similar performances can be found in both centers as were showed in Besides above satisfying model performance in each site, we further tried the leave-one-group-out analysis to test the site effects. We trained in two sites and tested in third one. The performance of leaveone-group-out (69% for leaving Site 1 out, 71% for leaving Site 2 out, 72% for leaving Site 3 out) was comparable with that of the withingroup LOOCV performance (79%). It supported the robustness of our model frame across sites.

| DISCUSSIONS
Using a baseline fMRI and SVM models, we presented the flexibility of some special regions, such as MFG, right insula, and ACC for accurate prediction of escitalopram treatment response. We specifically chose to characterize functional connection using MA since it displays behaviorally or pathophysiologically relevant interactions. A larger MA of ACC was found in responders compared to nonresponders. Further investigation revealed a system level dysfunction whereby the ACC was found to be the crucial discriminative region.

| The flexible brain
A clear interaction between regions congruent with FC showed temporal changeability (Figure 1b,c). The flexible brain was also found in humans during the learning process (Bassett et al., 2011). Flexibility is deemed as a trustworthy indicator of a given subject's biological process in response to learning or neurorehabilitation. Treatment response of 8 weeks prescription with escitalopram could be predicted by the baseline flexibility of brain regions (Figure 2c,d), suggesting the flexible brain could be beneficial for depressed individuals' pharmacological intervention.

| ACC effects on antidepressant outcome
The ACC is involved in the processing of specific modules which is responsible for reward anticipation (Bush et al., 2002), attention, error detection, conflict monitoring (Bush, Luu, & Posner, 2000;Shenhav, Botvinick, & Cohen, 2013), social cognition (Apps, Rushworth, & Chang, 2016;Tolomeo et al., 2016), and emotional response (Bush et al., 2000;Etkin, Büchel, & Gross, 2015;Tolomeo et al., 2016). Its abnormalities were associated with diverse symptoms such as suicidal thoughts (Holmes et al., 2017), negative bias, poor cognition, and comorbid anxiety in depression. Fox, Buckner, White, Greicius, and Pascualleone (2012) applied rTMS to depressed patients and found that a more anticorrelated FC of ACC predicted better treatment response. Weigand et al. (2017) also applied rTMS to predict antidepressant efficacy but targeted the subgenual ACC. In addition, CBT trials found that ACC hypoactivation was linked to better treatment outcomes (Ball, Stein, & Paulus, 2014). These findings confirmed that rTMS and CBT targeted directly or indirectly the ACC to rehabilitate the individuals who are predisposed to less activation (Fox et al., 2012).
In contrast, the most consistent finding in mediation studies was that greater ACC activation was associated with better outcomes (Ball et al., 2014). This finding indicated that medication treatment may not target the ACC activity and may therefore be most beneficial for indi-

| The ACC and DMN
The DMN can be divided into the dorsal medial and medial temporal subsystem (Andrewshanna, Smallwood, & Spreng, 2014), which broadly overlaps with the anterior and posterior subsets (Buckner, Andrewshanna, & Schacter, 2008;Raichle et al., 2001). The part of ACC connecting with the medial prefrontal cortex is the main brain region of the anterior DMN subnetwork. In recent years, the dissociation of DMN is of great research interest in MDD. Li et al. (2013) found an increased FC within both anterior and posterior DMN, but only the posterior DMN was normalized after treatment. A study which investigated the relationships between DMN subsystems and rumination showed opposite connectivity alteration within different subnetworks (Zhu, Zhu, Shen, Liao, & Yuan, 2017). Consequently, DMN subsets may provide new insights into the pathophysiology and antidepressant response efficiency in MDD.
The different DMN subsystems, of which the ACC was a hub, distinctively impacted escitalopram treatment efficacy. As a result, the strong functional integration of ACC may be regarded as a predictor of escitalopram monotherapy outcome in major depressive patients.
This implies that the pretreatment modular integration of ACC may be used to assess patients' recovery progress. Our findings have a huge potential to be replicated in the future.

| Exploring the robust biomarks for treatment response prediction
Many studies demonstrated that even a single dose of escitalopram is sufficient to alter the brain's functional anatomy even in a short term (Komulainen et al., 2018). Studies of brain response to a single dose, such as escitalopram, would seem to be noteworthy and relevant. However, each MA value between ACC to some special regions was still significant. Therefore, we can conclude the important role of ACC-core subnetwork in treatment response prediction.
The main limitation of this experiment is a relatively small sample size. Our results should be further validated by larger cohorts together with multimodal data and other therapy methods. One limitation of this study in the design was an open-labeled and not placebo-controlled. The response to antidepressant treatment could be interpreted by a combination of specific and nonspecific effects, which in addition to placebo neurobiological effects, may include variations in the natural history of illness, regression to the mean, and reporting biases.
In conclusion, the present study is the first to demonstrate individual response of MDD patients with escitalopram by use of baseline resting state fMRI and then validate over different datasets. A more advanced analysis showed significant hyperinteraction within the ACC-core subnetwork in first-episode, drug-naïve responders. We confirm that ACC could serve as a predictor for better treatment outcomes in individuals prescribed with escitalopram monotherapy.
Overall, our findings suggest that there is a huge potential in combining fMRI data with machine learning techniques. This technique will further evolve to identify biomarkers which may be used as a reliable biomarker for prognosis, optimizing therapeutic decisions and assisting psychiatric research of depression.

ACKNOWLEDGMENTS
We would like to express our sincere gratitude to the Department of Psychiatry & Mental Health and the Department of Radiology at Nanjing Brain Hospital and Peking University Institute of Mental Health & Sixth Hospital for helping patients during the neuroimaging procedures. We are also thankful to our healthy controls, patients and patient's family for their generous support, cooperation, and participation.