Clinical predictors of non- response to lithium treatment in the Pharmacogenomics of Bipolar Disorder (PGBD) study

Background: Lithium is regarded as a first- line treatment for bipolar disorder (BD), but partial response and non- response commonly occurs. There exists a need to identify lithium non- responders prior to initiating treatment. The Pharmacogenomics of Bipolar Disorder (PGBD) Study was designed to identify predictors of lithium response. Methods: The PGBD Study was an eleven site prospective trial of lithium treatment in bipolar I disorder. Subjects were stabilized on lithium monotherapy over 4 months and gradually discontinued from all other psychotropic medications. After ensuring a sustained clinical remission (defined by a score of ≤3 on the CGI for 4 weeks) had been achieved, subjects were followed for up to 2 years to monitor clinical response. Cox proportional hazard models were used to examine the relationship between clinical measures and time until failure to remit or relapse. Results: A total of 345 individuals were enrolled into the study and included in the analysis. Of these, 101 subjects failed to remit or relapsed, 88 achieved remission and continued to study completion, and 156 were terminated from the study for other reasons. Significant clinical predictors of treatment failure ( p < 0.05) included baseline anxiety symptoms, functional impairments, negative life events and lifetime clinical features such as a history of migraine, suicidal ideation/attempts, and mixed episodes, as well as a chronic course of illness. Conclusions: In this PGBD Study of lithium response, several clinical features were found to be associated with failure to respond to lithium. Future validation is needed to confirm these clinical predictors of treatment failure and their


| INTRODUC TI ON
Lithium is regarded as a first-line treatment for bipolar disorder (BD), 1-7 but it does not work for all patients. The modern use of lithium for treatment of BD was first introduced by John Cade in 1949, and it has been widely studied since. Although findings from these studies have been at times controversial, the evidence for the efficacy of lithium in acute mania and maintenance treatment is now well established. In a meta-analysis of five randomized controlled trials of BD comparing prophylactic lithium therapy with placebo, Geddes and colleagues found that lithium is more effective than placebo in preventing recurrence of illness, with 60% in the lithium group remaining well over 1-2 years compared with 40% in the placebo group. 8 In a subsequent meta-analysis of six studies of lithium in the treatment of acute mania, Yildiz and colleagues found that 48% of patients responded to lithium compared to 31% for placebo. 9 While these seminal reviews unequivocally demonstrate the efficacy of lithium for both acute mania and maintenance treatment of BD, they also highlight that anywhere from 40%-50% of patients do not respond adequately over a 2-year period and require either the addition of or a change to another psychotropic drug. 9 These findings are consistent with observational data from longitudinal cohort studies. [10][11][12] There is considerable continued interest in identifying predictors of response to lithium before starting treatment in order to avoid the typical trial and error process of finding the right medication for a particular patient during which time he or she may continue to experience devastating symptoms and be at risk for suicide. This is the goal of precision medicine (also referred to as individualized or personalized medicine). Although the promise of precision medicine has garnered a great deal of attention recently, 13 the search for predictors of lithium response dates back to the very first studies of its prophylactic effect in mood disorders. 14 Indeed, there is a long history of searching for clinical predictors of response to lithium treatment that can help guide treatment decisions. In 2005, Kleindienst and colleagues 15,16 carried out two comprehensive systematic reviews of predictors of lithium response in which they identified nearly 2,000 studies published between 1966 and 2003 on this topic. In one review, they focused on studies that examined psychosocial and demographic predictors and identified nine that emerged as consistently associated with lithium response.
Four were associated with good response (high social status, social support, good compliance, and "dominance" personality trait), while five were associated with poorer response (stress, high expressed emotion, neurotic personality trait, unemployment, and high number of life events). In the other review, they focused on studies that examined clinical predictors of lithium response and identified five that were consistently associated with lithium response across studies. These included a pattern of mania-depression-interval in bi-phasic episodes (so-called MDI polarity sequence) and older age at onset associated with better response, and high number of hospitalizations, a pattern of depression-mania-interval (i.e., DMI polarity sequence), and continuous cycling associated with poorer response.
Both reviews concluded that the effect sizes of these factors on treatment response were relatively small.
In 2019, Hui and colleagues 17 carried out a subsequent metaanalysis of clinical predictors of lithium response that included more recent data from 71 studies with over 12,000 patients. They identified six predictors of good lithium response, some of which overlapped the earlier review by Kleindienst and colleagues, 15,16 and included manic-depression-interval pattern, absence of rapid cycling, absence of psychotic symptoms, family history of bipolar disorder, shorter pretreatment illness duration, and later age at onset. They noted, however, that the included studies tended to have small sample sizes and there was considerable heterogeneity in results.
The Pharmacogenomics of Bipolar Disorder (PGBD) Study (www.clini caltr ials.gov, NCT01272531) was a large multi-center study designed to prospectively identify clinical and molecular predictors of lithium response. We report here the results of an analysis of clinical data from this study to examine clinical predictors of lithium response. The advantage of this study over previous ones is that patients were prospectively followed on lithium monotherapy for up to 2 years to better identify predictors of long-term treatment response specifically to lithium.

| Study overview
The PGBD was one of 14 research projects in the Pharmacogenetic Research Network funded by the National Institute of Health to support multi-disciplinary, collaborative research on how genetic factors contribute to inter-individual differences in responses to medications. The PGBD set out to conduct a multi-site prospective study of lithium monotherapy in the treatment of BD.
The details of the trial have been described elsewhere. 18 Briefly, the goal of the study was prevention of illness recurrence by lithium monotherapy. All patients were observed in an observation phase lasting 4 weeks to confirm they were in remission defined by having a Clinical Global Impression of Severity Scale (CGI-S) score of ≤3 (mildly ill) for at least 4 weeks. After the observation phase, the patients entered a 2-year maintenance phase, during which they were assessed every 2 months to monitor their on-going clinical response.
Patients who came into the trial clinically unstable and/or not on lithium monotherapy were first transitioned to lithium monotherapy in a stabilization phase that lasted a maximum of 16 weeks which included visits every other week for the first 8 weeks and one visit per month for the next 2 months. The treatment dosage of lithium was not fixed by study protocol but instead was titrated by the treating clinicians as clinically indicated. Throughout the follow-up, patients were allowed to take a benzodiazepine for anxiety and/or zolpidem for sleep. A range of clinical measures (described below) was collected at the screening and subsequent visits to monitor clinical progress and enable investigation of clinical predictors of response.

| Participants
Patients were enrolled into the study from outpatient psychiatry clinics in academic medical centers at nine sites within the United States and two international sites. The nine domestic sites included: Patients were excluded if they: (1) were unwilling or unable to comply with study requirements; (2) had renal impairment (serum creatinine >1.5 mg/dL); (3) had thyroid stimulating hormone (TSH) level over >20% above the upper normal limit or, if on thyroid medication, had not been euthyroid for at least 3 months before the first visit; (4) were currently in crisis such that inpatient hospitalization or other crisis management should take priority; (5) met criteria for physical dependence requiring acute detoxification from alcohol, opiates or barbiturates; (6) were pregnant or breastfeeding; (7) had participated in a clinical trial of an investigational drug within the past 1 month; or (8) had a history of lithium toxicity, not due to mismanagement or overdose, that required treatment.
All study procedures were approved by local Institutional Review Boards (IRBs), and all patients provided written informed consent.
This analysis included data on the first 345 BD patients who enrolled into the study and had sufficient follow-up of at least 4 weeks as of the date of data freeze on June 26, 2017. There were four patients who were still active in the study but had not yet reached the maintenance phase by the time of this data freeze and were not included in these analyses.

| Clinical outcomes
Patients were followed until they: (1) completed all study visits over 2 years of the maintenance phase (or had achieved the maintenance phase and were still active in the on-going study by the date of the data freeze), (2) were terminated from the study before completion of all visits because of failure to achieve (i.e., failure to remit) or maintain (i.e., relapse) stabilization on lithium, or (3) were terminated from the study for other reasons. Failure to remit was defined by the inability to achieve clinically sustained remission (where remission was documented as described above) by the end of the observation phase or based on clinical judgment that the patient was unable to adequately stabilize on lithium monotherapy. Relapse was evaluated using the Mood Episode Checklist which summarizes DSM-IV criteria for mania and depression and was collected at each visit during the maintenance phase. Relapse was defined by the following: (1) meets criteria for mania and has a CGI-S of 5 (markedly ill) or greater; (2) meets criteria for a major depressive episode with 4-week duration; (3) meets criteria for a mixed episode with CGI-S of 5 or greater; (4) psychiatric hospitalization for a mood episode is required; or (5) in the physician's judgment the patient cannot be managed on monotherapy and a change in medication is required. Episodes of hypomania without impairment of function were not considered relapses. These criteria were designed to be stringent so as to detect clear failures of prophylaxis, rather than brief episodes that might not require a medication change in clinical practice. Serum lithium levels were routinely monitored as clinically recommended over the course of follow-up. On average, lithium levels were maintained at appropriate therapeutic levels 19 and were, in fact, slightly higher for those who failed to remit or relapse compared to others (0.68 vs 0.63 mEq/L, p = 0.05).  Table 1 for a full list of variables that were examined.

| Statistical analyses
Differences in socio-demographic factors between patients who completed all study procedures, those who failed to remit or experienced a relapse, and those who were terminated from the study for other reasons were compared using chi-square tests for categorical variables and one-way ANOVA for continuous variables. We then used survival analysis with Cox Proportional Hazard models to examine the relationship between clinical predictors measured at baseline and the time from study entry to treatment failure, which was defined as the time of the last visit at which the patient was determined to have failed to remit or to have relapsed. All other patients were censored at the time of their last visit in the on-going study. We examined each clinical predictor individually in models that additionally controlled for potential confounders including age at study entry, sex, race, and lithium status upon entry into the study. These variables were selected from the available data because they are important socio-demographic factors that experience indicated may be relevant and/or they were found to differ with treatment outcome. Race was captured as a categorical variable for Whites, Blacks, Asians, or other. Lithium status upon entry into the study was captured as a categorical variable to distinguish those who entered the study stable on lithium monotherapy, on lithium plus other psychotropic medications, or not on lithium. We used two-tailed p < 0.05 to declare associations statistically significant. We did not correct for multiple testing because the clinical predictors were carefully selected based on prior hypotheses that they may be relevant to treatment response.
To determine if the associations with treatment response of the clinical predictors identified through the above procedures differed in the initial versus later phases of follow-up, we stratified the survival analyses and looked first at survival over the stabilization/ observation phases among all patients who entered the study, and then separately over the maintenance phase among patients who entered the maintenance phase. To formally test for differences in association, we combined the stratified survival data and included in the Cox Proportional Hazard models an interaction term between the specific predictor and an indicator variable for the stabilization/ observation versus maintenance phases.
To assess the robustness of observed associations to the assumptions of the survival analysis, we carried out two additional analyses. We defined two alternative but related response variables for analysis: (1) an acute response variable based on whether patients proceeded to the maintenance phase or not; and (2) a prophylactic response variable which contrasted patients who completed all study visits or who had reached the maintenance phase and were still active on study as of the data freeze on June 26, 2017 versus those who failed to remit or who relapsed on lithium monotherapy before completing all study visits. We then used logistic regression to examine the association between the clinical predictors and the two different dichotomous response variables in models that controlled for the same potential confounders as in the survival analysis.
The inferences drawn from these two alternative logistic regression analyses were nearly identical to those from the survival analysis, so we report here the results from the survival analysis because it uses more of the available information provided by the prospective data and it provides a unified framework for analyzing the data over the entire time course of the study.
Finally, to evaluate the predictive ability of a model that included all clinical predictors individually found to be significantly associated with treatment failure, we carried out a receiver-operating curve (ROC) analysis specifically for survival data. We first carried out multiple imputation to fill in missing covariate data and maximize the available data for the ROC analysis. We note that we only used the multiple imputation procedure for this and not the primary analyses described above, and we used it only after confirming that analyses with the imputed dataset yielded results that were consistent with those reported from the primary analyses described above. Multiple imputation was performed on the predictor dataset with the mi command in STATA to generate 35 imputed datasets. A consensus imputed dataset was generated by taking the median (for continuous covariates) or modal (for categorical covariates) values across the 35 imputed datasets. We note that this procedure does not take into account the uncertainty in the consensus imputed estimates, but we reasoned it would be sufficient for obtaining reasonable estimates from the ROC analysis. We then proceeded to compare the ROC curves of nested models, including a base model that included the base variables controlled for in all analyses (age at study entry, sex, race, and lithium status upon entry into the study) and a full model that included the base variables plus all clinical predictors that were individually associated with treatment failure (see Table 3). The consensus imputed dataset was randomly split into ten non-overlapping subsets of approximately equal size, with approximately the same proportion of censored and event observations across all subsets. Cox models for the nested models were then fit using nine out of ten subsets, leaving the tenth subset as a hold-out set. Using the results of the fitted models, linear predictor scores were obtained for observations in the hold-out set.
Model fitting and prediction were repeated ten times, where a different subset of data was held out each time. Predicted survival ROC curves over 2 years were estimated for the linear predictions using the CoxWeights function from the risksetROC R package. 20,21 The area under the curve (AUC) for the ROC of the nested models were generated, and the differences in AUC were recorded. This process was repeated across 10,000 permutations of survival status and time of censoring pairings. The p-value for AUC difference between models was derived as the proportion of permuted AUC differences that were greater than the unpermuted AUC difference.  Table 2 shows basic socio-demographic characteristics of the study sample broken down by the final outcome status of the patients, whether they completed the study (or were stabilized in maintenance and still active on the study), experienced a treatment failure, or were terminated for other reasons. There were no significant differences in age, sex or race between these three broad outcomes. Patients who entered the study stable on lithium monotherapy were significantly more likely to complete the study compared with those who either were on lithium and other psychotropic medications or were not on lithium on study entry. There were also significant differences between the sites in the outcomes achieved by the patients. These differences were largely explained by the proportion of patients at each site that entered the study stable on lithium, highlighting the importance of controlling for this potential confounder in subsequent analyses.

| RE SULTS
We then examined the association between hypothesized clinical predictors of lithium response and treatment response. Table 1 shows the list of clinical predictors that were selected a priori for investigation and the self and clinician rated scales from which they were derived. We examined each predictor individually in survival models controlling for factors that we reasoned may confound the relationship with treatment response because they are important socio-demographic factors or were found to differ with outcome status, including age at study entry, sex, race, and lithium status upon entry into the study. To evaluate how well a model that included the significant clinical predictors identified above could predict lithium treatment failure over a 2-year period, we carried out an additional ROC analysis  Table S2 for parameter estimates of the mul-

| DISCUSS ION
We report here results from the PGBD Study, in which we examine clinical predictors of response to lithium treatment for bipolar disorder (BD). Lithium is a first line treatment for BD and can be remarkably effective in controlling the devastating symptoms of BD. However, it is not effective in everyone and anywhere between 40%-50% of patients, or even more depending upon the length of follow-up, may require alternative therapeutic regimens. We identified several clinical markers that are associated with failure to respond to lithium treatment. These include current anxiety symptoms, functional impairments, negative life events and certain lifetime clinical features such as a history of migraine, suicidal ideation/attempts, and mixed episodes, as well as co-morbid personality disorder and a chronic course of illness. Future validation will be required to confirm whether these clinical markers are associated with treatment failure and whether they can be used clinically to effectively distinguish who will and will not do well on lithium before starting therapy. F I G U R E 2 ROC curves for the prediction of lithium treatment failure over a 2-year period for a base model that included the base factors controlled for in all analyses (age at study entry, sex, race, and lithium status upon entry into the study) and a full model that included these base factors and all clinical predictors that were individually associated with treatment failure as reported in Table 3 0 to examine associations with episode pattern, which is a compelling observation that has been implicated by previous studies.
Unique to our study, we found intriguing associations of treatment response with current anxiety symptoms as well a history of migraine, suicidal ideation/attempts, and mixed episodes. The observation that symptom level, but not lifetime diagnosis, of anxiety was associated with treatment response echoes findings from the NIMH Collaborative Depression Study that the severity of anxiety is predictive of long-term morbidity in BD. 22 With regard to migraines, it has been shown that the prevalence of migraines in patients with BD is 2-3 times higher than in the overall population. Moreover, antiepileptic drugs, such as valproate, are used to treat migraines whereas lithium has no indication in their prophylactic treatment. Thus, comorbid migraine could mark an etiologically distinct sub-type of BD that is less responsive to lithium treatment. 23  (median = 68%), with higher study withdrawals for lithium compared to other mood stabilizers. 27,28 The survival analysis we carried out assumed the risk of treatment failure for patients lost to follow-up was the same as for those who stayed on the study per protocol. It is possible this assumption was not true, and patients who did not complete the study per protocol did so because they were different somehow and possibly experiencing complications that were a precursor to treatment failure. Consistent with this, we did observe differences in certain baseline characteristics for those who did not complete the study per protocol. These individuals tended to be younger (p = 0.091), non-white (p = 0.027), and not stable on lithium monotherapy upon study entry (p < 0.001). However, we carried out two alternative analyses of the data and the findings were remarkably similar, suggesting the findings were robust to assumptions made by the survival analysis.
Second, in order to broaden the available population for study, we included patients who were naïve to lithium as well as those who may have taken lithium in the past or were currently on it. It is likely the response trajectories while on study would be different for these patients. Indeed, over one-quarter of the patients entered the study stable on lithium monotherapy and their treatment outcomes were notably better. To account for these differences, we tightly controlled for lithium status in the analysis, so that inferences about the associations with treatment response would not be confounded by these differences.
Third, the sample size may not have provided sufficient power to detect significant associations with important clinical predictors with smaller effect sizes. However, we emphasize this is one of the largest prospective studies specifically designed from inception to investigate predictors of lithium response. Indeed, it is the only such study that sought to treat patients with monotherapy in order to more firmly link treatment predictors with lithium response unclouded by the use of other psychotropic medications that are frequently taken by patients with BD. This is a unique and noteworthy strength of this study.
Finally, given the limitations of the current study, we could not distinguish whether the identified clinical predictors are specific for non-response to lithium versus other medications or reflect the natural course of illness. This is a limitation that is common to many previous studies of lithium treatment in BD. The ultimate goal of precision medicine is to identify predictors that can predict response to one treatment versus another so that they can be used to make clinical decisions about starting one over the other. We are unable to make such recommendations from the current findings. However, they do offer hypotheses that can be tested in other samples to determine if they can predict response to other treatments.
Given the devastating burden of BD, there is considerable motivation to develop more effective strategies for treating the disorder. Lithium is an inexpensive and effective treatment, but it does not work for everyone. It would be of tremendous clinical benefit if we could identify predictors of who will and will not respond to lithium before starting treatment. This study provides new evidence that certain clinical factors could be used to help with such predictions and help inform decisions about whether patients presenting with BD should be started on lithium or alternative medications.
Interestingly, we found that a model which included these clinical factors could predict lithium treatment failure with an AUC of 0.74 that was significantly better than the null. The AUC observed in our study is similar, albeit less, to the one achieved in a recent report on predicting lithium response using a machine learning method against 180 clinical predictors. 29 Another recent study reported on a clinical prediction model that explained 17.4% of the variance in observed outcome scores in response to treatment with lithium over a 6-month period in a randomized comparative effectiveness trial. 30 The hope is that eventually we will be able to improve these prediction models by incorporating both clinical and biological (e.g., neuroimaging, neurophysiology, and molecular) factors. This is the goal of the PGBD, and this report is a first step toward this goal.