Aligning the many definitions of treatment resistance in anxiety disorders: A systematic review

Abstract Anxiety Disorders often show a chronic course, even when treated with one of the various effective treatments available. Lack of treatment effect could be due to Treatment Resistance (TR). Consensus on a definition for TR Anxiety Disorders (TR‐AD) is highly needed as currently many different operationalizations are in use. Therefore, generalizability in current TR‐AD research is suboptimal, hampering improvement of clinical care. The objective of this review is to evaluate the currently used definitions of TR‐AD by performing a systematic review of available literature. Out of a total of n = 13 042, 62 studies that operationalized TR‐AD were included. The current review confirms a lack of consensus on TR‐AD criteria. In 62.9% of the definitions, TR was deemed present after the first treatment failure. Most studies (93.0%) required pharmacological treatment failures, whereas few (29.0%) required psychological treatment failures. However, criteria for what constitutes “treatment failure” were not provided in the majority of studies (58.1%). Definitions for minimal treatment duration ranged from at least 4 weeks to at least 6 months. Almost half of the TR‐AD definitions (46.8%) required elevated anxiety severity levels in TR‐AD. After synthesis of the results, the consensus definition considers TR‐AD present after both at least one first‐line pharmacological and one psychological treatment failure, provided for an adequate duration (at least 8 weeks) with anxiety severity remaining above a specified threshold. This definition could contribute to improving course prediction and identifying more targeted treatment options for the highly burdened subgroup of TR‐AD patients.

These active treatments should represent evidence-based treatment regimes, provided at an adequate dosage and for an adequate duration (Fava, Rafanelli, & Tomba, 2012;Roy-Byrne, 2015).
However, the absence of anxiety symptoms does not always indicate full disorder remission (Bystritsky, 2006;Chen & Tsai, 2016). A substantial amount of residual disease burden may be present in persisting behavioural changes such as avoidance, or in altered cognitive functioning, for instance in excessive rumination.
Additional emphasis on functional recovery is therefore advocated by a number of authors when assessing TR-AD (Bystritsky, 2006;Chen & Tsai, 2016). No systematic review into the definition for TR-AD is yet performed.
The aim of this study is to summarize and discuss the different criteria used for TR-AD. To do this, we will perform a systematic literature review. Second, by summarizing and comparing the different criteria used for TR in anxiety disorders, we aim to propose a consensus definition for TR-AD.

| METHODS
The methods for this systematic review were specified in advance in a study protocol which was documented in the PROSPERO database (reference number CRD42017055864). The current paper was drafted in accordance with the PRISMA guidelines for reporting on systematic reviews (Liberati et al., 2009 DSM-5, American Psychiatric Association, 2013) in combination with various free-text synonyms for "treatment resistance" (see Panel 1) for the full search query).
All publication types in English were included with the exception of conference summaries, editorials, columns, book reviews and manifestos as these were unlikely to include a full description of a TR-AD definition.
Studies were selected when they included adults or elderly persons with anxiety disorders (Panic Disorder (with or without Agoraphobia, PD(A)), Social Anxiety Disorder (SAD), Generalized Anxiety Disorder (GAD),

| Eligibility assessment
Eligibility assessment on title and abstract was performed independently by two reviewers (WB, GW, JG) by using the Cochrane-supported review program Covidence (www.covidence.org). Disagreements were resolved by consensus after discussion. A flow chart for inclusion of eligible studies according to PRISMA guidelines is provided in Figure 1. Full-text screening was performed independently by two reviewers (WB and JG). During the full-text screening phase, articles were excluded if a fulltext version could not be retrieved or if any of the exclusion criteria were present. Studies were included if their definition for TR-AD could be implicitly deduced from inclusion criteria used in a study. Reviews, meta analyses and book chapters were included if they provided their own definition for TR-AD but were excluded if they repeated other studies' definitions without providing rationale for choosing this definition over others. As the vast majority of studies used TR and "refractory" interchangeably we chose to regard them as synonyms and will refer to these phenomena as TR-AD.

| Data extraction
From trials we extracted data on study characteristics: number of subjects, population of interest, intervention, comparator condition, follow-up period, primary outcomes and results; from reviews we extracted data on study design and population of interest. With regard to the definitions for TR-AD, we extracted data on nine predefined putative criteria for the definition, based on criteria used in the Maudsley Staging Method for treatment resistant depressive disorders (Fekadu et al., 2009). In addition, we extracted one TR-AD criterion (treatment response), that was not predefined in our study protocol. The ten criteria were: minimal number of failed treatments, failed psychotherapy trials, failed pharmacological trials, failed other biological treatments, minimal F I G U R E 1 PRISMA flow chart for study inclusion BOKMA ET AL.

| 803
length of treatment, treatment response criterion (i.e., which posttreatment change constitutes response/failure), minimal duration of anxiety disorder, severity of symptoms, presence of functional impairment, and presence of comorbidity. We evaluated which of these ten criteria were present in TR-AD definitions across included studies (yes/no). Specific values for each criterion were extracted as well.

| Quality of definitions
We assessed the definition quality in each included study. As there are no formal risk of bias tools available for the purpose of our study, and as we are not interested in potential sources of study outcome bias we assessed definition quality in two ways; first, by counting the total number of TR-AD criteria included in each study's definition, second, by determining the degrees of precision with which the definition for TR-AD is presented in each paper. The total number of TR-AD criteria was a count variable counting presence of all ten dichotomized TR-AD criteria. Degrees of precision was categorized into "high", "medium" and "low". Precision was considered "high" if a study provided an explicit definition for TR-AD, for example in this study by De Salas-Cansado et al. (2013): Refractory was defined as subjects with persistent symptoms/suboptimal response, a Hamilton-anxiety (HAM-A) scale score ≥ 16 and a Clinic Global Impression (CGI) score ≥ 3 at baseline, after a standard dose regimen of any anti-anxiety drug, alone or in combination, for at least 6 months, given before the baseline study visit. (p987).
The degree of precision was deemed "medium" if the criteria were only implicitly attributable to the concept of TR-AD, or if multiple terms were used interchangeably, for instance in a study by Lohoff, Etemad, Mandos, Gallop, and Rickels (2010) in patients with "refractory GAD": Subjects also had to have treatment failure of at least 1 adequate trial of an SSRI, an SNRI, a BZ, or a combination of these agents. Patients who were on an SSRI, an SNRI, a BZ, or a combination of these agents before enrollment had to be on a stable dose for 4 weeks. Inclusion further required a total score of 16 or higher on the Hamilton Anxiety Scale (HAM-A) and a score of 4 or greater on the Clinical Global Impression

Severity of Illness Scale (CGI-S) (p186).
Finally, if the study only provided a description of the concept of TR-AD, without operationalizing it in specific criteria, the degree of precision was deemed "low", for instance: "failure of an adequate clinical trial of medication" (Stein, 2004).

| Data synthesis
To synthesize the results of the systematic review into a new operationalization for TR-AD, frequencies for presence of each individual TR-AD criterion were assessed. The most frequently used values for each individual criterion were considered the most appropriate operationalization for that criterion and were chosen for the consensus definition. However, if an unspecified category for a certain criterion (e.g., "unspecified type of pharmacological treatment") was the most frequently used value, we did not consider this category for the new definition if a more specified value was available. In addition, criteria that were included only in a small minority (<10%) of the studies were not used for the new definition, as they were then judged to be lacking a convincing empirical basis.

| Statistical analyses
To test associations between total number of criteria provided in definitions, degrees of precision and publication year, we performed different type of "resistance" (e.g., "resistance" in the psychodynamic paradigm), 7 were previously unrecognized duplicates and 2 reported on a different patient population. This resulted in the final inclusion of 62 studies (for a flow chart see Figure 1).

| Definition quality
The total number of criteria per study ranged from one to six (mean = 3.58; SD = 1.31). With respect to the assessment of the degree of precision for TR-AD definitions it appeared that 13 studies (21.0%) provided a high degree of precision, 44 (71.0%) a medium degree, and 5 (8.1%) a low degree of precision.
There was a significant association between total number of criteria and year of publication (χ 2 (df = 5) = 13.01; p = 0.02): the studies with the highest number of criteria were, on average, the most recent. For degrees of precision no association with publication date existed (χ 2 (df = 2) = 2.13; p = 0.34). Neither studies with a higher total number of criteria, nor studies with a higher degree of precision provided a different perspective on the ten TR-AD criteria.
Since definition quality did not change operationalizations for TR-AD, all studies were used in the synthesis of results. When the frequencies for each of the ten extracted TR-AD criteria were compared across included studies, some distinctive patterns arose (see Table 2).  Impression Improvement scale (CGI-I) score greater than two (i.e., "minimal improvement", at best). Severity of anxiety symptoms was often included in definitions (n = 29; 46.8%), with cut-off scores commonly provided: a HAM-A score of above 15 (for any Anxiety Disorder), a Clinical Global Impression Severity Scale (CGI-S) score of four or higher (for any Anxiety Disorder), a total score above 3, or any item above 1 on the Panic Disorder Severity Scale (PDSS) for PD and a score at or above 60 on the Leibowitz Social Anxiety Scale (LSAS) for SAD. For GAD, no disorder-specific measurement instrument was reported in TR-AD definitions.

| Main results
Finally, minimal disease duration (n = 2; 3.2%), presence of functional impairments (n = 5; 8.1%) and presence of comorbidity (n = 1; 1.6%) were sparsely included in definitions for TR-AD. See Table 2 for a summary per TR-AD criterion, and eTable 4 for a full overview of included TR-AD criteria per study. including studies with minimal treatment duration of "2 months." b the most often used criteria were: ΔHAM-A < 50% or CGI-I < 2. c the most often used criteria for severe symptomatology were HAM-A < 16 or CGI-S ≥ 4 (for all Anxiety Disorders), PDSS > 3 or any PDSS item > 1 (for PD(A)), LSAS > 60 (for SAD). d one study used SDS > 1 on each item as criterion for functional impairments.

| Synthesis of results
To propose a consensus definition for TR-AD that reflects the current literature, we included the most prevalent values for all criteria that were provided consistently across studies into the new TR-AD definition. Failed SSRI/SNRI trials were most often considered as criterion for TR-AD. Studies typically referred to SSRI/SNRI trials as "first-line" treatment. Therefore, failure of at least one first-line treatment (SSRI/SNRI) was included in the new definition. Although psychotherapeutic treatment failure was less often incorporated in TR-AD definitions, CBT was usually referred to as "first-line" This is the first study to systematically assess different criteria for TR-AD. A systematic approach was complicated by the absence of a risk of bias assessment tool for the purpose of the current study. Tools such as the Cochrane risk of bias tool for randomized studies (Higgins & Green, 2011) (Fava et al., 2012;Roy-Byrne, 2015). Also, in some studies it was not possible to assess whether previous treatment failures that were counted towards presence of TR-AD consisted of evidence-based antianxiety treatments. Finally, although psychological treatments like CBT were repeatedly proven effective in Anxiety Disorders (Bandelow et al., 2015;Carpenter et al., 2018), in many parts of the world they are not readily available (Saxena, Thornicroft, Knapp, & Whiteford, 2007). Therefore, generalizability of our findings may be limited in these regions.
Furthermore, for the purpose of this study we regarded TR-AD, "refractory anxiety" and other related terms as synonyms. Even though this approach is in line with the majority of the studies, a minority consider TR-AD and "refractory anxiety" to be different entities. For instance, in a Cochrane review, Ipser et al. (2006) propose the term TR for Anxiety Disorder patients who failed one pharmacologic treatment, whereas "refractory anxiety" refers to Anxiety Disorder patients with more than one failed treatment. Their approach can be viewed as a staging approach, distinguishing patients with end-stage TR-AD disorders from those with early stage TR-AD. This approach is also advocated by Cosci and Fava (2013), who propose a staging model for TR Panic Disorders. In their model, the level of TR increases when more treatment regimens within pharmacologic, psychological and combination treatment have failed. In a number of treatment algorithms, a stepped care approach hints to the author's underlying assumption of a staging model for levels of TR (National Institute for Health & Clinical Excellence, 2011). In staging models, treatment decision making is based on the stage of disease progression in which the patient currently is classified. This could lead to evidence-based stepped-care treatment algorithms. We did not incorporate this staging paradigm for TR-AD into the current paper, as no consensus exists for definitions of TR-AD, nor for staging approaches in TR-AD.
Future studies could empirically investigate the consensus definition for TR-AD. A first step could be to apply the proposed TR-AD definition to an Anxiety Disorder cohort and evaluate the longitudinal course of patients with TR-AD compared to patients without TR-AD. Possibly, this could also yield risk factors for development of TR-AD. Further research could also focus on the validity of a staging approach in TR-AD, as suggested by Cosci and Fava (2013) and Ipser et al. (2006).
In depression, a staging paradigm for TR is in use with the Maudsley Staging Method (Fekadu et al., 2009;Peeters et al., 2016;van Belkum et al., 2018). A similar approach could be beneficial for Anxiety Disorders. The criteria comprising TR-AD that were described in the current paper could be studied on their merits as individual components in a staging method for TR-AD, to reflect the various degrees of TR-AD.

| CONCLUSIONS
The majority of studies on treatment resistant Anxiety Disorders ). This consensus definition should be regarded as a first step to advance the field further. The definition provided in this paper could contribute in harmonization of the process of evaluating presence of TR-AD, which is a necessary first step towards improvement of the prognosis for TR-AD patients.