Assessing the unwanted: A systematic review of instruments used to assess negative effects of psychotherapy

Abstract Objective While the efficacy of psychotherapy in the treatment of mental disorders is well examined, systematic research into negative effects of psychotherapy seems comparatively rare. Therefore, this review evaluates instruments for assessing negative effects of psychotherapy in order to create a consensus framework and make recommendations for their assessment. Methods The study selection procedure follows current best‐practice guidelines for conducting systematic reviews, with 10 included studies in three databases (PsycINFO, PubMed, and Web of Science). The nine instruments identified were each critically reviewed concerning the theoretical orientation, including the assessed domains of negative effects, psychometric properties, and diagnostic characteristics. Results Seventeen domains of negative effects of psychotherapy were identified but inconsistently assessed by the nine instruments. Most instruments provide some initial data on their psychometric properties. Regarding diagnostic characteristics, different item‐response formats are used but often with reference to “attribution to therapy.” Conclusion This review indicates that the existing instruments for assessing negative effects of psychotherapy cover a wide range of relevant domains without any consensus on the most important ones and their psychometric properties are usually unsatisfactory. A framework for consensus, building on the definition and conceptualization of negative effects, is synthesized, and recommendations for improving the assessment are derived.

in order to improve the assessment of negative effects of psychotherapy.

LI M ITATI O N S
• The main limitation is that this review comprises a relatively small number of eligible studies and instruments that investigate and assess negative effects.
• A further limitation is that this review focuses on psychometric properties of each instrument and therefore has not considered their clinimetric properties.

| INTRODUC TI ON
The efficacy of psychotherapy for treating mental disorders has been well examined over several decades (Huhn et al., 2014;Schefft, Guhn, Brakemeier, Sterzer, & Köhler, 2019). In particular, the evidence base of cognitive behavioral therapy (CBT) is considered to be robust and strong (Butler, Chapman, Forman, & Beck, 2006;David, Cristea, & Hofmann, 2018;Hofmann, Asnaani, Vonk, Sawyer, & Fang, 2012). However, in comparison with research on positive effects supporting the efficacy of psychotherapy, research on negative effects is still rare. At the beginning of the 21st century, attention increased to the negative effects of psychotherapeutic interventions Scott, 2017). Psychotherapists as well as researchers highlight that negative effects are common in face-toface care, for example, in group psychotherapy (Schneibel et al., 2017), as well as in Internet-based interventions (Boettcher, Rozental, Andersson, & Carlbring, 2014). Linden et al. (2018) define negative effects 1 as adverse events (AEs) related to treatment comprising side effects (SE), malpractice (MP), and unethical conduct (UC). 2 In contrast to MP and UC, SE are AEs caused by a correctly performed psychotherapy, that is, lege artis delivered treatment, and comprises different life domains (such as transient symptom deterioration, conflicts in interpersonal relationships, and stigmatization concerns).
According to this definition, SE may be not only unexpected, but also expected and sometimes even intended effects. Accordingly, research suggests that approximately 58.7% of all patients in psychiatric hospitals, 45.2% in psychosomatic hospitals, and 93.8% in a convenience sample of former psychotherapy patients report at least one negative effect during psychotherapy (Ladwig, Rief, & Nestoriuc, 2014;Rheker, Beisel, Kräling, & Rief, 2017). This high prevalence of negative effects emphasizes the importance of evaluating negative effects not only once at the end of treatment, but also in the course of treatment and after its completion. However, recent reviews have shown that instruments assessing negative effects are heterogeneous and not systematically reported in randomized controlled trials (Jonsson, Alaie, Parling, & Arnberg, 2014), for example, in studies on persistent depressive disorder (Meister et al., 2016). In line with this result, the Consolidated Standards of Reported Trials (CONSORT) group claims that monitoring of negative effects in clinical studies on behavioral health is limited (Ioannidis et al., 2004).
Thus, despite their high prevalence, there are comparatively few systematic research studies on negative effects of psychotherapeutic interventions. Systematic research of their occurrence is hindered by a confusion of different definitions of negative effects (Parry, Crawford, & Duggan, 2016) as well as the diversity of terms and their inconsistent use (Linden, 2013), which leads to difficulties in developing adequate instruments for assessing negative effects.
In this context, researchers use different terms such as "deterioration effects," "side effects," "negative effects," "negative outcome," "unwanted/undesirable effects," "adverse events/effects," "harm," "mistakes," and "treatment-emergent reactions" synonymously, fostering confusion among researchers and psychotherapists. There are thus no instruments that are accepted worldwide as a "gold standard" and used consistently in studies. In conclusion, there is little systematic research on negative effects of psychotherapy, leading to an increasing need to improve current research methods on a sound theoretical basis.
In order to overcome current problems and the gap between their relevance and evaluation, the most recent methodological recommendations for trials of psychological interventions explicitly emphasize that the assessment of negative effects of psychotherapy should be performed using suitable methods of evaluation (Guidi et al., 2018). Thus, the main objective of this systematic review is to summarize and examine the available instruments for assessing negative effects in psychotherapy. To date, no review has focused on assessment tools, their theoretical foundation, or psychometric quality, underscoring their unique contribution to an often-neglected research field. Moreover, the secondary objectives of the present review are (a) to create a framework of negative effects on an empirical basis, (b) to give recommendations for improving the assessment instruments, and (c) to provide an outlook on the development of future instruments, including theoretical considerations on the framework.

| Study selection
The entire study selection process followed the current guidelines of meta-analyses and systematic reviews (Cuijpers, 2016 PsycINFO,233 in PubMed,and 1,204 in Web of Science). In addition, six articles were identified by reference list screening, resulting in a total of 1,792 articles. These records were carefully screened for title and abstract. Thereafter, 1,741 articles were excluded because their contents were considered unsuitable for this review. Of the remaining 51 matching hits, 19 duplicates were identified and removed for further analysis. The number of articles in the full-text analysis was thus reduced to 32. These 32 articles were read by the first two authors, and any that did not include instruments for assessing negative effects of psychotherapy were excluded from further analysis. In total, the analysis process selected 10 studies in which nine instruments were described. All articles have been included in the qualitative synthesis of this review.
All search criteria were limited to full-text articles published from 1986 to 2018, since the Vanderbilt Negative Indicators Scale (VNIS; Suh, Strupp, & O'Malley, 1986) was the first published structured assessment scale of negative effects of psychotherapy.
The objective of this study was to conduct a comprehensive review of current practice in the evaluation of all published qualitative and quantitative research on negative effects. Thus, studies were not excluded because of their psychometric properties, such as no data on reliability and validity or theoretical foundation. In addition, no restrictions were imposed on the place of origin of the studies, the year of publication, and the type of mental disorders presented in the sample. Nevertheless, the database search was limited to the availability of full texts in English and German.

| Instrument assessment
This systematic review is based on review frameworks for the evaluation of instruments developed through the integration of Reporting Standards, 2008) and guidelines for the evaluation of test instruments (Cicchetti, 1994). In addition, criteria for the evaluation of instruments according to Groth-Marnat (2009) were used.
The main objective of the review framework was to identify the theoretical orientation and psychometric characteristics of the relevant instruments. Due to their theoretical orientation, the underlying theoretical construct was identified and clustered in relation to F I G U R E 1 Flow diagram of included studies according to PRISMA (Moher et al., 2009)  All included studies were independently coded by the first two authors to allow complete extraction of relevant characteristics of each instrument and to ensure cross-checking. The following method was used for coding: Each reviewer read the identified studies and encoded all information related to the above review framework; the extracted information from the individual studies was then discussed and systematically included in the review framework. When discrepancies and misunderstandings in coding occurred, they were resolved by discussing the information, aimed at reaching a consensus between the first two authors.  the UE-G is a patient-rated instrument in the context of group psychotherapy. The EI (Epstein & Simon, 1990) was excluded for further analyses because the authors had no access to the full text in the common database and were not successful with the full-text request. Furthermore, the UE-ATR (Linden, 2013) was excluded for the evaluation of psychometric properties because the authors describe the checklist only as a useful, informative, and attention-grabbing tool for recognizing negative effects, in contrast to a scale with solid psychometric properties.

| Theoretical orientation
The choice of an adequate instrument for measuring negative effects of psychotherapy is often determined by its theoretical orientation with the intended use. A total of 17 domains were identified to evaluate the theoretical constructs of the individual instruments.

No domain was covered by all instruments; and
3. Three domains were assessed by all but one instrument.
"Therapeutic misconduct" was assessed by all instruments except the UE-ATR, "deterioration/emergence of symptoms" by all instruments except the ETQ, and "quality of therapy" by all except the INEP. Furthermore, "stigma" was recorded by five out of eight instruments and "therapeutic relationship (e.g., dependency, idealization)" by four out of eight, which indicates that they are relevant domains.
Other different domains were assessed only by some of the instruments reviewed. "Treatment response" was assessed by NEQ, SEPS, and PANEPS, "changes and strains in life areas (e.g., work, family, relationship)" by UE-ATR and INEP, and "wanted effects" by ETQ, SEPS, and PANEPS. Another visual analysis revealed that some domains were assessed by only one instrument. "Expectation towards therapy" was assessed only by VNIS; "intrapersonal changes" only by INEP; "therapy setting (e.g., room size)," "relationship to other patients," "global experience," and "hopelessness" only by UE-G; and "Well-being of the patient," "noncompliance to treatment," and "Prolongation of the treatment" only by UE-ATR. The UE-ATR and PANEPS have the largest overlap of negative effect indicators, with 8 out of 17 indicators. Table 3 summarizes the psychometric properties of seven instruments (without UE-ATR). Three types of validity aspects were identified as relevant (content-related, construct, and criterion). In addition, three types of reliability aspects were considered (internal consistency, test-retest, and inter-rater).

Content-related validity was defined as the representativeness
and relevance of the assessment tool for the underlying construct  provided information on the factor structure of the questionnaires, while one instrument, ETQ, showed significant correlations with related constructs.
Criterion validity was defined as comparison of the scores on the instrument with performance on another external tool (Groth-Marnat, 2009). Only one instrument, INEP, published data on the criterion validity through regression analysis on an external criterion, in fact "satisfaction with therapy." Reliability of an instrument has been defined as the extent to which a score is stable, consistent, predictable, and accurate over time (Groth-Marnat, 2009). Relevant indices of reliability are the internal consistency assessed by Cronbach's α, the test-retest reliability, and the inter-rater reliability. With the exception of UE-G, all instruments provided some reliability data, indicating a moderate to high reliability for these instruments. Alpha coefficients of .70-.79 are considered "fair," .80-.89 as "good," and .90 or higher as "excellent" (Cicchetti, 1994), while reliability should be at least

| Diagnostic characteristics
In terms of their practical use in diagnostics, the choice of instru-   The summary does not include the UE-ATR checklist (Linden, 2013) as the authors did not aim to develop a scale with psychometric properties, or the Exploitation Index (Epstein & Simon, 1990) due to no access to the full-article publication.
category. What they all have in common is that attribution to therapy is important when recording negative effects of psychotherapy (cf. Linden et al., 2018). In line with this, the INEP, NEQ, and UE-ATR query the relationship between negative effect and treatment.

Most instruments have been developed in English-and German-
speaking countries and therefore are only available in English and/ or German, but one instrument, the NEQ, has already been translated into several languages.

| D ISCUSS I ON
The main objective of this study was to conduct a systematic review of the current instruments for assessing the negative effects of psychotherapy by evaluating their theoretical orientation and psychometric properties, including diagnostic characteristics. This will help researchers and practitioners to select the appropriate tools for evaluating negative effects for their respective purposes, as proposed by Guidi et al. (2018). A secondary objective was to derive a bottom-up framework of negative effects from the available data in order to refine the conception and definition of negative effects in psychotherapy and to give recommendations for improving assessment.
Overall, the results of this systematic review indicate that the

| Defining negative effects of psychotherapy: toward a consensus framework
This review investigated the theoretical orientation of negative effect instruments. No instrument is able to record all derived domains, and most studies lack clear definitions of negative effects.
When definitions are given, they vary between studies. As there is no consensus on a model that covers all positive and negative ef- tive effects of psychotherapy) should also be evaluated in order to minimize negative priming . Negative priming may cause negative expectations about the occurrence of side effects of a particular treatment, even in psychological interventions (Bootzin & Bailey, 2005), and can therefore be associated with reported side effects-a phenomenon called the "nocebo effect," which so far has been used mainly in psychopharmacological trials (Colloca & Miller, 2011). In recent years, more and more researchers have considered negative expectations as a key feature in mental disorders (Rief et al., 2015). By assessing the side effects of psychological interventions, these side effects might be at least partially triggered by the nocebo effect. Several authors gave initial indications on how to deal with the nocebo effect (Webster, Weinman, & Rubin, 2016), for example, by reducing expectations of symptoms or limiting symptom suggestions. In this context, the informed consent could be adapted (Cohen, 2014). It should be noted, however, that this hypothesis has not (yet) been supported by independent studies and need further empirical data in the context of psychological treatments. Patients might be "nocebosusceptible" to side effects, which may be interpreted as one of many patient criteria that increase the risk of side effects. Future research should pay more attention to risk factors of side effects.
In addition, researchers discuss whether positive side effects and by-products should also be included in the framework of negative effects (Hoyer, 2016). The authors argue that the classic model of side effects in psychological interventions was derived from pharmacological models of side effects and their focus on symptom deterioration, and therefore cannot cover the complexity of the biopsycho-social model of medicine and psychological interventions.
The spectrum of potential negative effects in psychological interventions is greater than in pharmacological treatments, as it also includes negative events in social interactions (Szapocznik & Prado, 2007). However, research on this concept is not yet well established.
For example, the improvement of quality of life was considered as one of these positive side effects, whereas other authors argue that this should always be addressed as a goal of therapy and therefore considered as a (secondary) outcome (Caspar & Jacobi, 2007). In addition, the concept of positive side effects may be misleading as most instruments also covered areas other than symptom deteriora- those unrelated to treatment (Linden, 2013;Moritz et al., 2018).
This classification is displayed in Figure 2. On this basis, the authors try to create a consensus definition that is consistent with a recently published article by Linden et al. (2018). By integrating and synthesizing these findings within one framework, negative effects can be defined as unwanted events caused by psychotherapy. In addition, an attempt is made to distinguish between side effects and malpractice/unethical conduct. While side effects are unwanted events caused by lege artis psychotherapy (i.e., psychotherapy per- has been a topic of discussion for decades (e.g., May, 1971

| Improving the assessment of negative effects
Our analysis has shown several ways to improve the assessment of negative effects. Recommendations are delineated in Figure 3.
In summary, the use and development of instruments for assessing negative effects must be based on a strong theoretical background and a sound underlying conceptual model that includes a clear definition and classification of positive and negative effects, and the above-synthesized framework might be a useful tool to comply with this. In particular, on the basis of the results of this review, the following recommendations for evaluating negative effects in psychotherapy can be derived. First, instruments need to take into account different domains of side effects (in particular stigma, symptom change, changes and strains in life areas, dependence, or idealization of the therapeutic relationship). Second, the results highlight the distinction between side effects, malpractice, and unethical conduct.
The recommendation, which can be implemented by various instruments, is to use one instrument to assess side effects and another to assess malpractice and/or unethical conduct. It should be noted that the correct assessment of unethical behavior and misconduct, through both self-report and the practitioner's report, is difficult.
Thirdly, the instruments need to assess the level of burden to evaluate the relevance (and therefore impact) of side effects and also assess the attribution to psychotherapy; therefore, future studies apy (Lambert & Harmon, 2018;Lambert, Whipple, & Kleinstäuber, 2018). ROM therefore yields some considerable merit for the implementation of evidence-based practice in routine care, and the F I G U R E 3 Recommendations for the assessment of negative effects

Recommendations for the assessment of negative effects in psychotherapy
Use of an instrument with sound underlying conceptual model oriented towards the following criteria: Incorporating different domains of side effects (e.g., symptomatology, stigmatization, dependence or idealization to therapeutic relationship) Distinguishing side effects and malpractice/therapeutic misconduct Measuring degree of burden and evaluating its relevance to treatment outcome (e.g., by using quality of life questionnaires) Measuring the attribution to psychotherapy Considering different settings (individual vs. group treatment, outpatient vs. inpatient, face-to-face vs. internet or mobile-based, etc.) Considering different perspectives (patient, therapist, relative) Considering different therapeutic orientation (cognitive-behavioral treatment, psychodynamic treatment, etc.)

Further empirical examination of existing measures through different types of studies:
Qualitative studies (interview) to determine the main criteria improving the content validity Quantitative studies to determine psychometric properties due to different validity aspects (construct, predictive, criterion) and reliability of self-and therapist-rated instruments assessment of side effects may broaden and enrich current ROM strategies. Moreover, further empirical examination of existing measures through different types of studies is needed, that is, qualitative studies (interview) to determine the main criteria improving the content validity and quantitative studies to determine psychometric and clinimetric properties due to different validity aspects (construct, predictive, and criterion) and reliability of both self-and therapist/ observer-rated instruments.

| Limitations of the review
The first limitation of this systematic review concerns the relatively small number of eligible studies and instruments that investigate and assess negative effects. Secondly, this review included all available assessment tools of negative effects (e.g., UE-G is an instrument that only measures the negative effects of group psychotherapy). Since there has been no consensus on negative effects so far, the heterogeneity of the examined instruments may be considered as one limitation of this review. However, the authors adhere to this approach to do an exhaustive search and examine all relevant underlying theoretical foundations in order to extract the diagnostic features and synthesize a comprehensive model. Thirdly, this review did not examine the clinimetric properties of each instrument (Bech, 2016), which might be especially important in terms of research on psychological interventions (Fava, Rafanelli, & Tomba, 2012). Fourthly, in psychotherapy outcome research, the reliable change index (RCI) is used extensively for defining deterioration using standardized rating scales (Jacobson, Follette, & Revenstorf, 1984). The RCI has not been considered in this review because the scope of this review was to study assessment tools of negative effects during the course of therapy or after completion. In general, negative effects were considered more as a process variable than an outcome variable. Within this framework, deterioration as one potential side effect might not be seen as an outcome, more as a transient and short-term effect that may occur during the therapy process. Finally, the lack of variability of the patients participating in these studies could be another limiting factor narrowing to some extent the use of such instruments across highly heterogeneous mental health issues. For example, patients with more severe psychiatric disorders (such as personality disorder or schizophrenia) may experience more serious side effects than patients with less severe disorders (such as mild depression and no comorbidities). In this context, first studies suggest that inpatients who are usually more severely ill report more side effects than outpatients (see Brakemeier et al., 2018;Rheker et al., 2017).
Future studies should therefore specifically include severely ill patient groups in order to identify specific negative effects and compare them between different patient groups.

| Future research directions
There are several future directions for improving the assessment of negative effects of psychotherapy. First, existing instruments need to be evaluated with regard to their psychometric properties (see Figure 3). Of note, psychometric research was mainly developed outside the clinical field and although psychometrics has been used successfully in clinical psychology research and has led to some advances in evaluation, it has guided research to rely strongly on its advantages and to neglect its disadvantages. Thus, when developing new assessment tools in future research the clinimetric properties should be considered (Bech, 2016;Fava et al., 2012;Feinstein, 1987): those features of an instrument that identify clinically relevant changes in mental health over time (discrimination properties such as responsiveness/sensitivity; Fava, Tomba, & Bech, 2017) and predict long-term incremental validity within the clinical decisionmaking process (Fava et al., 2012). In the case of negative effects, besides the evaluation of psychometric properties, they should be linked to treatment outcome in order to determine the impact (relevance) of treatment on the individual patient's life. For example, within process-outcome research, future studies could link the occurrence of negative effects to treatment outcome, for example, by using the RCI (Jacobson & Truax, 1991). Initial attempts have been made to address the relevance of negative effects on treatment in an inpatient cognitive behavioral analysis system of psychotherapy (CBASP) sample . In line with this, the current methodological recommendations for trials of psychological interventions support the usefulness of clinimetrics (Guidi et al., 2018).
Second, there is considerable need to develop new instruments for assessing negative effects in specific populations (e.g., children and adolescents) and for different settings (e.g., short forms and specific items for group therapy and inpatient use). Third, most instruments are self-rated; thus, validated clinician-rated instruments would be valuable to provide therapists with a standardized tool to monitor negative effects during treatment. A promising approach is the UE-ATR, which should be validated in future studies. Fourth, longitudinal research designs could provide insights into the predictive validity of instruments (including clinimetric, discriminant, and incremental validity) as well as to improve our understanding of the influence of negative effects on treatment outcome (i.e., response, remission, relapse, and dropout) in order to determine the relevance of the negative effects (cf. Brakemeier et al., 2018). The prevalence of negative effects seems to vary widely from study to study, depending on the selection of instrument (Ladwig et al., 2014;Moritz et al., 2015Moritz et al., , 2018Rheker et al., 2017); therefore, an instrument that is recognized worldwide as the "gold standard" is desirable for use in most studies in order to make study results comparable. In addition, current methodological guidelines for trials plead for the assessment of negative effects of psychotherapy using suitable evaluation methods (Guidi et al., 2018). In order to monitor and counteract negative effects, in particular side effects, further studies need to develop a process scale that assesses negative effects during therapy and with a clear time frame. This would strengthen the clinimetric properties and thus clinical usefulness of an instrument in clinical practice. Further, such a process scale that regularly assesses side effects of psychological interventions may be a useful extension for ROM (Lambert & Harmon, 2018). The aim should always be to carry out effective psychotherapies with as few side effects as possible.

ACK N OWLED G M ENTS
None.

CO N FLI C T O F I NTE R E S T
The authors declared no conflicts of interest.

AUTH O R S ' CO NTR I B UTI O N
All authors have been significantly involved in the research and/or article preparation. All authors have approved the final article.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analyzed in this study.