Cognitive behavioral therapy for eating disorders: A map of the systematic review evidence base

Abstract Objective To map and examine the systematic review evidence base regarding the effects of cognitive‐behavioral therapy (CBT) for eating disorders (EDs), especially against active interventions. Method This systematic review is an extension of an overview of CBT for all health conditions (CBT‐O). We identified ED‐related systematic reviews from the CBT‐O database and performed updated searches of EMBASE, MEDLINE, and PsychInfo in April 2021 and September 2022. Results The 44 systematic reviews included (21 meta‐analyses) were of varying quality. They focused on “high intensity” CBT, delivered face‐to‐face by qualified clinicians, in BN, BED and mixed, not specifically low‐weight samples. ED‐specific outcomes were studied most, with little consensus on their operationalization. The, often insufficient, reporting of sample characteristics did not allow assessment of the generalizability of findings. The meta‐analytic syntheses show that high intensity one‐to‐one CBT produces better short‐term effects than a mix of active controls especially on ED‐specific measures for BED, BN, and transdiagnostic samples. There is little evidence favoring group CBT or low intensity CBT against other active interventions. Discussion While this study found evidence consistent with current ED treatment recommendations, it highlighted notable gaps that need to be addressed. There were insufficient data to allow generalizations regarding sex and gender, age, culture and comorbidity and to support CBT in AN samples. The evidence for group CBT and low intensity CBT against active controls is limited, as it is for the longer‐term effects of CBT. Our findings identify areas for future innovation and research within CBT. Public Significance This study provides a comprehensive mapping and quality assessment of the current large systematic review research base regarding the effects of cognitive behavioral therapy (CBT) for eating disorders (EDs), with a focus on comparisons to other active interventions. By transcending the more limited scope of individual systematic reviews, this overview highlights the gaps in the current evidence base, and thus provides guidance for future research and clinical innovation.

Método: Esta revisi on sistemática es una extensi on de una visi on general de la TCC para todas las afecciones de salud (TCC-O, Fordham et al., 2021a). Se identificaron revisiones sistemáticas relacionadas con los TCA a partir de la base de datos TCC-O y se realizaron búsquedas actualizadas en EMBASE, MEDLINE y PsychInfo en abril de 2021 y septiembre de 2022.
Los resultados específicos de los TCA fueron los más estudiados, con poco consenso sobre su operacionalizaci on. El informe, a menudo insuficiente, de las características de la muestra no permiti o evaluar la generalizaci on de los hallazgos. Las síntesis metaanalíticas muestran que la TCC uno a uno de alta intensidad produce mejores efectos a corto plazo que una combinaci on de controles activos, especialmente en medidas específicas de TCA para TpA, BN y muestras transdiagn osticas. Hay poca evidencia a favor de la TCC grupal o la TCC de baja intensidad frente a otras intervenciones activas.
Discusi on: Si bien este estudio encontr o evidencia consistente con las recomendaciones actuales de tratamiento de los TCA, también destac o las brechas notables que deben abordarse. No hubo datos suficientes para permitir generalizaciones con respecto al sexo y el género, la edad, la cultura y la comorbilidad y para apoyar la TCC en las muestras de AN. La evidencia para la TCC grupal y la TCC de baja intensidad contra los controles activos es limitada, al igual que para los efectos a más largo plazo de la TCC. Nuestros hallazgos identifican áreas para la innovaci on y la investigaci on futuras dentro de la TCC.

K E Y W O R D S
anorexia nervosa, binge-eating disorder, bulimia nervosa, CBT, cognitive behavioral therapy, eating disorder, EDNOS, OSFED, overview, systematic review

| INTRODUCTION
Cognitive behavioral approaches to the understanding and treatment of the eating disorders (EDs) were first developed in the early eighties (Fairburn, 1981;Fairburn et al., 1986;Garner & Bemis, 1982). Since this time, theory and treatment have evolved to focus on the mechanisms proposed to maintain eating disorder psychopathology across the full range of EDs (Cooper & Fairburn, 2011;Fairburn et al., 2003). In addition, evidence supporting cognitive behavioral therapy (CBT) for these disorders has accumulated from randomized controlled trials (RCTs) and has been synthesized in a number of systematic reviews (e.g., Bulik, Berkman, Brownley, Sedway, & Lohr, 2007;Hay, 2013;Linardon et al., 2017e). Further support has come from the use of evidence supported CBT in real world settings (Weissman et al., 2017).
CBT is now recommended by the majority of evidence-based national guidelines (Hilbert et al., 2017) as the first line of treatment for bulimia nervosa (BN) and binge eating disorder (BED), and to a lesser extent (due to less robust evidence) for the other specified feeding and eating disorders (OSFED) (this DSM-5 diagnosis partially overlaps with the previously used DSM IV category, "eating disorder not otherwise specified" [EDNOS]). While CBT is clearly regarded as the treatment of choice for these latter disorders that do not involve significantly low weight (Weissman et al., 2017), it is one amongst a number of options for the treatment of adults with anorexia nervosa (AN) (Mulkens & Waller, 2021). Three main approaches are currently recommended for the treatment of AN (e.g., National Institute for  Schmidt et al., 2015) and CBT (CBT-E;Fairburn et al., 2013).
In sum, there is evidence from RCTs investigating the effectiveness of CBT in comparison to both active (other treatment interventions) and inactive control conditions in the various distinct eating disorder presentations as well as those investigating various delivery methods for CBT (e.g., group, individual, guided self-help). Individual systematic reviews including meta-analyses (MA) have been conducted for different treatment intervention types and different delivery modes for each of the distinct eating disorder presentations. To our knowledge there has been no comprehensive critical synthesis or overview of this large systematic review literature to map the extent and strength of the available evidence and to identify gaps in the systematic review evidence. Recently, an overview of CBT systematic reviews across all health conditions (CBT-O) was published (Fordham et al., 2021a) that identified the large systematic review evidence base for EDs. However, due to the heterogeneity of the clinical presentations and outcomes, the overview did not focus specifically on reporting the evidence for CBT for EDs. The present study addressed this omission by conducting a continuation of the CBT-O focusing exclusively on EDs. It aimed to provide a critical synthesis of this systematic review evidence to identify the extent and strength of the evidence for CBT, to identify gaps in the evidence drawn from individual systematic reviews and examine the quality of the reviews from which it is drawn. Critically synthesizing and evaluating the evidence at this higher level of generalization allows a meta-perspective of all the systematic review evidence free of the more limited scope of any individual systematic review. An overview of systematic reviews also highlights the focus of the majority of past research in the area.
The present synthesis of evidence from systematic reviews regarding the effects of CBT for EDs aimed to examine:

| Search strategy
The present study is an extension of an overview of CBT for all health conditions (CBT-O), (Fordham et al., 2021a;Fordham et al., 2021b).
The full methods of the CBT-O study have been previously published (Fordham, Suganvam, et al., 2021a). Where applicable, we adhered to the CBT-O protocol and did not produce a separate protocol for this ED focused study. One author (MK) identified systematic reviews with ED related outcomes as part of the original search conducted for the CBT-O study, and we subsequently conducted two updated searches, in April 2021 and September 2022, of EMBASE, MEDLINE and PsychInfo using the same search strategy as the original with these two additional search queries: (1) restricted to EDs and (2) publication dates of January 2019 to April 2021 and April 2021 to September 2022 respectively. Only systematic reviews were included since they are widely considered the gold standard method for evidence synthesis. Only papers written in English were included due to the authors' limited proficiency in other languages. A list of excluded papers with reasons for their exclusion (S1) and the details of the search strategy (S4) are provided in the Supplementary material.

| Inclusion criteria
Inclusion criteria were those used in the CBT-O study with minor modifications in line with our research questions. The criteria were as follows: 1. As in the CBT-O study, reviews fulfilled at least four of the five criteria outlined by the widely accepted Centre for Reviews and Dissemination (CRD), as part of the Database of Abstracts of Reviews of Effects (DARE). These are: reporting of inclusion/exclusion criteria; adequacy of search; synthesis of included studies; assessment of quality of included studies and presentation of sufficient details about the individual studies (Khan et al., 2001).
2. Interventions studied were CBT treatments (excluding CBT in combination with other treatments, and prevention interventions).
3. CBT treatments were compared to non-CBT control conditions. 4. Participants studied met full or subthreshold criteria for an ED (excluding the newly recognized feeding disorders).

5.
Outcomes of the RCTs included in the reviews were qualitatively or quantitatively summarized.
6. Reviews were in English.

| Data extraction
We based our data extraction template on a pre-designed set of data intervention. If the review did not report the intensity of the intervention, it was assumed to be high intensity CBT. Also, if length of follow-up was not clearly reported, it was assumed to be short.

| Quality assessment
As in the CBT-O study, the quality of the systematic reviews was assessed using the widely accepted AMSTAR-2 (Shea et al., 2017).
The individual item descriptions are provided in Table 1 in the Results.
Reviews included in the original study had been previously assessed by the CBT-O study authors (one of whom was BF), and reviews added to the present review were assessed by the authors (MK, ZC, and BF) with each paper being assessed by two reviewers. Any disagreements were resolved through discussion.

| Data synthesis
The qualitative data describing the study details extracted from the reviews are displayed in a PICO to demonstrate the extent of the current evidence and any possible gaps. All outcomes were included, and categorized into eight outcome groups (ED behaviors, ED psychopathology, remission/ abstinence, weight/BMI, depression, other psychological outcomes, quality of life and percentage of dropouts). The definitions of "remission" and "abstinence" varied between the systematic reviews, and were often overlapping; thus, these two types of outcomes were grouped in one category.
The MA syntheses extracted from the included systematic

| Included reviews
As can be seen in Figure 1, we included 44 systematic reviews, 37 from the original study and 7 further reviews from the updated searches. For a list of the included reviews, please see the References.
The included reviews are presented in Table 2 together with a brief description of their characteristics using PICO criteria.

| Quality of the reviews
Of the 44 reviews, 18 (40.9%) were graded as of high/moderate quality while 26 (59.1%) were graded as of low/critically low quality. Table 1 presents the AMSTAR-2 items (Shea et al., 2017) and the number of reviews that failed to meet each criterion. The most common items that reviews failed to meet were: publishing a protocol prior to conducting the review, reporting the sources of funding and reporting the reason for the study designs selected for inclusion.
Inter-rater reliability (calculated as % of agreement on individual AMSTAR items) was 74.5% between ZC and MK, and 81.3% between BF and MK. The reviews rated as moderate or high quality are bolded.
3.3 | Qualitative synthesis of the reviews

| Participants
Most reviews (n = 24, 54.5%) combined data collected from participants with different EDs (i.e., at least partly transdiagnostic samples, referred to as "mixed" from here on), and of these 18 (  (2) N: dietary restraint, shape concern, weight concern, Y: cognitive symptoms.
(3) Y: weight concern, dietary restraint, N: shape concern (12 RCTs), cognitive symptoms (1 RCT). (4) Y: shape concern, weight concern; N: dietary restraint. (5)  Note (for Tables 4 and 5): Where the results are difficult to interpret, the differences between the syntheses are explained, and the outcomes categorized under "Other psychological" are specified (see numbers 1-7 in Table  4, and numbers 1-6 in Table 5 the reviews that reported sex) reviews included only female participants. Where men were included (n = 14, 56% of the reviews that reported sex), the percentage of men in the samples studied ranged between 2% and 41%. Other gender identities were not addressed in any of the reviews. Two reviews reported the ethnicity of some participants, but also included RCTs that did not report ethnicity. Participants in these two reviews were 57%-98% White ethnic groups.
Sixteen reviews were high middle income countries (HMIC), while others did not report these data. Fifteen reviews mentioned comorbid conditions, but only four reviews reported and/or included any analysis of these conditions.

| Interventions
Twenty-two (50%) reviews combined data from both high and low intensity CBT interventions (see Table 2). Nineteen (43.2%) included only high intensity (individual or group CBT). The three (6.8%) reviews studying exclusively low intensity CBT combined guided and unguided self-help.
Often reviews did not explicitly state whether the CBT intervention was ED-focused or generic CBT, and this had to be inferred. Ten (22.7%) reviews only included ED-focused CBT interventions. The remainder included both ED-focused and generic CBT or provided no information regarding CBT type.

| Comparison interventions
Most reviews (n = 34, 77.3%) combined data from RCTs that compared CBT to both active and inactive control groups (Table 2). Eight (18.1%) reviews included RCTs with only active comparisons and two reviews (4.5%) compared CBT exclusively with inactive comparators.
The active comparator interventions included other forms of psychological treatments (e.g., interpersonal psychotherapy, behavioral treatment, supportive psychotherapy, or psychodynamic psychotherapy) and pharmacotherapy (most often antidepressant medication) amongst other active interventions.

| Types of outcome reported in the MAs
The most common outcome studied in the MAs was ED behaviors Note: (1) outcome = interpersonal functioning, (2) outcome = interpersonal functioning and general psychiatric score, (3) N (7 RCTs) close to statistical significance (see Hay et al., 2009;Linardon et al. 2017b), (4) outcome = interpersonal functioning, (5) outcome = general psychiatric score, (6) outcome = general psychiatric score, (7) outcome = interpersonal functioning, self-concept, general psychiatric score. X = the result is from a low/critically low quality review. Abbreviation: NR, not reported.   Potential harms related to treatment were addressed in nine reviews but the lack of details in the RCTs prevented the synthesis and analysis of these.
Almost half of the reviews (n = 19, 43.2%) reported short and long term outcomes with only three (6.8%) reviews reporting exclusively on studies of long-term outcome. Meta-analytic syntheses at long term follow up were performed in five (23.8% of all MAs) reviews, all on high or mixed intensity CBT compared to active control conditions.

| Quantitative results from the meta-analytic syntheses
Of the 44 reviews 21 (47.7%) included at least one meta-analytic synthesis comparing CBT with a non-CBT intervention or an inactive control. Tables 3-6 present the findings from the MAs including active controls. Effect sizes are reported for the syntheses with statistically significant effects that included more than 1 RCT in comparison with an active control. If more than one synthesis studied a similar type of comparison, the effect size is reported for the synthesis that included more RCTs, or if syntheses included a similar number of RCTs, effect sizes from both are reported. The effect sizes are presented as they were reported in the original paper. Table 3 presents the data from reviews comparing CBT to any active control group. Tables 4 and 5 present the data comparing CBT to specific active controls groups and Table 6 presents the MAs of long-term follow-up data. Data from reviews which compared CBT to inactive control groups are presented in Tables 7a and 7b in the Supplementary material. Of the reviews with a MA, 16 (76.1%) were rated as moderate or high quality. Results extracted from the critically low/low quality reviews are marked in the tables.
The results from MA syntheses that compared CBT to active and inactive controls pooled together are reported in the summary of results but not in the tables.

| Quality of RCTs in the MAs
Only three of the MAs (14.2%) were conducted solely with RCTs rated as having low or moderate risk of bias. Ten (47.6%) reviews assessed the moderating effect of higher and lower quality RCTs and reported that the quality of the RCTs did not moderate effects reported. Six (28.6%) reviews assessed RCT quality but neither analyzed nor discussed its effects on their reported results. Four (19.0%) reviews did not assess RCT quality.

CBT compared to active controls
High intensity CBT was more effective compared to mixed active controls in reducing ED behaviors and psychopathology (see Table 3). It was also more effective than behavioral weight loss in reducing ED behaviors, and more effective than interpersonal psychotherapy in reducing ED behaviors and psychopathology. The results are mixed when CBT was compared against pharmacological interventions and a mixed group of various psychotherapies.
CBT has not been shown to be more effective than behavioral or supportive therapy on any of the outcomes. Comparisons against other forms of psychotherapy have included few RCTs. (see Tables 4,5). Aside from ED specific outcomes, the only effects favoring CBT against active controls were for depression and number of dropouts and only against certain specific control interventions (see Table 5).
In syntheses that pooled active and inactive controls (not in the data tables), high intensity CBT was effective in increasing self-esteem (9-14 RCTs per synthesis). There was also some support for high and low intensity CBT leading to improvements in quality of life outcomes in both those with BN and in mixed ED samples (3-13 RCTs/synthesis), but no significant effect for those with BED (3-4 RCTs/ synthesis).

Effects of different forms of CBT
Low and high intensity CBT. We identified a large evidence base for high intensity CBT. Low intensity CBT (pure self-help or guided self-help) MAs with active controls did not include the following: the study of AN populations, reports on long term outcomes and an examination of the effect of the quality of the RCTs. One MA synthesis found an effect in favor of low intensity CBT on certain ED psychopathology features (see Table 3). In MA syntheses where active and inactive controls were pooled, the only significant favorable effects of low intensity CBT were on quality of life (3-5 RCTs/synthesis).
Group-based CBT. We found evidence that group CBT is more effective than inactive control conditions but not more effective than active control conditions for BN and BED populations (see Table 3).
There are no data for AN. No effect favoring group CBT was found when compared to active and inactive interventions pooled together (quality of life, three RCTs/synthesis).
Specific CBT protocols. The data support CBT-BN (Fairburn et al., 1993) and its enhanced transdiagnostic form (CBT-E)  as more effective than active and inactive control conditions for mixed diagnostic and BN populations with regard to ED behavior and psychopathology, although not all syntheses produced consistent results. CBT-BN was not favored when it was described as "adapted" (see Linardon et al. 2017a, see Table 3). CBT-BN/E has shown a favorable effect on reported quality of life (six RCTs/synthesis) but no effect on health-related quality of life (two RCTs/synthesis) in syntheses that pooled active and inactive control interventions.

Effects across various eating disorder presentations
As we have seen, benefits for CBT have been reported on a variety of outcomes in those with BED, BN, and for mixed diagnostic groups, but no significant effects have been demonstrated for CBT as compared to other active treatments in those with AN, other than one MA synthesis that found a statistically significant effect in favor of CBT in the percentage of dropouts from treatment.

Effects across various demographic groups
The systematic review data is almost exclusively generated from adults, predominantly women, who are white and live in HMI countries. Comorbid conditions, with the exception of depression, have received relatively little attention in the MA syntheses. There are a lack of data to explore whether sex and gender, ethnicity, country of origin or age moderates the effectiveness of CBT on ED outcomes.

Longer term effects
The effectiveness of CBT at follow up of 12 month or longer is unclear (see Table 6) with some reviews finding positive effects on ED behaviors and psychopathology in BED and BN populations, while others did not.

Range of outcomes
In comparison to active control conditions, the strongest evidence for CBT has been found for the outcomes of ED behaviors and ED psychopathology.

| DISCUSSION
The current overview of the evidence supporting CBT for EDs aimed to provide a critical synthesis of the large and growing systematic review literature in the field. It was undertaken with the explicit aim of synthesizing evidence drawn from previously conducted systematic reviews to identify the extent and strength of the evidence for CBT, identify any gaps in the evidence and to examine the quality of the systematic reviews from which it is drawn. By undertaking a review of the evidence at this higher level of generalization, we aimed to transcend the more limited scope of individual systematic reviews.
Consistent with current guidelines (Hilbert et al., 2017), our review confirmed that CBT produces benefits for people with symptoms of binge eating and/or purging (generally those with BED, BN and EDNOS/OSFED). More particularly, it made clear that CBT is most effective in producing good outcomes for these groups on both ED behavior and psychopathology and, to a lesser extent, abstinence/ remission, when delivered face-to-face on an individual basis. The review with its explicit aims to assess systematic review evidence guided by a number of specific research aims also highlighted significant gaps in the evidence that need to be addressed in future research.
First, the current evidence base shows statistically significant effects for individually delivered high intensity CBT over a mixed group of active control interventions (as well as over certain specific psychological and psychotherapeutic approaches) with regard to certain ED-specific outcomes, although the effect sizes are small. However, overall, the current evidence base does not fully support CBT as generally more effective than other specific psychotherapeutic treatments. Further investigation of, for whom and under what circumstances, this form of CBT might be the treatment of choice has the potential to greatly enhance clinical benefit for patients.
Second, as regards the form of CBT, our overview of systematic reviews shows that low intensity CBT has received significantly less attention than high intensity CBT. The evidence supporting low intensity CBT, as well as group CBT, was relatively weak as there was only support for these forms of CBT when compared to inactive control interventions. A possible exception was for a particular form of low intensity CBT, guided self-help, which has the potential to be made much more widely available with the possibility of helping to bridge the well-documented treatment gap in EDs (Kazdin et al., 2017). CBT in guided self-help form produced benefits for certain features of ED psychopathology even when compared to active interventions, and benefits to quality of life in comparison to active and inactive controls pooled together.
The MA syntheses studying specific manualized approaches supported CBT-BN (Fairburn et al., 1993)  Sixth, our review highlighted the limited range of outcomes reported for CBT, and the lack of consensus on how they are operationalized in the syntheses. Other than ED specific outcomes, there is relatively little evidence of any other effects produced by CBT for EDs. For example, there is limited data on impairment of functioning-an important omission when considering that impairment is often a key factor in making a diagnosis of a disorder and in patients seeking treatment.
An important strength of our study was that we assessed the quality of the reviews being synthesized using AMSTAR, a widely used instrument for critically appraising systematic reviews. We found that the quality of the reviews included varied, as did the quality of the RCTS included in these reviews. The most common shortcomings of the systematic reviews were not registering a protocol before commencing the review, not reporting the funding sources of the RCTs included and, importantly, not providing an explanation for the selection of studies reviewed. Most of the reviews did not exclude poor quality RCTs from their MA syntheses, although many did study the moderating effect of their quality.
A limitation of our report is that we have only provided the main quantitative results and have left out some of the more detailed information that is usually included when reporting a MA. Data were only extracted at the review level and so some data from relevant RCTs were not explicitly represented. We excluded reviews not published in English (n = 10), which might have addressed one of the evidence gaps identified, namely few RCTs performed in settings outside of Western cultures. We also excluded consideration of the newly recognized feeding disorders in DSM 5, and studies that directly compare different forms of CBT, limiting our ability to draw conclusions about the treatment of these feeding disorders and the efficacy of different forms of CBT. The MAs studied various outcome variables and diagnostic groups, and the operationalization of definitions of variables and subthreshold disorders was not always explicit and varied between the systematic reviews. This was particularly an issue with the outcome of abstinence/remission as noted earlier. To synthesize a large amount of sometimes disparate data, we had to combine outcomes in a way that may obscure some finer grained details. Lastly, a single researcher performed the data extraction and synthesis.
In assessing the qualitative and meta-analytic syntheses in our overview it is important to remember that many include the same RCTs; i.e. certain RCTs are "recycled" and studied in different syntheses. We suggest further study of the quality and generalizability of any possible RCTs that might play a key role in determining the statistical significance of the various MA syntheses.
In summary, while the results of our current overview of the systematic review evidence provide support for some particular forms of CBT for certain ED presentations, an important finding was that there are major gaps in the current evidence that need to be addressed in the future. One of the most important and pressing concerns is the limited data on those who receive a diagnosis of AN. Although currently there are three main approaches recommended by the clinical guidelines (e.g., NICE, 2017), there is limited evidence to support CBT as the treatment of choice and there is an urgent need to address the relatively poor outcomes to date of treatment for this group (Mulkens & Waller, 2021). In addressing the gaps in the existing evidence, there is a need to ensure that future research is conducted in careful alignment with quality criteria to produce RCTs with low risk of bias and systematic reviews of high quality. We have identified that the quality of the synthesized evidence base to date has not been uniformly high.
It is important to note that while our overview highlights limited evidence or the absence of certain evidence, it does not constitute evidence against CBT, but rather points to areas for future innovation and research.

CONFLICT OF INTEREST
No conflicts of interest. No specific funding was received for this work.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.