The acceptability, effectiveness, and durability of cognitive analytic therapy: Systematic review and meta-analysis.

OBJECTIVES
This paper sought to conduct a meta-analysis of the effectiveness and durability of cognitive analytic therapy (CAT) and assess the acceptability of CAT in terms of dropout rates.


DESIGN
Systematic review and meta-analysis.


METHODS
PROSPERO registration: CRD42018086009. Searches identified CAT treatment outcome studies eligible to be narratively synthesized. Pre-post/post-follow-up effect sizes (ESs) were extracted and synthesized in a random-effects meta-analysis. Variations in effect sizes were explored using moderator analyses. Dropout rates were extracted. Secondary analyses synthesized between-group ES from trials of CAT.


RESULTS
Twenty-five studies providing pre-post CAT treatment outcomes were aggregated across three outcome comparisons of functioning, depression, and interpersonal problems. CAT produced large pre-post improvements in global functioning (ES = 0.86; 95% CI 0.71-1.01, N = 628), moderate-to-large improvements in interpersonal problems (ES = 0.74, 95% CI 0.51-0.97, N = 460), and large reductions in depression symptoms (ES = 1.05, 95% CI 0.80-1.29, N = 586). All these effects were maintained or improved upon at follow-up. Limited moderators of CAT treatment effect were identified. CAT demonstrated small-moderate, significant post-treatment benefits compared to comparators in nine clinical trials (ES = 0.36-0.53; N = 352). The average dropout rate for CAT was 16% (range 0-33%).


CONCLUSIONS
Patients with a range of presenting problems appear to experience durable improvements in their difficulties after undergoing CAT. Recommendations are provided to guide the further progression of the CAT outcome evidence base.


PRACTITIONER POINTS
Large pre-post reductions in global functioning and depression outcomes and moderate-large reductions in interpersonal problems are evident after CAT. The effects of CAT appear durable, and interpersonal functioning significantly improves over follow-up time. CAT produces small-moderate benefits compared to trial comparators. CAT appears to be an engaging psychotherapy that maintains patients in treatment.

conducted meta-analysis of pre-post and follow-up CAT outcomes, moderators of ES observed, and produce an initial evaluation of the efficacy of CAT. The study also sought to differentiate the effect of CAT on different clinical outcomes: global functioning, interpersonal problems, and depression. Due to limitations of uncontrolled pre-post ES, conclusions about the effectiveness of CAT throughout this review in the PBE studies should be considered in the context that patients undergoing CAT have experienced improvements in their difficulties, but the improvements cannot be specifically attributed to CAT. As such, a secondary aim was to provide an initial estimate of between-groups treatment effects on overall outcomes when CAT is compared to any comparator from the available controlled trials. Finally, CAT has been presented previously as an engaging psychotherapy that creates low dropout rates across diagnoses , and so, this study sought to assess the average dropout rate for CAT.

Method
The review was preregistered on PROSPERO (CRD42018086009) and is reported according to PRISMA guidelines (Moher et al., 2015).

Study selection
Electronic searches of four databases (PsycINFO, MEDLINE, CINAHL, and Web of Science) were conducted using the search term cognitive analytic (allowing for wildcard variations modified for each database), replicating the previous CAT review . No search limiters were applied, with the searches identifying studies of CAT treatment outcome published up until 2019 (final search was performed on 23 August 2019). Reference lists of identified studies and the list of published CAT studies on the Association of Cognitive Analytic Therapy (ACAT) website were cross-referenced to identify any additional studies not identified by the database search. Finally, key authors were emailed to enquire about further studies of CAT and authors' publication lists were checked. After duplicates were removed, the primary reviewer screened titles and abstracts and reviewed the identified full texts against the inclusion and exclusion criteria. An independent reviewer (trainee clinical psychologist) screened 25% of the full-text articles and checked 25% of the included studies to ensure they met eligibility criteria, with discrepancies resolved through discussion (rater agreement = 81%). Overall, consensus on study eligibility queries was achieved via discussion amongst all members of the review team.

Eligibility criteria
Articles were eligible for inclusion if they were a treatment outcome study that reported pre-and post-treatment scores on a validated outcome measure (i.e., means and standard deviations [SDs]) for adults (mean sample age 16+) treated with CAT for any psychiatric diagnosis or problem related to psychological factors. Studies were included if they used either a RCT, a non-randomised controlled trial, an uncontrolled (pre-post) design, or a case-series design. Both group and individual delivery of CAT were eligible, with no restriction on treatment setting. Studies that combined CAT with another treatment modality (e.g., CAT combined with CBT) were excluded, as were the CAT consultancy studies and all the single-case experimental design studies. Included studies were required to be available in English language and be published or have undertaken a peer-review process to ensure a minimum standard of study quality and replicable search procedures. Unpublished dissertations and conference papers were excluded. For inclusion in the primary quantitative synthesis, pre-post treatment outcomes were required to have been assessed using a validated measure of global functioning, interpersonal difficulties, and/or depression. A measure was considered validated if adequate psychometric properties had been demonstrated in a published study. Studies were excluded if they did not provide sufficient information to calculate ES (i.e., means and SD). For inclusion in the secondary between-groups synthesis, studies were required to have used a RCT design and assessed post-treatment between-groups outcomes on one of the specified outcome categories or any other psychiatric outcome measure for CAT versus any comparator condition.

Primary analyses
The three outcomes of interest were self-report measures of global functioning (e.g., Symptom Checklist-Revised-90 , Clinical Outcomes in Routine Evaluation-Outcome Measure [CORE-OM], Brief Symptom Inventory [BSI]), interpersonal difficulties (e.g., Inventory of Interpersonal Difficulties [IIP], Persons Related to Others Questionnaire-2 [PROQ-2]), and depression (e.g., Beck Depression Inventory [BDI/BDI-II], Hospital Anxiety and Depression Scale [HADS-D], Patient Health Questionnaire-9 [PHQ-9]). Originally, additional outcomes relating to anxiety measures and remission rates were included in the protocol; however, insufficient eligible studies assessed these outcomes, so the analyses were not performed. To ensure each study contributed only one ES per outcome, the most frequently used measure was selected when multiple measures of one outcome were reported. Dropout rates were also evaluated as a proxy for CAT treatment acceptability. Dropout was determined as percentage of patients who completed treatment according to the definition used in the original study.

Secondary analyses
Due to limited comparability between types of outcomes assessed in the eligible RCTs, between-groups comparisons assessed overall treatment outcome as measured by any of the outcome measures from the primary analysis or a disorder-specific outcome measure (if one of the primary outcomes measures was not assessed) rather than across each outcome separately (as specified in the original protocol). ES from individual studies that assessed more than one of the three primary outcomes was first aggregated in a mini metaanalysis to capture the aggregated treatment outcome and ensure only one ES per study was included.

Quality assessment
The Downs and Black (1998) 27-item quality checklist was used to assess methodological quality as it incorporates randomized and non-randomized study designs. This tool assesses five aspects of methodological qualityreporting (10 items); internal validityconfounding (6 items); internal validitybias (7 items); external validity (3 items); and power (1 item). Due to reported difficulties in applying the item relating to power, this item was modified to a yes/no response to indicate whether the study had adequate power (0.8) to detect a pre-post moderate ES (0.6; based on lower bound confidence interval of previous CAT meta-analysed aggregated ES) at a < .05 (O'Connor et al., 2015). Studies were therefore assessed on a scale of 0-28 (higher scores indicated higher quality) and categorized according to the following quality thresholds; poor (<14), fair (14-18), good (19-23), and excellent (24-28). The primary reviewer assessed all included studies, with 30% double-rated by an independent rater (trainee clinical psychologist). Inter-rater agreement was assessed with Cohen's kappa statistic (k), interpreted as .21-40 indicating fair agreement, .41-.60 as moderate agreement, .61-.80 as substantial agreement, and .81-1.0 as almost perfect agreement (Cohen, 1960;Landis & Koch, 1977). There was substantial agreement between the raters (k = 0.77). Any discrepancies between raters were discussed and a consensus reached to produce an agreed quality score for each study.

Data extraction
An a priori extraction tool was designed to code data on the following criteria: (1) methodological characteristics (study design, study quality, assessment of therapist's competence using CCAT [yes/no], mean CCAT score, and indication of therapist competence [CCAT> 20 competent CAT/CCAT < 20 incompetent CAT], (2) intervention characteristics (number of sessions, format [group/individual], treatment setting, and therapist qualification [CAT qualified/trainee], patient characteristics (age, gender [% male], and specified difficulties), and outcomes (pre-and posttreatment means, SDs, and dropout rates). Data on whether studies assessed adverse effects (hospital admissions, reliable deterioration rates) during CAT were also extracted. Where data were not reported, the information was requested from authors by email.

Effect sizes
Pre-post effect sizes For the primary analyses, pre-post ES was calculated for global functioning, interpersonal difficulties, and depression outcomes. ES was calculated by subtracting the mean posttreatment score from the mean pre-treatment score and dividing by the pre-treatment standard deviation (SD). To account for the violation of independence, calculations for pre-post ES require an estimate of the correlation between pre-and post-scores (r). Where pre-post correlations were reported, the actual value of r was used to calculate individual study ES. In the absence of a reported pre-post correlation, an imputed value of 0.6 was used based on the median within-group correlation extracted from 811 measures of prepost clinical trial arms (Balk et al., 2012) and sensitivity analyses were conducted to evaluate the effect of different imputed r values (see moderator and sensitivity analyses section). ES was converted to Hedges g using the J-correction to account for small study sample biases (Hedges & Olkin, 1985). If a study had used a RCT or controlled trial design, pre-post ES was calculated for the CAT treatment arm only. Positive ES indicated symptom reductions after treatment (negative ES indicated symptom deterioration) and was interpreted as 0.2 indicating a small effect, 0.5 indicating a moderate effect, and 0.8 indicating a large effect size (Cohen, 1992).

Between-groups effect sizes
For the secondary analyses, pre-/post-control group effect sizes were calculated for included studies that had used a RCT design to calculate the effect of CAT versus comparator groups. ES was calculated by subtracting the mean pre-post change of the comparison group from the mean pre-post change in the CAT group and dividing by the pooled pre-treatment SD to account for pre-treatment group differences (Morris, 2008). Where there was a follow-up assessment, post-treatment to follow-up control group ES was calculated using the same procedure. All ESs were adjusted using the small study Jcorrection (Hedges & Olkin, 1985). Positive ES indicated a treatment effect in favour of CAT.

Data synthesis
Effect sizes were synthesized via a random-effects meta-analysis with inverse variance weighting (based on the DerSimonian & Laird, 1986 estimator) using Meta-Essentials workbooks (Suurmond, van Rhee, & Hak, 2017). Overall, pooled treatment estimates, 95% confidence intervals (CI), and prediction intervals (PI) were produced using the inverse of the variance to weight effect estimates. Between-study heterogeneity was assessed using the I 2 statistic to indicate the percentage of variation and the accompanying Q statistic to assess significance. Study heterogeneity was grouped into low (25%), moderate (50%), and high (75%) thresholds (Higgins, Thompson, Deeks, & Altman, 2003).

Moderator and sensitivity analyses
In the primary analyses, the anticipated between-study heterogeneity was explored using pre-specified subgroup and meta-regression analyses. Subgroup analysis was used to explore four categorical variables: study type [PBE/RCT], therapist experience [CAT qualified/not qualified], format [1:1/group], and diagnosis [personality disorder/other]. Due to the range of presentations treated, diagnosis classifications were collapsed into two groups: (1) cases (or over half of included cases) presenting with a personality disorder and (2) all other diagnoses. Meta-regression was used to explore four continuous variables: study quality, gender [% male], mean age, and treatment duration [number of sessions]. To account for multiple testing, the alpha threshold for significance was adjusted to p < 0.0125 (a = .05/4) for between-subgroup differences and meta-regression beta-coefficients, respectively. A minimum of 10 studies were required to perform moderator analyses (Cochrane Collaboration, 2011).
Due to typically low rates of reporting for pre-post r values, sensitivity analyses were employed to investigate the effect of imputing different values for r in ES calculations on the overall estimated CAT treatment effect (Borenstein, 2009). Four separate overall CAT treatment effects were aggregated using study ES calculated with the value of r imputed as 0.0, 0.25, 0.5, and 0.75, respectively (Balk et al., 2012). Where possible, ES for studies that reported a true value for r was also aggregated and compared to the aggregated imputed ES (using r = 0.6 from the main analyses) for the same studies to assess the potential bias that might be present in the findings in the absence of knowing the true r values.

Publication bias
Publication bias was assessed via three approaches: (1) Funnel plots of study ES plotted against standard errors (SE) were visually inspected for asymmetry (indication of reporting biases in the included studies); (2) Egger's regression was used to statistically test for the presence of publication bias (Egger, Smith, Schneider, & Minder, 1997); and (3) 'Trim and Fill' imputation was conducted to estimate treatment effects adjusted for publication bias (Duval & Tweedie, 2000).

Study selection
The search strategy produced a combined total of 28 CAT outcome articles (Figure 1). After the removal of duplicates, a total of 576 articles were screened with 100 identified for full-text review. Thirty articles were initially deemed eligible for inclusion, although two studies were subsequently excluded, as they were reports on the same data of another included study. This left k = 28 articles for qualitative synthesis, with k = 25 studies eligible for the primary pre-post quantitative synthesis and k = 9 RCTs for the secondary between-groups quantitative synthesis.

Study characteristics
Characteristics of the included studies are presented in Table 1. Of the 28 included studies, k = 10 (36%) were RCTs and the remaining k = 18 (64%) were PBE studies (prepost design k = 10; case series k = 8). Mean study quality was 12.68 (SD = 4.50; maximum score 28), ranging between 5 and 21 (see supplementary materials for full study quality ratings). Overall study quality was poor, with only 11 studies classified as fair and three as good. The RCTs of CAT generated higher ratings of methodological quality (mean = 16.6, SD = 3.75), than the PBE studies (mean = 10.22, SD = 3.44). Study quality by sub-domain highlighted that studies scored highest in terms of methodological reporting (with the exception of reporting adverse events). Study quality was lowest for internal validity related to confounding in selection biases, particularly in terms of lack of randomization and lack of adequate adjustment for confounding in the analyses. In terms of power, less than half (k = 12) of the studies were sufficiently powered to detect a moderate pre-post effect.
All studies bar one were conducted in public health services. CAT was typically delivered individually (k = 26; 93%) or occasionally in groups (k = 2; 7%). Treatment duration ranged from 5 to 30 sessions. CAT was employed in the standard eight-session (k = 2; 7%), 16-session (k = 10; 35%), or 24-session (k = 8; 28%) versions, with k = 5 (18%) studies using a combination of treatment versions. A six-session (k = 1), 12-session (k = 1), and 20-session (k = 1) versions were used in the remaining studies (k = 1 study did not report treatment duration; total studies exceed number of included studies as one study included separate 16-and 24-session CAT groups). In k = 13 studies, therapists were qualified with the remaining k = 15 studies either comprising of trainee CAT therapists or therapists without accredited as a CAT practitioner. Just k = 5 studies (18%) reported competency ratings, with all reporting mean Competence in Cognitive Analytic Therapy (CCAT) scores exceeding the competence cut-off (>20; Bennett & Parry, 2004).
Primary meta-analysis of pre-post CAT treatment effects Meta-analytic comparisons were conducted on pre-post treatment and from posttreatment to follow-up global functioning, interpersonal problems, and depression Eligible studies included in secondary between-groups quantitative synthesis (meta-analysis) (n = 9) Articles excluded, with reasons (n = 19) Not an RCT design (n = 18) Comparator a CAT deconstruction (n = 1) Figure 1. PRISMA flow chart of study selection.   outcomes. Three studies [26-28 in Table 1] were not eligible for inclusion due to outcomes being assessed with measures that were not relevant to the specific outcome classifications (i.e., outcomes were assessed with disorder-specific measures; anorexia nervosa (k = 2) and OCD (k = 1) presentations). The remaining k = 25 studies evaluated CAT based on outcomes of global functioning, interpersonal problems, and/or depression symptoms. The Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM) was the most commonly used measure of global functioning (k = 8), followed by the Symptom Checklist-90-Revised (SCL-90-R; k = 6). All but k = 2 studies that assessed interpersonal difficulties used the Inventory of Interpersonal Problems (IIP; k = 9). The most common depression measure was the Beck Depression Inventory (BDI-I or II; k = 9).
Publication bias. Visual inspection of the funnel plot in Figure 3a and statistical testing using Egger's regression did not indicate substantial asymmetry in study distribution for reporting of pre-post global functioning outcomes (B = 0.62, t(20) = 0.58, p = .567). Trim and fill imputation accounted for two smaller missing studies with a minimal to small CAT treatment effects, producing a slightly reduced overall pooled treatment estimate (ES = 0.81; 95% CI 0.65-0.96) that was still representative of a large effect. Taken together, the analyses suggest minimal impact of reporting bias on the pooled treatment estimate.

Effect of CAT on interpersonal difficulties
Pre Publication bias. Visual inspection of the pre-to post-treatment funnel plot of interpersonal difficulty ES (see Figure 3b) suggested some slight asymmetry in the distribution of studies; however, statistical testing with Egger's regression did not detect significant reporting bias (B = 0.28, t(11) = 1.18, p = .265). Trim and Fill imputed data for one missing smaller study with a small deterioration effect after CAT, resulting in minimal change in the CAT ES estimate (ES = 0.73, 95% CI 0.53-0.92).
Publication bias. Visual inspection of the pre-to post-treatment funnel plot of depression ES (see Figure 3c) suggested there was some asymmetry in the distribution of studies, indicating larger effects of CAT on depression outcomes were more likely to be reported by small N studies. Egger's regression did not detect a statistically significant influence of reporting bias (B = 0.72, t(13) = 0.60, p = .561). Trim and Fill imputed three studies with minimal to small CAT treatment effects producing a reduced effect estimate of 0.88 (95% CI 0.62-1.14) suggesting some impact of reporting bias, albeit still representing a large effect.

Moderator analyses
Meta-regressions (Table 2) and subgroup analyses ( Table 3) investigating moderators of CAT treatment effects explored the significant between-study heterogeneity identified in the pre-post treatment comparisons for each outcome. Variations in treatment effects for global functioning, interpersonal problems, and depression symptoms were not explained by differences in participant age or gender. Initial analyses suggested smaller ESs were associated with longer CAT treatment for global functioning and higher study quality for interpersonal problems; however, after accounting for multiple testing both effects were no longer significant at the Bonferroni-adjusted p-value (< .0125). Analysis of ESs for categorical subgroups found no significant differences based on the presence of PD cases versus other presentations, or whether the study was practice-based or an RCT. Effects for global functioning and depression symptoms were not associated with variations in ESs for different formats of CAT, or when therapy was delivered by a qualified CAT therapist. However, larger ESs for interpersonal difficulties were associated with individual treatment delivered by CAT qualified therapists, explaining 53% and 20% of the observed variance, respectively (although after adjustment, the effect of therapist qualification fell just short of the Bonferroni-corrected significance threshold). Moderateto-large heterogeneity was still present in over half the subgroups.

Sensitivity analyses
Sensitivity analyses explored the impact of imputed pre-post correlation values (r) on overall CAT treatment estimates and are presented in Table 4. For each outcome, the magnitude of the CAT treatment effect with different r imputations ranged between moderate and large effects in favour of a significant improvement following CAT. This suggests imputation of missing values had a minimal to low impact on the overall conclusions that could be drawn. Depression outcome had a wider range of potential effect magnitude (0.60-1.30), than global functioning (0.72-0.97) and interpersonal difficulties (0.57-0.87), thus indicating interpretations may be slightly less reliable for depression outcomes. In addition, eight studies reported r for global functioning scores enabling a comparison of pooled treatment estimates using the true value or an imputed value (0.6) of r. The pooled treatment estimate was 0.94 (95% CI 0.54-.1.33; Z = 5.65; p < .001) when using an imputed estimate of r (0.6) and 0.98 (95% CI 0.65-.1.30; Z = 7.05; p < .001) when using the true study value of r.
Secondary meta-analysis of RCT between-groups treatment effects Meta-analytic comparisons were conducted to aggregate the effect of CAT compared to a comparator condition in RCTs at post-treatment, and where available, at follow-up. One RCT could not be included as CAT was compared to a dismantled version of CAT [19]. In the remaining nine RCTs, CAT was compared to another intervention in k = 3 studies,   treatment as usual (TAU) in k = 5 studies, and no treatment in k = 1 study (see supplementary materials for between-group comparator characteristics). Outcomes assessed were disorder-specific outcomes (k = 3 [Anorexia Nervosa k = 2; OCD k = 1]), interpersonal problems (k = 2), global functioning (k = 1), or a combination of global functioning, interpersonal problems, and/or depression outcomes (k = 3). Dropout rates for CAT during clinical trials ranged between 0 and 38% (mean = 23%) compared to 6-44% (mean = 26%) for comparators.

Discussion
CAT is defined through its relational and collaborative approach, in working with the past, using enactments within the therapeutic relationship and associated analysis of habitual relationship patterns, via analysis of reciprocal role dynamics and intra-and interpersonal procedures (Ryle & Kellett, 2018). The present study has built on previous reviews of the effectiveness of CAT Ryle et al., 2014) to provide a contemporary quantitative synthesis of the state of the CAT treatment outcome evidence base. This was achieved by specifying CAT pre-post treatment effects for global functioning, interpersonal difficulties, and depression, and also quantifying the longer-term effects of CAT for the first time. The uncontrolled ES found was comparable to the pre-post ES found in the previous quantitative review (d = 0.83; Ryle et al., 2014). Moderators of CAT treatment effects were explored, alongside consideration of the impact of largely PBE studies on the confidence in overall treatment estimates. An aggregated controlled CAT effect size has also been estimated to provide a preliminary indication of CAT effectiveness, in the context of the acknowledged limitations of uncontrolled pre-post treatment effects. CAT continues to be used to treat a variety of typically severe and complex psychological disorders; 12 separate diagnoses were present in the current review with evaluations of CAT for disorders including psychosis, morbid jealousy, and chronic pain recently completed. However, less than 25% of studies assessed adverse event rates during CAT, and so, this needs to become a more widely assessed safety outcome.
The findings overall show patients who undergo CAT experience improvements across a range of clinical difficulties, seeing moderate-to-large pre-post reductions in global symptoms (g = 0.86), interpersonal difficulties (g = 0.74), and depression (g = 1.05) and small-moderate beneficial effects compared to comparators (g = 0.36-0.53). Although the pre-post improvements cannot be specifically attributed to CAT, the superior between-groups outcomes suggest CAT is effective. Evidence of outcome reductions was maintained at medium-/long-term follow-up in functioning and depression, and it is noteworthy that interpersonal difficulties significantly reduced over followup time. Dropout was low (15%) suggesting that CAT is an acceptable to patients.
Overall, 28 CAT treatment studies were synthesized, with 25 studies included in a synthesis of pre-post outcomes and nine studies in a RCT between-groups synthesis. This encompassed an additional N = 12 studies (including four RCTs) that had been published since the CAT reviews in 2014 called for urgent development of the evidence base, and in particular the conduct of RCTs. The increase in the number of RCTs (i.e., 36% of studies were RCTs) indicates that the CAT evidence base is developing. However, there is still a tendency towards favouring PBE-style evaluations typically in complex clinical populations. It is encouraging that CAT has developed translations of the model in terms of group delivery (Calvert, Kellett, & Hagan, 2015), a consultancy version for patients in secondary care that community teams struggle to engage (Kellett et al., 2019c), an 8-session version suitable for step 3 delivery in Improving Access to Psychological Therapies (IAPT) services , forensic CAT (Kellett et al., 2019d), and a psychoeducational version for step 2 delivery in IAPT services (Meadows & Kellett, 2017).
Assessment of the specific impact of CAT on interpersonal difficulties was particularly indicated, as CAT presents itself as a relationally informed psychotherapy, capable of conceptualizing and changing interpersonal processes (Ryle & Kellett, 2018). In terms of pre-post change, then large improvements in global functioning and depression and a moderate improvement in interpersonal difficulties were observed following CAT. However, the follow-up meta-analysis revealed little further change in functioning and depression (i.e., neither significant improvements nor deterioration indicating that treatment gains were maintained), but with significant improvements in interpersonal distress occurring over the follow-up period. This suggests that the relational nature of CAT (Ryle & Kellett, 2018) is impacting on interpersonal dynamics usefully over time. The results concerning interpersonal change therefore provide support for the underlying theory and approach of CAT.
The high presence of heterogeneity between studies warranted exploration through meta-regression and subgroup analyses; however, limited moderators of CAT treatment effect were identified. While it is important to remember moderator analysis cannot be used to infer causality, it provides insight into potential treatment effects (Cochrane Collaboration, 2011ollaboration, 2011. ESs across all outcomes did not differ according to participant age, gender, diagnosis, or for the outcomes in RCT versus PBE studies. However, it is likely these analyses were hampered by small sample sizes and the limited number of studies within subgroup analyses creating a lack of power to detect small effects (Cochrane Collaboration, 2011ollaboration, 2011. All outcomes showed a trend of larger effects for individually delivered CAT compared to group formats; however, only interpersonal outcomes had multiple studies of group CAT to enable subgroup comparisons. One-to-one CAT produced larger effects, whereas group CAT outcomes were significantly more modest. This may be a reflection of the relative lack of theoretical work concerning group delivery of the model, as only two group CAT studies were available and were in complex patient groups (trauma/forensic) that could be confounding the effect. Smaller global functioning effects were associated with longer treatment contracts, which is possibly an artefact of more complex clinical presentations being offered the 24-session version of the model in routine services (Marriott & Kellett, 2009).

Limitations
The findings need to be considered alongside methodological limitations. Many of the limitations of the current review reflect the limitations of the evidence base, rather than the methods used in this study. CAT is clearly frequently used in the treatment of highly varied patient groups, and this limited the specificity of the review, particularly through the lack of bodies of disorder-specific CAT outcome research. CAT has generated most evidence around the treatment of borderline personality disorder. Restricting the search to peer-reviewed studies may have overlooked eligible studies in the grey literature. The review would have been improved through more consistent efforts to access the grey literature.The review was limited by the wide variety of measures used in the CAT outcome literature, so the creation of outcome classification clusters could be challenged and may have introduced bias (Puhan, Soesilo, Guyatt, & Sch€ unemann, 2006). There was a lack of studies comparing CAT to active treatments, and so, the results cannot be used to comment reliably on the comparative efficacy of CAT (Bucher, Guyatt, Griffith, & Walter, 1997).
The number of included studies within outcomes and within-study sample sizes was small, which can inflate ESs and provide inaccurate evaluations of between-study heterogeneity (Inthout, Ioannidis, Borm, & Goeman, 2015). In addition, subsequent moderator analyses were subject to low power and insufficient subgroups to be able to reliably detect variation in effects (Guolo & Varin, 2017). For example, the number of studies of adults within specific diagnoses was low, especially when compared to similar meta-analyses for other treatment modalities (Tolin et al., 2015). The methodological quality of included studies was generally sub-optimal, and there were indications that for some outcomes, lower quality studies might have produced larger effects. Poor study quality is commonly criticized for contributing inflated ESs (Altman, 1994). The lack of studies employing the CCAT (i.e., 16%) means that the certainty with which CAT was actually being delivered is questionable. Assessing dropout by calculating an average is problematic as precision and variance in study estimates is not accounted for. The amount of studies assessing serious adverse events was low (23%).
Finally, the type of the evidence available means that ESs may have been susceptible to uncontrolled error thus biasing their interpretation. As the CAT evidence base has a strong reliance on PBE, pre-post rather than between-group ESs were used in the primary analysis. Critics argue such uncontrolled ESs should be avoided in meta-analyses, as they may index change caused by external factors (Cuijpers, Weitz, Cristea, & Twisk, 2017). To control for violation of independence in pre-post ESs alongside limited reporting of the correlation of pre-post scores, a fixed correlation value was used when data were missing. However, when this fixed value is considerably different from the true correlation, it has been shown to inflate ES estimates (Cuijpers et al., 2017). While attempts were made to manage these biases through the use of Hedges g correction for small sample sizes, accounting for pre-post correlation and assessing the impact of different imputed values, it does raise questions as to whether these samples were suitable to be combined.

Research, clinical, and organizational implications
Researchers still need to strive to generate evidence, particularly using randomized and controlled methods, and produce CAT evidence for specific diagnoses. CAT needs to generate more evidence for its acceptability and effectiveness with common mental health problems. The evidence base comprises mostly of one-to-one CAT delivery, and more evaluations of group CAT are indicated. It is essential that CAT outcome studies consistently report adverse event rates. Competency assessment should be the norm rather than the exception in PBE-and EBP-style studies. Evaluation of CAT is hampered by a lack of consensus on types of outcome used to evaluate treatment, largely it appears due to the complexity and variety of disorders treated. Use of a generic measure of symptomatic/functioning, such as CORE-OM (Barkham, Gilbert, Connell, Marshall, & Twigg, 2005), would enable more robust and widespread comparisons across studies. The more frequent use of the Personality Structure Questionnaire (PSQ; Pollock, Broadbent, Clarke, Dorrian, & Ryle, 2001) is also indicated as this is based on the CAT model and has recently been cross-culturally validated (Berrios, Kellett, Fiorani, & Poggioli, 2016).
Given the frequency of PBE-style CAT studies, it would be useful for future pre-post studies to report correlations between pre-post scores as standard. Short-and long-term follow-up needs to be routinely built into the design of any future CAT PBE or EBP studies to clarify the true durability of CAT treatment effects. Although the current study provides a basic commentary on dropout rates as a proxy for treatment acceptability, the evidence base would benefit from a meta-analysis of dropout from routine service delivery and clinical trials of CAT (see Imel, Laska, Jakupcak, & Simpson, 2013 for an example). Future studies need to enable increased patient choice, and so, completion of patient preference trials is indicated and is underway (Kellett et al., 2019a). The present meta-analysis was based on nomothetic psychometric outcomes. However, as change in ideographic 'target problems' specific to the patient is at the core of evaluating change in the model (Ryle & Kerr, 2002), a meta-analysis of the large CAT single-case experiential design literature is also indicated.
Both CAT accredited therapists and therapists without CAT accreditation produced statistically similarly large treatment effects for symptomatic and functioning changes. However, reductions in interpersonal difficulties when CAT was delivered by unaccredited therapists were moderate and were significantly lower than the large effects observed for accredited CAT therapists (albeit just short of significance after adjusting for multiple testing). Given that the implementation of methods to help patients change their interpersonal roles and procedures is thought to play a crucial role in the benefits experienced (Ryle & Kellett, 2018), this implies that the interpersonal work of CAT needs to be supported through formal training and associated clinical supervision.

Conclusions
The aims of the study were met, and a contemporary and methodologically improved meta-analysis of the CAT evidence base was produced. The findings add to a growing body of evidence suggesting relationally informed therapies can be beneficial in reducing psychological distress and interpersonal difficulties (Fonagy, 2015;Jakobsen, Hansen, Simonsen, & Gluud, 2011). The results of this review highlight an emerging, but still relatively small diagnostically specific CAT evidence base. Despite methodological limitations, the within-groups and preliminary between-groups findings taken together suggest CAT is useful across a range of clinical presentations. Patients particularly appear to benefit interpersonally, and the findings reported here do support the commissioning of CAT for disorders with a significant interpersonal element. In conclusion, these findings should provide sufficient impetus for a coordinated research strategy to move the CAT evidence base forward in a targeted and productive manner. Supporting Information The following supporting information may be found in the online edition of the article: Table S1. Rejected full-text studies and reason for rejection.