Effectiveness of stress control large group psychoeducation for anxiety and depression: Systematic review and meta-analysis

Objectives. This review sought to evaluate the effectiveness of the ‘Stress Control’ (SC) large psychoeducational 6-session group programme developed to increase access to treatment for patients with anxiety and depression. Design. Systematicreviewandmeta-analysis(Prosperoregistration:CRD42020173676). Methods. Pre – post and post-treatment follow-up effect sizes were extracted and synthesizedinarandomeffectsmeta-analysis,andvariationsineffectsizeswereinvestigatedviamoderatoranalyses.Secondaryanalysessynthesizedbetween-groupeffectsizesfromcontrolledstudiescontainingcomparatortreatmentsandcalculatedtheaveragedropoutrate.Thequalityofthemeta-analysiswasassessedusingtheGRADEapproach. Results. Nineteenstudieswithpre – posttreatmentoutcomeswereincluded.Theaverage group size was N = 39, and the average dropout rate was 34%. Pooledeffect sizes indicated moderate pre – post treatment reductions in anxiety (ES = 0.58; CI 0.41 to 0.75; N = 5597; Z = 7.13; p < .001), moderate reductions in depression (ES = 0.62; CI 0.44 to 0.80; N = 5538,Z = 7.30; p < .001),andlargereductionsinglobaldistress(ES = 0.86;CI0.61to 1.11;N = 591;Z = 7.41; p < .001).Atfollow-up,improvementsinanxiety,depression,and global distress were maintained. When SC was compared to active and passive controls, outcomes were equivalent for anxiety (ES = 0.12, 95% CI (cid:1)

. Secondly, the intervention has evolved from being primarily concerned with the management of generalized anxiety (White, Keenan, & Brookes, 1992) to also include, in more recent versions, the management of depressive mood (Delgadillo et al., 2016). Therefore, SC is delivered for the commonly occurring and mild-to-moderate mental health problems that are a feature of community and primary care settings, the range of anxiety disorders, trauma, obsessive compulsive disorders, and depression (Sundquist, Ohlsson, Sundquist, & Kendler, 2017). Thirdly, SC assumes poor stress management is a common underlying process that serves to maintain various common mental health problems. Fourthly, the vast majority of the SC evidence base evaluates outcomes in routine practice and therefore uses practice-based methodologies that do not feature diagnostic interviewing, randomization, treatment adherence monitoring, strict inclusion/exclusion criteria, and independent assessment (Barkham, Hardy, & Mellor-Clark, 2010). There are fewer examples therefore of applying randomized and controlled methods of evaluation (see for a SC controlled example, Kitchiner et al., 2009). Therefore, considerable uncertainty still exists as to the diagnostic status of SC participants.
Nevertheless, SC is now a widely available treatment option for patients with mild-tomoderate anxiety and/or depression (Delgadillo et al., 2016), and therefore, an evaluation of the evidence base is indicated to inform future commissioning and the organization of service delivery. Therefore, the present study sought to provide a contemporary quantitative synthesis of the current SC evidence base, by conducting a meta-analysis of treatment outcomes and by also disaggregating and evaluating the impact of SC on anxiety, depression, and global distress. The review sought to supplement the synthesized quantitative outcomes with the corresponding numbers needed to treat (NNT) information and to explore possible moderators of SC treatment effects. Secondary aims were to appraise the durability of SC effectiveness and the relative effectiveness of SC (i.e., compare outcomes against other interventions, including one-to-one psychotherapies) and to summarize the cross-study dropout rate.

Method
The systematic review and meta-analysis were registered on PROSPERO (CRD42020173676) prior to conducting formal literature searches and is reported in line with PRISMA guidelines (Moher et al., 2015).

Study selection
Firstly, a comprehensive and systematic electronic search was conducted to identify literature published after the original conceptualization of the SC programme (White, 1995). The searches were modified for each of six databases: Web of Science, Scopus, MEDLINE, PsycINFO, Google Scholar, and OpenGrey. Secondly, search terms (expanded using alternative synonyms, and both US and UK spelling) for (a) cognitive behavioural therapy, (b) large group psychoeducation, (c) stress control, (d) anxiety, and (e) depression were combined using a mixture of MeSH, title, abstract, keywords, and text word searches. Filters to treatment outcomes and human populations were applied. The final searches were run on 20/04/2020. Thirdly, reference lists from the identified articles were manually searched to identify any additional studies. Fourthly, to address the issue of potential publication bias, attempts were made to contact clinical services from across the United Kingdom known to have delivered SC in order to gain access to the grey literature.

Eligibility criteria
Articles were eligible for inclusion if they were a treatment outcome study that reported pre-and post SC treatment scores (and follow-up scores if available) on a validated outcome measure (i.e., means and standard deviations [SD]) for adults (16 years +). Studies were included if they used either a randomized controlled trial (RCT), a nonrandomized controlled trial or an uncontrolled design, or were a service evaluation (i.e., grey literature). Only studies written in English were included. Unpublished dissertations and conference papers were included. Inclusion in the primary meta-analysis required that pre-post outcomes were assessed using a validated psychometric measure of anxiety, depression, or global distress. If sufficient statistical information to calculate effect sizes (ES) was not available, corresponding authors of potentially eligible studies were contacted via email and given 4 weeks to supply statistical information. For inclusion in the secondary meta-analysis investigating controlled studies, eligible studies included a comparator condition (i.e., passive or active control) and assessed outcomes in both conditions at post-treatment (and if available at follow-up).

Primary and secondary analyses
The three outcomes of interest were anxiety, depression, and global distress. Where studies reported multiple measures of one outcome (e.g., anxiety), the most common measure used across all studies was chosen to ensure each study contributed only one ES per outcome. Between-treatment comparisons assessed anxiety and depression outcomes only, due to limited global distress outcome data, measured by any of the outcome measures from the primary analysis at post-treatment outcome only.

Data extraction
A bespoke data extraction tool was designed and piloted. Any issues were resolved through consensus in the research team. Data were extracted across the following criteria: (a) intervention characteristics (recruitment method, group sizes, and number of session), (b) methodological characteristics (study design and study quality), (c) patient characteristics (age, gender [% female], and presenting problems), and (d) outcomes (pre-post and post-follow-up means, SDs and dropout rates). Where data were available, outcomes for anxiety, depression, and global psychological distress were extracted at pre-post treatment and at follow-up.
Within-and between-group effect sizes, numbers needed to treat, and dropout For the primary analyses, pre-post treatment effect sizes were calculated for anxiety, depression, and global psychological distress. A separate effect size was calculated for each of the outcome measures. Effect sizes were computed for the difference between pre-post treatment by subtracting the pre-treatment mean from the post-treatment mean and then dividing the result by the pre-treatment standard deviation. To be able to calculate the variance (and therefore the standard error) of pre-post changes, which is required for inverse-variance meta-analyses, the correlation between the pre-post scores is required. The majority of included studies did not report the pre-post correlations. Therefore, an imputed value of 0.6 was used based on recommendations informed by the median within-group correlation extracted from 811 measures of pre-post clinical trial arms (Balk, Early, Patel, Trikalinos, & Dahabreh, 2012). To account for any small study sample biases, effect sizes were converted to Hedges g using the J correction (Wasserman, Hedges, & Olkin, 1985). In the primary analysis, positive effect sizes were an index of symptom improvement following treatment, whereas negative effect sizes were an index of symptom deterioration. Effect sizes were interpreted according to Cohen's criteria, where 0.2 is considered as a small effect, 0.5 is considered a moderate effect, and 0.8 is considered a large effect (Cohen, 1992). NNT is traditionally calculated from the inverse of the absolute risk reduction (ARR), which is the amount of risk that is reduced by the treatment studied, compared with those participants in an RCT who were allocated to the control condition (Sedgwick, 2015). Because the SC evidence base is predominantly drawn from uncontrolled practice-based studies, the NNT was calculated from the prepost effect size. As such, the NNT estimated the number of SC patients needed to be treated in the large groups in order for one patient to experience symptomatic improvements relative to their pre-treatment level of impairment (rather than relative to a control group).
For the secondary analyses, pre-post control group effect sizes were calculated for those studies that had used a control, active treatment, or wait-list trial design (i.e., all comparators). These comparisons enabled an estimate of the relative effectiveness of SC. Effect sizes were calculated by using the difference between the mean pre-post change in the SC and comparators divided by the pooled pre-treatment SD to account for pretreatment group differences (Morris, 2008). Where SC was compared against more than one comparator, the SC group sample size was divided by the number of treatment comparators, so that patients were not included more than once in the analysis (Cochrane Collaboration, 2011). To calculate the average dropout rate for SC, the dropout rate in the original study was extracted and the mean and SD calculated.
Data synthesis and quality Available data were synthesized using the Meta-Essentials workbooks (Suurmond, van Rhee, & Hak, 2017) and the package 'forestplot' in R studio (version 1.2.5019). Pooled effect sizes and 95% confidence intervals (CI) were computed using the inverse of the variance to weight the effect. Due to the expected level of heterogeneity resulting from differences between study types, a random effects model was used to account for within-and between-study variances. Within-and between-study heterogeneity was assessed using the I 2 statistic to indicate the percentage of variation, and the Q-statistic was used to assess statistical significance. Higgins, Thompson, Deeks, and Altman's (2003) criteria were used to identify low (25%), moderate (50%), and high (75%) levels of study heterogeneity.
All meta-analytic comparisons were assessed by two reviewers using the grading of recommendation assessment, development, and evaluation (GRADE) tool (Dijkers, 2013). The quality of evidence for each comparison was evaluated across five aspects of synthesis quality: limitations of the individual included studies, inconsistency in treatment estimates, imprecision of treatment estimates, indirectness of treatment estimates, and publication bias. Evidence quality (either high, moderate, low. or very low) could be downgraded by one or more levels based on the perceived influence of limitations on overall evidence quality.

Moderator and subgroup analyses
In the primary analysis, anticipated between-study heterogeneity was explored using prespecified subgroup and meta-regression analyses to explore variation in ES. Subgroup analyses were used to explore three categorical variables: study type (RCT/non-RCT), setting (primary care/community), and presenting problem (diagnosis/no diagnosis). Meta-regression was used to explore three continuous variables: age, gender (% female), and study quality. A minimum of 10 studies were required to conduct moderator analyses (Cochrane Collaboration, 2011). To account for multiple testing, a Bonferroni correction was applied to minimize type 1 errors. The alpha was adjusted to p < .017 (a = .05/3) for between-subgroup differences and meta-regression beta coefficients.

Publication bias
Publication bias was assessed via visual inspection of funnel plot asymmetry. Egger's regression was used to statistically test for the presence of publication bias (Egger, Smith, Schneider, & Minder, 1997). 'Trim and Fill' imputed any missing data and provided an adjusted estimate effect, accounting for publication bias (Duval & Tweedie, 2000).

Study selection
After the removal of duplicates, the search strategy produced a combined total of 70 articles (see Figure 1), including one record from unpublished grey literature searches (Love, 2020). Title and abstract screening identified 25 articles for full-text review. Upon review, six articles were excluded leaving a total of 19 studies which met inclusion criteria for synthesis. All 19 studies were included in the primary pre-post and post-follow-up quantitative synthesis, of which five studies (containing 11 comparisons) were also included in the secondary comparator synthesis. Table 1 summarizes the characteristics of 19 included studies. SC is typically delivered (k = 18) in public psychological health services, with one study set in a custodial setting.

Study characteristics
Studies were typically UK based (k = 14), with k = 2 studies set in Ireland, k = 2 studies set in Belgium and a single study conducted in China. In terms of design, k = 2 were randomized control trials (RCTs) and the remaining k = 17 studies were variations on practice-based evidence (PBE) designs. These included controlled pre-post (k = 1) and controlled pre-post-follow-up designs (k = 2) that included control conditions: singlegroup pre-post (k = 7) and pre-post-follow-up designs (k = 6) and a service evaluation (k = 1). Overall, five studies compared SC to a comparator/s (across 11 separate comparisons). SC was compared to another psychological intervention in k = 4 studies containing seven separate comparisons (i.e., cognitive therapy, behaviour therapy, placebo-subconscious retraining, individual CBT, individual psychodynamic interpersonal therapy, anxiety management, and mindfulness-based cognitive therapy). SC was compared to usual care in a single study and wait-list or no treatment controls in k = 3 studies. Overall, risk of bias was fair, with four studies classified as poor and 14 studies classified as fair. Mean study quality was 18.21 (SD = 4.48; maximum score 44). Risk of bias typically arose from studies not conducting blinding, assessor training, treatment adherence, or therapist competence checks. The two RCTs were of mixed methodolog-    Stress control meta-analysis 7

Meta-analysis of stress control outcomes
Meta-analytic comparisons were performed to aggregate the treatment effects of SC on (i) anxiety, (ii) depression, and (iii) global distress symptoms from pre-post treatment and from end of treatment to follow-up. GRADE assessments are reported for each metaanalysis to denote the quality of evidence. All comparisons were mostly based on practice-based studies with generally low quality of evidence. Across the primary prepost comparisons, studies were identified as having study limitations (due to lack of control conditions increasing risk of confounding and poor follow-up), inconsistency in estimates due to high levels of heterogeneity (large I 2 values), and some evidence of publication bias (for depression outcomes). As a result, the quality of evidence was downgraded to very low quality. For the secondary between-group comparisons, studies were identified as having some study limitations (lack of randomization to control confounding and poor follow-up), some inconsistency from moderate heterogeneity, and some imprecision evident in the wide confidence intervals of study estimates. The between-group comparisons were therefore also downgraded to very low evidence quality.

Effectiveness for anxiety at end of treatment and follow-up
The primary pre-post treatment meta-analysis was conducted on 21 comparisons (extracted from 19 studies) of pre-post treatment anxiety outcomes, totalling N = 5597 participants. Figure 2 presents the pooled effect size (ES) showing moderate, significant reductions in anxiety following SC (ES = 0.58; 95% CI 0.41 to 0.75; Z = 7.13; p < .001; GRADE = very low). There was high and statistically significant between-study heterogeneity (I 2 = 88%; Q = 168.02; p < .001). The anxiety NNT was 3.14. Funnel plot symmetry (see Figure 5a) and a non-significant Egger's regression suggested no significant influence of publication bias for pre-post treatment anxiety outcomes (B = 0.57, p = .988). Trim and fill imputation did not account for any missing studies, and as such, the overall pooled treatment estimate (ES = 0.58; 95% CI 0.41 to 0.75) remained the same, representing a moderate effect. End of treatment to follow-up data were provided in 12 comparisons (2 studies provided 2 groups), totalling N = 327 participants. There was a significant, minimal effect size for anxiety at follow-up (see Figure 2), indicating that anxiety symptoms minimally improved over follow -up time (ES = 0.14; 95% CI 0.00 to 0.28; Z = 2.14; p = .032; GRADE = very low). There was low-to-moderate, non-significant heterogeneity (I 2 = 36%; Q = 17.19; p = .102).

Effectiveness for depression at end of treatment and follow-up
The primary pre-post treatment meta-analysis was conducted on 21 comparisons (3 studies provided 2 groups) from N = 5538 participants providing pre-post treatment outcomes. Figure 3 presents the pooled effect size, showing moderate, significant improvements in depression following SC (ES = 0.62; 95% CI 0.44 to 0.80; Z = 7.30; p < .001). There was significant, high between-study heterogeneity detected (I 2 = 84%; Q = 125.69; p < .001; GRADE = very low). The depression NNT was 2.95. Asymmetry in the funnel plot (see Figure 5b) and a significant Egger's test (B = À0.57, p = .042) indicated some reporting bias for pre-post depression outcomes. Trim and fill imputed data for three missing smaller studies, which resulted in a small reduction in the SC effect size estimate (ES = 0.46; 95% CI 0.23 to 0.69). End of treatment to follow-up data were provided in 12 comparisons (2 studies provided 2 groups), totalling N = 322 participants. There was a non-significant, minimal effect size for depression at follow-up, indicating that levels of depression symptoms were maintained over follow-up time (ES = 0.02; 95% CI À0.08 to 0.13; Z = 0.52; p = .600; GRADE = very low). There was minimal between-study heterogeneity (I 2 = 0%; Q = 9.93; p = .536). Abbreviations defining the study identifiers of different independent subsamples for studies with more than one effect size: G1 = group 1; G2 = group 2; MBS = mindfulness; body scan; PMR = progressive muscle relaxation; GAD = generalized anxiety disorder; PD = panic disorder.

Effect for global psychological distress at end of treatment and follow-up
The primary pre-post treatment meta-analysis was conducted on 15 comparisons (2 studies provided 2 groups) from N = 591 participants providing pre-post treatment outcomes for global distress. Figure 4 presents the pooled effect size, showing large, significant reductions in global psychological distress after SC (ES = 0.86; 95% CI 0.61 to 1.11; Z = 7.41; p < .001; GRADE = very low). There was significant between-study heterogeneity (I 2 = 87%; Q = 95.82; p < .001). The global psychological distress NNT was 2.19. The global distress outcome funnel plot (see Figure 5c) was asymmetrical, but Egger's regression was not significant (B = À2.03, p = .098). Trim and fill imputed data for one missing smaller study with a small deterioration effect after SC, resulting in a small reduction in the SC effect size estimate that still represented a large effect (ES = 0.82; 95% CI 0.56 to 1.07). Taken together, these results suggest a small impact of publication bias in the included studies. End of treatment to follow-up data were provided in 7 comparisons (1 study provided 2 groups), totalling N = 209 participants. There was a non-significant, minimal effect size for global distress symptoms at follow-up, indicating that improvements in global distress were maintained over follow-up time (ES = 0.08; 95% CI À0.06 to 0.23; Z = 1.46; p = .145; GRADE = very low). There was non-significant between-study heterogeneity (I 2 = 0%; Q = 5.43; p = .490).

Moderator analyses
Meta-regression (see Table 2) and subgroup analyses (see Table 3) investigated moderators of SC treatment effects by exploring heterogeneity in pre-post treatment anxiety, depression, and global distress outcomes. Variations in treatment effects for anxiety, depression, and global distress symptoms were not explained by differences in participants' age, gender, or study quality. No significant differences for anxiety, depression, or global distress were found based on study design or whether participants in  studies had a diagnosis versus no diagnosis. Significantly larger effects were observed for anxiety, depression, and global distress symptoms in studies that recruited patients within primary care clinical settings compared to community settings (only the effect for anxiety symptoms remained significant after adjusting the p value for multiple testing).
The effectiveness of stress control relative to control groups Meta-analytic comparisons were conducted for depression and anxiety to compare the aggregated effect of SC vs comparators at post-treatment only, as there was insufficient longer term follow-up data in the studies. Eleven comparisons from k = 5 studies, totalling N = 560 participants, evaluated post-treatment SC anxiety outcomes with a comparator (SC N = 157; comparator N = 403). The pooled effect size (see Figure 6) indicates a minimal, non-significant treatment effect in favour of SC (ES = 0.12, 95% CI À0.25 to 0.49, Z = À0.70; p = .482; GRADE = very low). There was significant heterogeneity (I 2 =65%; Q = 28.30, p = .002). Subgroup analysis indicated a minimal, non-significant effect in favour of active controls compared to SC with minimal heterogeneity (ES = À0.16, 95% CI À0.41 to 0.09, I 2 = 14%, Q = 6.96, p = .325) and a moderate significant effect in favour of SC compared to passive controls with minimal-to-low heterogeneity (ES = 0.68, 95% CI 0.33 to 1.03, I 2 = 21%, Q = 3.78, p = .286).
Eleven comparisons from five studies, totalling N = 560 participants, evaluated posttreatment depression outcomes against comparators (SC N = 157; comparator N = 403). The pooled effect size (see Figure 6) indicates a minimal, non-significant treatment effect in favour of SC (ES = 0.15, 95% CI À0.24 to 0.54, Z = 0.84; p = .401; GRADE = very low). There was a statistically significant and high level of heterogeneity (I 2 = 71%; Q = 34.11, p < .001). Subgroup analysis indicated a minimal-to-small, non-significant effect in favour of active controls compared to SC with minimal heterogeneity (ES = À0.19, 95% CI À0.42 to 0.05, I 2 = 2%, Q = 6.12, p = .410) and a moderate-to-large significant effect in favour of Note. k = number of comparisons; ES (g) = Effect size hedge's g; CI = confidence intervals; SE = standard error; R 2 = percentage of variation explained. *Significant at p < .05: *Significant at Bonferroni adjusted p < .0166 threshold for multiple testing. SC compared to passive controls with moderate heterogeneity (ES = 0.74, 95% CI 0.27 to 1.21, I 2 = 54%, Q = 6.47, p = .091). Sensitivity analyses comparing the aggregated effects of all eligible studies (reported as the main analyses) with the aggregated effects when only including studies that used a RCT design are reported in the Appendix S1.

Discussion
This review investigated the effectiveness of SC group-based psychoeducational interventions on anxiety, depression, and global distress. The depression and anxiety ES were comparable though slightly lower than the pre-post treatment ES reported in a recent meta-analysis of practice-based evidence from the IAPT programme (Wakefield et al., 2021), where SC has been widely implemented at a national level. The recruitment setting (primary care-versus community-based recruitment) was the only moderating factor of outcomes, such as that community samples tended to have lower effect sizes. The comparator analysis found a moderate-to-large effect favouring SC compared to passive controls, but no significant difference between SC and active controls at post-treatment. Although SC appears to have similar depression and anxiety outcomes to other psychological interventions examined in controlled studies, these findings need be considered in light of the high patient-to-therapist ratios that SC enables (White & Kennan, 1990). The follow-up results generally demonstrate that treatment gains from SC were maintained over time, and this is in line with previous single-study evidence demonstrating maintained improvements at 1-year (Van Daele, van Audenhove, Vansteenwegen, Hermans, & Van der Bergh, 2013b) and 2-year follow-ups (White, 1998). In terms of acceptability, the average dropout rate across SC studies (34%) appears comparable to other group psychological interventions, as approximately 25-50% dropout from groups in routine practice settings (Batch, 2018;Simon et al., 2012). Prior research indicates that high socio-economic deprivation predicts dropout from SC groups (Burns et al., 2016;Firth, Delgadillo, Kellett, & Lucock, 2020). The average size of a SC group was N = 38 in the current review, suggesting the routine delivery of relatively large groups. The prepost treatment NNT was approximately three across all outcome measures. Hence, it might be expected that around 12.6 people in a typical group of N = 38 SC participants are likely to experience considerable symptomatic improvement relative to their pretreatment functioning. As such, SC represents a low-intensity psychoeducational intervention that is clinically effective and organizationally efficient. When SC was compared to seven active treatments, few significant differences were found, and so this challenges criticisms that SC is merely a means of services managing waiting lists (Gaudiano, 2008).

Limitations of the current review
The methodological quality rating tool used (POSMRS; Ost, 2008), whilst being selected on the basis that it could rate both randomized and non-randomized study designs, tended to assign low-quality ratings to practice-based studies and may therefore be less appropriate for such designs compared to other tools that are designed to rate the quality of uncontrolled observational studies. As the included studies were made up of mostly practice-based evidence, this subsequently contributed to a lower quality rating of the overall meta-analysis as indicated by the GRADE assessments. GRADE highlighted additional issues with inconsistency in estimates and some evidence of publication bias (for depression outcomes), so the conclusions that can be drawn from the results do have some caveats. It is acknowledged that the screening of titles and abstracts was only completed by one reviewer, potentially introducing bias into study selection. Furthermore, we were only able to access data from one unpublished study from routine service evaluations, and it is possible that other SC data sources exist but were not successfully obtained through our searches and communications with corresponding authors of eligible studies.
The pre-post treatment ES analyses were limited, due to the need to account for the correlation data that were often not reported and a lack of a randomized control comparator. Treatment effects in uncontrolled studies cannot be fully attributed to SC, as some of the outcome may partially arise from spontaneous recovery or other such factors. However, attempts were made to minimize theses biases in the form of imputing correlations in effect sizes and performing a preliminary comparison of between-group effects to provide context. Some of the included studies had small study sample sizes (especially for the between-group comparisons) which can produce inflated ES and result in imprecise evaluations of between-study heterogeneity (IntHout, Ioannidis, Borm, & Goeman, 2015). Such small study sample sizes are curious, as they appear in contrast to the large group philosophy of the SC intervention. Moreover, moderator analyses were subject to low power and inadequate subgroups, due to the low number of observations for some variables to be able to detect reliable variation in effects (Guolo & Varin, 2017). The comparator analysis was limited due to the small number of studies, of which not all were randomized and as such the results cannot be considered robust (Bucher, Guyatt, Griffith, & Walter, 1997).

Research implications
Nine key points are proposed to further enhance the SC evidence base: (a) consistent primary outcomes need to be agreed by researchers to enable better comparisons across services, (b) consistent reporting of attendance and dropout outcomes, (c) consistent reporting on basic characteristics such as presenting symptoms, mean age, and gender, (d) taking a measure of treatment adherence, (e) routinely following up SC completers over the short and long term, (f) comparing SC outcomes against other bona fide psychological interventions, (g) measuring patient preferences, (h) testing interventions to reduce SC dropout (see Avishai, Oldham, Kellett, & Sheeran, 2018 for an RCT example), and (i) testing interventions to reduce the impact of socio-economic deprivation on attendance and outcome.

Service implications
This review has indicated that SC is an effective group-based LI intervention, and therefore, the manner in which the intervention is offered and integrated into services is important to consider. In services that follow stepped care principles, the patient journey starts with a LI intervention and only non-responsivity to that intervention would indicate that the patient requires a more intensive intervention (Bower & Gilbody, 2005). However, failure of an intervention may be demoralizing in terms of seeking future psychological care, and therefore, the decision to allocate to SC is important to consider. Whilst the large group SC approach is normalizing through attendance (Kellett, Clarke, & Matthews, 2007), it is clear through the acceptability data presented here that SC is not effective nor acceptable for every patient. SC should only be routinely offered for those presenting with mild-to-moderate anxiety and depression, as that is what the intervention was designed for and that is what the evidence base rests upon. The development of welldesigned, easy-to-understand, and clear patient information leaflets contrasting and comparing LI interventions need to be made routinely available in order to enable informed patient choice and to support patient preferences .
Group psychoeducation is recommended in the National Institute for Health and Care Excellence (NICE) guidelines for common mental health problems (NICE, 2011a(NICE, , 2011b. Guidelines are limited to group psychoeducation for generalized anxiety disorder (GAD), panic disorder, and, under certain limited circumstances, obsessive compulsive disorder (NICE, 2011a). NICE guidelines for GAD specify inititaing CBT-based psychoeducational groups with a group contract, that the six two-hour groups are facilitated by trained professionals (in ratio of 1 facilitator to 12 patients) delivering presentations and encouraging participants to complete in-session exercises and also contain 'homework' elements. Group-based psychoeducational interventions, computerized CBT, and one-toone guided self-help all need to be equal components of the LI offer in routine services, thus offering choices to maximize acceptability and access to care for patients with common mental disorders.

Conclusions
The present findings support earlier work suggesting that the SC large group psychoeducational approach appears beneficial in reducing psychological distress (Delgadillo et al., 2016). Although there were methodological limitations, the combined evidence from within-group and between-treatment analyses suggests that SC is effective in treating psychological distress in primary care and community settings. Whilst patient preferences for group approaches need to be considered, SC appears an acceptable, efficient, and effective LI treatment option which should be offered at the early steps of stepped care services for mild-to-moderate anxiety and depression. Further, more controlled evaluations of SC would always be a welcome addition to the evidence base.