The effectiveness of Internet‐delivered treatment for generalized anxiety disorder: An updated systematic review and meta‐analysis

Abstract Background Generalized anxiety disorder (GAD) is a highly prevalent, chronic disorder associated with impaired quality of life, societal burden, and poor treatment rates. Internet‐delivered interventions may improve the accessibility of treatments and are increasingly being used. This study aimed to update a previous meta‐analysis to determine the effectiveness of available Internet‐delivered interventions in treating symptoms of GAD. Method Systematic literature searches were conducted (through April 2020) using Embase, PubMed, PsychINFO, and Cochrane to find randomized controlled trials of Internet‐delivered interventions for GAD. Risk of bias was evaluated, and Hedge's g was calculated at posttreatment and follow‐up. Results Twenty studies met eligibility criteria and were included in the meta‐analysis. Random‐effect models detected large effect sizes for primary outcomes of anxiety (g = 0.79) and worry (g = 0.75), favoring treatment. Effect sizes for depression, functional impairment, and quality of life were moderate to large. Maintenance of effects at follow‐up seems likely. Conclusions Results support the effectiveness of Internet‐delivered treatments for GAD. Considerable heterogeneity between studies appeared moderated by variability in the interventions themselves, highlighting the importance of further investigation into the characteristics that may optimize treatment outcomes. Overall, Internet‐delivery appears to be a viable mode of treatment for GAD with potential to relieve existing gaps in the provision of treatment.

the possibility of fast dissemination of available treatments, we wanted to examine the effectiveness of available Internet-delivered GAD treatments. Specifically, we aimed to systematically review all evidence for Internet-delivered interventions targeting GAD and thereby update a systematic review and meta-analysis conducted by Richards et al. (2015). While there are several recent meta-analyses of psychological treatments of GAD, some of which included Internet-delivered interventions (e.g., Carl et al., 2020;Chen et al., 2019;Hall et al., 2016), none specifically set out to evaluate all available evidence for Internet-delivered interventions (e.g., they only included iCBT, focused only on specific subgroups like older adults, or Internet-delivered interventions were included alongside other self-help and face-to-face interventions).
As Internet-delivered treatments are developing fast and we have been aware of a number of new developments in the area of Internet-delivered treatments for GAD (e.g., Dahlin et al., 2016;Richards et al., 2016), we wanted to build on the work of Richards et al. (2015) and provide a fresh look at the available treatment options and their effectiveness. In doing so, we also strived to offer an in-depth look at some moderators that may influence the effectiveness of Internet-delivered treatments for GAD; for example, type of control group used (Zhu et al., 2014), specific intervention characteristics like theoretical orientation, composition or length of the treatment (Zhang et al., 2019), sample characteristics (Carl et al., 2020), and support offered (Wright et al., 2019). There were therefore two questions that guided our review (Eilert et al., 2019)

| Literature search
This systematic review and meta-analysis was conducted according to the PRISMA statement (Moher et al., 2009; for the corresponding checklist see Appendix A). A systematic literature search for English language articles was conducted across four prominent electronic databases: Embase, PubMed, PsychINFO, and Cochrane. The search was carried out in two stages, the first one in December 2018 and a more updated one in April 2020. Only publications published after June 1, 2013 (cut-off for articles included in original review) were included in the searches. To facilitate an update of the previous meta-analysis, we used the same two key search phrases "Internet treatment for generalized anxiety disorder" and "Internet treatment for anxiety." We used the same two search phrases across the four databases to replicate procedures in the original meta-analysis. This resulted in a total of eight searches. In addition, where protocols meeting eligibility criteria were identified through the searches, we looked up the initial protocol registration on the corresponding trial register for recent publications associated with it. We also cross-checked references of other relevant review papers.
EILERT ET AL. | 197 2.2 | Selection of studies Following the removal of duplicates, initial search results were screened at title/abstract by one researcher (N. E. Dec'18' searches/ R. W. April'20' searches) and entirely off-topic studies were excluded. Remaining papers were assessed for eligibility by reading full texts and grounds for exclusion were recorded in line with a predefined hierarchy. Eligibility assessments were carried out independently by two researchers (N. E. and A. E. Dec'18' searches/ R. W. and N. E. April'20' searches) and discrepancies in study selection that arose were solved through discussion and consultation with a senior researcher (D. R.).
Eligibility criteria were established to solely include studies that were (1) an Internet-delivered intervention for symptoms of GAD; (2) a randomized controlled trial (RCT); (3) compared with a wait-list, placebo, or attention control; (4) had a sample with a confirmed clinical diagnosis of GAD who may have had comorbidities and/or impairment in functioning (studies that included a minority of participants with significant subthreshold symptoms of GAD were also accepted); (5) had a sample of adults over 18 years of age; (6) published in a peerreviewed journal in English and; (7) included reliable and valid outcome measures of symptoms of GAD (i.e., measures of anxiety and/or worry).
In addition, for transdiagnostic protocols and studies reporting on outcomes across several anxiety disorders in total only, we contacted authors to request the discrete outcome data pertaining to participants with a confirmed diagnosis of GAD. Where authors were unable to provide the required data, studies were excluded from the meta-analysis.

| Data extraction
We extracted the following data from included studies: (a) country where study was conducted; (b) participant/sample characteristics (diagnostic measure used, mean age, baseline symptom levels, method of referral); (c) scope of intervention (transdiagnostic or disorder-specific); (d) intervention type (i.e., iCBT, Internet-delivered psychodynamic therapy etc.); (e) type of control group; (f) support during intervention; (g) length of treatment phase and number of modules in intervention; (h) engagement metrics (percentage of program completed, average supporter time spend per participant); (i) means and standard deviation to calculate posttreatment and follow-up effect sizes where relevant.
As some trials used numerous instruments to measure outcomes related to our five constructs of interest (primary outcomes: anxiety and worry; secondary outcomes: depression, impaired functioning, and quality of life), a hierarchy of instruments was developed before data analysis as per recommendations by the Cochrane Handbook and to facilitate uniformity where possible (B. Johnston et al., 2019).
Within each construct, a list of relevant outcome measures was generated, which were then ranked based on reliability, validity, and interpretability (i.e., given our primary aim of evaluating intervention effects on GAD symptoms, anxiety measures relating more specifically to GAD like the Generalized Anxiety Questionnaire-7 item were rated higher then generic anxiety measures like the Beck Anxiety Inventory). See Table 1 (columns 9-13) for the specifics of which outcome measure informed which construct in each study. To evaluate potential effects of using different outcome measures within each construct, the specific measure used by each study was coded to be evaluated through sensitivity analyses. We contacted the authors of 13 studies, from the original and updated searches, to extract either outcome data pertaining to GAD participants (as described above) and/or additional data relating to potential moderators of effects. Data were extracted by one researcher (N. E.) and checked for accuracy by a second one (R. W.).

| Quality assessment of included studies
To determine the risk of bias of each study we employed the CLEAR NPT checklist (Boutron et al., 2005), designed to measure the quality of RCTs evaluating nonpharmacological treatments (NPTs). The protocol for the meta-analysis intended to use the Revised Cochrane Risk-of-Bias tool (RoB 2); however, as the investigators became aware of issues raised in relation the RoB 2 (i.e., complexity of the tool, relatively low inter-rater reliability; Minozzi et al., 2020), along with the fact that the tool is still in the validation phase, it was decided to use another well-established, standard for assessing the quality of RCTs. The CLEAR NPT checklist has been successfully used in previous meta-analytic studies of Internet-delivered interventions for depression (Wright et al., 2019). This checklist assesses studies on 10 key questions and 5 subquestions; most of which require an answer of yes/no/unclear based on whether or not the study meets the criteria for that item. The items focus on the adequacy of randomization, the accessibility of the details of the interventions, appropriacy of supporter skills, measurement of treatment adherence, whether there was blinding of everyone involved or, if not, the acknowledgment of the steps taken to prevent bias, uniformity across follow-up schedules of conditions, and whether an intentionto-treat principle of analysis was followed. The CLEAR NPT checklist was independently completed for all studies by two researchers (R. W. and O. M.) in relation to our primary outcomes (anxiety and/or worry). Conflicts were discussed and resolved in consultation with NE. As the CLEAR NPT checklist does not provide an overall study quality rating, risk of bias assessment results were used to inform evaluations of the state of the body of literature as a whole rather than incorporated in meta-analytic models on an individual basis.

| Meta-analytic procedures
All analyses, including the calculation of effect sizes, were conducted in R using the "metafor" and "dmeta" packages (Harrer et al., 2019;Viechtbauer, 2010). To assess posttreatment standardized mean differences between treatment and control groups, Hedge's g was calculated for each construct addressed within each study. In threearm trials in which both active arms met inclusion criteria, control  (9) Wait-list ( group sample sizes were halved to allow for the calculation of separate effect sizes by trial arm . For the evaluation of effect sizes, we implemented the following cut-off points; 0-0.32 for a small effect, 0.33-0.55 for a moderate effect, and 0.56-1.2 was considered a large effect (Lipsey & Wilson, 1993).
Due to the anticipated moderate-to-high level of between-study heterogeneity and in line with the PROSPERO protocol, randomeffect models were used to pool effect sizes. Restricted maximum likelihood (REML) was used to estimate between study variance and heterogeneity was assessed through the Q-value, I 2 statistic and prediction/credibility intervals. According to Higgins and Thompson (2002), an I 2 value of 0% indicates no heterogeneity, 25% indicates low heterogeneity, while 50% and 75% indicate moderate and high levels, respectively. Model fit and the presence of outliers were assessed through diagnostic plots and statistics.
Follow-up between-group effects were assessed through the same models as posttreatment effects where this was feasible (i.e., a sufficient number of studies included relevant data). Where there were multiple follow-up time points available for one study, the follow-up time point closest to the ones used by other comparisons within an analysis was selected to ensure as much coherency as possible (i.e., where all but one study included only follow-up time points under 6 months, the 6-month rather than the 12-month follow-up point was selected). Two sets of sensitivity analyses were conducted to assess the robustness of conclusions drawn from the analysis. To assess the influence of studies that may potentially act as confounders within the meta-analysis (i.e., studies where not all participants had GAD diagnoses but subthreshold symptoms) random-effect models with and without these studies were compared. To assess the influence of potentially debatable methodological decisions (i.e., how outcome measures within constructs were selected in line with a predefined hierarchy) mixed-effect models controlling for those decisions (i.e., by including outcome measure used within each study as a predictor of effects) were conducted.
To explore various potential moderators, mixed-effect models of primary outcomes (anxiety and worry) utilizing the Knapp-Hartung method to reduce the chance of type 1 error were used. As per Fu et al.
(2011) recommendations and as an absolute lower limit, moderators were only explored statistically if there were at least six moderate-tolarge studies with data available for any continuous moderator and four moderate-to-large studies per subgroup for categorical moderators. In terms of baseline symptom severity, relative symptom severity was calculated by subtracting average from observed baseline scores across studies reporting on a specific outcome measure (Chaimani, 2015).
Publication bias was examined using funnel plots and Egger et al. (1997) test was used to evaluate possible asymmetry in the latter.

| Selection and inclusion of studies
The electronic database searches resulted in 4165 records. Four records were identified through other sources. After duplicate removal 2401 records remained. Of these, 2252 records were excluded after reviewing title and abstract, leaving a total of 149 potentially eligible records. Full-text versions of these articles were obtained and examined for eligibility. Of those considered eligible, it was necessary to exclude three transdiagnostic studies as the authors did not provide requested data for GAD participants. Finally, nine RCTs fulfilled all eligibility criteria and were included. In addition, 11 articles were carried over from the original meta-analysis (Richards et al., 2015), bringing to 20 the number of studies included for analysis. Figure 1 shows the results of the systematic searches, the flow of exclusions as well as the reasons for exclusion.

| Description of included studies
The main characteristics of the included studies are presented in Table 1. Across 20 studies a total of 1333 participants were included -767 in treatment conditions and 566 control conditions. Sample size ranged from 13 (Bell et al., 2012) to 199 (Richards et al., 2020) with a mean sample size of 52. All participants were adults, ranging were conducted to ensure the small numbers of participants without a full GAD diagnosis across these studies did not unduly influence meta-analytic outcomes.
In 18 comparisons, regular support was provided by a qualified or soon-to-be qualified therapist or clinician. Five received support in the form of psychological well-being practitioners, nonspecialist psychologists, technicians, or researchers, and another study was unsupported. Support was mainly offered with a view to reinforcing engagement, normalizing difficulties, and giving guidance throughout the program completion. Support consisted of individual feedback, answering questions, and feedback on assignments. This was predominantly given through email or telephone, but also through text messaging, webpage messaging, video conferencing, and online dis-

| Program completion and support utilization
For each study, the average percentage of the intervention program completed by participants was estimated within four categories (0%-25%, 26%-50%, 51%-75%, 76%-100%) as some articles did not allow for the calculation of exact percentages. Program completion was addressed in 19 out of the 20 studies. One study did not include measures of program completion (Christensen et al., 2014). For most interventions (n = 15), the researchers reported an average amount of program completion between 76%-100%, six reported between 51% and 75%, and one study reported a percentage completion in the range of 26%-50%. The average amount of time supporters spent per person over the course of their treatment was 77.04 min across the 15 studies providing this information, ranging from 18.15 min (Newby et al., 2013) to 130 min (Titov et al., 2009).

| Intervention characteristics
Overall, nine interventions were disorder-specific and 15 were transdiagnostic. A total of 18 interventions were based on conventional (second wave) CBT (seven of which were disorder-specific), F I G U R E 1 Flowchart of study inclusion and exclusion. GAD, generalized anxiety disorder; RCT, randomized controlled trial EILERT ET AL.
| 205 two on cognitive bias modification, one on psychodynamic therapy (disorder-specific), one on extinction therapy, one on acceptancebased behavior therapy (disorder-specific), and one on affect-focused psychodynamic psychotherapy. Most interventions consisted of eight modules (i.e., units of content delivered to participants). Specifically, there were 12 interventions with 8 modules, 4 with 6 modules, 2 with 10 modules, 3 with 7 modules, 1 with 5 modules, 1 with 4 modules, and 1 with a range of 3-6 modules.
With the exception of the two interventions based on cognitive bias modification, most interventions included some combination of the following nonspecific intervention components: psychoeduca-  However, in studies where participants and care providers were not blinded, the provision of all other treatments and care, and the number of withdrawals and loss to follow-up were the same in each randomized group, which may have helped to minimize the risk of bias associated with lack of blinding. In studies where outcome assessors were not blinded, no study reported on specific methods used to avoid ascertainment bias (i.e., systematic differences in outcome assessment). Only 35% (7/20) of studies adhered to the same follow-up schedule for randomized groups with discrepancies often related to shorter follow-up schedules for waiting list groups, possibly due to ethical considerations surrounding the withholding of treatment. Quality ratings for included studies are displayed in Figure B1 (see Appendix B).

| Quality of studies
3.6 | Meta-analysis of primary and secondary outcomes 3.6.1 | Random-effect model for anxiety

| Random-effect model for quality of life
In terms of quality of life, outcomes across seven comparisons (six studies) suggested a small-to-moderate effect in favor of the treatment (g = 0.33; 95% CI, 0.06, 0.59; p = .024; see Appendix B, Figure B4). While the I 2 and Q value metric suggested little hetero-

| Sensitivity analysis
Sensitivity analyses were conducted to evaluate the potentially con- ). In addition, mixed-effect models (using REML estimation and the Knapp-Hartung method) were used to evaluate the possibility that outcomes were dependent on which outcome measure was used to assess each construct within each study.
Omnibus tests across these mixed-effect models were nonsignificant, confirming that the outcome measure used within each study did not predict effects found across all of the five constructs (see Appendix B and Table B1 for details).

| Moderator analyses
Several potential moderators of effects were explored across the primary outcomes of anxiety and worry using mixed-effect models.
Other study level moderators were deemed not feasible for inclusion  Table B2. Average baseline symptom scores across studies are reported in Appendix B and Table B3 3.10 | Publication bias A funnel plot generated across studies assessing anxiety suggested some possible asymmetry; however, given the heterogeneity found within the random-effect model, as described above, and Egger's test

| DISCUSSION
The main aim of the current study was to evaluate the effectiveness of Internet-delivered interventions in treating symptoms of GAD. We These findings confirm and extend Richards et al.'s (2015) findings and are in line with recent meta-analyses evaluating psychological treatments for GAD (Carl et al., 2020) and iCBT for depression and anxiety (Andrews et al., 2018). While both studies found comparable large effect sizes across their primary outcomes (g = 0.76, n = 39; g = 0.80, n = 64), crucially neither focused on the effects of Internet-delivered interventions on GAD specifically, with the first also including non-Internet-delivered interventions and the latter only including Internet-delivered interventions based on CBT.
This differentiation is key, as the sole focus on the Internet as a treatment modality, irrespective of ever debated psychotherapeutic models, is what may help to fill existing gaps in the provision of treatment to those suffering from GAD (Alonso et al., 2018;Richards et al., 2020). In terms of secondary findings, beneficial large effects of psychological interventions for GAD on depressive symptoms have also been previously reported (Carl et al., 2020;Cuijpers et al., 2016) and are encouraging considering frequent comorbidity between MDD and GAD and close associations between anxiety and depressive symptoms (Jacobson & Newman, 2017;Moffitt et al., 2007).
Similarly, our findings of large co-occurring functional improvements are important in the context of close ties between anxiety, depression, and functioning, especially in the longer term (Iancu et al., 2014;Lukat et al., 2017).
Nevertheless, meta-analytic results also suggested significant variability in effect sizes among included studies, which, at least in part, may need to be understood in the context of significant variability among the Internet-delivered interventions implemented within these studies. Not only did interventions vary in how long participants had access to them, but more importantly they included various components and aimed to address various underlying mechanisms in line with the particular theoretical model they adhered to. Interestingly though and independent of those varying intervention characteristics, our analyses suggested that longer treatment durations are associated with better outcomes in terms of GAD symptoms. While more research is needed to understand the intricacies of dose-response relationships in Internet-delivered intervention McVay et al., 2019), this finding may suggest that time, or at least some minimum amount of time, is of the essence in allowing various "active ingredients" of specific Internetdelivered interventions to work. This would make sense in the context of established face-to-face treatments for GAD usually spanning 16+ weeks (American Psychological Association, 2016).
Internet-delivered interventions drawing on a wealth of psychological theories have been developed and implemented. Still, conventional "second wave" CBT-based interventions continue to dominate the field. As such, only comparisons between iCBT and other interventions (drawing on various theoretical models including "third wave" CBT) were feasible, which did suggest somewhat larger effects across conventional CBT-based interventions. Importantly, interventions following other theoretical models were also found to be effective though, and in light of users not necessarily choosing iCBT when given the option and the effects not meeting user preferences can have (Lindegaard et al., 2020;Williams et al., 2016), the further development and implementation of intervention grounded in different paradigms seems advisable. Here, as more research EILERT ET AL.
| 209 becomes available it will be particularly interesting to explore the effectiveness of "second wave" CBT interventions in comparison to their "third wave" counterparts, as these have become increasingly popular in the treatment of GAD in recent years.

| Limitations
The study has several limitations.

ACKNOWLEDGMENTS
We wish to thank all authors who provided us with additional data for their studies. Worry a n/a n/a n/a In terms of baseline symptom severity, moderator analyses were only feasible across two outcome measures (GAD-7: n = 15 and PSWQ: n = 17), with the numbers of studies reporting on the remaining outcome measures being too small to facilitate mixed-effect models.