Identification and selection of studies
Several strategies were used to identify relevant studies. We searched four major bibliographical databases (PubMed, PsycInfo, EMBASE and the Cochrane database of randomized trials) by combining terms indicative of each of the disorders with terms indicative of psychological treatment (both MeSH terms and text words) and randomized controlled trials. We also checked the references of 116 earlier meta-analyses of psychological treatments for the included disorders. Details of the searches and exact search strings are given in Figure 1.
We included randomized trials in which the effects of a psychological treatment were directly compared with the effects of antidepressant medication in adults with depressive disorder, panic disorder with or without agoraphobia, GAD, SAD, OCD, or post-traumatic stress disorder (PTSD). Only studies in which subjects met diagnostic criteria for the disorder according to a structured diagnostic interview – such as the Structured Clinical Interview for DSM-IV (SCID), the Composite International Diagnostic Interview (CIDI) or the Mini International Neuropsychiatric Interview (MINI) – were included. Comorbid mental or somatic disorders were not used as an exclusion criterion. Studies on inpatients, adolescents and children (below 18 years of age) were excluded. We also excluded maintenance studies, aimed at people who had already recovered or partly recovered after an earlier treatment, and studies on other types of medication, such as benzodiazepines for anxiety disorders. Studies in English, German, Spanish and Dutch were considered for inclusion.
Quality assessment and data extraction
We evaluated the quality of included studies using the Cochrane Collaboration “risk of bias” assessment tool . This tool assesses possible sources of bias in randomized trials, including the adequate generation of allocation sequence, the concealment of allocation to conditions, the prevention of knowledge of the allocated intervention (masking of assessors), and dealing with incomplete outcome data (this was rated as positive when intention-to-treat analyses were conducted, meaning that all randomized patients were included in the analyses). The assessment was conducted by two independent researchers, and disagreements were solved through discussion.
We also coded the participant characteristics (disorder, recruitment method, target group); the type of antidepressant which was used (SSRI, TCA, monoamine oxidase inhibitor (MAOI), other or protocolized treatment including several antidepressants); and the characteristics of the psychotherapy (format, number of sessions, and type of psychotherapy). The types of psychotherapy we identified were cognitive-behavioral therapy (CBT), IPT, problem-solving therapy, non-directive supportive counselling, psychodynamic psychotherapy, and others. Although CBTs used a mix of different techniques, we clustered them together in one group. We rated a therapy as CBT when it included cognitive restructuring or a behavioral approach (such as exposure and response prevention). When a therapy used a mix of CBT and IPT, we rated it as “other”, along with other therapeutic approaches.
For each comparison between a psychotherapy and a pharmacotherapy, the effect size indicating the difference between the two groups at post-test (Hedges' g) was evaluated. Effect sizes were calculated by subtracting (at post-test) the average score of the psychotherapy group from the average score of the pharmacotherapy group, and dividing the result by the pooled standard deviation. Because some studies had relatively small sample sizes, we corrected the effect size for small sample bias .
In the calculations of effect sizes in studies of patients with depressive disorders, we used only those instruments that explicitly measured symptoms of depression. In studies examining anxiety disorders, we only used instruments that explicitly measured symptoms of anxiety. If more than one measure was used, the mean of the effect sizes was calculated, so that each study provided only one effect size. If means and standard deviations were not reported, we used the procedures of the Comprehensive Meta-Analysis software (version 2.2.021) to calculate the effect size using dichotomous outcomes; and if these were not available either, we used other statistics (such a t value or p value). To calculate pooled mean effect sizes, we also used the Comprehensive Meta-Analysis software. Because we expected considerable heterogeneity among the studies, we employed a random effects pooling model in all analyses.
We only examined the differential effects at post-test and did not look at the longer-term effects. The types of outcomes reported at follow-up and the follow-up periods differed widely between studies. Furthermore, some studies reported only naturalistic outcomes, while others delivered booster sessions and maintenance treatments during the whole follow-up period or part of it. Because of these large differences, we decided it was not meaningful to pool the results of these outcomes.
As a test of homogeneity of effect sizes, we calculated the I2 statistic. A value of 0% indicates no observed heterogeneity, and higher values indicate increasing heterogeneity, with 25% as low, 50% as moderate, and 75% as high heterogeneity . We calculated 95% confidence intervals around I2  using the non-central chi-squared-based approach within the Heterogi module for Stata .
We conducted subgroup analyses according to the mixed effects model, in which studies within subgroups are pooled with the random effects model, while tests for significant differences between subgroups are conducted with the fixed effects model. For continuous variables, we used meta-regression analyses to test whether there was a significant relationship between the continuous variable and the effect size, as indicated by a Z value and an associated p value.
We tested for publication bias by inspecting the funnel plot on primary outcome measures and by Duval and Tweedie's trim and fill procedure , which yields an estimate of the effect size after the publication bias has been taken into account. We also conducted Egger's test of the intercept to quantify the bias captured by the funnel plot and test whether it was significant.
Multivariate meta-regression analyses were conducted with the effect size as the dependent variable. To decide which variables should be entered as predictors in the regression model, we first defined a reference group within each category of variables. To avoid collinearity among the predictors of the regression model, we first examined whether high correlations were found among the variables that could be entered into the model. Next, we calculated the correlations between all predictors (except the reference variables). Because no correlations were higher than r=0.60, all predictors could be entered in the regression models. Multivariate regression analyses were conducted in STATA MP, version 11 for Mac.