Results of the search
Initially, we identified 303 references. After reading the abstracts, 265 references were considered relevant for our review and retrieved for more detailed evaluation. The search found 37 additional studies written in Chinese. We commissioned a professional translator for the full translation of these papers. The translation process is still ongoing, so in the present review we considered all Chinese studies as awaiting assessment studies (we will include them in the next update of the review, which is expected to be in a two years time). An additional four studies were considered as awaiting assessment because the papers reported insufficient information to decide about inclusion or exclusion (Ahlfors 1988; Galecki 2004; Moeller 1986; Thomas 2008). We contacted corresponding authors and at the time the review has been submitted we are still waiting for their reply and further information. We identified two ongoing studies. Although the search was thorough, it is still possible that there are still unpublished studies which have not been identified.
A total of 37 studies were included in this systematic review. Of these, four trials were unpublished (29060/785; Lu 10-171, 83-01; Lu 10-171,79-01; SCT-MD-02). Attempts to contact authors for additional information were successful in seven cases (with additional data provided by authors) and unsuccessful in 13.
The mean sample size per arm was 107 participants (range 17-303). Sixteen studies recruited fewer than 100 participants overall.
The great majority of included studies were reported to be double-blind (28 out of 37 RCTs, that is 75.6%).
The great majority of included studies had been carried out in Europe or in the US (29 out of 37 RCTs, that is 78.4%). Two studies randomised patients in China (Hsu 2011; Ou 2010), three in India (Khanzode 2003; Lalit 2004; Matreja 2007) and one in Russia (Yevtushenko 2007).
Four studies randomised only elderly patients (Allard 2004; Karlsson 2000; Kyle 1998; Navarro 2001) and 22 studies only patients aged between 18 and 65 years (59.4%). The remaining studies randomised both adult and elderly patients or it was unclear.
Only three studies (8.1%) included patients with bipolar disorder (Bougerol 1997a; Hosak 1999; Timmerman 1993). As per protocol, RCTs were included in the present review only if patients with bipolar disorder were less than 20% in each study.
Twenty trials enrolled only out-patients, four studies only in-patients (Andersen 1986; de Wilde 1985; Hosak 1999; Lu 10-171,79-01), seven recruited both in- and out-patients (Bougerol 1997a; Gravem 1987; Karlsson 2000; Lu 10-171, 83-01; Navarro 2001; Ou 2010; Shaw 1986), three studies enrolled patients from general practice (Bougerol 1997b; Ekselius 1997; Lewis 2011). In the remaining three studies the setting was unclear. About two thirds of the participants were women. In 31 RCTs patients had a formal diagnosis of major depression (or major depressive disorder) according to DSM-III, DSM-III-R, DSM-IV or ICD-10 criteria. In six studies the diagnosis was based on different standardized research criteria (i.e., Feighner criteria).
Interventions and comparators
We found RCTs comparing citalopram with TCAs (amitriptyline, imipramine and nortriptyline), tetracycles (mianserin and maprotiline), other SSRIs (escitalopram, fluoxetine, sertraline, fluvoxamine and paroxetine), one SNRI (namely, venlafaxine), one MAOI (moclobemide), other conventional ADs (mirtazapine and reboxetine) and also only one non-conventional ADs (St John's wort, or hypericum). Hypericum, a member of the Hypericaceae family, has been used in folk medicine for a long time for a range of indications including depressive disorders. It is licensed and widely used in Germany for the treatment of depressive, anxiety and sleep disorders and in recent years it has also become increasingly popular in other European and non-European countries (Linde 2008).
Details on the included studies are as follows: nine studies (overall 1277 participants) comparing citalopram with TCAs (four studies versus amitriptyline, two versus imipramine and two studies versus nortriptyline and one study versus clomipramine, respectively); three studies (overall 477 participants) comparing citalopram with tetracyclics (two studies versus mianserin and one study versus maprotiline); 18 studies (overall 4200 participants) comparing citalopram with SSRIs (seven studies versus escitalopram, four studies versus fluoxetine), four studies versus sertraline, one study versus fluvoxamine, one study versus paroxetine and one study versus either escitalopram or sertraline); six studies (overall 1137 participants) comparing citalopram with SNRIs (one study versus each of the following drugs: venlafaxine and mirtazapine), comparing citalopram with MAOI (one study versus moclobemide), comparing citalopram with other conventional psychotropic drugs (two studies versus reboxetine), comparing citalopram with non-conventional antidepressants (one study versus hypericum).
There were four three-arm trials: one study comparing citalopram (20 mg/day) with escitalopram 20 mg/day or escitalopram 10 mg/day; one study comparing citalopram (20-60 mg/day) with amitriptyline (150-300 mg/day) or fluoxetine (20-60 mg/day); one study comparing citalopram 10-30 mg/day with citalopram 20-60 mg/day or imipramine (50-150 mg/day); one study compared citalopram (20 mg/day) with escitalopram 10 mg/day or citalopram 10 mg/day. One four-arm trial compared citalopram 20 mg/day with citalopram 40 mg/day or paroxetine controlled-release 12.5 mg/day or paroxetine controlled-release 25 mg/day.
Of the included 37 studies, one study (Andersen 1986) did not report efficacy data and one study reported split data according to different genotypes (Lewis 2011). We were not able to obtain further data for these trials because we could not contact the authors by any means and therefore, could not obtain extra information from these authors. By contrast, all 37 studies did report tolerability/acceptability data that could be entered into a meta-analysis The great majority of the identified studies (34 out of 37 RCTs) used the MADRS or HRSD as the rating scale of choice for primary or secondary outcome measures. Among the 35 studies reporting dropouts due to any reason, 31 reported dropouts due to side effects. Twenty-eight studies reported the number of patients experiencing individual side effects.
Of the 265 references retrieved for more detailed evaluation, 214 articles did not meet our inclusion criteria and were excluded because of one of the following reasons: duplicate publications (eight articles), wrong diagnosis (24 articles), wrong population (51 articles), wrong comparison or intervention (63 articles) and non-randomised or wrong design (68 articles). Fourteen additional studies were considered as awaiting assessment (overall we found 51 awaiting assessment studies - see above).