The basic method of meta-analysis is to combine data from different studies by assigning weights to those studies by sample size and by information (effect estimates and estimates of error). The result is a common odds ratio and confidence interval that sums all studies, rather than a crude average, or a vote count (e.g. Six studies were positive, three studies were negative). Consequently, meta-analysis has the potential to provide summarizing clarity to a confusing literature (1). Sometimes meta-analysis identifies a finding that was not statistically recognizable in the individual studies, usually because of small sample size (type II error). Other times, as in the paper by Tondo et al. in this issue (2), meta-analysis refutes a common clinical belief (in this case, the belief that anticonvulsants are more effective than lithium for rapid-cycling bipolar disorder, RCBD). In fact, this meta-analysis suggests that RCBD is a globally treatment-refractory condition, in which treatment response (to whatever mood stabilizer) is worse than in non-rapid-cycling bipolar disorder.

Unfortunately, meta-analysis seems a bit esoteric. In fact, it is simply a quantitative analogue to what clinicians and researchers do when evaluating the scientific literature. In light of their experience and of the published evidence they come to believe that a treatment is more or less effective. Instead of personal judgment alone or a merely narrative review of what has been published, a systematic review attempts to assess, in a way that others can reproduce, all the available evidence: a meta-analysis simply adds a numerical dimension where appropriate, without devaluing the narrative approach. Meta-analytical summaries do not constitute meta-physical truths, but instead seek to present the most objective way of reviewing the evidence. Previous articles in this journal are good examples of the benefits of this techinique (3, 4).

Yet meta-analysis has its critics, primarily because of the risk of three biases: the ‘garbage in, garbage out’ problem, where poorly performed individual studies are seen as not worthy of assessment in systematic reviews; concerns about heterogeneity, where studies with different methods are seen as too different to allow for quantitative summing; and publication bias (the file drawer problem), where negative studies are less likely to be published (5). It is often argued that, because of these biases, meta-analysis of treatment studies should be limited to randomized clinical trials (RCTs), as in the Cochrane Collaboration. Despite the fact that RCTs represent the most reliable way to establish a cause–effect relationship, we believe that this perspective represents a reification of the concept of randomization.

The problem of rapid-cycling bipolar disorder (RCBD) is exactly the type of question where the need to apply meta-analysis, carefully and cautiously, to observational data is demonstrated. There are basically no randomized data to meta-analyze, and, further, it is likely that there never will be, because of the expense and difficulty of studying this population. Yet, there are many clinical misconceptions, which meta-analysis clears up by correcting two methodological errors. First, there is the ‘apples and oranges’ error, where one compares results from different studies, based on different samples, instead of only making comparisons within samples. In this meta-analysis, the reader is guided to the four comparative studies in which lithium and anticonvulsants were compared in the same population, and no differences exist. Secondly, there is a lack of recognition of the concept of levels of evidence. As discussed in the evidence-based medicine (EBM) literature, this concept allows one to weigh the evidence, putting more weight on more rigorous, larger, better designed studies. This central classification, primarily based on the epidemiological concept of validity and secondarily on power (sample size), constitutes an indispensable framework to understand the quality of the inferences we draw.

In Table 1, we offer our version of the five levels of evidence, derived from the work of leaders in the EBM movement (6), as adapted by us for the state of the psychiatric literature. The data suggesting benefit with anticonvulsants involve level III non-randomized evidence, and thus carry less weight than the level I randomized data suggesting inefficacy of lithium in rapid-cycling bipolar disorder, precluding definitive judgments of superiority of one agent over another.

Table 1.  Levels of evidence
Level I: Double-blind randomized trials
 Ia: Placebo-controlled monotherapy
 Ib: Non-placebo-controlled comparison trials, or placebo-controlled  add-on therapy trials
Level II: Open randomized trials
Level III: Observational studies
 IIIa: Nonrandomized, controlled studies
 IIIb: Large non-randomized, uncontrolled studies (n > 100)
 IIIc: Medium-sized non-randomized, uncontrolled studies (100 > n > 50)
Level IV: Small observational studies (non-randomized, uncontrolled) (50 > n > 10)
Level V: Case series (n < 10), case report (n = 1), expert opinion

Nonetheless, skepticism about meta-analysis of observational studies persists. For instance, we observed a reviewer seriously argue that observational studies should simply be ignored, and that if not enough was known about an issue, patients should be offered randomization where available, or not treated at all. One imagines this academician quite literally flipping a coin at the bedside to decide on a treatment for a particular patient. This kind of extremism, a misinterpretation of EBM, vulgarizes the concept of levels of evidence. It is as if level I was the only important level, and all the rest were equally useless. In fact, the differentiation between levels II to V is meant to indicate that they differ from each other, and that, for instance, large level III non-randomized datasets are much more meaningful than a clinical anecdote from level V. Statistical techniques, like multivariate regression, allow for fine tuning of non-randomized datasets to reduce some (though not all) confounders, thus strengthening the validity of non-randomized data. Indeed there is some evidence that high-quality observational studies may produce results that are similar in magnitude and direction to randomized studies (6). Given the expense and difficulty of conducting randomized studies, especially in psychiatry, where patients and the public are often wary of research studies, it makes sense to make as much use of observational data as possible.

There also are certain philosophical assumptions that underlie this controversy. Those who think that only RCTs provide useful information take a ‘frequentist’ approach, believing that a single study should be considered in isolation based on its own methods, and further, that a single study can provide either definite confirmation or refutation of an hypothesis. The alternative approach, loosely labeled ‘Bayesian’ (though there are many varieties of it) holds that no single study is definitive and that all studies serve as modifiers of previous viewpoints (‘‘prior probability’’) leading to newer viewpoints (‘‘posterior probability’’) (8). Meta-analysis can be thought of as, loosely speaking, a quantitative way to put all studies on a topic into context and then to draw conclusions, much as in the Bayesian conceptual framework of statistics. We think that most clinicians function as intuitive Bayesians (9). They have certain opinions based on their own clinical experiences and their understanding of the scientific literature, and they use new studies to alter those opinions. Meta-analysis is consistent with the practice of clinical medicine in this sense, whereas the frequentist insistence on the ideal perfect randomized study is far from clinical reality. Perhaps it is this division that leads to the wariness of many clinicians regarding EBM (10). This need not be the case, if proponents of EBM avoided reifying randomization, and if clinicians were more aware of the concept of levels of evidence.

We can be scientific physicians, and still humanistic, in the tradition of great clinician/scientists like William Osler. Where feasible, randomized studies will remain the gold standard. But we should not let the perfect be the enemy of the good.


  1. Top of page
  2. Acknowledgements
  3. References

Supported by NIMH Research Career Award (MH-64189) (SNG).


  1. Top of page
  2. Acknowledgements
  3. References