In a recent article published in Arthritis Care & Research, Hall et al (1) performed a meta-analysis of data from 6 randomized controlled trials (RCTs) of Tai Chi for individuals with chronic arthritis, and the authors concluded that, “Tai Chi has a small positive effect on pain and disability in people with arthritis.” The baseline-weighted mean pain scores of the included trials ranged from 18–44 out of a maximum 100 (based on 4 of 6 studies for which English language articles were available). Contrary to the authors' suggestion that the reviewed studies found an overall small positive effect, the 10.1 point average decrease in pain was actually quite substantial given the low baseline pain scores. Indeed, when converted to a standardized mean difference effect size, the resulting Hedges' g of 0.67 is typically interpreted as moderate to large. Hall and colleagues noted that the included trials were small and of low methodologic quality, and that reviewing only published studies may have led to an overestimation of the effect of Tai Chi. They did not, however, appear to recognize the degree to which combining the effects of a group of small, underpowered studies can inflate effect estimates. Cell sizes in the RCTs included in the meta-analysis done by Hall et al ranged from 8 to 56 subjects, with only 1 treatment cell having more than 32 subjects.
Kraemer and colleagues have shown that the reliance on small, underpowered trials in meta-analyses results in substantially overestimated effect sizes due to the tendency for higher publication rates among studies reporting positive and statistically significant effects (2). Small trials that are not statistically significant are not usually published. Small trials that are published generally need to have sizeable effect sizes just to meet the minimum threshold for statistical significance. The studies included in the meta-analysis done by Hall et al, for instance, would have had to have effect sizes of 0.4 to 1.1 for pain reduction just to meet a minimum threshold for statistical significance (P < 0.05). The problem is even worse than that, however, since small studies that cross the P < 0.05 threshold do it by varying degrees, with some studies producing quite large effect sizes, even when the null hypothesis of no treatment effect is true. Kraemer et al show that when the true effect of a treatment is zero and there are 20 subjects per subgroup (e.g., in the treatment and control groups), the estimated standardized effect size for the mean difference in a meta-analysis of only statistically significant RCTs will be between 0.90 and 1.00. With n = 50 per subgroup, the expected effect size for such a group of null studies is ∼0.60.
Not all of the RCTs included in the meta-analysis by Hall and colleagues reported statistically significant reductions in pain. However, most did, and not surprisingly, the results of the meta-analysis conformed to what would be expected based on the work of Kraemer et al and similar work by others (3, 4). Meta-analysis is an important technique that increasingly informs health care policy and clinical practice guidelines and recommendations. Researchers and clinicians need to be aware of the important pitfalls that can lead to significant biases in reported results. Hall et al would have done well by concluding that the literature on Tai Chi for chronic pain was not yet ready for meta-analysis.