Contemporary Challenges in Deriving Summary Estimates of Comparative Effectiveness Using Meta-Analysis


  • Edward J. Mills has no conflicts to declare.

Edward J. Mills, Canada Research Chair in Global Health, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada V6Z 1Y6. E-mail:


A meta-analysis involves taking data from more than one study and analyzing it to derive a pooled estimate, commonly referred to as a summary estimate. Meta-analyses can vary considerably from very simple to very complex, and there is a variety of different software packages available to aid the process. As with any good research, the most important thing is asking the right question. In the context of meta-analyses, my perspective is very broad in that I believe it is relevant to pool as much data as possible; i.e., for the most part, all apples are oranges and vice versa, and the extensive data set can then be used to try to explain any differences according to outcomes. The philosophy behind this approach is that if we expect that a drug class is going to provide a largely similar direction of effect across a broad range of patients, pooling data from studies of different types of drugs and different dosages of drugs is reasonable, and should facilitate subsequent subgroup analyses; although, it is most important to define the questions to be addressed a priori.

Consistency of Effects

The consistency of the direction of effect across trials is the first parameter to consider regarding pooling of data. Ideally, studies examining interventions for a particular indication would all demonstrate a similar estimate of the size of the effect and meta-analysis would further corroborate this. However, as in reality this doesn't happen very often, a visual assessment of heterogeneity should be undertaken to decide whether or not meta-analysis is likely to be useful, and to determine which data to pool. To put it very simply, a meta-analysis of four studies, two of which exhibit a positive effect and two of which exhibit a harmful effect, is not going to be informative without some understanding of why these studies have such disparate results. In the “real world,” generally, some studies give positive results while others give negative results, but the overall direction of the effect tends to be relatively consistent (Fig. 1), and upon pooling the data, additional analyses based on prior knowledge can be pre-planned to help explore and explain the discrepancies.

Figure 1.

Visual assessment of heterogeneity (artificial data). CI, confidence interval.

Assumptions Prior to Pooling Data

A very narrow or very broad approach can be taken to pooling data. In the latter approach, data may be pooled across a variety of different interventions and a variety of patient populations, which allows for extensive exploration of the data that may help to identify causes of variability in outcomes, and facilitate treatment comparisons. Causes in variability in outcomes across studies can be assessed in a variety of ways. For example, sensitivity analysis can be used to determine the impact of predefined parameters on the results of the meta-analysis. Taking statins for the secondary prevention of cardiovascular events as an example, studies primarily involving diabetic patients may be excluded from the pooled dataset because they are a particularly high-risk group that are expected to have worse outcomes than the general study population. Data pooling may also be influenced by the logistics for predefined subgroup analyses of specific patient populations, interventions or outcomes. Various tests can be applied to assess data heterogeneity, and to gauge the consistency of subgroup analyses. The most common of these is the I-squared statistic, a measure of the proportion of inconsistency of an analysis that cannot be explained by chance. It ranges between 0% and 100% with lower values representing less heterogeneity. Meta-regression, a more sophisticated method for investigating heterogeneity of effects across studies, examines the relationship between one or more study parameters and the sizes of effect observed in the studies; again, software is available to perform this function.

Comprehensive Meta-Analysis—Statins

Two models commonly used in meta-analysis are the fixed-effects and random-effects models. The former makes the assumption that the individual specific effect is correlated with the independent variables; i.e., it assumes there is a single true value underlying all study results. The latter assumes that the individual specific effects are uncorrelated with the independent variables; by assuming greater variability across studies, greater weight is placed on smaller studies. In our hands, meta-analysis of data from 41 studies of the efficacy of statins in cardiovascular disease, involving more than 41,000 participants, led to an estimated relative risk reduction of all-cause mortality of 0.85 (confidence interval (CI); 0.81 to 0.90) with a fixed-effects model, and 0.83 (CI; 0.78 to 0.90) with the more conservative random-effects model. In general, because it is more conservative, the random-effects model is favored by statisticians; however, there are instances when the fixed-effects approach is more reasonable. For example, some of the statin trials included in the analysis had 10,000 participants, while others had less than 100. In this case, it could be assumed that the larger trials are of better quality than the smaller trials, so a fixed-effects model may therefore be more appropriate. Statins have been extensively studied in clinical trials and it is generally accepted that they are an effective drug class for the prevention of cardiovascular events in patients at increased risk of cardiovascular disease, or those with established cardiovascular disease. Consistent with this, both of our analytical models confirmed that statins are indeed an effective intervention that significantly reduces all-cause mortality. Notably, however, upon examining the data from the individual trials, it appears that in most cases, statins do not have a statistically significant effect on this endpoint (Fig. 2). So, despite the fact that many of these trials included thousands of participants, because of the relatively low number of events, many were underpowered to demonstrate an intervention effect of this nature.

Figure 2.

Effect of statins on all-cause mortality. CI, confidence interval.

Subgroup Analysis

A basic assumption in meta-analysis is that the event rates relative to the control group are expected to be similar across trials. The absolute risk differences, on the other hand are less likely to be consistent, because some trials involve individuals from very high risk populations, while others consider only lower risk populations. A number of rules exist to guide the valid parsing of data by subgroup analysis, with the first being that the questions must be defined a priori, so as to avoid observing apparently significant effects that have occurred by chance, the probability of which increases with multiple analyses. Given the increasing probability of finding a significant result by chance, the subgroup analyses should be one of only a small number of hypothesized effects—a ball park number being one subgroup analysis per 10 trials. Another consideration in subgroup selection is whether the subgroup is importantly different from other subgroups. Going back to the example of statins, atorvastatin, and lovastatin have relative risk reductions of all cause mortality of 0.78 and 0.82, respectively. In this case, the question becomes whether or not expert opinion considers this to be a particularly important difference in the context of the disease area. It is also important to asses the consistency of the subgroup effect, and whether or not there is a biological or social reason to support the subgroup analysis; for example, a subanalysis by drug type may be revealing if the drugs have different mechanisms of action, or issues with tolerability.

Statin Subgroup Analysis

Pooled estimates according to the individual subgroup of the statins are shown in Table 1. Fluvastatin, pravastatin, and simvastatin were able to demonstrate a significant effect in both fixed- and random-effect models. As indicated by the Tau2 and I2 statistics, atorvastatin and simvastatin were associated with variability across trials. However, because of the large number of individuals randomized, simvastatin was still able to demonstrate an effect in fixed-effect analysis, although this was lost in the random-effects model. It is worth noting that both atorvastatin and pravastatin have a considerably larger number of patients ever randomized to them, in a larger number of trials, than the other drugs of the class. Thus, even though rosuvastatin was unable to demonstrate a significant effect (there were only two trials included in the data set), it is most likely that because of the low event rate observed for this endpoint, this subgroup analysis is underpowered to show a significant effect. Similarly, with lovastatin, very few patients have been randomized to this drug. In this case, the pooled estimate headed in the wrong direction, but with very wide confidence intervals; as such, it can not be ruled out that lovastatin works well within one secondary population, but may have a harmful effect in another.

Table 1.  Subgroup analysis of the efficacy of statins in the prevention of all-cause mortality
StatinFixedRandomTau2I2 (%)
Atorvastatin0.85 (0.69–1.05)0.77 (0.55–1.08)0.1716
Fluvastatin0.68 (0.48–0.96)0.68 (0.48–0.96)00
Lovastatin1.11 (0.32–3.86)1.11 (0.32–3.86)00
Pravastatin0.81 (0.74–0.89)0.81 (0.74–0.89)00
Rosuvastatin0.95 (0.87–1.03)0.95 (0.87–1.03)00
Simvastatin0.71 (0.60–0.85)0.70 (0.39–1.28)0.1839

Meta-analysis: The “Gold Standard”?

Over the last decade, the work of the Cochrane Collaboration and others has exposed meta-analysis as an approach that can help resolve questions of uncertainty about medical interventions and disease effects, and it has acquired the reputation of being somewhat of a “gold standard.” However, if we are to attain definitive evidence, it is important that the meta-analysis itself is appropriately powered. The recommendation for achieving the appropriate power, also known as the optimal information size, is that the meta-analysis is at least as well-powered, if not better powered, than a good quality clinical trial. For meta-analyses, the optimal information size can be calculated using the realistic event rates that occurred in the clinical trials included in the pooled sample, and the minimal effects of treatment that would be considered to be of significance to patient care. Incidentally, it is worth noting here that the overestimation of event rates used in power calculations to determine sample size for prospective clinical trials is often the source of error that leads to their being underpowered, and unable to conclusively demonstrate effects. In our meta-analysis of the effects of statins, we found a very low rate of all-cause mortality of only 10% among control groups and 9.2% among intervention groups. As such, a clinical trial would require a very large sample size in order to demonstrate a significant difference between these findings; indeed, based on these numbers, 26,000 participants would have to be included in the study. The broad sample size of 41,000 patients in our meta-analysis permits us to make a strong inference on whether or not statins as a drug class are effective in secondary prevention of all-cause mortality; however, it is not particularly well-powered to discriminate the effects of individual statins.

Closing Remarks

Meta-analysis provides a compelling approach to evaluating data across many clinical trials. It offers the option of applying a very broad pooling of data that provides sufficient power to conclude strong inferences, and facilitates investigations according to a priori determined subgroups. However, of note, a degree of knowledge of the patient population, the disease area and the intervention outcome is required so as to maximize the utility of the data through informed probing. As a final note, the application of the concept of optimal information size in meta-analyses may well qualify their reputation as the “gold standard” in clinical intervention research in the future.

Source of financial support: Oxford Outcomes, the National Pharmaceutical Council, and Shire Pharmaceuticals.