SEARCH

SEARCH BY CITATION

Keywords:

  • CMH weighting;
  • meta analysis;
  • Simpson's Paradox;
  • study-size adjustment method

Abstract

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

Experience has shown us that when data are pooled from multiple studies to create an integrated summary, an analysis based on naïvely-pooled data is vulnerable to the mischief of Simpson's Paradox. Using the proportions of patients with a target adverse event (AE) as an example, we demonstrate the Paradox's effect on both the comparison and the estimation of the proportions. While meta analytic approaches have been recommended and increasingly used for comparing safety data between treatments, reporting proportions of subjects experiencing a target AE based on data from multiple studies has received little attention. In this paper, we suggest two possible approaches to report these cumulative proportions. In addition, we urge that regulatory guidelines on reporting such proportions be established so that risks can be communicated in a scientifically defensible and balanced manner. Copyright © 2010 John Wiley & Sons, Ltd.


1. BACKGROUND

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

Houston, we have a problem!

A management team met to review high-level safety summary based on data from multiple studies to support a new drug application. The tables were produced by standard programs routinely used to summarize adverse events (AEs). One of the tables contains the proportions of patients with various AEs coded in the MedDRA preferred terms. The team zeroed in on one particular event of interest. The proportion of patients with the event among the 1000 patients receiving the investigational drug in 6 trials is 13.0%. The corresponding proportion among the 750 patients receiving the control is 9.5%. A simple chi-squared test comparing the two proportions produces a two-sided p-value of 0.023. This difference could be judged statistically significant at the two-sided 5% level. An alarm was raised. Why didn't the team see even a hint of this disparity in any of the individual trials?

Project statisticians and clinicians were called to assess the situation. Detailed data from individual trials were examined and contrasted with the pooled results. Data at the individual study level are displayed in Table I. One of the studies (Study ♯5 in Table I) had much higher proportions of patients with the event for both groups. This same study had a 2:1 randomization ratio with more subjects receiving the investigational drug. This imbalance appeared to have led to a much higher proportion for the investigational drug when event occurrences were added across studies to produce the cumulative proportions.

Table I. Proportions of subjects experiencing a target adverse event while receiving one of two treatments in six studies.
StudyNew treatment (%)♯ of ptsControl (%)♯ of pts
181004100
271006100
311001100
411002100
52150020250
6810010100
Total1310009.5750

Armed with this new insight, the statisticians immediately pointed out the folly of comparing the cumulative proportions directly. They argued that a meta analytic approach should be employed to account for differences between studies. A meta analysis was conducted where differences within a study were combined and weighted by the inverse of the variance associated with the estimated differences. This analysis, a common approach among meta analysts, led to a point estimate for the difference in the proportions of −0.001 with a 95% confidence interval of (−0.017, 0.018). The two-sided p-value was 1.00. These findings showed that the proportions are quite comparable between the two groups, a conclusion one might have reached by carefully inspecting the data from each study in Table I.

Once the team uttered a collective sigh of relief, the statisticians decided to use this opportunity to explain the phenomenon commonly known as Simpson's Paradox [1, 2]. They used the simple example in Table II. The numbers in Table II were chosen to simplify the calculations.

Table II. Number and proportion of subjects experiencing a target adverse event while receiving one of two treatments in two studies.
 New treatmentControl
StudyEventNo eventTotalEventNo eventTotal
11801203006040100
 (60%)(40%) (60%)(40%) 
26014020060140200
 (30%)(70%) (30%)(70%) 
Total (Naïve pooling)240260500120180300
 (48%)(52%) (40%)(60%) 

Two studies are included in Table II. In the first study, the event of interest was reported in 60% of the patients in both treatment groups. The same event was reported in 30% of the patients in both groups in the second study. The first study includes three doses of the new treatment. An equal randomization to the three doses and the control results in a 3:1 ratio in the number of patient receiving the new treatment and that receiving the control. Combining three dose groups in an integrated safety summary is not unusual if all three doses demonstrate efficacy and a sponsor seeks regulatory approval for all three doses. In the second study, patients were randomized to the highest dose and the control in a 1:1 ratio. In practice, unequal randomization to treatment groups is not uncommon in HIV and oncology trials where more than a 50% chance to receive a new promising treatment is often perceived to encourage enrollment.

By construction, the new treatment and control in Table II have identical proportions of patients with the event. If one pools the results in Table II by adding the event occurrences and the total patients over the two studies, one arrives at the last row in Table II, which shows a cumulative proportion of 48% for the new treatment and 40% for the control. A comparison between these two proportions yields a two-sided p-value of 0.028 (chi-square test). Now, if one does not know the origin of the data, one is likely to conclude a significant difference in the proportions between the two treatments. The latter is clearly an erroneous conclusion in this case. This phenomenon, the statisticians explained, can also be viewed as confounding between study and treatment where a study with uniformly higher proportions in both groups has more patients randomized to the new treatment.

Now, the team saw the problem of pooling data indiscriminately and drawing inferences based on the combined results. One member on the team said, ‘This all makes sense to me. But, how should we report the proportions of patients with the event? In the example of Table II, should we report 48% vs 40%? In doing so, we will risk leading others to the same wrong conclusion as we almost did. If we don't report 48% vs 40%, what proportions should we report?’

Indeed, what proportions should the team report? This is important since the package insert of a product routinely lists AEs reported in ≥1% and ≥2% of the patients receiving the product in randomized clinical trials, along with the proportions reported by patients receiving a control. This side-by-side display urges consumers to use the proportions for the control to ‘judge’ the magnitude of those associated with the new product. As such, conclusions drawn from examining proportions obtained from naively pooled data could be misleading.

In this paper, we will look at two options for reporting the cumulative proportions that are not affected by Simpson's Paradox.

2. A LITTLE THEORY

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

2.1. Testing the hypothesis of equal proportions

Let πij denote the proportion of subjects with a target AE among those receiving the ith treatment in the jth study. For simplicity, we assume there are two treatments and i = 1 represents the new treatment and i = 2 represents the control. Let nij denote the number of subjects receiving the ith treatment in the jth study and equation image the number of individuals (among these nij individuals) who experienced the AE. Let equation image denote the observed proportion, which is an estimate for πij. A quantity often used to assess treatment effect on the proportion in the jth study is the risk different equation image. An estimate for dj is equation image.

A common approach to test for an ‘average’ treatment effect on the proportion is to first form a weighted average equation image as in (1) and then construct a test statistic as in (2) [3].

  • equation image(1)
  • equation image(2)

X2 in (2) has an asymptotic chi-squared distribution with 1 degree of freedom if equation image.

Two choices for the weights equation image are common among researchers performing meta analyses. One is to set wj as the inverse of the sample variance of equation image. When the risk difference dj is the same across all studies, i.e. dj = d, inverse variance (IV) weighting produces the minimum variance estimate for the common risk difference d. This estimate is unbiased for large samples. Under this assumption, the null hypothesis of equation image is the same as d = 0. The IV method is the most commonly used method among those who regard treatment effect as a fixed effect in this case.

Another choice for wj is equation image which is proportional to the harmonic mean of n1j and n2j. This weighting strategy is generally referred to as the Cochran–Mantel–Haenszel (CMH) weighting in the literature because the test statistics, proposed by Cochran and Mantel–Haenszel, respectively, in (3) use such weights [3]. A continuity correction could be added to both test statistics. In (3), equation image is the observed proportion in the jth trial, that is, equation image, and equation image is the total number of subjects in the jth study

  • equation image
  • equation image(3)

A third approach to combine results in a stratified analysis of a binary endpoint is the minimum risk approach proposed by Mehrotra and Railkar [4]. This approach, designed to increase the power in hypothesis testing, lets data decide the weights used to combine results from individual studies. According to Mehrotra and Railkar, the minimum risk approach is not recommended if the primary objective of the analysis is on estimation in a meta analysis where the strata are defined by studies. Because we are interested in estimation in this paper, we will not pursue this approach further for the remaining of the paper.

2.2. Reporting cumulative proportions

Assuming a fixed-effect model and dj = d for all j, a meta analyst can use equation image in (1) with IV weighting as the basis for constructing a confidence interval for d. Similarly, the CMH approach will lead to a point estimate and a confidence interval for d. Because the IV weighting produces the minimum variance estimate in this case, the CI confidence interval produced by the IV approach is narrower than that produced under the CMH approach.

We will now focus on methods to report the cumulative proportion of subjects reporting the target AE in multiple studies. Because pij's are likely to differ among studies, an important question concerns the desirable properties of the resulting cumulative proportion. In our opinion, at a minimum, an acceptable strategy should lead to comparable cumulative proportions between treatment groups if the proportions are comparable within each study.

The above motivated us to first look at the cumulative proportions in (4) where proportions in studies are combined using either the IV or the CMH weights. To differentiate the proportions in (4) from that obtained from the pooled data (i.e. equation image, we will call cumulative proportions in (4) ‘adjusted’ cumulative proportions

  • equation image
  • equation image(4)

Applying the above concept to Table II, we obtained the results in Table III. Even though both studies have 400 subjects, they do not have the same weight in forming the adjusted cumulative proportions under either approach. Study 2, with equal number of subjects on the two treatments, enjoys a heavier weight than Study 1. This reflects the fact that given a fixed total sample size, equal allocation generally produces greater information value than most other randomization ratios. Compared with the CMH weighting, the IV weighting gives even less weight to study 1. This is because sampling variability associated with a 60% proportion is higher than that associated with a 30% proportion. By design, the adjusted cumulative proportions are the same for the two treatment groups under either weighting scheme. Their values are 42% (IV) and 43% (CMH).

Table III. (a) Weights applied to Studies 1–2 in Table II to construct adjusted cumulative proportions under the inverse variance (IV) and the CMH methods and (b) Weights applied to Studies 1–6 in Table I to construct adjusted cumulative proportions under the IV, CMH, and SS-based methods.
 StudyIVCMH 
(a)10.400.43 
 20.600.57 
Adjusted cumulative proportionNew treatment42%43% 
 Control42%43% 
(b)
 StudyIVCMHStudy size
 10.070.120.114
 20.070.120.114
 30.430.120.114
 40.290.120.114
 50.090.400.430
 60.050.120.114
Adjusted cumulative proportionNew treatment4.1%11.4%11.9%
 Control4.0%10.8%11.2%

For studies in Table I, the weights given to the six studies as well as the adjusted cumulative proportions under the IV and CMH approaches are given in Table III. Table III exposes the greatest drawback of the IV method in combining proportions. Recall that the IV weights are inversely proportional to the variance of the differences in the observed proportions. For a fixed sample size, studies that have smaller proportions will weigh more heavily than studies with higher proportions as long as they are all less than 50%. The opposite is true if the proportions are greater than 50%. As a result, Study 3 and (to a lesser extent) Study 4 in Table I have high weights under the IV approach, resulting in much lower adjusted cumulative proportions. A second disadvantage of the IV method is the inclusion of observed proportions associated with the other treatment in constructing the adjusted cumulative proportions. For these reasons, we do not recommend using IV weights to obtain adjusted cumulative proportions.

Another sensible approach is to weigh the observed proportion in a study by the percentage of subjects in that study among the pooled population. In other words, wj will be set to equation image as in (5).

  • equation image(5)

For convenience, we will call this the SS-based method (SS for study size). The adjusted cumulative proportion in (5) has the flavor of the cumulative proportions if patients in the pooled population all received treatment i. In other words, the cumulative proportion for each treatment group is ‘normalized’ to the composition of the pooled population. In doing so, we avoid the potential impact of Simpson's Paradox.

equation image has the advantage that the difference between equation image and equation image is the point estimate for the risk difference between the two treatments under the CMH approach. On the other hand, equation image has a natural interpretation. When there is an equal allocation of subjects to the treatment groups in all studies, weights under the SS-based method are approximately the same as those under the CMH method. Applying the SS-based method to studies in Table I, we obtained the adjusted cumulative proportions in the last column in Table III. For data in Table II, weighting studies by their size lead to equal weights for the two studies, resulting in a 45% adjusted cumulative proportion for both treatments.

In Table III, the SS-based method produces higher adjusted cumulative rates for both groups than those under the CMH method. In general, these two approaches produce very similar adjusted cumulative proportions; but one approach does not always produce values higher than the other. A simple example is illustrated in Table IV where data from three studies were used to calculate the adjusted cumulative proportion for the new treatment under three hypothetical scenarios. Results in the table show that depending on the randomization ratio and the observed proportions, the CMH approach produces adjusted cumulative proportions that could be higher or lower than their counterparts under the SS-based method.

Table IV. Adjusted cumulative proportions for the new treatment based on data from three studies with 1:1, 2:1 and 3:1 randomization ratios.
Sample size (New treatment, control)Observed proportion for new treatment
(100, 100) 5%10%15%
(200, 100) 10%15%5%
(300, 100) 15%5%10%
CMH approachAdjusted cumulative proportion (Standard deviation)0.107(0.012)0.098(0.013)0.096(0.013)
SS-based approachAdjusted cumulative proportion (Standard deviation)0.111(0.013)0.094(0.012)0.094(0.012)

3. A SECOND EXAMPLE

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

The statisticians, through networking with their colleagues, soon found another case where naïve pooling led to an inappropriate conclusion in another new drug application. The case is included in Table V, where the original data were slightly modified from another real case to aid illustration. The six studies in the table were conducted to support two indications. Studies 1–3 compared a low dose of a new investigational drug vs placebo in patients with a disorder generally common in the younger population. Studies 4–6 compared the low dose and a high dose of the new drug vs placebo in patients with an illness more affecting older patients. Older patients are more prone to declining visual acuity in general. This is the AE of interest here.

Table V. Numbers and proportions of patients with adverse visual effects in six studies.
 PlaceboLow doseHigh dose
Study♯ of patients♯ with eventProportion♯ of patients♯ with eventProportion♯ of patients♯ with eventProportion
  • *

    *Low risk: Studies 1–3.

  • High risk: Studies 4–6.

120021%20021%
210022%10022%
320031.5%20031.5%
410055%10055%200105%
510066%10055%200126%
610077%10088%200147%
Total (Naïve pooling)800253.1%800253.1%600366.0%
Low risk*50071.4%50071.4%   
   (adjusted)  (adjusted)
High risk300186.0%300186.0%600366.0%
   (adjusted)  (adjusted)  (adjusted)

Naïve pooling of the six studies yields a cumulative proportion of 6.0% for the high dose group and a proportion of 3.1% for the placebo group. Based on these proportions, one might infer that the high dose group is twice as likely to suffer from declining visual acuity when compared with the placebo group. This is clearly not the case if one looks at patient experience at the individual study level.

An appropriate analysis in this case would be to report the proportions for the two populations separately. For the low-risk population, the comparison is between the low dose and the placebo. The CMH weights for the three studies are 0.4 (Study 1), 0.2 (Study 2) and 0.4 (Study 3). For the high-risk population, the CMH approach weights studies 4–6 equally when comparing either low dose vs placebo or high dose vs placebo. The SS-based approach yields the same weights as the CMH approach for both the low-risk and the high-risk patients. Adjusted cumulative proportions for the low-risk and high-risk populations under these two approaches are given in the last two rows in Table V.

In this example, including placebo data from the low-risk population in the data pool to compare the high dose vs the placebo is the culprit for the erroneous conclusion concerning high dose's effect on visual acuity.

4. DISCUSSION

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

When risk difference is the same across all studies (Section 2.1) and the randomization ratio is identical across studies, risk difference is collapsible over studies [5]. In this case, the risk difference estimate obtained from the collapsed data will be an unbiased estimate for the common risk difference even though this estimate is often not efficient. Also, proportions obtained from the collapsed data will not suffer from the ill effect of Simpson's Paradox. Even if the risk difference is not the same across studies, as long as the randomization ratio is the same across all studies, reporting pi based on the collapsed data will not suffer from the phenomenon affecting Tables I and II. In this regard, the proposals equation image and equation image apply to situations when randomization ratio is not identical across studies.

In the first story, a team member asked whether adjustment, when necessary, will always make a new treatment look better. The answer is ‘NO’. Suppose that in Table II, the proportion for both treatments in Study 1 is 30% and the proportion for both treatments in Study 2 is 60%. Naïve pooling will produce a 42% proportion for the new treatment and 50% for the control. Since the CMH weighting depends only on the number of patients on each treatment in the two studies, the same CMH weights apply to this new situation. The CMH weights lead to an adjusted cumulative proportion of 47% for both treatment groups. The SS-based method weights both studies equally, resulting in an adjusted cumulative proportion of 45% for both groups. In this example, the adjusted cumulative proportion for the new treatment under either approach is higher than the naïve proportion.

Some researchers might feel uncomfortable with the idea of reporting adjusted cumulative proportions. An interesting question is – have we always reported observed results without making any adjustment? The answer is ‘NO’. For example, it is a common practice to report model-based least-squares means when employing the analysis of covariance model. Then, why haven't we paid more attention to the reporting of cumulative proportions in the face of treatment by study confounding? The answer may lie in the fact that analytical focus, whether the rigor of endpoint definitions, or the sample size estimation, or appropriate model selection, or concerns about drop out bias or missing data, has been directed primarily to the efficacy side of treatment evaluation. The amount of methodological attention paid to safety assessment, including the proportions of patients with AEs, is often less rigorous and somewhat cursory. Yet, quantifying risk and communicating risk of a pharmaceutical product are critically important to patients and health-care providers.

We propose two approaches to report cumulative AE proportions in this paper. Whether regulators will accept either one of them is unclear. Regardless of regulatory acceptance, we think it is important to bring this issue to public's attention and encourage open debate. Whether CMH or SS-based approach is eventually selected to report cumulative proportions, we feel it is imperative that pragmatic guidelines beyond those currently existing [6, 7] be established to provide specific recommendations and guide decisions concerning the AE reporting section of a product label.

Public has a right to information on the risks of medicinal products. Pharmaceutical sponsors have an obligation to communicate the findings from clinical research in the most transparent way possible. To do this, we need a common platform that is scientifically accurate and balanced.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References

The authors thank Professors Stephen Lagakos, Gary Koch, and Byron Jones as well as Drs. Mike Gaffney, Joe Cappelleri, and Ha Nguyen for their insightful comments on an earlier draft of this article. The authors also thank two anonymous reviewers whose comments have helped improve the quality of this paper.

References

  1. Top of page
  2. Abstract
  3. 1. BACKGROUND
  4. 2. A LITTLE THEORY
  5. 3. A SECOND EXAMPLE
  6. 4. DISCUSSION
  7. Acknowledgements
  8. References