Reflecting Heterogeneity in Patient Benefits: The Role of Subgroup Analysis with Comparative Effectiveness


Mark Sculpher, Centre for Health Economics, University of York, York YO10 5DD, UK. E-mail:


In analyzing randomized clinical trials, trialists typically focus on treatment effects relative to a comparator and consider specific outcomes individually. Nevertheless, recognizing that, in practice, there is systematic variation in the benefit that patients derive from medical treatments, the question becomes how to incorporate multidimensional outcomes and multifactorial heterogeneity in the context of comparative effectiveness to capture the incremental benefit across all outcomes. The role of subgroup analysis in the context of clinical trials, cost-effectiveness analysis, and comparative effectiveness are discussed.

Subgroup Analysis in Clinical Trials

In general, the clinical trial paradigm assesses individual end points separately and discretely, and focuses on relative treatment effects relating to subgroups. Clinical trials are usually powered to test one or two primary endpoints (e.g., an efficacy end point and a safety end point). Subgroup analysis is used to determine whether statistically significant interactions exist between characteristics of the patients and the relevant measures of relative treatment effects.

An example of a classic clear-cut subgroup analysis is seen in the clinical trial comparing two antithrombotic treatments (enoxaparin and fondaparinux) for patients with non-ST elevation acute coronary syndrome: OASIS-5 Investigators et al. [1]. In this study, the investigators focused on the hazard ratio as an indicator of the relative treatment effect with respect to the primary measure of efficacy (a composite of death, myocardial infarction (MI), and refractory ischemic pain at 9 days) or safety (major bleeding at 9 days). Subgroup analysis focused on the interaction of safety and efficacy end points with patients' baseline age, gender, creatinine levels relative to median, use of heparin, and need for revascularization in the previous 9 days. The subgroup analysis showed no statistically significant interaction in terms of efficacy and, at 9 days, there was no overall difference between the two groups. With respect to safety, a marked overall benefit of the new agent (fondaparinux) was demonstrated in terms of reducing major bleeding episodes at 9 days, with a consistent direction of effect for all of the subgroups that were considered.

In practice, moving from clinical trial data to a decision-making context is often complicated by inconsistency in the direction of the treatment effect across subgroups. Specifically, the difficulty comes when there are trade-offs; for example, a gain in efficacy, but some loss in safety, or vice versa. In this context, some basis for evaluating the trade-off is required—for example, to assess whether the gains in efficacy are worth the loss in safety. This requires a movement from the focus on relative treatment effects to a consideration of the benefits and disbenefits of treatment on an absolute scale (e.g., expressed in terms of mortality risk, bleeding adverse events). Furthermore, to assess the net-benefit of one treatment versus another (i.e., whether benefits are greater than disbenefits), it is necessary to place positive and negative effects on to a single scale. There are various approaches to this form of “net-benefit analysis” or “risk-benefit analysis”[2], including the use of patients' or public values to generate a preference-weighted overall measure of (net) benefit.

Subgroup Analysis of Cost-Effectiveness Data

In cost-effectiveness analysis (CEA), the central question to address is whether the additional benefits generated by a given intervention, compared with an alternative, are worth any additional cost [3]. Relative treatment effects are therefore insufficient for CEA because there is a need to know how more beneficial the intervention is and not just that it is more beneficial. Furthermore, the need to make trade-offs between different outcomes is commonplace. Preferences are frequently used to provide a weighted measure of benefit on a single scale. Outside the field of health, these preferences are often expressed in terms of individuals' willingness to pay. Within health, the most widely used form of preference-weighted benefit is the quality-adjusted life-year (QALY). Furthermore, subgroup analysis is increasingly used with CEA to establish in which types of patients an intervention provides the greatest value for money.

Taking the data from the study of antithrombotic treatments described previously, Sculpher et al. undertook a cost-effectiveness analysis of the interventions [4]. To assess sub-groups, the analysis started by determining the relative variation in risk of the composite efficacy and safety events according to a range of patient characteristics including age, sex, and history of previous coronary events (as well as treatment allocation) [4]. Table 1 shows the results of this analysis for two end points at 180 days—death and nonfatal MI. These data clearly show the systematic variation in risk depending on the type of patient. For example, the risk of death by 180 days after an episode of acute coronary syndrome is more likely in men, older patients, and diabetics.

Table 1.  Modeling of underlying risk of events at 180 days in patients with non-ST-elevation acute coronary syndrome in the OASIS-5 study
Explanatory baseline variablesDeathNonfatal MI
HR95% CIHR95% CI
  1. Adapted from Sculpher et al. [4].

Use of fondaparinux0.8920.797–0.9990.9450.833–1.073
Age at study entry (y)1.0601.053–1.0671.0221.015–1.028
History of heart failure1.8761.652–2.1301.0250.856–1.227
History of diabetes1.4361.274–1.6181.2391.077–1.428
History of hypertension0.9790.857–1.1191.0380.898–1.200
ST depression at study entry1.7851.581–2.0151.2321.083–1.401
Serum creatinine 21.0540.870–1.2770.8570.708–1.037
Serum creatinine 31.2861.070–1.5461.0440.868–1.256
Serum creatinine 41.8651.566–2.2201.1020.911–1.333
Constant (on the log scale)−10.32−10.85 to −9.79−6.972−7.48 to −6.47
Ancillary parameter0.5050.478–0.5340.4290.403–0.456

These estimates of differential event risk by patient characteristics were used to undertake a subgroup analysis in terms of the cost-effectiveness of the two antithrombotic treatments (enoxaparin and fondaparinux). This showed how cost-effectiveness varies according to underlying (or baseline) risk of the composite outcome of death, nonfatal MI and nonfatal stroke. This assumed that treatment effectiveness is common across all subgroups (as shown in the clinical analysis of the OASIS-5 trial). As illustrated in Table 2, the analysis showed that, at a cost-effectiveness threshold of $50,000 per QALY for patients who have the average characteristics seen in the OASIS-5 trial, treatment with the new agent (fondaparinux) is cost effective, with similar results obtained for patients at low or high baseline risk of the composite outcome. Although fondaparinux's cost-effectiveness was consistent across subgroups, the application of a common relative treatment effect to a varying baseline risk resulted in subgroup differences in the absolute benefit of treatment. As shown in Table 2, the absolute lifetime QALYs for patients at low baseline risk of events is much higher than for the average patient for both treatment groups, but the additional benefit of fondaparinux is slightly lower than for the average patient. Patients who are at high baseline risk have relatively low lifetime QALY estimates, but the additional benefit of fondaparinux is greater.

Table 2.  Example of a subgroup analysis of cost-effectiveness
 EnoxaparinFondaparinuxFondaparinux vs. Enoxaparin
  1. Adapted from Sculpher et al. [4].

Patient with average characteristics   
 Expected cost$79,905$79,717 
 Expected quality-adjusted life-years7.067.10 
 Probability most cost-effective at a threshold of $50,000 per QALY0.7%99.3% 
 Probability of cost saving17.6%82.4% 
 Difference in costs per 1,000 patients  −$188,000
 Difference in QALYs per 1,000 patients  40
 Incremental cost-effectiveness  Fondaparinux dominates
Patient at low risk of composite event over 180 d (2.5th percentile)   
 Expected cost$115,163$114,998 
 Expected quality-adjusted life-years12.9512.98 
 Probability most cost-effective at a threshold of $50,000 per QALY0.1%99.9% 
 Probability of cost saving14.0%86.0% 
 Difference in costs per 1,000 patients  −$165,000
 Difference in QALYs per 1,000 patients  30
 Incremental cost-effectiveness  Fondaparinux dominates
Patient at high risk of composite event over 180 d (97.5th percentile)   
 Expected cost$57,968$57,643 
 Expected quality-adjusted life-years3.383.48 
 Probability most cost-effective at a threshold of $50,000 per QALY0.8%99.2% 
 Probability of cost saving17.5%82.5% 
 Difference in costs per 1,000 patients  −$325,000
 Difference in QALYs per 1,000 patients  100
 Incremental cost-effectiveness  Fondaparinux dominates

Subgroup Analysis with Comparative Effectiveness

If we are explicit about the need to focus on absolute benefit to inform clinical decision-making, and about the sources of heterogeneity that might cause the absolute benefit to vary between patients, then we can apply essentially the same approach used in determining cost-effectiveness to determine comparative effectiveness, simply by leaving out the cost. To illustrate this, Table 3 shows the expected QALYs from the cost-effectiveness subgroup analysis of enoxaparin and fondaparinux described previously and represents a measure of net clinical benefit on the absolute QALY scale, and this relates to different levels of baseline risk of clinical events. It is also possible to calculate the probability of a net gain in health compared to a loss in health if a patient takes fondapurinax compared with enoxaparin (shown in Table 3); this reflects the uncertainty in the underlying evidence base.

Table 3.  Possible comparative effectiveness subgroup analysis
Patient with average characteristics  
 Expected quality-adjusted life-years7.067.10
 Probability of net clinical benefit0.010.99
Patient at low risk of composite event over 180 days (2.5th percentile)  
 Expected quality-adjusted life-years12.9512.98
 Probability of net clinical benefit0.010.99
Patient at high risk of composite event over 180 days (97.5th percentile)  
 Expected quality-adjusted life-years3.383.48
 Probability of net clinical benefit0.010.99

In this example, there is full consistency in terms of the impact of the new drug on patient health at different risk levels. Nevertheless, there are many clinical examples where there is a significant trade-off between efficacy end points and safety end points in which consistency across subgroups may not be observed. In these cases, a measure of absolute benefit may be most informative. For example, Minelli et al. considered the net clinical benefit of hormone replacement therapy (HRT) in terms of QALYs for women at different baseline risks of breast cancer [5]. They assessed how the expected (or mean) net clinical benefit of HRT changed as the baseline risk of breast cancer increased. With the use of a preference-weighted scale that incorporates risks and benefits, we can also reflect uncertainty in the evidence base. Minelli et al. for example, used Bayesian methods to present the 95% credibility interval around net clinical benefit for the average patient, as well as the probability of net harm for a given baseline risk of breast cancer [5].

The Impact of Patient Preferences

In addition to systematic variation in underlying risks and treatment effect, patient preferences can be a major source of heterogeneity in health outcomes. This is because individuals often have different attitudes to the trade-offs between efficacy end points and side-effect end points that affect the success of treatment. These may include, for example, considerations of pain, physical function, mental well-being, social function, and life expectancy. As with underlying risk, the benefit that individual patients derive from a treatment varies with their preferences. A preference-weighted measure of benefit, such as a QALY, can reflect this heterogeneity in preferences.

Figure 1 illustrates the idea of a distribution of health for patients deriving benefit from minimally invasive surgical techniques compared with open surgical techniques. On the left of the distribution are those patients who derive less benefit from minimal access surgery and on the right are those patients who derive more benefit. In terms of preferences, those on the left of the distribution may be, for example, the patients who would prefer a “once and for all” solution, on the basis that minimal access surgery may require the need to return for further treatment, or they may be more worried about the nonfatal complications of minimally invasive surgery than they are about the higher mortality risk associated with open surgery. In contrast, patients to the right of the distribution expect to gain more in terms of benefit, perhaps because they prefer the shorter convalescence associated with minimal access surgery, or they may be less worried about the failure risk of this procedure and more worried about the higher mortality risk associated with open surgery.

Figure 1.

Example of a distribution of net health benefit reflecting heterogeneity in preferences.

A specific example of an analysis of the effects of patient preferences on expected QALYs is provided in the comparison of minimal access surgery, transcervical resection, versus an open surgical technique, abdominal hysterectomy, in the treatment of menorrhagia [6]. Eliciting women's preferences for aspects of these two treatments revealed considerable variation in how those preferences influenced the QALYs gained from each type of treatment. On average, a higher number of expected QALYs were associated with open surgery (the traditional and more invasive procedure). Nevertheless, the distribution of QALYs for these two procedures indicated that not all women would benefit more from open surgery. In fact, it appeared that many women would get more benefit from transcervical resection, not because of any underlying clinical difference between the women or any treatment modifying effect, but because their preferences are different. This example serves to emphasize the need to reflect the heterogeneity in preferences when making clinical decisions.

Closing Remarks

Two fundamental assumptions that are routinely applied in cost-effectiveness analysis are transferable to the quantification of comparative effectiveness:

  • 1The need to express health benefit on a single scale in light of the positive and negative dimensions of effects—as the most widely used and understood weighted measure of benefit; the QALY is perhaps the most obvious candidate for such a scale because it applies to comparative effectiveness. Nevertheless, other methods exist such as discrete choice methods [7].
  • 2Heterogeneity of clinical trial data is not only associated with the treatment effect on selected outcomes, it encompasses absolute health benefit—comparative effectiveness analysis needs to accommodate the potential effects of variation in baseline risk in different types of patients, as well as patient preferences on absolute risk reduction.

In an environment with or without resource constraints, absolute measures of net benefit are the key to clinical decision-making, whether for an individual patient or a group of patients.

Source of financial support: Oxford Outcomes, the National Pharmaceutical Council, and Shire Pharmaceuticals.