National Collaborating Centre for Mental Health, Centre for Outcomes Research and Effectiveness, Research Department of Clinical, Educational and Health Psychology, University College London, London, UK;
National Collaborating Centre for Mental Health, Centre for Outcomes Research and Effectiveness, Research Department of Clinical, Educational and Health Psychology, University College London, London, UK;
A.E. Ades, Department of Community Based Medicine, University of Bristol, Cotham House, Cotham Hill, Bristol BS6 6RT, UK. E-mail: email@example.com
Background: Cost-effectiveness analysis often requires information on the effectiveness of interventions on multiple outcomes, and commonly these take the form of competing risks. Nevertheless, methods for synthesis of randomized controlled trials with competing risk outcomes are limited.
Objective: The aim of this study was to develop and illustrate flexible evidence synthesis methods for trials reporting competing risk results, which allow for studies with different follow-up times, and that take account of the statistical dependencies between outcomes, regardless of the number of outcomes and treatments.
Methods: We propose a competing risk meta-analysis based on hazards, rather than probabilities, estimated in a Bayesian Markov chain Monte Carlo (MCMC) framework using WinBUGS software. Our approach builds on existing work on mixed treatment comparison (network) meta-analysis, which can be applied to any number of treatments, and any number of competing outcomes, and to data sets with varying follow-up times. We show how a fixed effect model can be estimated, and two random treatment effect models with alternative structures for between-trial variation. We suggest methods for choosing between these alternative models.
Results: We illustrate the methods by applying them to a data set involving 17 trials comparing nine antipsychotic treatments for schizophrenia including placebo, on three competing outcomes: relapse, discontinuation because of intolerable side effects, and discontinuation for other reasons.
Conclusions: Bayesian MCMC provides a flexible framework for synthesis of competing risk outcomes with multiple treatments, particularly suitable for embedding within probabilistic cost-effectiveness analysis.
It is common for randomized controlled trials (RCTs) to report more than one outcome. For purposes of designing a trial, it is generally felt that a single outcome should be prespecified to be the “primary” outcome. But, it is also recognized that, when pooling results from trials in a meta-analysis, there are several reasons why it may be appropriate to combine information on different outcomes. First, one might wish to “gather strength” by combining several similar outcomes, or to be able to combine results from trials that report different, but similar, outcomes [1,2]. Second, in a decision-making context, the different outcomes recorded may each have separate implications for estimating quality of life or economic consequences of each treatment.
A key requirement in the synthesis of multiple outcomes is that the correlation structures are appropriately represented [1–4]. In a meta-analysis, the correlations may occur at either or both of two levels. At the between-patient within-trial level, a patient's outcome on one measure may be positively or negatively correlated with their outcome on another. At the between-trial level, trials in which there is a larger treatment effect on one measure may tend to be the trials on which there is a larger treatment effect on another (a positive correlation), or possibly a smaller one (a negative correlation).
Competing risk outcomes represent a special type of multiple outcome structure in which there are several different failure time outcomes that are considered mutually exclusive. Once a patient has reached any one of these end points, they are considered to be out of the risk set. Censoring may also be occurring. When results from these trials are pooled in a meta-analysis, the competing risk structure should be taken into account so that the statistical dependencies between outcomes are correctly reflected in the analysis. These dependencies are essentially within-trial, negative correlations between outcomes, applying in each arm of each trial. They arise because the occurrence of outcome events is a stochastic process, and if more patients should by chance reach one outcome, then fewer must reach the others. The importance of these correlations in the context of meta-analysis of competing risk outcomes has been recognized by Trikalinos and Olkin , who suggest an approach based on normal approximation of the variances and covariances arising from multinomial data, and illustrate it with an application to a two-treatment meta-analysis with two competing outcomes.
In this article, we present an alternative approach based on hazards rather than probabilities, to more appropriately take account of time at risk. We use Bayesian Markov Chain Monte Carlo (MCMC) estimation , which we believe is more flexible in a situation with large numbers of treatments and outcomes. We begin by describing the data set, and we then explain the “proportional competing risks” assumption that underlies our approach. We next propose three alternative models: one fixed effects, and two random effects analyses. We also suggest some methods for model selection. In our discussion of the results, we compare our proposed method to previously described approaches, and consider some possible extensions.
Illustrative Data set
Figure 1 shows the network of comparisons of trials of antipsychotic medication for the prevention of relapse in people with schizophrenia. Each “edge” in the network indicates that the treatments at either end have been compared in an RCT, and the number on the edge indicates the number of trials. The data set includes 17 trials comparing nine treatments including placebo; eight of the trials are placebo controlled. There are 36 possible pair-wise contrasts between the nine treatments, and the present data set provides direct evidence on 11 of them. The methods used to identify these studies, and the criteria for inclusion and exclusion in the data set have been described previously . Briefly, a systematic search of the literature was undertaken to identify double-blind RCTs of antipsychotics used for relapse prevention in people with schizophrenia who are in remission. The review was conducted during the update of a clinical guideline on schizophrenia, commissioned by the National Institute for Health and Clinical Excellence in the UK . Although ziprasidone was not considered during the formulation of guideline recommendations, as it is not licensed in the UK, ziprasidone trials were included in the systematic review (and subsequently this analysis) to strengthen inference about the relative effect between other treatments. The analysis described in this article was similar to the one used to populate the decision-analytic economic model that informed the guideline recommendations .
The data available from each trial are the number of patients in each of the three outcome states at the end of follow-up. The outcome states are: relapse, discontinuation of treatment because of intolerable side effects, and discontinuation for other reasons, which might include inefficacy of treatment that did not fulfill all criteria for relapse, or loss to follow-up. Patients not reaching any of these end points at the end of follow-up were considered as censored observations, and still in remission. Individual patient data with times of transition were not available. Study follow-up varied from 26 to 104 weeks. The available data are shown in full in Table 1. Three trials comparing olanzapine and haloperidol were pooled as if they were a single study, because the original publication trials did not report all three outcomes separately.
Table 1. Trials of treatments for schizophrenia
Each cell gives: numbers of patients who relapse, discontinue because of side effects, discontinue for other reasons, and the denominator at initially at risk. Citations for included studies can be found in . The Tran 1998 data represent pooled results from three trials in which discontinuation data were not separately for each trial.
1. Beasley 2003
28, 12, 15; 102
9, 2, 19; 224
2. Dellva 1997 – 1
7, 0, 4; 13
10, 2, 16; 45
3. Dellva 1997 – 2
5, 2, 5; 14
6, 10, 15; 48
4. Loo 1997
5; 5; 39; 72
4, 1, 26; 69
5. Cooper 2000
21, 4, 24; 58
4, 16, 21; 61
6. Pigott 2003
85, 13, 12; 155
50, 16, 18; 155
7. Arato 2002
43, 11, 7; 71
71, 19, 28; 206
8. Kramer 2007
52, 1, 7; 101
23, 3, 17; 104
9. Simpson 2005
11, 6, 44; 71
8, 5, 33; 55
10. Tran 1998
87, 54, 170; 627
34, 20, 50; 180
11. Study S029
28, 9, 26; 141
29, 14, 25; 134
12. Tran 1997
20, 17, 36; 172
53, 17, 18; 167
13. Speller 1997
5, 3, 2; 29
9, 5, 2; 31
14. Csernansky 2000
65, 29, 80; 188
41, 22, 60; 177
15. Marder 2003
8, 0, 4; 30
4, 3, 4; 33
We begin with a general formulation for competing risks based on standard results from survival analysis . If λm(t) is the cause-specific hazard at time t for outcome m, then the conditional probability that failure at time t is of type m, given there is a failure at time t is
The probability that failure occurs before time D and is of type m is: (the probability of surviving to t, times the probability of failure at t, times the conditional probability that failure is of type m), integrated over all times t between zero and D. This is:
A form of “proportional hazards” assumption can now be made, which might be better termed “proportional competing risks,” in which the ratio (Equation 1) is constant over all t (i.e., πm(t) = πm). Under this restriction, Equation 2 becomes:
which is the probability of failure before time D times the probability that failure was of type m. In what follows, we assume constant underlying hazards, but Equation 3 shows that with proportional competing risks we are free to fit more complex survival distributions. This suggests some useful extensions to which we return in the discussion.
We now number the treatments from 1 to 9 (as shown in Table 1 and Fig. 1). Placebo is selected as the reference treatment 1. This is an arbitrary choice, but made to ease interpretation. We define the three outcomes as: m = 1 relapse, 2 = discontinuation caused by side effects, and 3 = discontinuation for other reasons. Then, each outcome is modeled separately on the log hazard rate scale. Considering a trial j that compares treatments k and b, the cause-specific log hazard for outcome m for treatment T is:
where δj,b,k,m is the trial-specific log hazard ratio of treatment k relative to treatment b for outcome m. This can be interpreted as meaning that the b arm of the trial estimates the baseline log hazard µj,m, while the k arm estimates the sum of the baseline hazard and the log hazard ratio. Note that b is not necessarily treatment 1, nor is it the same treatment in every trial; instead, it is simply the treatment with the lowest index in that trial. Thus, in a trial comparing treatments 2 and 3, b = 2. The trial-specific log hazard ratios are assumed to come from a common normal distribution, following the standard “random effects” approach:
The mean of this distribution is a difference between mean relative effects dk,m and db,m, which are the mean effects of treatments k and b, respectively, relative to (placebo) treatment 1, for outcome m, and we define d1,m = 0. This formulation of the problem expresses the consistency equations , by which the dimensionality of the 11 treatment contrasts on which there are direct data (Table 1 and Fig. 1), are reduced to functions of the eight contrasts between the active treatments and placebo. The between-trial variance of the random effect distribution, , is specific to each outcome m. Three models for the variance are considered below.
We may write the model as Equation 4 because all the trials in this example are two-arm trials. An advantage of our approach, however, is that it can be readily extended to multi-arm trials, and Equation 5 should in fact be interpreted as a “fragment” of a multivariate normal distribution:
The underlying assumption in Equations 5 and 6 is therefore that every trial may be considered as if it were a multiarm trial on all nine treatments, that trial-specific relative treatment effects are sampled from the multivariate normal in Equation 6, and that treatments are missing at random. (Note that missing at random means missing without regard to treatment efficacy; it does not mean that treatment arms are equally likely to be included in a trial).
The linking function that relates the arm-specific log hazards θj,k,m to the likelihood is developed as follows. Figure 2 shows a Markov transition model with a starting state (remission) and three absorbing states (relapse, discontinuation caused by side effects, and discontinuation caused by other reasons). Based on Equation 3, if we assume constant hazards λj,k,m acting over the period of observation Dj in years, the probability that outcome m had occurred by the end of the observation period for treatment T in trial j is:
The probability of staying in the remission state (m = 4) is now simply 1 minus the sum of the probabilities of arriving at the three absorbing states, that is,
The data for each trial j and treatment T constitute a multinomial likelihood with four outcomes: moving to one of the three absorbing states, or remaining in the initial remission state. If rj,T,m is the number of patients on treatment T observed to reach end point m, and nj,T is the total number at risk on treatment T in trial j, then:
Three different models were fitted, differing solely in the specification of the between-trial variation in relative treatment effects, . In the fixed effects model, , and the model collapses to: θj,T,m = µj,m + dk,m − db,m, b = 1,2 . . . 8, k ≥ b.
In the random effect single-variance model, the between-trial variance , reflecting the assumption that the between-trial variation is the same for each outcome: a vague inverse gamma prior was put on the variance, 1/σ2∼ gamma (0.001, 0.001). In the random effect different variances model, each outcome has a different between-trial variation, and the vague uniform prior is put on each: . A sensitivity analysis based on uniform priors was also examined: σ ∼ uniform (0, 5). This gave virtually identical posteriors for the treatment effects, but resulted in posterior distributions with “spikes” at σm values at or close to zero and spikes in the posterior mean treatment effects. Gamma priors, which give zero weight to infinite precision and hence to zero SD, were therefore used in the primary analyses reported below.
Finally, in each of the three models, vague Normal (0, 1002) priors were put on all the trial baselines µj,m and mean treatment effects dk,m. The model for treatment effects (Equations 4 and 5) is therefore identical to that previously proposed for mixed treatment comparisons (MTCs) except that the multinomial likelihood (Equation 9) and linking function (Equation 7) are used, as is appropriate for the data at hand, in place of the binomial likelihood and logit link function proposed in most of previous work on these kinds of evidence structures [9–13].
Choice of models was based on the deviance information criterion . This is a deviance measure of goodness of fit, , equal to the posterior mean of minus twice the log likelihood, penalized by an estimate of the effective number of parameters in the model, pD. The DIC can be seen as a Bayesian measure analogous to the Akaike information criterion used in classical analysis, but which can also be applied to hierarchical models. Here, we adjust the standard deviance formula by subtracting the deviance of the saturated model (a constant). The contribution of each multinomial observation (trial j treatment T) to the deviance is:
where is an estimate of the probability of outcome m, for example, the estimate generated on some MCMC cycle, and is the posterior mean of the sum of the deviance contributions over all data points, . In a model that fits well, is expected to roughly approximate the number of data points. In this data set, with 15 two-arm trials each reporting three outcomes (the fourth, “censored” outcome is predicted from the number at risk less the other outcomes), the number of independent data points is 90; pD is equal to , where is the sum of the deviance contributions, evaluated at the posterior mean of the fitted values .
Models were estimated using the freely available Bayesian MCMC software WinBUGS 1.4.3 . Convergence for all models occurred within 10,000 to 25,000 iterations as assessed by the Brooks–Gelman–Rubin criteria . Results are based on 150,000 samples, from three separate chains with disparate starting values for fixed effect, and five chains for the random effect models, taken after the first 60,000 were discarded. We also established that, in each model, all the chains converged to the same posterior. The code for each model is available on the ISPOR Web site as Supporting Information, in Appendix A at: http://www.ispor.org/Publications/value/ViHsupplementary/ViH13i8_Ades.asp.
The global goodness-of-fit statistics rule out the fixed effects model (Table 2), which fits relatively poorly: the mean posterior deviance is 119.8 compared to the number of data points, 90. This model also has the highest DIC because the relatively poor fit is not sufficiently compensated by the fact that it has fewer effective parameters. The DIC results slightly favor the three-variance model over the single-variance model.
Table 2. Goodness of fit, relative effects (posterior mean log hazard rates dk,m relative to placebo), and between-trial heterogeneity (posterior median and credible intervals for between-trial SD), for fixed and random effect models
Random effect single variance
Random effect three variances
Goodness of fit statistics
1. Placebo (ref)
Discontinuation caused by intolerable side effects
1. Placebo (ref)
Discontinuation caused by other reasons
1. Placebo (ref)
Median (95% CI)
Median (95% CI)
Posterior summaries of the log relative hazard rates dk,m for each treatment relative to placebo are shown in Table 2. For each of the three outcomes, the ranking of the treatments shows a high degree of consistency regardless of the choice of model. Zotepine is the most effective treatment in preventing relapse, followed by olanzapine; amisulpride causes the least discontinuation caused by intolerable side effects followed by risperidone and olanzapine; risperidone appears to cause the fewest discontinuations for other reasons, followed by amisulpride and then olanzapine. Nevertheless, the rankings of the posterior mean treatment effects do not take uncertainty into account.
Figure 3 shows the probabilities that each treatment is ranked jth (j = 1,2 . . . 9) for each of the three outcomes, based on the different variances model. As with Table 2, a rank of 1 indicates that the treatment is “best” at avoiding that (unwanted) outcome. These rankograms, introduced by Cipriani et al. , provide an “at a glance” summary that is hard to achieve from numbers alone, as they simultaneously demonstrate not only the relative rankings of treatments on each outcome, but also the very considerable uncertainty in inferences about relative efficacy. For example, although Table 2 shows that zotepine is ranked first in reducing relapse, Figure 3 reveals that the probability that zotepine is best in this outcome is only about 0.6. Olanzapine has a relatively high probability of being ranked among the highest three places for all three outcomes.
Although the mean effects dk,m are similar across models, their posterior uncertainty as assessed by the posterior SD depends strongly on the model used. As one would expect, the random effect summaries are much more uncertain than the fixed effect summaries. Nevertheless, in the random effect models, posterior uncertainty is greater in the three-variance model than in the single-variance version, but only for the relapse and discontinuation caused by intolerable side effect outcomes. The reason for this can be seen in the summaries of the between-trial SDs (Table 2). Although the single-variance model produces an average between-trial SD with relatively narrow credible limits, the three-variance model has a particularly low SD for the relative effects on discontinuation for other reasons, and higher SDs with wide credible intervals for the other two outcomes. We have no ready explanation for this apparent difference in between-trial variability between outcomes, which impacts on the uncertainty in the mean treatment effect measures. Alternative uniform (0, 5) priors for the between-trial SD parameters were used in sensitivity analyses (not shown). These result in posterior estimates of σ and σm that were higher, but this had only minor effects on posterior distributions of dk,m.
We have described and illustrated a simple approach to meta-analysis of trials with multiple, mutually exclusive event outcomes. The special feature of such “competing risk” data is the negative correlations between outcomes within trial arms. The Bayesian MCMC framework, particularly with WinBUGS, allows the user to specify a multinomial likelihood along with conventional fixed or random effect models for relative treatment differences.
Almost invariably, when meta-analyses have been applied to competing risk outcomes, they have looked at outcomes one at a time, and have not attempted an analysis that looks at the multiple outcomes simultaneously within a single coherent framework. Methods for competing risk meta-analysis appear not to have been previously described, although recently Trikalinos and Olkin have presented methodology for what they describe as “mutually exclusive binary outcomes”. They propose synthesis of log odds ratios, log relative risk, or risk difference outcomes from multinomial data arising from what is in fact a competing risk situation. Their approach is to take the estimated log odds ratios (or risk differences, or log relative risks) as data, and to develop expressions for the variances and covariances across outcomes based on normal theory. We believe that our approach has several advantages over this scheme. First, the use of the multinomial likelihood avoids the approximations required when the normal likelihood is used, and also avoids the need to add a constant to zero cells, a maneuver that generates bias [19,20]. Second, although calculation of the variances and covariances is relatively easy when there are two competing outcomes, it has the potential to be error prone if there are more than four or five. Similarly, the covariances would become still more complex to specify in multiarm trials, whereas our Bayesian approach can be readily extended [9,11], as shown in Equation 6. A third advantage of the Bayesian simulation framework is that it is suitable both for inference and for use in probabilistic decision models, or within cost-effectiveness analyses. Frequentist methods would need to be supplemented by, for example, bootstrap sampling, in order to be used in such environments.
It is important for users to be aware of the underlying assumptions being made in applications of MTC synthesis. The key assumption is that, if all trials had included all nine treatments, the relative effects δj,b,k,m would, for each outcome m, be exchangeable across trials. This is equivalent to assuming that treatments are missing at random, which means that the absence of data on a treatment is independent of the treatment effect. Such assumptions, commonly made in evidence synthesis, are very hard to verify, and one must generally rely on experts with knowledge of the trials and the subject area to confirm its plausibility. This ensemble of trials was considered sufficiently homogeneous to form the basis for a comparative assessment of the eight active treatments in the context of clinical guideline development .
But, the main advantage of the methods proposed here is that they are based on hazard rates and their ratios, and as a result they can correctly accommodate sets of trials that have been run to different follow-up times, as is the case in the present data set. Methods based on odds ratios, risk ratios, and risk differences, by contrast, cannot give consistent results for trials with different follow-up times, and it would be mathematically impossible to construct survival time distributions that would be compatible with constant measures at different durations. Use of these measures in trials with different follow-up times not only introduces unwanted heterogeneity to the estimates, but will also result in estimates that cannot, strictly speaking, be applied to other follow-up times. The Trikalinos and Olkin proposals could, however, be extended to hazard ratios, by introducing expressions for variances and covariances of log hazard ratios. Although this would yield coherent rate models with comparable estimates to those derived from our approach, it would still suffer from the disadvantages of normal approximations and an explosion of complexity with multiple competing outcomes. These factors, along with the intrinsic compatibility of MCMC posterior sampling with probabilistic decision analysis, make the modeling approach proposed here the obvious choice in cost-effectiveness analysis.
The analyses proposed here can be extended in a number of ways. The purpose of our models is to take into account the negative correlations between outcomes at the patient level. Nevertheless, it may also be worth considering the additional possibility of correlations between outcomes at the trial level. For example, it may be plausible to assume that trials in which treatments which are more effective in preventing relapse may be those in which treatments are more likely to lead to intolerable side effects leading to treatment discontinuation. This suggests a class of extensions that might focus attention on the possibility of correlations between the trial-specific treatment efficacy and discontinuation rates, in which the treatment effects are drawn from an extended multivariate normal distribution. If we write Equation 5 as δj,m ∼ MVN(dm,Σmm), we might consider, for example:
where the off-diagonal covariance matrices carry terms for correlations between treatment effects on different outcomes. The present model would then be a special case where these correlations were assumed to be zero. Experience with heterogeneous variance models  in MTC suggests that very large amounts of data, especially from multiarm trials, would be required to identify the off-diagonal covariances, and that the impact on the posterior distributions of mean treatment effects may be limited. Informative priors could of course be placed on the correlations, although it is no trivial exercise to ensure the matrix is positive definite. A useful parameterization has been suggested by Lu and Ades .
Because of the underlying rate parameterization, it would be possible to incorporate additional data that are reported in the form of number of events and time at risk . It is also relatively easy to incorporate data where outcomes are reported at more than one time point. This is best achieved by conditioning the data for subsequent intervals on survival to the end of the previous interval, in order to achieve independent Bernoulli samples, as illustrated by Lu et al. . The WinBUGS code provided in the Appendix (http://www.ispor.org/Publications/value/ViHsupplementary/ViH13i8_Ades.asp) would not need to be altered to accommodate additional data structured in this way.
The existence of data at multiple time points would facilitate further extensions of the competing risk analyses to more complex underlying survival distributions, such as the Weibull or other distributions. This represents a very substantial liberalization of the modeling assumptions. Depending on how much data are available, Weibull shape parameters could be held constant or allowed to vary across trials . Alternatively, a piece-wise constant underlying hazard model offers considerable flexibility .
These extensions, of course, all require not only a proportional hazard assumption for relative treatment effects, but also the proportional competing risk assumption described in the introduction. The latter is a strong assumption, but it can be relaxed, for example, in a piece-wise constant hazard framework. This could accommodate the possibility that, for example, the proportional risk of discontinuation caused by side effects could be higher in an initial period. It is also possible to incorporate data from trials that fail to report a subset of end points separately, for example, by aggregating discontinuation caused by side effects with other reasons for discontinuing treatment.
Source of financial support: No specific funding was received for this work. A.E.A., S.D., and N.J.W. were supported by Medical Research Council funding to the Health Services Research Collaboration, transferred to University of Bristol.