Indirect comparisons of treatments based on systematic reviews of randomised controlled trials


  • Disclosures
    Steve Edwards and John Borrill are employees of AstraZeneca UK Ltd. Mike Clarke and Sarah Wordsworth are funded by the UK Department of Health and receive no support or funding from AstraZeneca UK Ltd.

Steven J. Edwards,
Kellogg College, University of Oxford, 62 Banbury Road, Oxford OX2 6PN, UK
Tel.: + 44 1865 612 000
Fax: + 44 1865 612 001


Background:  Randomised controlled trials are the most effective way to differentiate between the effects of competing interventions. However, head-to-head studies are unlikely to have been conducted for all competing interventions.

Aim:  Evaluation of different methodologies used to indirectly compare interventions based on meta analyses of randomised controlled trials.

Methods:  Systematic review of Cochrane Database of Systematic Reviews, Cochrane Methodology Register, EMBASE and MEDLINE for reports including meta analyses that contained an indirect comparison. Searching was completed in July 2007. No restriction was placed on language or year of publication.

Results:  Sixty-two papers identified contained indirect comparisons of treatments. Five different methodologies were employed: comparing point estimates (1/62); comparing 95% confidence intervals (26/62); performing statistical tests on summary estimates (8/62); indirect comparison using a single common comparator (20/62); and mixed treatment comparison (MTC) (7/62). The only methodologies that provide an estimate of the difference between the interventions under consideration and a measure of the uncertainty around that estimate are indirect comparison using a single common comparator and MTC. The MTC might have advantages over other approaches because it is not reliant on a single common comparator and can incorporate the results of direct and indirect comparisons into the analysis. Indirect comparisons require an underlying assumption of consistency of evidence. Utilising any of the methodologies when this assumption is not true can produce misleading results.

Conclusions:  Use of either indirect comparison using a common comparator or MTC provides estimates for use in decision making, with the preferred methodology being dependent on the available data.

Review Criteria

  • A systematic review of the following bibliographic databases was carried out for papers and abstracts:
    • (i)Cochrane Database of Systematic Reviews;
    • (ii)Cochrane Methodology Register;
    • (iii)Excerpta Medica Database;
    • (iv)Index Medicus database.
  • The methodologies for the different indirect comparisons performed were assessed and evaluated. They were then reported in distinct categories to highlight the strengths and weaknesses of the different approaches.

Message for the Clinic

  • In the absence of direct comparison in randomised controlled trials, indirect comparison can be performed to assess the relative effect of different treatment interventions.
  • The interpretation of the results from indirect comparisons is dependent on the methodology employed. Only an adjusted indirect comparison using a single common comparator and a mixed treatment comparison provide an estimate of the difference between treatments and a measure of the uncertainty around that estimate.


The randomised controlled trial has been established as the most reliable way to differentiate the effects of healthcare interventions (1) and for assessing their cost-effectiveness (2). This is because randomised controlled trials are designed to minimise bias by techniques such as: randomised allocation to the different interventions (which also provides the different intervention groups with similar patient populations) and, where appropriate, blinding clinicians, patients, data analysts and others so that they are unaware of the intervention used (3).

Randomised controlled trials also need to be adequately powered to detect the minimum clinically meaningful difference between the interventions under examination (1). If a trial is executed poorly, the results can be significantly confounded, e.g. inadequate random allocation to interventions has been found to increase the odds ratio between treatments by up to 41% (3), inadequate blinding has been found to increase the odds ratio between interventions by 17% (3), and under powering a trial decreases the likelihood of identifying a clinically relevant difference (1).

The challenges for conducting clinical trials are increasing as new interventions seek to demonstrate a clinically meaningful benefit over established interventions rather than placebo and as the regulations governing trials increase. This can also add to the costs of doing a randomised controlled trial and, in 2004, the estimate of bringing a new drug to market, when the cost of failed research is factored in, was $1.7 billion (4). As such, it is unlikely that all of the interventions for a particular condition have, or ever will be, compared within randomised controlled trials.

However, Health Technology Assessments conducted by bodies responsible for providing national guidance on the prevention and treatment of medical conditions, such as the National Institute for Health and Clinical Excellence (5), often have to make recommendations despite this absence of data.

One pragmatic solution for this problem might be the use of indirect comparisons of interventions using the currently available data supplemented by new randomised controlled trials if and when they occur. The research reported in this paper was conducted to identify what approaches are currently used to perform an indirect comparison of the effects of treatments and to assess the strengths and weaknesses of the different methodologies.


The protocol for this systematic review was developed following guidelines from the Centre for Reviews and Dissemination (6).

Searching for evidence

This systematic review was designed to review the different methodologies for indirect comparisons of treatments and evaluate how they have been applied in the published literature.

Searching for papers

The following bibliographic databases were searched for papers and abstracts:

  • • the Cochrane Database of Systematic Reviews (CDSR);
  • • the Cochrane Methodology Register (CMR);
  • • the Excerpta Medica Database (EMBASE);
  • • the Index Medicus database (MEDLINE).

The search strategies employed for the CDSR, EMBASE and MEDLINE had to be tailored to accommodate the functionality of each individual database and included search terms for ‘clinical trials’, ‘meta analysis’ and ‘indirect comparison’. The CMR was searched using the broad free text search terms of, ‘indirect comparison’, ‘network analysis’ and ‘mixed treatment comparison’.

No restriction was made on publication year or on the language of the papers identified, beyond the inherent English-language focus of the databases used (7).

Hand searching of journals was not carried out as the methodological journals identified as potentially containing relevant publications (e.g. Statistics in Medicine, Controlled Clinical Trials and Journal of Clinical Epidemiology) have already been hand searched with relevant articles incorporated in the CMR.

As we were investigating how different methodologies have been employed, we focused our efforts on full papers only. If conference abstracts were uncovered by the literature search, we contacted the authors for additional details. The reference lists in papers uncovered in the literature search were examined for any additional references. All database searching was completed by July 2007.

Inclusion criteria

Systematic reviews suitable for inclusion were defined as:

  • • papers including a meta analysis of randomised controlled trials;
  • • indirect comparison of treatments.

Methodological papers suitable for inclusion were defined as:

  • • methods for comparing treatments in the absence of direct comparisons of randomised controlled trials;
  • • methods for identifying and assessing bias in indirect comparisons.

The list of paper titles and abstracts obtained from the searches was assessed against these criteria and only those that met the criteria (or if it was unclear if they met the criteria) were obtained and assessed in full. This was performed independently by two authors (SJE and JB) with any differences in opinion discussed.


Report flow

The results of implementing the search strategies in CDSR, EMBASE and MEDLINE produced 1018 abstracts for initial review (Figure 1). Duplicate papers and papers clearly not relevant were excluded at this stage. A total of 105 papers were retrieved in full.

Figure 1.

 Flow of papers through the systematic review

The search of the CMR produced 43 records, of which 28 were deemed relevant; 11/28 were duplicates of papers found in CDSR, EMBASE and MEDLINE literature search and 17/28 were new. Of the new papers, 12/17 were conference abstracts and 5/17 full papers. We contacted the authors of all the abstracts identified to determine if full publications were available. We were told that a full publication was underway for four of the abstracts (8–11), that the abstract was the only available publication in one case (12) and did not receive a response for the other abstracts. Of the five full papers identified in the CMR only, four were systematic reviews and one was a methodological paper.

Methods of conducting indirect comparisons

Table 1 shows the 62 papers detailing systematic reviews of randomised controlled trials that included some form of indirect comparison.

Table 1.   Indirect comparisons of treatments based on systematic reviews of randomised controlled trials
ReferencesCondition/ProcedureOutcomeTreatments comparedNumber of trials used in indirect comparison
  1. TIA, transient ischaemic attack; VTE, venous thromboembolism; IJV, internal jugular vein; SCV, subclavian vein; BPRS, brief psychiatric rating scale; WOMAC, Western Ontario and McMaster Universities index; ALT, alanine aminotransferase; MB, major bleeds; NSAID, non-steroidal anti-inflammatory drugs; Cox, cyclooxygenase; PDE, phosphodiesterase.

Pignon et al. (13)Small cell lung cancerMortalityChemotherapy vs. radiotherapy27
Gøtzsche (14)Rheumatoid arthritisTender jointsNSAIDs vs. placebo20
Arriagada et al. (15)Rheumatoid arthritisMean changes in WOMAC pain scoresCox-2 inhibitors vs. placebo21
Lowenthal and Buyse (16)Secondary prevention of stroke following TIA and/or strokeMortalityAntiplatelet agents vs. placebo9
Matcher et al. (17)Non-valvular atrial fibrillationStroke preventionAntiplatelet or anticoagulant agent vs. placebo7
Gould et al. (18)Panic disorderEffect sizeAntidepressants vs. placebo18
Lefering and Neugebauer (19)Sepsis and septic shockMortalityCorticosteroids vs. placebo8
NSCLC Collaborative Group (20)Early non-small cell lung cancerMortalityChemotherapy agents plus surgery vs. surgery alone17
Piccinelli et al. (21)Obsessive compulsive disorderEffect sizeNeurotransmitter re-uptake inhibitors vs. placebo18
Poynard et al. (22)Duodenal ulcerEndoscopic healing at 4 weeksH2-receptor antagonists vs. lansoprazole5
Rossouw (23)Coronary artery diseaseAngiographic evidence of changeDrug therapy, lifestyle or surgery vs. placebo15
Tramér et al. (24)Paediatric strabismus surgeryAbsence of vomitingAnti-emetic vs. control11
Boersma et al. (25)Acute myocardial infarctionMortalityTiming of fibrolytic therapy vs. placebo22
Koch et al. (26)Osteo- or rheumatoid arthritisShort-term prevention of gastric lesionHistamine type 2 antagonist or misoprostol vs. placebo12
Leizorovicz (27)Deep vein thrombosisMortalityLow-molecular weight heparin vs. unfractionated heparin (at home or in hospital)20
Poynard et al. (28)Viral hepatitis CComplete ALT responseTiming of interferon treatment vs. control13
Zalcberg et al. (29)Colorectal cancerMortalityDose of 5-fluorouracil iv in 3 months or oral formulation vs. control17
Zhang and Po (30)Postoperative acute painTotal pain reliefParacetamol alone or in combination with caffeine or codeine vs. placebo103
Bucher et al. (31)Pneumonia in HIV patientsPrevention of pneumoniaProphylactic regimens vs. aerosolised pentamidine13
Po and Zhang (32)Postsurgical pain, arthritis and musculo skeletal painSum of the pain intensity differenceParacetamol alone or in combination with dextropropoxyphene vs. placebo26
Moore and McQuay (33)Postoperative patientsTotal pain reliefAcetaminophen, aspirin, codeine, propoxphene or tramadol vs. placebo34
Moore et al. (34)Postoperative patientsTotal pain reliefParacetamol alone or in combination with codeine vs. placebo76
Srisurapanont and Maneeton (35)SchizophreniaResponse ratesAtypical antipsychotic agents vs. placebo9
Einarson et al. (36)Open-angle glaucomaIntraocular pressureLatanoprost or brimonidine eye drops vs. betaxolol or brimonidine9
van der Heijden et al. (37)Symptomatic venous thromboembolismRecurrent VTE and major bleedsLow molecular weight heparins vs. unfractionated heparin 13 VTE/15 MB
Horn and Limburg (38)Acute ischaemic strokeDeath or dependencyNimodipine vs. placebo 23
Song et al. (39)Surgical wound infectionEradicationAntibiotics vs. co-amoxiclav  2
Otto et al. (40)Panic disorder with or without agoraphobiaEffect sizeSerotonin selective uptake inhibitors vs. placebo 12
Sauriol et al. (41)SchizophreniaBPRS total scoreAtypical antipsychotic agents vs. haloperidol 11
ATC (42)Vascular diseaseSerious vascular events in high-risk patientsAntiplatelet agents vs. placebo214
Ferrari et al. (43)MigraineHeadache responseTriptans vs. placebo 73
Coomarasamy et al. (44)Tocolysis in preterm labourNeonatal respiratory distress syndromeAosiban or nifedipine vs. β-agonist 10
Hind et al. (45)Central venous catheterisation (internal jugular vein or subclavian vein)Failed catheter placementsUltrasound guidance methods vs. landmark method or surgical cut-down procedure 12 IJV/4 SCV
Hochberg et al.(46)Rheumatoid arthritisClinical improvementBiologic agents vs. placebo  4
Lim et al. (47)Vein graftsOcclusion of vein graftLow- or medium-dose aspirin vs. placebo  5
Psaty et al. (48)HypertensionCoronary heart diseaseAntihypertensive agents vs. placebo or β-blocker or calcium channel blocker or low-dose diuretic 47
Rice and Stead (49)SmokingSmoking cessationHigh or low intensity nurse intervention vs. usual care 20
Wehren et al. (50)OsteoarthritisVertebral fracturesAnti-resorptive agents vs. placebo 29
Ballesteros (51)DysthymiaTreatment responseTricyclic antidepressants or selective serotonin inhibitors or monoamine oxidase inhibitors vs. placebo 15
Berner et al. (52)Erectile dysfunctionErectile dysfunction domain scorePDE5 inhibitors vs. placebo 14
Caldwell et al. (53)Acute myocardial infarctionMortalityThrombolytic therapies vs. streptokinase or alteplase 37
Dodwell and Vergote (54)Postmenopausal women with advanced breast cancer (who have progressed with tamoxifen)Time to progressionAromatase inhibitors vs. megastrol acetate  4
Mandema et al. (55)MigrainePain reliefTriptans vs. placebo 27
Otoul et al. (56)Patients with drug-resistant partial epilepsy not controlled by ≥ 1 antiepileptic drugsSeizure reductionAntiepileptic agents vs. placebo 36
Richy et al. (57)OsteoporosisBone mineral densityVitamin D and its two analogues vs. placebo 33
Abou-Setta (58)Embryo transferPregnancy rateFirm embryo transfer catheters vs. tight difficult transfer catheter  2
Chou et al. (59)HIV infectionDeath or disease progressionTriple therapies vs. dual therapy24
Cooper et al. (60)Non-valvular atrial fibrillationIschaemic strokesAnticoagulant or antiplatelet therapy vs. placebo or warfarin30
Davies et al. (61)Elective non-urgent surgeryTransfusion with allogeneic bloodAllogeneic transfusion or recombinant human erythropoietin vs. usual care39
Eckert and Lançon (62)Major depressive disorderChange in depression scale scoreAntidepression agents vs. placebo39
Gartlehner et al. (63)Rheumatoid arthritisClinical improvementBiologic agents vs. placebo17
Jensen (64)Type 2 diabetes mellitusChange from baseline in HbA1cSelf-monitoring of blood or urine glucose vs. no self-monitoring14
Kyrgiou et al. (65)Ovarian cancerSurvivalPlatinum- or taxane-based chemotherapy vs. non-platinum or non-taxane-based chemotherapy59
Lip and Edwards (66)Non-valvular atrial fibrillationIschaemic strokes or systemic embolismAnticoagulant or antiplatelet therapy vs. warfarin17
Purkayastha et al. (67)Colorectal cancerDiagnostic odd ratioComputed or magnetic resonance colonography vs. conventional colonoscopy19
Small et al. (68)Solid organ transplantsPrevention of cytomegalovirus diseasePre-emptive or universal prophylaxis vs. standard care25
Stettler et al. (69)Artery stenosisIn-stent restenosisDrug-eluting stents vs. bare metal stents10
Zhou et al. (70)Cardiovascular diseaseMajor coronary eventsStatins vs. placebo8
Elliott and Meyer (71)HypertensionIncident diabetesAntihypertensive agents vs. placebo or β-blockers or calcium channel blockers or diuretics32
Kamphuisen and Agnelli (72)Acute ischaemic strokeDeep vein thromboembolism prophylaxisLow-molecular-weight heparin or unfractionated heparin vs. placebo16
Nixon et al. (73)Rheumatoid arthritisClinical improvementBiologic agents vs. placebo13
Yazdanpanah et al. (74)AIDSProgression to a new AIDS defining disease or deathTriple therapies vs. dual therapy14

The methods described for conducting indirect comparison of treatments fell into five categories: comparing point estimates; comparing 95% confidence intervals (CIs); performing a statistical test of summary estimates; indirect comparison using a single common comparator; and mixed treatment comparison (MTC) of a network of connected trials.

In four of the papers identified, the method of indirect comparison was based on two previously published systematic reviews (16,51,53,56). Different researchers conducted the two systematic reviews, which could be a potential source of bias, as it is unlikely that the two research groups would have followed a consistent approach to trial selection, data extraction and meta analysis.

The five different approaches to indirect comparisons of treatments are explained in more detail below with examples drawn from the most recent published use of each particular method.

Comparing point estimates

One paper (1/62) adopted the approach of directly comparing the mean summary estimates from the meta analyses with a common comparator without performing any additional statistical assessment (37). The paper is used as an example of this methodology below:

Example: van der Heijden et al. (37) compared the harm/benefit profile of low-molecular weight heparins (LMWH) using unfractionated heparin (UFH) as a common comparator in the initial treatment of symptomatic venous thromboembolism (VTE). Recurrent VTE during 3 months of follow up was compared with major bleeds (MB) during the same 3 months of treatment.

Randomised controlled trials were identified comparing certoparin with UFH (one trial – VTE and MB, 538 patients), dalteparin with UFH (two trials – VTE, 452 patients; three trials – MB, 705 patients), enoxoparin with UFH (three trials – VTE and MB, 1034 patients), nadroparin with UFH (three trials – VTE, 716 patients; four trials – MB, 882 patients), reviparin with UFH (two trials – VTE and MB, 1784 patients) and tinzaparin with UFH (two trials – VTE and MB, 1044 patients). No trials were identified comparing the different LMWHs with one another. Meta analyses provided the following summary effect estimates (log odds ratios): certoparin (VTE −0.456, MB −0.398), dalteparin (VTE 0.483, MB −0.824), enoxoparin (VTE −0.137, MB −0.057), nadroparin (VTE −0.237, MB −0.387), reviparin (VTE −0.143, MB 0.041) and tinzaparin (VTE −0.268, MB −0.523).

Log odds ratio point estimates of both outcomes were plotted on a scatterplot. A visual assessment of the harm/benefit profile of each LMWH was assessed. While a systematic review of the different LMWH compared with UFH was conducted and enough information was provided to enable calculation of the summary log odds ratios, the scatterplot was based on the results from the individual randomised controlled trials rather than the summary estimates of the meta analyses.

This approach to indirect comparison is flawed as it is based on individual mean values from randomised controlled trials with no attempt to weight studies (e.g. by the size of the trial). Even if the scatterplot had been based on summary estimates, there was no indication of how it would take into account the uncertainty around the point estimates used. A visual evaluation is a subjective assessment, which could be more prone to observer bias than a formal statistical approach. As an approach to an assessment of harm/benefit, it is potentially flawed by the implicit underlying assumption that the outcomes assessed as a benefit and a harm have equal weight (i.e. they can be traded off against each other in a 1 : 1 relationship).

Comparing 95% confidence intervals

The approach of comparing 95% CIs was taken in 26/62 of the papers identified (13,15,17,19–28,30,32–35,38,42,43,49,54,61,66,72). It is simply a case of directly comparing the 95% CIs of summary estimates using a common comparator.

Example: Kamphuisen and Agnelli (72) conducted research into the effects of UFH and LMWH at low and high doses on the prophylaxis of deep vein thrombosis (DVT). Randomised controlled trials were identified comparing low-dose UFH vs. placebo (four trials, 533 patients), low-dose LMWH vs. placebo (four trials 381 patients) and high-dose LMWH vs. placebo (four trials 1570 patients). No trials were identified comparing high-dose UFH vs. placebo or comparing any of the active treatments with one another. Meta analyses provided the following summary effect estimates (odds ratios): low-dose UFH (0.17, 95% CI: 0.11–0.26); low-dose LMWH (0.34, 95% CI: 0.19–0.59) and high-dose LMWH (0.07, 95% CI: 0.02–0.29). All the active treatments were found to significantly reduce the risk of DVT compared with placebo but as the 95% CIs for all three comparisons overlap with each other there is no evidence that any one treatment is more effective.

In the absence of direct comparison of treatments in randomised controlled trials, this would appear to be a reasonable approach, based on a standard pair-wise meta analysis from a systematic review of the literature. It provides a dichotomous answer – either ‘yes’ the 95% CIs do not overlap and there is a significant difference in treatment effects or ‘no’ there is no evidence of a difference in treatment effects. However, it may provide erroneous answers in that while it is correct to assume that treatments are significantly different, at least at the 5% significance level, if 95% CIs do not overlap; it is not always the case that treatments are not significantly different when the 95% CIs do overlap (75). This is because comparing distributions is not the same as testing for statistically significant differences between two mean values. In addition, this approach does not provide an effect estimate for the different treatments or a measure of the level of uncertainty around that estimate.

Statistical test of summary estimates

The approach of performing a statistical test of the summary estimates was taken in 8/62 of the papers identified (14,16,18,29,36,40,57,68). Here a statistical test of the two summary effect estimates is carried out which results in a p-value, where p < 0.05 is considered statistically significant. The most common statistical tests were parametric tests that made an assumption on the distribution of the datasets, i.e. a t-test (14,17,18,29,40,57,68) or a z-test (36) of summary effect estimates. There was one example of a non-parametric test using the chi-squared statistic (16). Non-parametric tests have the advantage of not assuming an underlying distribution of the dataset but tend to have less power efficiency than parametric tests (76) (i.e. a non-parametric test requires a larger dataset to have the same statistical power as a parametric test).

Example: Small et al. (68) conducted research into the prevention of cytomegalovirus (CMV) disease in postorgan transplant patients using ganciclovir as universal prophylaxis or pre-emptive treatment. Randomised controlled trials were identified comparing universal prophylaxis vs. standard care (16 trials, 1509 patients) and pre-emptive treatment vs. standard care (nine trials, 457 patients). The definition of standard care was taken from the Canadian Society of Transplantation (77). A single trial was identified directly comparing universal prophylaxis with pre-emptive treatment (78) but this was not incorporated into the analysis. Meta analyses provided the following summary effect estimates [relative risks (RR)]: universal prophylaxis (0.49, 95% CI: 0.39–0.60); and pre-emptive treatment (0.30, 95% CI: 0.15–0.60). Both methods were found to significantly reduce the risk of CMV disease. The results were assessed using Student’s t-test: universal prophylaxis vs. pre-emptive treatment (p < 0.07). The analysis, therefore, found no evidence of significant differences in treatment effects.

Similar to the 95% CI approach, the statistical test approach appears to be reasonable in the absence of direct comparison of treatments in randomised controlled trials. However, it provides a similar dichotomous answer – either the difference is statistically significant or it is not. It does have the advantage of not producing the potentially misleading results that would arise if overlapping CIs were assumed to demonstrate no significant difference in treatments.

Indirect comparison using a single common comparator

In 20/62 papers, a formal indirect comparison using a single common comparator was used that produced a point estimate and some measure of uncertainty (95% CIs, p-values or both) of the indirect comparison of interest (31,39,41,44–47,50–52,55,56,58,59,62,63,67,69,70,74).

While several researchers have published details of the methods of this approach (79–82) it has changed little from the original publication by Bucher et al. (31).

That is, in a situation where we have three treatments A, B and C where A and B have been compared in randomised controlled trials and B and C have been compared in other randomised controlled trials. The RR of A compared with C can be calculated by:


The exponential of log RRAC will give the RR for A vs. C. If the RRAB and RRCB are independent of one another (as they would be if they are derived from different sets of trials) the variance of RRAC can be calculated by:


The variance of RRAC is the exponential of the result. This can then be used to calculate 95% CIs for the point estimate, RRAC.

Example: Yazdanpanah et al. (74) investigated the efficacy of antiretroviral combination therapy based on protease inhibitors (PI) or non-nucleoside analogue reverse transcriptase inhibitors (NNRTI) in the prevention of progression to a new AIDS defining disease or death. Randomised controlled trials were identified comparing PI-based triple therapy vs. PI-based dual therapy (seven trials, 4686 patients) and NNRTI-based triple therapy vs. PI-based dual therapy (seven trials, 2099 patients). No trials were identified that directly compared the two triple therapy strategies. Meta analyses provided the following summary effect estimates (odds ratios): PI-based triple therapy (PI-bt; 0.49, 95% CI: 0.41–0.58); and NNRTI-based triple therapy (NNRTI-bt; 0.90, 95% CI: 0.71–1.15). Both methods were found to reduce significantly the risk of progression to a new AIDS defining disease or death compared with PI-based dual therapy (PI-bd). The indirect comparison was carried out as follows:


The ORPI-bt vs. NNRTI-bt is the exponential of −0.618, which is 0.539.

The variance of ORPI-bt vs. NNRTI-bt can was calculated as:


The 95% CI needs to be calculated using the log variance before the exponential is taken to convert back to the natural scale.

Therefore, the


where the standard error of the log ORPI-bt vs. NNRTI-bt is estimated by the square root of the variance of the log ORPI-bt vs. NNRTI-bt.

The exponential of which gives the OR 95% CI, ranging from 0.400 to 0.726. The detail of the preceding calculations was not provided in the published paper but was recreated from the raw data presented.

The ‘Bucher method’ as it has become known, addresses the problems highlighted for the other methods for indirect comparisons, by providing a summary estimate and 95% CIs for the difference between the treatments being compared. Its limitations are that it can only incorporate a single indirect comparator and cannot integrate any direct comparative data that may be available from randomised controlled trials of the comparison of interest.

Mixed treatment comparison of a network of connected trials

The remaining 7/62 papers identified used an approach called either a MTC or a network meta analysis (48,53,60,64,65,71,73). For consistency, this method is referred to in this paper as MTC. The MTC has evolved from the Confidence Profile Method (CPM), which was advocated by Eddy et al. (83) for use in indirect comparisons of treatments as early as 1990. The CPM is a Bayesian approach to statistical inference.

Bayesian statistical inference has its roots in Bayes’ theorem, which is the formalisation of the work by Thomas Bayes, published after his death in 1763 (84). Bayes’ theorem for general quantities can be summarised as follows:

New data ‘y’ is available on the outcome of interest ‘θ’. The previous evidence suggests that the outcome has a probability of occurring of ‘p(θ)’ (called the ‘prior probability’) and a likelihood of y occurring given the previous evidence of ‘p(y|θ)’. These values can be used to calculate an updated probability of the outcome occurring of ‘p(θ|y)’ (called the ‘posterior probability’).


That is, the posterior probability is directly proportional to the observed data and the prior probability. In other words, the likelihood of the outcome occurring is ‘updated’ based on the new evidence.

Bayesian statistical inference requires considerably more computer processing power than standard statistical methods (often termed ‘frequentist’ or classical methods) (75) and this may be why the work carried out by Eddy et al. has only become more widely used in recent years when technological advancements have made processing power less of a barrier to implementing the techniques.

The CPM is different from other forms of meta analysis because it allows for the combination of data to estimate a particular outcome from a multitude of different sources, e.g. not only incorporating direct comparisons of treatments in randomised controlled trials but also indirect comparisons from other randomised controlled trials. It can also accommodate other forms of data about interventions (e.g. from observational studies) and, in theory, could account for any methodological issues within trials (e.g. inappropriate randomisation). However, as the appropriate weighting to give to other forms of data or to correct for a high risk of bias is unknown, these supplementary sources are rarely used in practice. Similarly, the difficulty in establishing credible informed priors has resulted in most published Bayesian analyses using uninformed priors (also referred to as ‘flat’ priors). For example, an informed prior could be based on existing observational data for an analysis of randomised controlled trials – however, this would have similar problems as including the observational data in the analysis (i.e. how much influence should observational data have once results from randomised controlled trials are available?). An uninformed prior assumes that there is no existing data on which to base a prior distribution and so allows the results of the analysis to be based on the data used in the analysis.

The MTCs are a formalisation of the CPM approach to provide a series of pair-wise comparisons between treatments in a network of connected treatments with a measure of the heterogeneity contained within the pair-wise comparisons and a formal assessment of the heterogeneity (or incoherence) contained within the evidence from different pairs of treatments (85).

The limitation of the approach taken by Lumley (85) is that it can only accommodate two-arm trials. Further research by Ades et al. has provided a generalised framework for the MTC for trials containing any number of treatment groups with adjustments made for multi-arm trials within the analysis (53,86,87).

The most commonly used form of the MTC uses a Bayesian Markov Chain Monte Carlo method (53) using the freely available WinBUGS software (88). As Bayesian statistical inference provides the probability that a parameter will take a certain value, results are presented with 95% credible intervals (95% CrI) rather than 95% CIs. In this context, these CrIs are a measure of the probability of the observed value occurring again.

Within an MTC, there are two likely sources of heterogeneity, which can be described as follows: between trial heterogeneity within pair-wise comparisons (which is a similar measure to heterogeneity in a standard pair-wise meta analysis) and between pair-wise comparison heterogeneity (called, ‘incoherence’). Heterogeneity is measured in an MTC by the standard deviation of the trials within the model (called ‘tau’ to distinguish it from the standard deviation of other distributions within the model). As an example, on the odds ratio scale, tau can be interpreted as the following degrees of heterogeneity: < 0.1 none/little; 0.1–0.5 some; 0.5–1.0 moderate; 1.0–2.0 high (89). Incoherence is assessed by the residual deviance in the model. It should approximate the number of unconstrained data points used in the MTC.

Example: Nixon et al. (73) investigated the use of biological agents [tumour necrosis factor alpha (TNF-α) and interleukin 1 (IL-1)] in the treatment of patients with rheumatoid arthritis. Randomised controlled trials were identified comparing a biological agent vs. methotrexate and/or placebo: the TNF-α biological agents – adalimumab (four trials, 2123), etanercept (four trials, 1637) and infliximab (two trials, 1432) and the IL-1 – anakinra (three trials, 1392 patients). No trials were found directly comparing two or more biological agents. A network of trials was formed, as depicted in Figure 2. Results are presented for responder status at 6 months for the TNF-α biological agents as follows (odds ratios): adalimumab vs. etanercept (0.98, 95% CrI: 0.45–1.93); adalimumab vs. infliximab (0.94, 95% CrI: 0.50–1.62); infliximab vs. etanercept (0.98, 95% CrI: 0.45–1.93).

Figure 2.

 Network of randomised controlled trials depicting the mixed treatment comparison performed by Nixon et al.

The MTC method can be seen as an extension of traditional pair-wise meta analysis. It has advantages over the indirect comparison using a common comparator as it does not rely on a single common comparator but requires only that a new treatment joining the network has been compared with an existing treatment already contained within the network. It can also integrate direct and indirect evidence within the network of trials. Overall, the MTC provides a framework for evidence synthesis where multiple treatments are required to have comparable estimates of treatment effects based on all of the available data from randomised controlled trials.

As stated earlier, WinBUGS is the most commonly used software for conducting an MTC as it offers the greatest flexibility for fitting models. However, more general software packages like SAS® (SAS Institute, Cary, NC), Stata® (StataCorp LP, College Station, TX), S-PLUS® (TIBCO Software Inc., Palo Alto, CA) and R (R Foundation for Statistical Computing, Vienna, Austria) may also be used.


This systematic review of the use of indirect comparisons identified 62 papers, which used five different approaches. These are: comparing point estimates, comparing 95% CIs, performing a statistical test of summary estimates, indirect comparison using a single common comparator and MTC of a network of connected trials.

None of the approaches identified used what Glenny et al. (82) described as a ‘naive comparison’. That is, pooling the data from the treatment groups of separate clinical trials and then comparing these groups directly as though they had been randomised against each other. This is unreliable, even if the trials followed identical designs, as it loses the benefits of randomisation within the individual trials (81,82). However, our failure to identify any studies that used this type of comparison might be attributable to our inclusion criteria, which required that the indirect comparison be based on a meta analysis of randomised controlled trials which even in the most simplistic approach of comparing point estimates, still requires an adjusted indirect comparison using a common comparator.

All of the approaches identified are based on consistency of data. That is, there is an underlying assumption, whether implicit or explicitly stated, that all of the trials used within the comparison are similar and that the relative effectiveness of treatments between trials is similar. To illustrate this further, in a simple indirect comparison of three treatments A, B and C, the trials of A vs. B and of B vs. C are assumed to have been potentially trials of A vs. B vs. C but where one arm from each trial is missing at random.

It has been argued that this assumption is not different from the practice within standard pair-wise meta analysis of combining similar trials (53,60,83–87). However, when attempting to compare treatments which have not been compared against one another directly in randomised controlled trials, there is the potential for introducing additional bias. Examples of which are given below:

  • • Temporal bias: when comparing treatments using trials conducted over an expanse of time the efficacy of treatments may be dependent on a changing baseline risk (90). In this instance, it may be inappropriate to combine trials of the new treatment with trials of the older treatments unless the changing baseline risk was incorporated into the analysis.
  • • Opportunity bias: there might be an active choice for a participant to enter a trial of A vs. B rather a trials of B vs. C and so combining both datasets as if the patient could have gone into either trial may be flawed.

Comparing the different approaches to indirect comparison of treatments

The five different approaches can be categorised based on increasing complexity and robustness of analysis. These are discussed below in two categories: Simple and Complex.


The simplest approach of directly comparing point estimates is easy to perform but cannot include a statistical analysis (other than the subtraction of one point estimate from the other). Therefore, it is not possible to estimate the likelihood that any differences identified could be merely because of chance. Comparing the 95% CIs of summary estimates from different meta analyses using a common comparator is a straight forward approach and does provide a method of identifying statistically significant differences between treatments (when the CIs do not overlap) but does not provide a method of quantifying the magnitude of the difference and could be prone to an assumption of no significant difference (when CIs do overlap) when there is actually a significant difference in treatments (75). The statistical test for differences between summary estimates similarly does not provide a method of quantifying the magnitude of the difference between treatments. However, it does protect against potentially misleading assumptions as a result of the overlapping CIs.


The indirect comparison using a common comparator could be considered the ‘gold standard’ approach of performing indirect comparisons. It is a commonly used approach, featuring in 20/62 papers identified in the systematic review and provides point estimates and 95% CIs for the indirect comparison of interest.

This approach is also the only one to have been validated empirically. Song et al. (81) performed a review of 44 meta analyses (from 26 systematic reviews) demonstrating that in 41 of the 44 comparisons assessed, the indirect analysis resulted in the same conclusion as a direct comparison of treatment. The different conclusions drawn in three of the 44 comparisons were possibly due to random error, such as the wider CIs of the indirect comparison (i.e. a significant result in a direct comparison was a non-significant result in the adjusted indirect comparison) (81).

Performing an indirect comparison using a single common comparator (the Bucher method) is relatively inefficient in mathematical terms. It has been estimated that four times the amount of data is required to provide the same precision around an indirect comparison as would be required for a direct comparison (82).

The other potential drawback of this approach is the need for a single common comparator. This limits the range of randomised controlled trials available for inclusion in the analysis as only three treatments can be compared in any single analysis. It also potentially introduces the bias caused by ‘lumping’ individual interventions, e.g. drugs, into convenient categories, e.g. drug classes, to achieve maximum statistical power with the available randomised controlled trials (53). That is, it is unlikely that each intervention in a category is identical and so taking the average overall effect estimate of the category does not provide insights into the relative effectiveness of individual interventions.

The principal advantage of the MTC approach is that it provides a framework for integrating multiple comparators within a unified analysis of a network of randomised controlled trials. It provides the opportunity to calculate relative treatment effects for all treatments providing the underlying assumption of exchangeability is met and there is a common comparator for each pair-wise treatment comparison in the network. There is no danger of ‘lumping’ as with the indirect comparison method, as all treatments can be individually represented within the MTC.

The MTC also supports the inclusion of the findings from direct comparison of treatments. It has been suggested by Higgins and Whitehead (91), that the results from direct comparisons could have the precision of their results enhanced by being analysed with any available indirect comparison of the same treatments.

The MTC also provides a formal way of assessing the incoherence of the between pair-wise comparison which would be likely to exist if the exchangeability assumption is invalid (87). However, a weakness of the MTC approach is the lack of empirical evidence to substantiate its use.

Potential limitations of our research

A systematic review is only as comprehensive as the literature search upon which it is based. While our search strategies identified more than a thousand potentially relevant papers, it is possible that some additional papers were not identified. However, as all of the methods of indirect comparison that we found fell into five main categories, it would seem unlikely that we missed any published methodology that would be considered substantially different from those identified.

The 62 papers we identified can be compared with the previous research on the frequency of indirect comparisons conducted by Glenny et al. (82), where 36 papers were identified as containing indirect comparisons, with 11 being ‘naive indirect comparisons’ and 25 being adjusted indirect comparisons.


Use of either indirect comparison using a common comparator or MTC provides estimates for use in decision making, with the preferred methodology being dependent on the available data.


SJE and JB are employees of AstraZeneca UK Ltd, who provided some support for their work on this review. SW and MJC are both funded by the UK Department of Health and received no support or funding from AstraZeneca UK Ltd.