Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression‡

Network meta-analysis (multiple treatments meta-analysis, mixed treatment comparisons) attempts to make the best use of a set of studies comparing more than two treatments. However, it is important to assess whether a body of evidence is consistent or inconsistent. Previous work on models for network meta-analysis that allow for heterogeneity between studies has either been restricted to two-arm trials or followed a Bayesian framework. We propose two new frequentist ways to estimate consistency and inconsistency models by expressing them as multivariate random-effects meta-regressions, which can be implemented in some standard software packages. We illustrate the approach using the mvmeta package in Stata. Copyright © 2012 John Wiley & Sons, Ltd.


Introduction
Network meta-analysis (NMA) in its standard form makes an assumption of 'consistency' (Lu and Ades, 2006), also called 'coherence' (Lumley, 2002), which means that estimates of treatment effects from direct and indirect evidence are in agreement, subject to the usual variation under the random-effects model for meta-analysis. Consider the case with three treatments A, B and C. If trials comparing B with A, C with A, and C with B estimate parameters d AB , d AC and d BC , respectively, then consistency means that d AB + d BC = d AC : the effect of B relative to A, plus the effect of C relative to B, equals the effect of C relative to A.
The consistency assumption is often questionable. For example, the trials that do not include treatment A may include a population for whom A is inappropriate and hence, their results may differ systematically from trials that do include A (Song et al., 2009). Alternatively, trials that include A may be older and therefore use different implementations of other treatments. Wrongly assuming consistency can lead to misleading conclusions, so procedures to investigate the possibility of inconsistency are important.
Some authors have proposed ways to assess consistency directly from fitting consistency models (Dias et al., 2010, Lu et al., 2011. This paper instead assesses consistency by fitting both consistency and inconsistency models. The companion paper (Higgins et al., 2012) reviews the meaning of inconsistency and methods for assessing it (Salanti et al., 2008;Lu and Ades, 2006;Lumley, 2002) and argues that inconsistency is best modelled by a design-bytreatment interaction. We follow that paper in using design to refer only to the set of treatments compared in a trial.
Models for consistency and especially inconsistency are complex. Lumley (2002) proposed a frequentist approach that was valid when all trials had only two arms. Lu and Ades (2006) proposed a Bayesian approach that accommodated multi-arm trials and could be implemented in WinBUGS software (Lunn et al., 2000). This paper proposes two new computational strategies for estimating consistency and inconsistency models when the data include multi-arm trials. The new strategies differ from the Bayesian approach in two ways. First, they are frequentist methods, so they aim to speed up computation, avoid sensitivity to the choice of priors and avoid Monte Carlo error. Second, they are two-stage estimation procedures, unlike the one-stage Bayesian procedure. Both strategies formulate the consistency and inconsistency models as multivariate random-effects meta-regressions which can now be easily fitted in standard software (White, 2009(White, , 2011.
The paper is set out as follows. We first introduce data from an NMA of thrombolytic drugs and introduce multivariate meta-analysis and meta-regression in general terms. We then formulate a model for NMA allowing for heterogeneity and inconsistency and describe our new proposals for two simple estimation methods and the alternative Bayesian analysis. We compare the three methods using the thrombolytic drugs data and end with a discussion.

Example: thrombolytic drugs
As an example, we use a dataset consisting of 28 trials comparing eight thrombolytic treatments after acute myocardial infarction: streptokinase (A), accelerated alteplase (B), alteplase (C), streptokinase plus alteplase (D), tenecteplase (E), reteplase (F), urokinase (G) and anti-streptilase (H) (Lu and Ades, 2006). The data are shown in Table 1. Two of the studies are three-arm trials and the rest are two-arm trials. The usefulness of NMA is emphasised by examples such as this, where many pairs of treatments are not compared head to head in any trial. For example, the only way to assess the effect of reteplase (F) relative to urokinase (G) is by utilising the indirect evidence from trials in designs 5 and 6 which compare F and G with A and from designs 9 and 10 which compare F and G with B, although design 12, which compares G with C, also contributes to the indirect evidence. We aim to re-analyse these data to explore and test for consistency, and if consistency is not rejected, to rank the treatments.

Multivariate random-effects meta-regression
Suppose each study yields estimates of p different quantities of interest: in this paper, these will be different between-treatment contrasts, but in other applications of multivariate random-effects meta-regression, they may represent different study outcomes (Jackson et al., 2011). We write these p estimates for study i as the (1 Â p) vector y i . We assume we also know S i , the 'within-study' (p Â p) variance-covariance matrix of y i . The meta-regression model is where X i is a (q Â p) design matrix containing the covariates (White, 2011). In this model, the unknown parameters are a, a (1 Â q) vector of regression coefficients and Σ, a (p Â p) between-studies variance-covariance matrix, assumed the same for all studies. Usually, the different quantities in y i have separate regressions, with the jth quantity regressed on q j covariates. Then, the mean of the jth quantity is modelled as a j x ij , where a j is a (1 Â q j ) vector and x ij is a (q j Â 1) vector. In this case, q ¼ P p j¼1 q j and (1) holds with a ¼ a 1 ; a 2 ; . . . ; a p À Á and The multivariate meta-analysis model is the special case of (1) with X i equal to the (p Â p) identity matrix, so that and a is a (1 Â p) vector of the overall means of the estimates (White, 2009;Jackson et al., 2011). The standard meta-regression model (Thompson and Sharp, 1999) is the special case of (1) with p = 1, so that y i , S i and Σ are scalars.
Multivariate random-effects meta-analysis is implemented in Stata as mvmeta (White, 2009), which has recently been extended to implement multivariate random-effects meta-regression (White, 2011). Restricted maximum likelihood (REML) estimation is usually used for Σ to avoid the negative bias associated with maximum likelihood estimation of variance components. Likelihood-based estimation can straightforwardly accommodate studies in which some elements of y i are missing.

NMA model formulation
In this section, we state a model assuming that one of the treatments, which we call treatment A, is included in every trial. This is rarely true in practise and is not true for the thrombolytic drugs data. For the alternative case where the trials have no common treatment, we later describe different modifications and show how they can be expressed by the multivariate meta-regression model (1). We consider a network including a total of T treatments A, B, C, and so on. Let d = 1, . . ., D index the designs (the sets of treatments compared in a trial). Let there be n d trials of the dth design, each comparing T d treatments. Thus, in Table 1, d = 1 corresponds to the design comparing A, B and D; this is a three-arm design, so T 1 = 3, and only one such trial is present, so n 1 = 1. Salanti et al. (2008) compared 'arm-based' models describing arm-specific parameters, such as the log odds, and 'contrast-based' models describing contrasts of arm-specific parameters, such as log odds ratios. We initially use a contrast-based approach.
Let y AJ di be the observed contrast of treatment J (J = B, C, . . .) with treatment A in the ith trial (i = 1 to n d ) in the dth design (d = 1 to D). y AJ di may represent any measure, such as a mean difference, a standardised mean difference, a log risk ratio or a log odds ratio. Our model for the observed data is where d AJ represents a contrast (a summary effect) between J and A, b AJ di represents heterogeneity in the J-A contrast between studies within designs, o AJ d represents inconsistency in the J-A contrast (heterogeneity between designs), and e AJ di is a within-study error term. Equivalently, in vector notation, For the within-study error terms, we assume « di $ N(0, S di ) where S di is assumed to be known. The treatment contrasts d are regarded as fixed parameters. We consider the specification of the heterogeneity terms b di and the inconsistency terms v d below.
For treatments J not used in design d, y AJ di is missing. This is not a problem; we simply use the model implied by (6) for the observed subvector of y di .

Modelling heterogeneity
Heterogeneity, represented by the b AJ di terms, refers to variation between true treatment effects in trials of the same design. We regard it as a random effect as in the conventional random-effects model for meta-analysis. We refer to model (7) without constraint on Σ as the unstructured model. In this model, the between-studies variance is var b AJ di À Á ¼ Σ JJ for the J-A contrast and var b AK di À b AJ di À Á ¼ Σ JJ À 2Σ JK þ Σ KK for the K-J contrast (J,K 6 ¼ A); between-studies variances of different contrasts are distinct. Σ is identified under consistency if every pair of treatments has been compared in at least two studies, but estimation is typically very imprecise with small numbers of studies.
We now describe two possible structured heterogeneity models for Σ, which are more readily identified (Salanti et al., 2008;Lu and Ades, 2009).
In a fully structured heterogeneity model, we assume that all treatment contrasts have the same betweenstudies variance t 2 . By considering the between-studies variance of the J-A contrasts, it follows that Σ JJ = t 2 for all J. By considering the between-studies variance of the K-J contrasts, it also follows that Σ JK = t 2 /2 for J 6 ¼ K. Thus, where P(r) is a matrix with all diagonal entries equal to 1 and all off-diagonal entries equal to r. Partially structured heterogeneity models are also possible. For example, the choice Σ = t 2 P( r) for unknown r might be useful, because it implies that all treatment contrasts involving treatment A have variance t 2 and all treatment contrasts not involving treatment A have a different variance 2(1 À r)t 2 .

Modelling inconsistency
Inconsistency is represented by the o AJ d terms. These could be treated as random effects with mean zero (Lumley, 2002;Lu and Ades, 2006) or as fixed effects (Lu and Ades, 2006). In the companion paper, we argue that they are best treated as fixed effects (Higgins et al., 2012). Here, we have an additional practical reason for treating them as fixed effects; if we treated them as random effects, then model (6) would have three variance components (within-study, between-study and inconsistency) so could not be expressed in the meta-regression framework (1).
We model inconsistency using the design-by-treatment interaction model described in the companion paper. This specifies no structure for the o AJ d terms. Thus, each treatment contrast is allowed to differ freely across designs; for example, the B-A contrast in trials with design AB can differ from that in trials with design ABD. Inconsistency models with fewer parameters are possible, such as the model of Lu and Ades (2006), which depends on a particular ordering of the treatments (Higgins et al., 2012).
Only a limited number of o AJ d parameters is usually needed in the model; for example, o AC d is not needed in design AB. In the design-by-treatment interaction model, the number of identifiable o AJ d parameters (the number of degrees of freedom (DOF) for inconsistency, df inc ) is the difference between the number of identified fixed parameters in the design-by-treatment interaction model, P d ( T d À 1), and the number of identified fixed parameters in the consistency model, T-1, so df inc = P d ( T d À 1) À (T À 1). There are various equivalent ways to parameterise the model. A simple approach, illustrated below, considers the designs in order. It adds a parameter for inconsistency in any design which includes a pair of treatments whose contrast can be estimated either directly from a previous design or indirectly under the consistency model from two or more previous designs. For multi-arm designs, the number of inconsistency parameters added to a design depends on how many of its treatment contrasts can be estimated from previous designs.
Below we show two ways to estimate models (6) and (7) with fixed inconsistency parameters using a two-stage approach.

Testing consistency
Once an inconsistency model has been fitted, using the methods to be described below, it is possible to test the hypothesis of consistency. This is best performed by globally testing all the inconsistency parameters using the global Wald test statisticô 0v arô ð Þ À1ô which under consistency follows a w 2 distribution on df inc DOF. Because we use REML estimation, likelihood ratio tests are not available for comparing models with different covariates. Different parameterisations for the modelsdifferent choices of the reference treatment or different ways to construct the inconsistency parametersmay be used, but the global Wald test should give the same results for all parameterisations.
It is also possible to examine and test individual o parameters. However, it is important to remember that a different parameterisationfor example, a different choice of reference treatmentwould lead to a different interpretation. This point is illustrated below.
Like any global test, the global Wald test may lack power. Thus, when the hypothesis of consistency is not rejected, inconsistency may nevertheless be present. It is important to use one's understanding of the nature and design of the studies in the NMA to decide how plausible is consistency. If large inconsistency is plausible, then the NMA should be broken downby restricting the treatments or the trialsinto a smaller problem where consistency is more plausible. If the hypothesis of consistency is not rejected and is plausible on subject-matter grounds, then it may be reasonable to base inferences on the consistency model.
If the significance test rejects the hypothesis of consistency, then treatment effects estimated from the consistency model are especially suspect. Instead, it is important to try to understand the source(s) of inconsistency, just as the sources of heterogeneity should be explored in standard meta-analysis (Thompson, 1994). Examination of theô parameters may help here. Possible explanations should be sought in terms of differences in study-level covariates such as the date of the study or the nature of the population (Salanti et al., 2009). If successful, these covariates could be included in the NMA model or used to stratify the analysis. Another possible explanation is that particular studies are outliers. If outliers are also of inferior methodological quality, then they might be excluded from the analysis, but if they are of equal or superior quality, then there is no alternative to a careful understanding of the nature and reliability of the data.

Estimation by the standard approach
We now consider model estimation where no treatment is common to all trials. The standard contrast-based model (Salanti et al., 2008) designates trial-specific reference treatments. For example, sensible choices of reference treatment in the data of Table 1 would be A for designs 1-7, B for designs 8-11 and C for designs 12-13. Contrasts of non-reference treatments with the reference treatment and their standard errors are then estimated.
First, consider the case when only two-arm trials are present. Define y Ã di as the single estimated treatment contrast in trial i of design d. If design d compares treatments J and K, then model (6) implies We now describe how this model can be expressed as a standard univariate meta-regression (model (1)  The first term in (9) involves fixed parameters and enters the aX i term of model (1) by constructing covariates x B di ; x C di ; . . . for each treatment except A. For the JK design, the covariate for K is x K di ¼ þ1, the covariate for J is x J di ¼ À1, and all other covariates x B di ; . . . are 0. The second term in (9) is a random variable representing heterogeneity. Under the unstructured heterogeneity model (7), its variance is Σ JJ À 2Σ JK + Σ KK , which varies between designs. Under the fully structured heterogeneity model (8), its variance is t 2 in all designs. To use the meta-regression framework (1), which imposes a common between-studies variance for all studies, we must therefore assume the fully structured heterogeneity model (8).
The third term, representing inconsistency, involves fixed parameters and requires covariates constructed carefully as described above. Lumley (2002) proposed a similar model for NMAs comprising only two-arm trials with the inconsistency parameters as random effects.
The fourth term is a random variable, representing within-study error in the K-J contrast, and its variance can be estimated directly from the study-level data. Now, consider the case with three-arm trials; each yield two estimated treatment contrasts with within-study covariances assumed known. Standard univariate meta-regression software is not now appropriate and to date, only Bayesian estimation has been described (Lu and Ades, 2006). We now express this case in the multivariate meta-regression framework (1). Define In general, each outcome is regressed on T-1 covariates constructed to equal À1 for the reference treatment, + 1 for the treatment compared with the reference and 0 for the other treatments. The coefficients d are the same for all outcomes, so the structure differs from that in equations (2) and (3).
With this reformulation, the model can be fitted by a two-stage procedure using the Stata programme mvmeta (White, 2009(White, , 2011 and the code in Appendix A.1. Imposing common coefficients across outcomes requires a previously unpublished option commonparm which was implemented for this paper. Extension to trials with four or more arms is straightforward.

Estimation by data augmentation
We now propose an alternative two-stage approach for data that include multi-arm trials; here, the same treatment is taken as the reference treatment in all designs. We accommodate designs which do not include the reference treatment by a data augmentation technique, which introduces an artificial reference treatment arm containing a very small amount of information. Data augmentation is a computational device to enable easy model estimation in a two-stage framework. The advantages of this approach over the standard approach are that model (6) can be estimated directly and intuitively; modelling and estimation are simplified; and more flexible modelling of heterogeneity, as in equation (7), is possible.
We use an arm-based approach to motivate the data augmentation approach in the contrast-based model. We take arm A as the reference category in all trials, irrespective of their design. For trials without arm A, we augment the data by introducing a very small amount of information in arm A. If the trial outcome is binary, the augmenting information is h individuals with success fraction m, for a small value of h. In the analyses below, we use h = 0.001 and set m equal to the overall event fraction pooling all treatments and all trials (0.08 in the thrombolytics data). Sensitivity to choices of reference treatment, h and m is explored below. If the trial outcome is quantitative, the augmenting information could be a mean equal to the overall mean and a standard error M ≫ 1 times the largest arm-specific standard error. Table 2 shows why the augmentation method works, using studies 2, 6 and 27 from the thrombolytics data which compare three treatments A, C and H. Study 2 includes all three treatments and hence yields estimates of both contrasts, which are assembled into the vector y di with the variance-covariance matrix S di shown. Both variances in S di are small, because there is information about both contrasts. The X di matrix shows that y di estimates the two basic parameters. Study 6 includes A and C but not H, so it only yields an estimate for the first component of y di ; the second component is missing, and correspondingly, S di has just one non-missing element, and X di has just one non-missing column. Study 27 includes C and H but not the reference treatment A. For the standard approach, y di has just one non-missing component corresponding to the H-C contrast. The X di matrix shows that it estimates the difference of the two basic parameters. For the data augmentation approach, an arm A is introduced. The estimated S di contains very large variances, representing the lack of information in this trial about the C-A and H-A contrasts. However, the equally large covariance in S di means that the information about the H-C contrast is correctly conveyed; setting c = (À1, 1) T , the H-C contrast is c T y di = À 0.262 with variance I. R. WHITE ET AL. c T S di c = 0.19, the same as with the standard approach. The X di matrix shows that y di estimates the two basic parameters as in study 2. In the case of binary outcomes, zero counts in observed treatment arms require additional treatment; the standard procedure is to add 1/2 to all cells in observed treatment arms for any trial with a zero count (Sweeting et al., 2004). A better but more computationally demanding, the way to handle binary data is to use the full binomial likelihood (Hamza et al., 2008).

Fitting the models to the augmented dataset
The consistency model (equation (5) with all o terms equal to zero) can be written as This is a multivariate random-effects meta-analysis as in equation (4). With the o parameters as fixed effects, the inconsistency model can be written as a multivariate random-effects meta-regression The random effects in such a model represent heterogeneity within designs. For trials missing one or more of arms B, C and so on, y di is incomplete, but likelihood-based procedures naturally accommodate this.

Ranking in the consistency model
If the consistency model is supported by the data, then it is useful to report the probability that each treatment is the best treatment (Caldwell et al., 2005). This is most naturally performed in the Bayesian approach but can also be approximated in the frequentist approach using a parametric bootstrap procedure implemented in mvmeta (White, 2011). We describe the procedure in the case where positive elements of d identify treatments that are worse than the reference treatment. Parameter vectors d Ã b , for b = 1, . . ., B, are drawn from the approximate posterior Nd; V , where V ¼v ard . For each b, the best treatment is identified as treatment A if all elements of d Ã b are positive and otherwise as the treatment corresponding to the lowest element of d Ã b . The probability that each treatment is best is estimated by the fraction of the B draws for which that treatment was best. Other summaries, such as rankograms or the probability that each treatment is the worst (Salanti et al., 2011) can be estimated similarly but are usually of less clinical interest.
As described, the method ranks the overall treatment effects d; the predicted treatment effects in a new trial, Higgins et al., 2009;Riley et al., 2011).

Estimation by the Bayesian approach
We also consider a Bayesian version of our model, which we implement for the thrombolytics data in WinBUGS. This model uses binomial within-study distributions in a one-stage analysis and does not require data augmentation or any approximation. We use an arm-based approach, where for each trial, we model a baseline treatment outcome m di and other treatment outcomes as comparisons to the baseline treatment. For convenience, we directly model the treatment comparison in each design, d AJ þ o AJ d , and reparameterise to obtain results for the inconsistency parameters o AJ d . A Bayesian version of the Wald test statistic for inconsistency can be computed. Alternatively, consistency and inconsistency models can be compared using the deviance information criterion (Spiegelhalter et al., 2002).
We use prior distributions for modelling the baseline treatment outcomes, m di $ N(0, 100), and treatment comparisons, If Σ is unconstrained, a Wishart prior could be used for Σ À 1 . For the constrained model (8), an appropriate prior distribution is t $ U(0, 2). Results may be sensitive to the prior distribution for the heterogeneity parameters (Lambert et al., 2005).
For each WinBUGS run, we used a burn-in of 30 000 updates. Convergence was checked using the Gelman-Rubin statistic as modified by Brooks and Gelman (1998), calculated for three chains with different initial parameter values. Every 20th update was sampled to reduce auto-correlation. Results were calculated from 150 000 sampled updates, ensuring that the estimated Monte Carlo error for all parameters was less than 0.005. Table 1 shows that only eight contrasts are found in more than one trial, and only seven contrasts are found in more than one trial of the same design. The arguments above show that we cannot estimate an unstructured heterogeneity model. Instead, we fit the fully structured heterogeneity model (8), which assumes the same heterogeneity variance between each pair of treatments. The partially structured heterogeneity model Σ = t 2 P(r) gives restricted log likelihoods that vary by no more than 0.02 as r varies from 0 to 1 (results not shown).

Consistency model
The consistency model (Table 3) shows that treatments B, E and G are most likely to be the best, with similar results for REML and Bayesian estimation. Point estimates were similar between the two estimation methods, except that the estimated heterogeneity was very small for REML but larger for Bayesian estimation. Standard errors were larger for the Bayesian analysis because of the larger estimated heterogeneity and because Bayesian analysis better allows for uncertainty in the heterogeneity parameter.

Inconsistency model
For these data, the design-by-treatment interaction model has 15 fixed parameters (two for each of the two threearm designs and one for each of the 11 two-arm designs), and the consistency model has seven fixed parameters. There are thus 15 À 7 = 8 DOF for inconsistency.
We parameterise the design-by-treatment inconsistency model by working down the designs in Table 1. Bold font in Table 1 indicates designs to which we attach inconsistency parameters. Designs 1 and 2 have no potential for inconsistency. Design 3 introduces potential for inconsistency because the C-A contrast can differ between designs 2 (ACH) and 3 (AC); we call this 'design inconsistency' in the companion paper (Higgins et al., 2012). We therefore attach an inconsistency parameter o AC 3 to arm C in design 3. We similarly attach inconsistency parameters to D in design 4 and H in design 7. Design 9 introduces potential for inconsistency in a different way, because it directly estimates the F-B contrast which is indirectly estimated under consistency by combining designs 1 (ABD) and 5 (BF); we call this 'loop inconsistency' in the companion paper (Higgins et al., 2012). We therefore attach an inconsistency parameter o AF 9 to arm F in design 9. We similarly attach inconsistency parameters to G and H in designs 10-13 because the contrasts in these designs (BG, BH, CG and CH) are all estimable under consistency from earlier designs. Table 3 reports results for REML estimation via data augmentation and for Bayesian estimation. The REML Wald test for inconsistency was 8.60 on 8 DOF, showing no overall evidence for inconsistency. The Wald-like statistic for Table 3. Thrombolytic drugs data: results from consistency and inconsistency models. 'REML' is the data augmentation approach using h = 0.001, m = 0.08 and with 10 000 parametric bootstrap samples to compute P(best). 'Bayes' is the Bayesian approach and estimates are posterior means. the Bayesian analysis was similar at 7.80. The deviance information criterion was lower for, and hence shows greater support for, the consistency model. In the design-by-treatment inconsistency model (Table 3), seven of the eight inconsistency parameters are smaller than their standard errors, butô AH 11 is slightly more than twice its standard error. Given the multiple testing and the overall lack of evidence for inconsistency, this is likely to be a chance finding. However, to illustrate interpretation as in Table 5 of the companion paper (Higgins et al., 2012), we now takeô AH 11 at face value; it represents a discrepancy between the direct estimate of the H-B contrast from design 11 and the indirect estimate from the other designs, especially designs 1 (AB) and 2 and 7 (AH), suggesting inconsistency around loop ABH. Indeed, design 11 seems to show large benefit of B over H, whereas designs 1, 2 and 7 show little difference between A, B and H. Different parameterisations can lead to different conclusions about individual parameters but not overall. For example, if we parameterise the design-by-treatment inconsistency model by working up, instead of down, the designs in Table 1, then the overall test for inconsistency changes by less than 0.001 (code in Appendix A.2, results in Table 4). However, now, two parameters (ô AB 1 andô AF 5 ) are larger than twice their standard errors. Largeô AB 1 suggests inconsistency around loop ABH, as with the first parameterisation, whereasô AF 5 suggests a second form of inconsistency which was not seen with the first parameterisation.
Surprisingly, the estimated heterogeneity is much larger in the inconsistency model than in the consistency model. Because the contrasts between designs contribute to the heterogeneity in the consistency model and not in the inconsistency model, this may arise because one or more contrasts between designs are smaller than expected by chance. Univariate analysis shows that the C-A contrast, which probably contributes most information about heterogeneity, is very similar between designs 2 (AC) and 3(ACH).
It is possible to fit other inconsistency models using our framework, as described in the companion paper (Higgins et al., 2012).
Choice of reference categories, h and m Table 5 explores sensitivity of the data augmentation approach to different choices of the parameters h and m and different choices of reference category. The results of the Wald test of consistency are very similar in all cases except that mild differences are seen with h = 0.1 and m = 0.5. Estimates of treatment contrasts AB and AG showed excellent agreement in all cases. Similar comparisons for the other treatment contrasts showed even closer agreement (results not shown). In the data augmentation approach, we calculated matrices S di using double precision computation (that is, with accuracy to about 16 significant figures); by contrast, using single precision Table 4. Thrombolytic drugs data: results from two different parameterisations of the inconsistency model, using the data augmentation approach with h = 0.001, m = 0.08. (that is, with accuracy to about eight significant figures), agreement between approaches was poorer and computation failed for h < 0.001. These results confirm that the choice of reference category is unimportant, support the argument for taking m to be the observed mean of the data and show that h = 0.001 was a good choice in these data.

Discussion
We have demonstrated two frequentist estimation procedures for consistency and inconsistency models for NMA. Our procedures are approximate in two ways. First, they are two-stage procedures which summarise the data from each study as a point estimate and variance, so they implicitly approximate the within-study log-likelihood by a quadratic function of the parameters. This is typically a good approximation except with very sparse data, that is, where many study arms have no events (or no non-events). Second, the data augmentation approach approximates by adding a small amount of information in the reference category. The comparisons in Table 5 shows that this is an excellent approximation.
The alternative Bayesian procedure avoids both of these approximations, but it has its own difficulties. First, it is computationally intensive; fitting the inconsistency model took about 2 h for the Bayesian procedure and 8-14 s for each frequentist procedure on a Windows Server running on a multi-user Sun computer with 2.59-GHz processors. Second, results can be very sensitive to the prior distribution chosen for the heterogeneity parameters.
The two frequentist procedures fit the same basic model but have different strengths and weaknesses. The disadvantage of the data augmentation approach is the extra approximation involved. Its advantages are (i) the between-studies variance matrix Σ need not follow the fully structured model (8), unlike in the standard approach, and (ii) the modelling framework is more intuitive, with treatment effects appearing as overall means rather than as coefficients of design variables. To allow treatment effects to vary with a study-level covariate W, the data augmentation approach enters W as a covariate, whereas the standard approach requires the interaction between W and the design variables to be entered in the model.
If the data augmentation approach is used for binary outcomes, m should be the observed mean and h should take a small value such as 0.001. Although the choice of reference category should not affect the model fit, it may be convenient in terms of interpretation to use an untreated control or placebo group as reference. Alternatively, computational speed might be improved by using a treatment found in many designs as reference category. Finally, when substantial data augmentation has been performed, it may be wise to check that results are similar between different choices of the reference treatments, h and m.
Although we have illustrated the methods using Stata, other software can be used. In R, the data augmentation approach can be implemented using the mvmeta package (Gasparrini, 2011). In SAS, both frequentist approaches can be implemented using proc mixed (van Houwelingen et al., 2002), and a referee has pointed out that the models can also be fitted in a one-stage procedure using conditional logistic regression (Stijnen et al., 2010).
Future research should explore the interpretation of individual parameters in the design-by-treatment interaction model; the power to test for inconsistency in this model; possible ways to assess inconsistency by Table 5. REML estimation of thrombolytic drugs data: Wald x 2 statistics testing consistency, selected estimated treatment effects and their standard errors, comparing standard approach with data augmentation approach for various choices of h and m and the reference treatment.  .197 À0.197 À0.197 À0.197 À0.198 À0.205 B À0.197 À0.197 À0.198 À0.197 À0.198 À0.202 C À0.197 À0.197 À0.197 À0.197 À0.198  posterior predictive checks; possible reduced inconsistency models with fewer DOF and greater power under specific forms of inconsistency; and partially structured models for heterogeneity. Network meta-analysis is increasingly popular. This paper has contributed a frequentist analysis approach based on multivariate meta-regression, which should make analysis easier and faster in the presence of multiarm trials, gives a global test for consistency, and is able to rank the treatments in a way that was formerly only possible with a Bayesian analysis.