A REML method for the evidence‐splitting model in network meta‐analysis

Checking for possible inconsistency between direct and indirect evidence is an important task in network meta‐analysis. Recently, an evidence‐splitting (ES) model has been proposed, that allows separating direct and indirect evidence in a network and hence assessing inconsistency. A salient feature of this model is that the variance for heterogeneity appears in both the mean and the variance structure. Thus, full maximum likelihood (ML) has been proposed for estimating the parameters of this model. Maximum likelihood is known to yield biased variance component estimates in linear mixed models, and this problem is expected to also affect the ES model. The purpose of the present paper, therefore, is to propose a method based on residual (or restricted) maximum likelihood (REML). Our simulation shows that this new method is quite competitive to methods based on full ML in terms of bias and mean squared error. In addition, some limitations of the ES model are discussed. While this model splits direct and indirect evidence, it is not a plausible model for the cause of inconsistency.


| INTRODUCTION
A key issue in network meta-analysis (NMA) is the potential occurrence of inconsistency between direct and indirect treatment comparisons.Several approaches have been proposed for detecting such direct-indirect evidence inconsistency as nicely reviewed by Shih and Tu. 1 These authors pointed out that previous proposals do not fully separate the direct and indirect evidence on a pairwise comparison of interest.They therefore proposed a new model, which they called the evidence-splitting (ES) model and which is based on the principle of independence between direct and indirect evidence.Splitting of direct and indirect evidence is accomplished through inclusion of a regression term that depends on the variance for heterogeneity between trials (studies).The consequence of the presence of this variance parameter in both the mean and the variance structure of the model is that standard methods of estimation based on linear mixed models do not apply.The full maximumlikelihood (ML) method is a natural choice for estimation in this setting. 1This method can be implemented in different ways, including structural equation modeling. 2A wellknown downside of ML estimation in linear mixed models is the bias of variance component estimates, 3,4 which can be substantial especially in arm-based network meta-analysis. 5Hence, residual (or restricted) maximum likelihood (REML) is the method of choice for such models.However, as the variance is present in both the mean and the variance structure of the ES model, REML is not directly applicable.
Network meta-analysis is most commonly implemented using a contrast-based approach, 6 whereby in every trial each treatment arm is compared to a baseline treatment.The alternative is to use an arm-based approach.This approach amounts to fitting a two-way analysisof-variance model with fixed effects for trials and treatments, and a random effect for the trial-by-treatment interaction. 7The random effect represents heterogeneity between trials.Taking the trial main effect as fixed ensures that the principle of concurrent control is observed, that is, all inference on treatment effects is solely based on within-trial information, whereas between-trial information is not recovered. 8,9The ES model proposed by Shih and Tu 1 is based on the contrast-based approach.Splitting the direct and indirect evidence involves linear combinations among all contrasts within a trial, ensuring that direct and indirect evidence are uncorrelated. 1However, the authors also give an arm-based representation of the model, which is particularly convenient for network metaanalysis in case some trials involve a larger number of treatments.This is because the arm-based approach requires focusing only on the two treatment arms of interest, whereas the contrast-based approach requires forming linear combinations among all treatments per trial, for the purpose of evidence splitting.
The present paper proposes an iterative approximate method for estimating the ES model with both the contrast-based and arm-based representations, which allows REML estimation to be used.To assess the merit of this new method, we will compare it to ML-based estimation methods.The models and methods are laid down in Section 2. Two examples are considered in Section 3. In Section 4, we report on a small simulation study comparing the estimation methods.Some limitations of the ES model are pointed out in Section 5.The paper ends with some conclusions in Section 6.

| THE EVIDENCE-SPLITTING MODEL
Without loss of generality, we here give the models for a three-arm study involving treatments A, B, and C.This case is the simplest and relevant to the example considered in Section 3.1.For the purpose of exposition, the response is binomial, and empirical log-odds are analyzed under the assumption of approximate normality.However, the approach also applies for other response distributions and link functions, including the normal for continuous responses in combination with an identity link, as considered in the example in Section 3.2.The justification we give for the ES model (in Appendices A and B) is somewhat different from that given by Shih and Tu, 1 who considered conditional distributions.We give our own compact derivation based on expected values here because this helps us focusing on the direct connections between the contrast-based and the arm-based form of the model.Furthermore, our derivation shows that the evidence can be split without using the ES model.

| The contrast-based basic model for NMA
Consider a network of trials with treatments A, B, and C. In trial i, at most three log-odds ratios can be observed: y iAB , y iAC , and y iBC for contrasts B versus A, C versus A, and C versus B, respectively.In a three-arm trial, the contrast-based model assumes that (Model 1. Contrast-based basic model for network metaanalysis), where d AB and d AC are the expected values of the two contrasts y iAB and y iAC , τ 2 is the heterogeneity variance and v iA , v iB , v iC are the outcome variances of A, B, and C, respectively.The variance for heterogeneity, τ 2 , is due to trial-by-treatment interaction.Note that the covariance among contrasts involves half the variance for heterogeneity, τ 2 =2, as well as the outcome variance that the two contrasts share.By comparison, the variance of each contrast involves the full variance for heterogeneity, τ 2 , as well as two outcome variances, since each contrast has contributions from two treatments.The model for the third contrast, y iBC , can be written as The outcome variances v iA , v iB , v iC may differ between trials and also between treatments in the same trial, depending on the individual trial design and kind of summary statistic used as the response.They are known quantities obtained from the analyses of the individual trials.In a two-arm trial, just one of the three contrasts y iAB , y iAC , and y iBC is observed.This contrast is still normally distributed with expected value and variance according to Model 1.

| The contrast-based evidencesplitting model
Following Shih and Tu, 1 we consider evidence splitting for the C versus B contrast.For this comparison, the contrast-based ES model for the i-th three-arm trial can be stated as (Model 2. Contrast-based ES model for network metaanalysis), and In this model, w is the discrepancy between the direct and the indirect evidence on the C versus B contrast.The latent covariates are given by The role of the latent covariates x iAB and x iAC in Model 2 is to enable evidence splitting, that is, estimation of the effect w.We denote these covariates as latent because they depend on unknown variance parameters of the model.Note that the latent covariates are only needed for three-arm trials, but not for two-arm trials. 1 The variance components are the same as in Model 1. Specifically, the variance for heterogeneity, τ 2 , is unknown and must be estimated, whereas the outcome variances v iA , v iB , v iC are known.Model 2 is readily extended to multi-arm trials with more than three treatments, but all treatments must be considered.For details, see Shih and Tu. 1 In Appendix A, we provide an alternative derivation of the contrast-based ES model and show that evidence splitting does not strictly require the ES model.

| The arm-based basic model for NMA
If y iA , y iB , y iC denote the observed log-odds for treatments A, B, C in a three-arm trial i, then an armbased model can be written as (Model 3. Arm-based basic model for network meta-analysis), where μ is an overall intercept, β i is the fixed main effect of the i-th trial, and γ A , γ B , γ C are the fixed main effects for the treatments A, B, C.This representation of the arm-based model differs slightly from that given by Shih and Tu 1 in that we use a form that does not require specifying a baseline treatment.It is of the form of a twoway analysis-of-variance model with factors treatment and trial and as such in our experience is very convenient for implementation in a linear mixed model package. 7Note that even though the linear predictor is over-parameterized, and hence restrictions on estimates of the effects need to be imposed, the predictions are unique.

| The arm-based evidencesplitting model
We here focus on the B versus A contrast for better comparability with Shih and Tu. 1 For the i-th three-arm trial, the arm-based ES model can be written as 1 (Model 4. Arm-based ES model for network meta-analysis), where w is the discrepancy between the direct and indirect evidence on the B versus A contrast and the latent covariates are defined as Note that the latent covariate term is present only for the two treatments for which the evidence is split (A and B), but not on the third treatment (C).Also, the latent covariate is not needed for the two-arm trials.For trials with more than three treatments, the coding of the latent covariates is the same as shown here, regardless of the number of treatments in the trial. 1 Thus, whenever a trial has more than two treatments and comprises both treatments A and B, the latent covariates are added in the linear predictor for A and B in the form given above, otherwise the latent covariate is set to zero.This makes this representation of the model particularly convenient for meta-analyses involving trials with a larger number of treatments, which are very common in agriculture.In Appendix B, we provide a derivation of the arm-based ES model.

| Methods of estimation for the contrast-and arm-based models
The basic contrast-based Model 1 can be estimated by full ML and by REML.The ES model has the heterogeneity variance parameter τ 2 involved in both the latent covariate x and the variance structure.Thus, the model is no longer of the linear mixed form.As the parameter τ 2 is present both in the mean and the variance, standard REML is not an option, but full ML is.Shih and Tu 1 implemented this method using structural equation modeling.Here, we employ the NLMIXED procedure of SAS to implement the full ML method of Shih and Tu. 1 Our approach requires specifying the contrast-based Model 2 in the syntax of the procedure, based on which the full likelihood is then set up and maximized iteratively using adaptive Gaussian quadrature. 10n alternative is to employ an approximate iterative scheme using a linear mixed model in which the current value of the latent covariate x is considered a known constant.The latent covariate is computed for the current estimate of τ 2 , which is updated in the next iteration.These steps are repeated until the values of τ 2 in the latent covariate x and the current estimate based on the linear mixed model fit agree.If ML is used during iterations, we obtain an approximation of the full ML estimation.It is also possible to use REML in the iterations, and this is expected to be preferable due to a smaller bias in the estimation of τ 2 . 3We implemented these iterative methods using the GLIMMIX procedure of SAS (see Supplemental Information).The iterative scheme is summarized in Figure 1.
The description above is focused on the contrastbased models.For the arm-based models, we have exactly the same options as for the contrast-based models.In particular, the algorithm in Figure 1 can be applied in the same way, replacing the contrast-based Models 1 and 2 with the arm-based Models 3 and 4.

| The Sclerotherapy NMA data
We consider the Sclerotherapy NMA data, 11 also used by Shih and Tu, 1 to illustrate their method.The trial network has two three-arm trials and 24 two-arm trials.The treatments were a control group (A), sclerotherapy (B), and beta-blocker (C).The focus of our analysis is on the direct-indirect evidence inconsistency for the C versus B contrast.The binomial outcome is the number r of participants suffering from bleeding.Trials with zero events or zero non-events in any of the treatment arms were handled by adding a value of 0.5 to the counts in both categories. 12The outcome was transformed to log-odds (y iA , y iB , y iC ) and analyzed assuming approximate normality.The outcome error variance for the log-odds of a treatment in a given trial was computed as where n is the binomial sample size.
Table 1 shows the estimates based on the basic Models 1 and 3.The REML-based analysis agrees between both models, whereas the ML-based analyses differ.The agreement of the REML estimates of τ 2 is expected, because the likelihood function being maximized is based on contrasts taking out all fixed effects in both cases. 7It is noteworthy that the ML estimate of τ 2 is rather small for the arm-based model, indicating substantial downward bias (also see Section 4).
Table 2 shows the results for the ES models.The results for the contrast-based representation agree well with those in Shih and Tu 1 (see their Table 2) when full ML is used.Iterated ML provides a very good approximation.The arm-based and contrast-based results differ notably when full or iterated ML is used.By contrast, results agree between arm-based and contrast-based model when iterated REML is used.The heterogeneity variance estimate with ML methods is considerably smaller than with REML, again suggesting that ML methods are fraught with downward bias, particularly when an arm-based parameterization is used (see Section 4).

| A multi-environment variety trial
We here consider a series of 85 crop variety trials evaluating 13 varieties of spring wheat, conducted in Sweden during the five-year period 2002-2006 at multiple locations. 13The purpose of these kinds of trials is to assess the value for cultivation and use (VCU) for each variety.The series of trials comprised one reference variety (with ID 9601).There was particular interest in comparing all candidate varieties to the reference variety.The dataset is rather unbalanced, with some varieties tested only in years 2005 and 2006.A salient feature of this dataset is that for some of the comparisons the mean of observed direct differences with the reference variety is quite different from the generalized least squares estimate of the same difference based on a suitable linear mixed model as will be described below.This apparent inconsistency between direct and indirect evidence hampered the communication of generalized squares estimates of variety means to growers, which led to the proposal of a new method to estimate the difference exclusively based on direct comparisons. 13Here, we will investigate the magnitude and significance of the inconsistency.
Value for cultivation and use trials are routinely analyzed using only data of the current year using an unweighted two-stage approach. 13,14In addition, a joint analysis across several years may be performed.We here consider both models, starting with the single-year analysis.We only use the arm-based approach, which is the more convenient one due to the relatively large number of treatments.The available data consist of variety means per trial, each computed across four observations.In Swedish variety testing, information on the precision of the means is not carried forward.Thus, network metaanalysis for a single year was based on the model Arm-based basic model for year-wise analysis of series of crop variety trials), where y ij is the mean of the j-th variety in the i-th trial and e ij is a random residual comprising both trial-by-variety interaction (heterogeneity) and residual error associated with the mean.We assume normality for both the response and the residual.As the outcome error variances v ij were not available, the residual was modeled as independent random normal deviate with constant variance σ 2 e . 13Note that if the error variances v ij were known, the residual would be modeled as var implies the assumption that the error variances v ij are constant.Assuming that the number of replications was the same in each trial, this assumption has some justification.While it is often found that there is some heterogeneity of error variance between variety trials, the effect of accounting for such heterogeneity in a joint analysis over trials is often found to be relatively small. 14Also, the constant variance assumption has a justification in randomization theory for multi-environment trials. 15s the residual has constant variance under the assumed model, the covariate x takes values À0.5 and 0.5 for the two varieties to be compared when both are in the same trial, otherwise the value is zero.This is very simple to implement because x does not depend on the variance for heterogeneity here (i.e., x is not latent) and analysis by REML is therefore exact.The extended model is (Model 6. Arm-based ES model for year-wise analysis of series of crop variety trials), where w is the fixed effect for inconsistency.
Next, consider analysis across years.Locations varied between years, so we model trials as nested within years using 13 Arm-based basic model for analysis over years of series of crop variety trials), where y ijk is the yield of the j-th variety in the k-th year and i-th trial, μ is an overall intercept, γ j is the main effect of the j-th variety, a k is the main effect of the k-th year, g jk is the variety-by-year interaction of the j-th variety with the k-th year, β ik is the effect of the i-th trial nested within the k-th year, and e jik is a residual comprising both trial-by-variety interaction and error associated with y ijk .The effects a k and g jk are modeled as random for routine analysis.If, however, we test for inconsistency within years, then all effects except the residual are fixed.In this case, we can make the replacement γ jk ¼ γ j þ a k þ g jk and take this effect for the j-th variety and k-th year as fixed.With this parameterization, the joint model for simultaneously assessing inconsistency in the different years is (Model 8: Arm-based ES model for analysis over years of series of crop variety trials), where w k is the fixed effect for inconsistency in the k-th year.The covariate x ijk takes values À0.5 and 0.5 for the two varieties to be compared  For comparison, we conducted the same analyses using full ML (Table 3).This yielded considerably smaller residual variance estimates, reflecting the large number of model degrees of freedom (123 in the across-years analysis) relative to the size of the dataset (498 observations).The bias can be calculated exactly in this case, observing that the REML estimator of the residual variance equals the residual mean MS residual of an analysis of variance for the Model 6 or Model 8, hence is unbiased, and the ML estimator equals n À p ð Þn À1 MS residual , where n is the sample size and p is the number of fixed-effect parameters, meaning that the bias equals Àn À1 p Â 100% of MS residual .For the acrossyears dataset we have n = 498, p = 124 such that p/n ≈ 1 /4 and hence a bias of approximately À25%.The ratio p/n is substantial yet mild by comparison with the usual setting in medical meta-analysis, where the number of treatments is often two in the majority of studies.For the Sclerotherapy NMA data in Section 3.1, we have p/n = 28/54 ≈ ½.The REML-based estimator there is not a residual mean square, because the outcome variances v iA , v iB , v iC enter the estimation procedure and the ES model is no longer of the linear mixed form, but the high p/n ratio still gives some rough indication of the order of magnitude of the discrepancy to be expected between iterated REML and ML estimates in an arm-based analysis.By the same token, bias in the contrast-based analysis is expected to be smaller, because trial effects are removed such that p equals only the number of treatment arms minus one and hence p/n is rather smaller for contrast-based analyses.This will be investigated further by simulation in the next section.

| SIMULATION
To investigate the bias and accuracy of alternative estimation methods for the ES model, we conducted a Monte Carlo simulation using the Sclerotherapy NMA data 1 to parameterize the arm-based model.The data were simulated under the null hypothesis of no inconsistency between direct and indirect evidence (w = 0) for the C versus B contrast (d BC ).Thus, assuming Model 3, we set the fixed effects in the linear predictor equal to the estimates obtained by iterated REML.Random effects for heterogeneity were simulated from a normal distribution with variance τ 2 =2.The value of the parameter τ 2 was set to 0.3, 1.0 (close to the iterated REML estimate) and 3.0.The binomial response was generated using the binomial probability obtained by applying the inverse logit link to the linear predictor and using the sample size n as given in the example for the respective treatment and study.When non-convergence occurred, this was invariably because the estimator of τ 2 approached zero.In these cases, we replaced the analysis by one that fixed τ 2 at zero from the start.This way, convergence could be achieved in all cases.The percentage of cases where convergence failed when τ 2 was a parameter to be estimated was recorded.The performance of the full ML method and the iterated ML and REML methods for the ES model was assessed in terms of the bias and mean squared error (MSE) of the estimators of τ 2 , d dir BC , d ind BC , and w (see Appendices A and B).Each of the three scenarios for τ 2 was assessed based on 10,000 simulation runs.The methods were assessed under both the contrast-based and the arm-based representation of the model for analysis.For comparison, we also investigated the analysis based on the consistent Models 1 and 3. Results are shown in Tables 4 and 5. Generally, for the ES model, the iterated REML method performs best in terms of both MSE and bias for τ 2 and effect estimates as well (Table 5).Overall, the performance of the iterated REML method is quite comparable to that of REML for analysis by the consistent model (Table 4).The method produces the same results for both contrast-based and arm-based approach, as is expected from corresponding results for the consistent model. 5The key reason for this agreement between contrast-based and arm-based analysis lies in the fact that REML makes use of the residual contrasts, which remove any fixed effects, thus duly accounting for the degrees of freedom lost to fixed effects. 3For the theoretical details we refer readers to Piepho et al. 5 The result carries over to our iterated REML method for the ES model, because on each iteration, a linear mixed model is fitted, conditioning on the current estimate of w.The ML-based methods suffer from larger bias for τ 2 , which also translates into larger MSE.The problem is most pronounced for the arm-based approach, which is due to the large number of fixed study effects.Generally, for both ML and REML methods, the MSE increased with the true value of τ 2 , which is expected by way of analogy with sample variances in simple random samples. 3Interestingly, the point estimates of w have comparable bias and MSE for ML and REML methods.Note, however, that estimates of uncertainty for w estimates (model-based standard errors and confidence intervals) will be adversely affected by the inferior performance in estimating τ 2 .In summary, the results show that the iterated REML-approach is quite competitive.These results are restricted to smaller networks as investigated here.Performance in larger networks is certainly worth investigating in future.

| LIMITATIONS OF THE EVIDENCE-SPLITTING MODEL
The arm-based ES model has a single term wx ij to detect the inconsistency between direct and indirect evidence.
The latent covariate x ij takes non-zero values only in trials with direct evidence, so we will focus on such trials here.If we consider the direct evidence of the B vs. A contrast, the relevant latent covariate values for the i-th trial are x iA and x iB .The values of x iA and x iB depend on the variance components.While this model does lead to a clean separation of the direct and indirect evidence under the consistent two-way model, and as such is very useful, it does not represent a plausible data generating mechanism under inconsistency.To explain in more detail, consider the arm-based linear predictor under consistency, excluding the random effect for heterogeneity for clarity: Arm-based linear predictor under consistency, excluding random effect for heterogeneity for clarity).This gives the expected value of the j-th treatment in the i-th trial on the linear-predictor scale under the consistent arm-based model.Adding the term for evidence splitting, the linear predictor becomes (Model 10.Arm-based linear predictor for ES model, excluding random effect for heterogeneity for clarity).What this model would imply if taken at face value, is that in trials where both treatments A and B are present, the mean of treatment A is shifted by the amount wx iA compared to trials not having both treatments, whereas simultaneously treatment B is shifted by the amount wx iB .The parameter w itself equals the corresponding shift in the difference between the means of A and B. Thus, observing that x iB À x iA ¼ 1 under the ES model, the difference in the i-th trial, assumed to have both treatments A and B, is given by The key point here is that x iB À x iA ¼ 1 regardless of the variance components.But under the ES model, the variance components additionally determine the exact values of x iA and x iB .Hence, the variance components dictate how large the shift of each of the two treatments is.Moreover, the shifts for A and B are further restricted to be opposite in sign, and also the shifts depend on the trial.While this is perfectly fine for the purpose of splitting the direct and indirect evidence under the consistent Model 9, it is too restrictive as a model explaining how any inconsistency may have come about in multi-arm trials, where treatments other than A and B do not experience a shift.Hence, the ES model is best regarded as a convenient tool for tracing the separate flows of direct and indirect information for a particular contrast under the consistent Model 9, but not as a model describing a plausible datagenerating mechanism under inconsistency.
As the focus of the analysis is on just the two treatments A and B in multi-arm trials where both are present, the most obvious choice for a plausible model under inconsistency is one that allows the direction and amounts of shift of A and B to be independent of any variance components.Clearly, the variance components cannot be causal agents in the genesis of inconsistency.The causal agents must be related to the conduct of a trial, its specific settings for the individual treatments and how this affects the treatment means.If there is an inconsistency regarding treatments A and B, then at least one of their effects must be shifted when they are in the same trial, as compared to the effect when they are not in the same trial.If that is a reasonable assumption, then it is also plausible to assume that both effects are shifted, and independently so, unless one has very specific evidence to show that only one of the two treatment effects would sustain a shift under inconsistency.In this scenario, one possible model under inconsistency may be formulated using the linear predictor (Model 11.An arm-based linear predictor with two parameters for inconsistency, excluding random effect for heterogeneity for clarity), where z 1ij = z 2ij = 0 except for z 1iA = 1 for the observation on treatment A and z 2iB = 1 for the observation on treatment B if the i-th trial has both treatments A and B. The effect δ A is the shift of the effect for treatment A when in the same trial with B, and δ B is defined analogously.Apart from its realism, this Model 11 has two important advantages over the ES Model 10: (i) the covariates (z 1ij and z 2ij ) no longer depend on variance components, so the linear predictor is truly linear in the parameters and REML is directly available, and (ii) this model can also be fitted to individual-person (or individual-plot) data using a generalized linear mixed model (GLMM) for observed binomial counts, whereas the ES model only works for a linear mixed model (LMM) assuming approximate normality for the empirical logits.The same goes for other distributions and links with GLMM.
The one missing degree of freedom in the fixed effects for the ES Model 10 in comparison with Model 11 means that under this latter realistic scenario for inconsistency, the ES model cannot fully capture the inconsistency and hence the variance estimate for heterogeneity (τ 2 ) will sustain an upward bias.This, in turn, raises the question as to the best model to estimate τ 2 .The answer partly depends on the purpose of the analysis.The models considered in this section provide three options: (i) the consistent Model 9, (ii) the ES Model 8 with one degree of freedom for inconsistency, and (iii) the extended Model 11 with two degrees of freedom for inconsistency.Using Model 9 provides a valid estimate of τ 2 under the null hypothesis of consistency.This estimate is fine if we assume that consistency holds and we just want a numerical check on the contributions of the direct and indirect evidence to the overall estimate of the B versus A contrast.This estimate is also fine for testing the null hypothesis of consistency, but it is not appropriate for obtaining valid standard errors of estimates of w under Model 10, which is why Shih and Tu proposed a ML estimation method based on Model 10. 1 In Section 2, we have continued along the same lines and developed alternative methods of estimation under the same Model 10, which we then compared by simulation under consistency in Section 4. The discussion in the present section suggests, however, that Model 10 may not be the best model to obtain a realistic estimate of τ 2 under inconsistency, because it only has one degree of freedom for inconsistency, whereas two degrees of freedom are required for a realistic representation of any inconsistency in relation to the two focal treatments.Hence, one may argue that Model 11 is preferable for estimating τ 2 if it is thought necessary to allow for inconsistency.Moreover, Model 11 may also be the better model for detecting inconsistency.
It should be re-iterated that the two-parameter Model 11 for inconsistency proposed here does not achieve the separation of direct and indirect evidence arising when the consistent Model 9 is assumed to hold, whereas this is achieved perfectly well with the ES Model 10.For example, if there are two three-arm trials, as in Example 1, then in addition to the direct evidence there is also indirect evidence on the treatment effects between the two three-arm trials due to the heterogeneity in the response variances v ij .Model 11 cannot split that evidence, because the effects δ A and δ B simultaneously capture both sources of information, but the ES Model 10 can achieve the splitting via the single parameter w, which by construction captures the discrepancy between direct and indirect evidence.The ES Model 10 introduces the single additional parameter w as a device to trace and separate the flow of direct and indirect evidence on the B versus A contrast in fitting the consistent Model 9, without representing a plausible model for the genesis of any inconsistency.In all fairness, this is the main stated purpose of the ES model.By contrast, Model 11 introduces two parameters δ A and δ B in order to represent a realistic mechanism for inconsistency and simplify the estimation of the variance for heterogeneity.It may have better power for detecting inconsistency, but the model cannot cleanly separate the direct and indirect evidence flowing in the consistent Model 9.
In this section, we have considered one specific model for inconsistency.There are, of course, many other models for inconsistency, and these are nicely reviewed in Shih and Tu. 1 Any of these models can be used as an alternative to Model 11 in analyses aimed at modeling inconsistency.Our purpose in introducing that model was not to advocate one specific model for inconsistency, but to illustrate the limitations of the ES model in representing a plausible mechanism for inconsistency.
In relation to option (i), it may be added that if consistency is assumed to hold, the split of the direct and indirect evidence could also be achieved as follows.First, fit the consistent Model 9 using all the data and obtain an estimate of the B versus A contrast and its associated var This reverse-engineering approach splits the evidence in exactly the same way as the ES model does when the same estimate of τ 2 is used.This equivalence makes it clear that the ES model is a tool for splitting the evidence under the consistent model.

| CONCLUSION
The ES model is useful for detection of inconsistency between direct and indirect evidence.As the model has variance parameters in both the mean and the variance structure, full ML is the most natural approach to estimation.The downside of this method, as supported by our simulation, is substantial bias of the variance parameter estimator.Our proposed iterated REML estimator provides an efficient and less biased alternative.While the ES model is a convenient tool for comparing direct and indirect evidence, it does not provide a plausible model for the genesis of inconsistency.For this purpose, other models may be preferred.

A P P END I X A : DERIVATION OF THE CONTRAST-BASED ES MODEL
In this section, we show that evidence splitting does not strictly require the ES model and also give an alternative derivation of the contrast-based ES model.The example that is considered in Section 3.1 uses treatment A as a baseline.Following Shih and Tu, 1 we focus on the contrast C versus B. Before moving to three-arm trials, consider the simpler case where all trials involve just two of three treatments.If the interest is in the C versus B contrast d BC , then two-arm trials comprising treatments B and C (in short: BC-trials) provide direct evidence.Two-arm trials with treatments A and B (AB-trials) provide direct evidence on the B versus A contrast d AB , and two-arm trials with treatments A and C (AC-trials) provide direct evidence on the C versus A contrast d AC .In a consistent network, both ABand AC-trials together provide indirect evidence on the C versus B contrast because d BC ¼ d AC À d AB .Hence, evidence splitting amounts to comparing the direct evidence provided by the BC-trials with the indirect evidence provided by the AB-and AC-trials.Now assume that the network includes at least one three-arm trial (ABC-trial).Such trials comprise both direct and indirect evidence on the C versus B contrast.Hence, the evidence needs to be split within such trials.In a three-arm trial i, the direct evidence is given by where y iBC is the observed contrast (direct difference) between C and B. Now in the same three-arm trial we are looking for with c i chosen so that L i1 and L i2 are uncorrelated, and hence independent under normality, which we assume throughout.The reasoning here is that L i1 and L i2 provide a one-to-one transformation of y iAB and y iAC , representing the complete evidence provided by the i-th trial, and that since L i1 captures all direct evidence but no indirect evidence, then the uncorrelated L i2 must capture all the indirect evidence provided by the i-th trial. 1 Note that indirect evidence provided by L i2 comes from all the indirect comparisons between C and B that are available across the network of trials.Clearly, an indirect comparison always involves data from more than one trial, so L i2 in isolation is not informative about the C versus B contrast.For example, if the i-th trial provides a direct comparison of B versus A (y iAB ) and another trial k provides a direct comparison of C versus A (y kAC ), then the difference of these two direct comparisons y iAB À y kAC ð Þprovides an indirect comparison of B versus C. A large network of trials can provide a large number of such indirect comparisons.NMA seeks to exploit all of these indirect comparisons and combine them in an optimal way with the direct evidence.
Continuing our derivation, we find [Note that c i is defined always because var y iAC ð ÞÀcov y iAB ,y iAC ð Þ> 0 always].If there are n ABC ABC-trials and n BC BC-trials, then the direct evidence is given by the n ABC + n BC equations where, according to Model 1, e i1 and e iBC are both distributed as From these equations, the contrast d BC can be estimated using generalized least squares.This provides the direct-evidence estimate b d dir BC .The indirect evidence is provided by the n ABC ABCtrials, the n AB AB-trials, and the n AC AC-trials, through the n ABC + n AB + n AC equations. .As we have seen, the ES model by Shih and Tu is not needed for splitting the evidence.However, their model allows for inconsistency between the direct and indirect evidence. 1his is accomplished by the assumption that where w is the discrepancy between the direct and indirect evidence for the C versus B contrast.Under the null hypothesis of consistency between the direct and indirect evidence, we have w = 0. We shall now provide a simple justification for the ES model.
Using the ABC-trial, b As a result, the expected value for the contrast between C and A is 1 þ c 1 ð Þ À1 w units larger in an ABC-trial than in an ACtrial.
Similarly, we may consider the ABC-trial (i = 1) and an AB-trial The expected difference between the direct-and indirect-evidence estimates is w In conclusion, the expected value for the contrast between B and C is where the variances and covariances are the same as given in Model 1.These definitions of the latent covariates are needed only for the three-arm trials in the example in Section 3.1.In all other studies, involving only two treatments, the covariate x equals 0, except when the two treatments are B and C, in which case x equals 1.Thus, the contrast-based ES model is different from Model 1, since a latent covariate x is added to assess the inconsistency between direct and indirect evidence for the C versus B contrast.Hence, the contrast-based ES model for the i-th threearm trial can be stated as in Model 2. The variance components are the same as in Model 1. Specifically, the variance for heterogeneity, τ 2 , is unknown and must be estimated, whereas the outcome variances v iA , v iB , v iC are known.Model 2 is readily extended to multi-arm trials with more than three treatments, but all treatments must be considered.For details, see Shih and Tu. 1

A PP E ND IX B: DERIVATION OF THE ARM-BASED ES MODEL
Here, we provide a derivation of the arm-based ES model.Assume we want to split the evidence for the comparison of treatments A and B. Here, we focus on this contrast, rather than on the C versus B contrast considered in the example in Section 3.1, for better comparability with the derivations given by Shih and Tu. 1 In a multi-arm trial comprising both A and B, we consider two linear combinations, that is, M i1 ¼ y iB À y iA and with f i chosen so that cov M i1 , M i2 ð Þ¼0.M i1 contains the direct evidence and M i2 the indirect evidence on the B versus A contrast.Under the assumption of independence between y iA and y iB cov M i1 , M i2 ð Þ¼var y iB ð ÞÀf i var y iA ð Þ: Thus, to ensure cov M i1 ,M i2 ð Þ¼0 we require With this choice of f i , M i1 is uncorrelated with M i2 , the observed responses of treatment C in the i-th trial, and data from any other trials.Note that one could separately analyze M i1 from all three-arm trials together with all AB-trials to extract all direct evidence on the B versus A contrast.Likewise, M i2 and data on C from all threearm trials together with the data from all AC-and BCtrials could be analyzed separately to extract all indirect evidence.However, a joint analysis is needed to assess the inconsistency between both estimates.To develop a model for this purpose, we now consider the expected values for M i1 and M i2 .First, we express the expectations in terms of the contrast between A and B and, as with the contrast-based approach, defining d dir AB to be the expected contrast based on direct evidence and d ind AB the expected contrast based on the indirect evidence.
Note that for the indirect evidence M i2 we are using the definitions E y iA ð Þ¼γ A and E y iB ð Þ¼γ B (again ignoring intercept and trial effects).By comparison, we cannot use these definitions in the model for the direct evidence M i1 , because this also involves w.This specification is analogous to that for the contrast-based ES model in Section 2.2.From this model for the expected values of M i1 and M i2 , we may obtain the expected values for the observed treatment responses as It can be shown that where the variances are given in Model 3. The latent covariate x is set to zero in all trials comprising only one of the two treatments A and B, and also for treatments other than A and B in the multi-arm trials involving both A and B, because the corresponding observations are not involved in the evidence splitting.The treatment effects given above for y iA and y iB , that is, γ A þ x iA w and γ B þ x iB w, are also given by Shih and Tu. 1 These effects differ from those in Model 3, as a latent covariate x has been added to assess any inconsistency between direct and indirect evidence for the B versus A contrast.The full linear predictor for the expectation of the j-th treatment in the i-th trial is μ þ γ j þ β i þ x ij w.Hence, for the i-th three-arm trial, the arm-based ES model can be written as in Model 4.

F I G U R E 1
Algorithm for obtaining iterated ML or REML estimate of the contrast-based ES Model 2 or the arm-based ES Model 4.
. The REML-based F-test for inconsistency is not significant, both for the individual year analyses (p = 0.3386 on 1 d.f. for 2005, p = 0.9801 on 1 d.f. for 2006) and the across-years analysis (p = 0.5690 on 2 d.f.), giving no indication that the direct and indirect evidence for this comparison are inconsistent.In each of the two years, the direct and indirect difference estimates ( b d dir and b d ind ; see Appendix B) are such that the corresponding generalized least squares estimates of the difference based on a model with fixed variety and trial effect ( b d ¼ À107 in 2005, b d ¼ À1053 in 2006) fall between them as expected.In fact, since b d dir and b d ind based on the ES model are independent under normality, this generalized least squares estimate of the contrast, d, can be computed as the weighted average of b d dir and b d ind , with weights given by the inverses of the squared standard errors: b d ¼ s:e: b d dir h i À2 b d dir þ s:e: Using this method, b d ¼ À82 (s.e.= 334) for the year 2005 and b d ¼ À1048 (s.e.= 195) for 2006.These estimates coincide exactly with the estimates using the standard NMA model.The standard errors, however, differ slightly between the two methods due to different residual variance estimates.Using the same residual variance estimate, the standard errors coincide exactly (not shown).
iance.Denote these as b d AB and var b d AB .Also, save the estimate of τ 2 from this analysis and denote this as b τ 2 all .Second, only use the data on treatments A and B and only include trials having data on both A and B. Plug in the estimate b τ 2 all and obtain the estimate of the B versus A contrast.This will be the estimate representing the direct evidence, b d dir AB , with associated variance var b d dir AB .Third, noting that b d AB must equal the weighted average of b d dir AB and b d ind AB with weights given by w dir ¼ var b d Let b d ind AB and b d ind AC denote the generalized least-squares estimates of d AB and d AC , respectively, based on these equations.The indirect-evidence estimate b d ind BC is computed as b d ind BC ¼ b d ind AC À b d ind AB .Since the direct-and indirect-evidence estimates b d dir BC and b d ind BC are uncorrelated, the generalized least-squares estimate of d BC in Model 1 is the weighted average of b d dir BC and b d ind BC , using their inverse variances as weights.We have thus achieved the split of the evidence for d BC in Model 1.By Model 1, the direct-and indirect estimators are consistent, that is, E b d dir BC ¼ E b d ind BC 1arameter estimates (standard errors) for the basic models for network meta-analysis using different methods with the Sclerotherapy NMA data.1Parameterestimates(standard errors) for ES models for C versus B contrast using different methods with the Sclerotherapy NMA data.1 T A B L E 1 À0.825 (0.935) À0.825 (0.936) À0.841 (1.005) À0.638 (0.616) À0.640 (0.613) À0.841 (1.005) a Using the NLMIXED procedure of SAS.b Using the GLIMMIX procedure of SAS.c For definitions see Appendices A and B.
13A B L E 3 Estimates (standard errors) of direct effects (d dir ), indirect effects (d ind ), and inconsistency (w) in comparison of variety 20549 with reference variety 9601 and F-test for inconsistency in variety trial data.12ResidualvarianceestimatedbyREMLandML.To illustrate, consider the comparison of variety 20549 with the reference variety 9601.Variety 20549 was tested only in years 2005 and 2006, whereas the reference variety was tested in all years.Analyses were done both per year using Model 6 and across years using Model 8.The results are shown in Table3.The direct difference estimates b d dir (see Appendix B) agree exactly with those reported by Forkman13in his Table2 Simulated bias and mean squared error (MSE, reported in brackets) of estimators for τ 2 and d BC under consistent model using the Sclerotherapy NMA data 1 to parameterize the models (Models 1 and 3) for simulation, assuming τ 2 = 0.3, 1.0 and 3.0.
T A B L E 4 a Using the GLIMMIX procedure of SAS.
Simulated bias and mean squared error (MSE, reported in brackets) of estimators for τ 2 , d dir BC , d ind BC and w under the ES model using the Sclerotherapy NMA data 1 to parameterize the model for simulation, assuming τ 2 = 0.3, 1.0 and 3.0.
T A B L E 5 smaller in an ABC-trial than in an AB-trial.
Again, under consistency d dir AB ¼ d ind AB , whereas under inconsistency d dir AB ≠ d ind AB .As in the contrast-based ES model (Section 2.2), we use the parameterization d dir AB¼ d ind AB þ w, thus setting d dir AB ¼ γ B À γ A þ w and d ind AB ¼ γ B À γ A, where γ A and γ B are the effects of treatments A and B (see Section 2.3).Hence, the expectations for M i1 and M i2 are (ignoring intercept μ and trial effects β i for simplicity)