Methods to assess evidence consistency in dose‐response model based network meta‐analysis

Network meta‐analysis (NMA) simultaneously estimates multiple relative treatment effects based on evidence that forms a network of treatment comparisons. Heterogeneity in treatment definitions, such as dose, can lead to a violation of the consistency assumption that underpins NMA. Model‐based NMA (MBNMA) methods have been proposed that allow functional dose‐response relationships to be estimated within an NMA, which avoids lumping different doses together and thereby reduces the likelihood of inconsistency. Dose‐response MBNMA relies on appropriate specification of the dose‐response relationship as well as consistency of relative effects. In this article we describe methods to check for inconsistency in dose‐response MBNMA models. Global and local (node‐splitting) tests for inconsistency are described that account for studies with ≥3 arms that are typical in dose‐finding trials. We show that consistency needs to be assessed with respect to the choice of dose‐response function. We illustrate the methods using a network comparing biologics for moderate‐to‐severe psoriasis. By comparing results from an Emax and an exponential dose‐response function we show that failure to correctly characterise the dose‐response can introduce apparent inconsistency. The number of comparisons for which node‐splitting is possible is also shown to be dependent on the complexity of the selected dose‐response function. We highlight that the nature of dose‐finding studies, which typically compare multiple doses of the same agent, provide limited scope to assess inconsistency, but these study designs help guard against inconsistency in the first place. We demonstrate the importance of assessing consistency to obtain robust relative effects to inform drug‐development and policy decisions.


INTRODUCTION
Network meta-analysis (NMA) is a well-established methodology for the combination of results from randomized controlled trials (RCTs) where the evidence forms a connected network of treatment comparisons. 1,2 The popularity of NMA in healthcare has been driven by its ability to simultaneously compare the relative efficacy of multiple treatment options, which is essential to inform health care policy and for the development of clinical guidelines. NMA relies on the consistency assumption, which can be thought of as agreement between the direct and indirect evidence for a particular comparison between two treatments (defined as a contrast). 3 If there is an imbalance in treatment effect modifiers (patient characteristics or other study-level factors that change the relative treatment effects) between the included studies, then this can lead to heterogeneity if these differences are between studies informing the same contrast, or/and inconsistency if there are differences between studies informing different contrasts. 2 Differences in treatment characteristics is one factor that may induce inconsistency and/or heterogeneity. Key characteristics of pharmacological treatments that may vary between studies could be drug formulation, route/frequency of administration and administered dose. It is therefore important to account for these differences in NMA to help avoid inconsistency and heterogeneity. There are two main approaches to account for dose in NMA. The simplest is to define each agent and dose combination as a distinct treatment in the NMA. However, this may lead to sparse or disconnected evidence networks which yield imprecise estimates, or no estimates at all for some comparisons. 4 An alternative is to model the dose-response relationship with a parametric function. A general model-based NMA (MBNMA) framework has been proposed to fit parametric dose-response functions to combine evidence on multiple agents at multiple doses whilst respecting the randomization in the RCTs. 4 Compared with a standard NMA where each dose and agent combination is considered a distinct treatment, dose-response MBNMA models bring the potential to connect disconnected networks and increase precision of estimates for treatments of interest. 5 Dose-response MBNMA is particularly useful in the context of drug-development to inform decisions as to which dose of an agent to take forwards for further development, where it is crucial to understand not just the dose-response for the compound being developed, but also that of the key competitor drug(s). Whilst accounting for dose helps to reduce inconsistency and heterogeneity that may arise from differences in treatment characteristics, they may not be completely eliminated, and methods are needed to check for it within the dose-response MBNMA framework. A thorough assessment of the consistency assumption is therefore important to increase confidence that the resulting treatment effect estimates provide a firm basis for decision-making, whether for reimbursement decisions, clinical guidelines, or for quantitative drug development.
Standard methods to check for inconsistency in NMA include a simple z-test to compare direct and indirect estimates from a single "loop" of evidence (the Bucher method 6 ), an extension to multiple independent loops of evidence (repeated Bucher method 7 ), fitting models that relax the consistency assumption globally (unrelated mean effects model 3,8 ) or locally for each comparison (node-splitting 9 ), and cross-validation. 10 Assessing consistency in dose-response MBNMA is complicated because consistency depends on the functional form assumed for the dose-response relationship, and so is inextricably linked to model fit of the dose-response relationship. Furthermore, dose-response MBNMA typically includes many RCTs with three or more arms, which are problematic for standard methods of assessing consistency in NMA.
The aim of this article is to present methods for assessment of evidence consistency in a dose-response MBNMA framework. We begin by introducing a network of evidence comparing different biologics at different doses for the treatment of psoriasis, as an example with which to motivate and illustrate the methods. 11 We then give some background on the dose-response MBNMA modeling framework before describing how methods to assess consistency in NMA can be extended to dose-response MBNMA. We apply the methods to the psoriasis dataset, and end with a discussion and recommendations for practice.

MOTIVATING EXAMPLE: BIOLOGICS FOR PSORIASIS
We motivate the methods with a network of trials comparing the efficacy of biologics for the treatment of moderate to severe psoriasis. 11 The original dataset, which is available within the MBNMAdose R package, 12 included nine different agents (biologics) given at various doses, and placebo. Two agents (Adalimumab and Guselkumab) were only investigated at a single dose. As inclusion of these agents would limit the complexity of different dose-response functions that could be F I G U R E 1 Network diagram showing the comparisons made by the RCTs included in the moderate-to-severe psoriasis dataset at (A) agent-level (dose and agent combination) and (B) treatment-level. Where direct trial evidence exists between a pair of agents/treatments they are joined by a line. Each agent/treatment is represented by a node, coloured by agent with agent and dose indicated alongside each node. Width of lines is proportional to the number of studies that make a given agent/treatment comparison. Size of nodes is proportional to the number of participants randomised to a given agent/treatment. Interventions are coded according the first two letters of the agent and dose (mg/week). So Br 70 is Brodalumab at 70 mg/week, and so forth. See Tables S1 and S2 for the full dataset and list of comparisons made in the studies fitted, these agents were excluded for the purposes of illustrating consistency checking on a multiparameter dose-response function. The final dataset therefore included seven agents and placebo, investigated in 22 RCTs ( Figure S1 and Table S1). Only five of these did not include a placebo arm. Figure 1A,B show the network of evidence at the agent-level and treatment-level (agent and dose combination) respectively. If we were to ignore dose (agent-level network Figure 1A) then we would be likely to introduce heterogeneity and inconsistency, whereas if we were to consider each treatment (dose-agent combination) as having separate relative effects (agent-by-dose network Figure 1B), then the network becomes very sparse giving imprecise estimates, and only allows comparisons to be made between the specific dose-agent combinations that are connected in the network. Dose-response MBNMA achieves more precision by accounting for dose, however we still need to check for inconsistency.
The outcome of interest is the proportion of patients who achieved ≥75% improvement in psoriasis area and sensitivity index (PASI) at 12 weeks follow-up. The dose for each drug is given in total mg/week to account for different dosing regimens. Two studies reported the dose in mg/kg, so the total dose was calculated by multiplying this by the mean body weight of participants in each arm. 13,14 An additional arm in one of these studies that was not included in the original review was also added to our dataset. 14

Dose response model based network meta-analysis (MBNMA)
Mawdsley et al proposed the following framework for dose-response MBNMA. 4 Let t i,k be the treatment given in arm k of study i. Each treatment t is defined according to its dose x t and agent a t , where the agent takes values {A, B, C, … }. Placebo is defined as any agent at zero dose, x t = 0, and it is assumed this is equivalent across agents and denoted as a t = P when x t = 0. However, the inclusion of placebo in the network is not a requirement for the analysis, and these methods can be used in the absence of studies investigating placebo. For ease of notation we will denote x i,k = x t i,k to indicate the dose of the treatment on arm k of study i. Similarly, we denote a i,k = a t i,k . When x and a have two indices they refer to a specific arm and study, and when they have one index this refers to a specific treatment. Each study reports a summary measure for each arm, which, following Dias et al. is modeled using a generalized linear model, where the likelihood is a function of parameter i,k and the MBNMA model is given for an appropriate "linking" function g ( i,k ) . 2,15 For example, the summary measure may be the number of events r i,k out of n i,k randomized individuals, which has a Binomial likelihood with probability of response parameter, i,k , modeled on the logit-scale, so that g ( Other link functions can be used for other likelihoods/outcome measures. 2,15 The modeled outcome on arm 1 (the study reference treatment) of each study, i , is treated as a nuisance parameter, with treatment effects i,k estimated relative to this: The general random effects model is: where for any pair of treatments c and k, d (xc,ac),(xk,ak) is the mean relative effect for treatment k (agent a k dose x k ) compared with treatment c (agent a c dose x c ), and 2 is the between studies variance in relative effects, assumed to be constant across treatment comparisons. Methods have been proposed to estimate separate between studies variances for different treatment comparisons, but these require more studies than are typically available in a NMA. 16 The common effect model is obtained by setting = 0. Trials with more than two arms (multiarm trials) are very common in early phase trials that explore dose-response, but they add complications because they provide estimates of multiple relative effects (compared with arm 1) which will be correlated. Equation (1) needs adjusting as follows, to account for correlations in effects for a trial with K arms: We define a dose-response curve E 0 + f (x, a ) for each agent a, governed by agent-specific parameters, a , where E 0 corresponds to a placebo effect ( Figure 2). Examples of dose-response relationships, E 0 + f (x, a ), include linear, exponential, or Emax models. The choice of an appropriate functional form of dose-response relationship will typically be guided by visual inspection of the data, together with model fit and pharmacological considerations (see model critique and selection). The Emax model is perhaps the most common model used to describe dose-response relationships, and has been shown to be applicable to the majority of compounds and outcome measures. 17,18 The hyperbolic form is given by: where E max,a can be interpreted as the maximum efficacy on the corresponding link scale of agent a relative to placebo, and ED 50,a the dose at which half of the E max,a response has been accrued for agent a. On the logit scale, E max,a would be the log-odds ratio for agent a relative to placebo. A simpler model with only a single dose-response parameter that is commonly used in pharmacometrics is the exponential model, f (x, a ) = a (1 − e −x ), where a is the rate on the corresponding link scale of agent a relative to placebo. 17 In a Bayesian analysis, priors are given to the parameters of the dose-response function (eg, for the Emax model these are: E max,a and ED 50,a , noting that ED 50,a must be positive), the study-specific baseline parameters i and the between study SD . F I G U R E 2 Schematic illustration of dose-response functions for agents A and B, illustrating that the comparison of agent A dose x 2 with agent B dose x 3 adheres to the consistency model (Equation (5)) and thus is informed both by direct and indirect evidence. Using these dose-response functions a comparison between any doses of these is possible by extrapolation/interpolation. represents the response

Consistency equations
At the treatment level we assume consistency of relative effects so that all comparisons can be written in terms of comparisons with the reference treatment 1 (x 1 , a 1 ): d (xc,ac),(xk,ak) = d (x1,a1),(xk,ak) − d (x1,a1),(xc,ac) for treatments (x c , a c ) , (x k , a k ) .
Equations (1) and (4) define the standard NMA model where each agent and dose combination are treated as a distinct treatment (eg, Figure 1B).
Dose-response MBNMA constrains this model by fitting a functional relationship for the dose-response. For a given dose-response function, f (x, a ), the consistency Equations (4) become (noting that the E 0 terms cancel): for treatments (x c , a c ) and (x k , a k ). Note that consistency will therefore depend on the appropriateness of the dose-response function and may hold for one dose-response function but not another. The MBNMA model (1) under the treatment level consistency assumptions for a given dose-response function is therefore: where x i,k is the dose on arm k of study i. Equation (6) needs adjusting for multi arm trials, to account for correlations in effects for a trial with K arms, as follows: where the correlation between within study relative effects is 0.5 under the homogenous variance model. 1 Setting = 0 in Equations (6) and (7) gives the common effect model. For example, if study 1 is a 3-arm study comparing placebo and treatments 1 and 2 which are different doses of agent A, (0, P)vs (x 1 , A) vs (x 2 , A), then study 1 estimates: , whereas if study 2 is a 3-arm study comparing treatment 1 a single dose of agent A with treatments 2 and 3 which are two doses of agent B, ( .

Model critique and selection
The appropriateness of different dose-response functions and the choice of common or random effect models can be assessed statistically by comparing model fit statistics across competing models. In a Bayesian analysis one would compare the posterior mean residual deviance, Dbar (a measure of fit), the effective number of parameters, pD (a measure of complexity), and the deviance information criteria, DIC, (the sum of the measures of Dbar and pD), preferring smaller values, where differences of at least 3 (or even 5) in Dbar and DIC are considered meaningful. 19 In addition to this, the contribution of each data-point to the posterior mean residual deviance can be plotted against dose as a form of residual analysis to identify any missing structure in the dose-response relationship. 4

Checking the consistency assumption in dose-response MBNMA
In this section we describe how the methods for assessing evidence consistency in NMA can be extended to dose-response MBNMA. Note that the consistency equations under a dose-response MBNMA model are given by Equation (5), which depends on the dose-response function f (x, a ). If each agent and dose combination is treated as a distinct treatment in a standard NMA, then the consistency equations are given by Equation (4) which does not depend on the dose-response relationship. To fully explore the consistency assumption in dose-response MBNMA we need to consider the fit of the dose-response model alongside assessment of consistency.

Bucher method to assess consistency in dose-response MBNMA
The simplest approach to assess inconsistency is the Bucher method 6 which can be applied to a triangle of evidence. Suppose we have a simple triangle of evidence comparing placebo, (0, P), agent A at dose x 2 , (x 2 , A), and agent B at dose via the consistency relation for standard NMA . A test for inconsistency is constructed by forming a z-score for the

F I G U R E 3 An illustrative network and dose-response relationships of agents A and B at various doses, and placebo (P) on which the
Bucher test for inconsistency could be used in dose-response MBNMA. Straight black lines connecting points represent comparisons within each study for which direct evidence is available. The network contains at least one study comparing different doses of A and placebo (Study 1), one study comparing different doses of B and placebo (Study 2), and one study comparing (x 2 ,A) vs (x 3 ,B) (Study 3). θ represents the response difference between the direct and indirect estimates. The method requires that the studies that form the direct comparison are different to those that form the indirect comparison. 2,3 The Bucher method can also be applied to larger loops of evidence (eg, a quadrilateral with four treatments), and can also be extended to the case where there are multiple independent loops of evidence (where each loop is based on different sets of studies with the only exception being a single common edge). 2,3 This allows the construction of a global test for inconsistency across all the independent loops. 7 In the context of dose-response MBNMA, the Bucher method can also be applied in the situation where we have the same loop of evidence as in Figure 3, but where the (x 2 , A) vs (0, P) estimate is based on a dose-response MBNMA for a network of at least one study/studies comparing various doses of agent A and placebo (Study A), and similarly the (x 3 , B) vs (0, P) estimate is derived from a dose-response MBNMA for a network of at least one study/studies comparing various doses of agent B and placebo (Study B), and there is a separate study that directly compares (x 2 , A) vs (x 3 , B) (Study C), as illustrated in Figure 3. Separate dose-response models can be fitted for agents A and B to obtain estimates of d (0,P),(x 2 ,A) and , which can then be compared with the direct estimate using the Bucher method. As with NMA, this method can also be used for multiple independent loops of evidence in this format (where each loop is based on different sets of studies with the only exception being a single common edge). For applications of the Bucher method with random effects models the between studies variance parameter should be shared across comparisons. 3,8 This can be achieved by fitting a MBNMA model to the indirect evidence simultaneously with a pairwise meta-analysis for the direct evidence, with a shared parameter for the between studies SD. In practice this requires fitting the node-splitting models described below.
The application of the Bucher method is likely to be of limited use in assessing consistency in dose-response MBNMA because networks of evidence are typically complex with many RCTs with three or more arms, such that loops of evidence are not independent.

3.3.2
Global assessment of inconsistency at treatment-level In NMA the consistency assumption can be relaxed by fitting an inconsistency model where there is a separate parameter for each comparison based only on the direct evidence, whilst sharing the between-studies SD parameter across comparisons. This is known as the unrelated mean effects (UME) model. 2,3 A global test for inconsistency can be obtained by comparing model fit statistics for the inconsistency and consistency models, and inspecting the estimates of the between-studies SD . If is much smaller for the inconsistency model, then this indicates evidence of inconsistency (even in absence of differences between model fit statistics). This UME model can also be fitted in dose-response MBNMA by estimating separate relative effects d (xc,ac),(xk,ak) for each treatment contrast with a shared between studies SD (Equations (1) and (2)), but without making the additional consistency assumption (Equation (5)). 4 By comparing Equations (2) and (7) it can be seen that the UME model relaxes both the dose-response relationship and the consistency assumption at the same time. This means that differences in model fit and heterogeneity estimates could be explained by lack of fit of the functional dose-response function f (x, a ), inconsistency of the evidence sources, or both. Furthermore, inconsistency could mask lack-of-fit of the dose-response relationship to the extent that we detect neither lack-of-fit nor inconsistency (this phenomenon has been described in the network meta-regression literature 20 ). We therefore recommend that three different models are compared: 1. Model 1: The dose-response MBNMA consistency model (Equations (5, 7)), which assumes a dose-response relationship and assumes consistency, 2. Model 2: The treatment-level NMA consistency model (Equations (1, 2, 4)) where each treatment (agent and dose combination) is considered a separate treatment. This model does not assume any dose-response relationship, but does assume consistency, 3. Model 3: The UME model (Equation (2)) which relaxes both the dose-response relationship and the consistency assumption.
Comparing model fit and estimates across these three models allows both the dose-response and consistency assumptions to be checked ( Figure 4). Note that the network may be disconnected at the treatment-level, meaning that estimates for disconnected treatments may not be valid. When this occurs, model fit statistics for a treatment-level NMA (Model 2) can still be compared to the dose-response MBNMA consistency model (Model 1), but only relative effects between connected treatments are valid and can be interpreted. There may also be convergence issues in disconnected networks, in which case treatment-level NMAs may have to be fitted separately to different subnetworks.
To further explore potential causes of lack of fit and inconsistency deviance-deviance plots can be useful. 21 The contribution of each data point to the residual deviance can be plotted for (i) the dose-response MBNMA consistency model against the UME model (Model 1 vs Model 3); (ii) the dose-response MBNMA consistency model against the treatment level NMA consistency (Model 1 vs Model 2) and; (iii) the treatment level NMA consistency model against the UME model (Model 2 vs Model 3). These plots can identify which data-points are leading to lack of fit, inconsistency or both. Note however, that this does not necessarily mean that those data points are "wrong". Inconsistency is a property of loops of evidence and identifying inconsistency in a loop does not tell us which piece(s) of evidence are causing the problem, merely that the evidence within that loop does not agree.

3.3.3
Node-splitting methods for MBNMA We may wish to explore whether there is inconsistency for specific pairwise contrasts. This could be because the global test identified inconsistency and we wish to know which evidence is contributing to this. Alternatively, attention may be focussed on particular contrasts where there is a priori doubt about the consistency assumption, for example due to older studies or differences in study design/populations. In NMA, node-splitting is a method that can be applied to a specific contrast, where the relative effect parameter for that contrast is "split" into two separate parameters estimated by two different sets of evidence. 9 The first is estimated by the direct head-to-head evidence, and the second is estimated indirectly from the NMA model applied to the remaining evidence (ie, excluding the head-to-head evidence for that contrast). The direct and indirect estimates for the contrast can then be compared by computing the difference between them, and model fit can be compared for the "split" model with that from the consistency model.   (1), and giving this parameter a prior distribution rather than applying the consistency relations (Equation (5)). d Ind (xc,ac),(xk,ak) is estimated by applying the consistency relations (Equation (5) Model fit measures and heterogeneity can be compared for the "split" model with that from the consistency model. In addition, a Bayesian p-value can be obtained as 2 × min{Pr(Diff > 0), 1 − Pr(Diff > 0)}. 22 To illustrate, suppose we were interested in node-splitting for (x 1 , A) vs (x 2 , B). A 2-arm study, study 1, which compares (x 1 , A) vs (x 2 , B) estimates: whereas a 3-arm study 2 comparing (0, P)vs (x 3 , A) vs (x 4 , A) contributes to the estimation of the dose-response curve for agent A, estimating: and a 3-arm study 3 comparing (0, P) vs (x 5 , B) vs (x 6 , B) contributes to the estimation of the dose-response curve for agent B, estimating: .
These can be used together to obtain an indirect estimate: Node-splitting requires both direct and indirect evidence to exist for a particular treatment-level contrast. The R-packages gemtc 23 and MBNMAdose 12 include functions for finding comparisons in the "treatment-level" network (ie, Figure 1B for psoriasis example) where both direct and indirect evidence exist. However, in dose-response MBNMA there may be additional treatment-level contrasts where indirect estimates can be obtained via the dose-response relationship. This is because the dose-response model can allow indirect comparisons to be made via interpolation/extrapolation ( Figure 2). Due to the nature of phase-II studies exploring dose-response, we expect to be able to obtain indirect estimates for many of the treatment-level contrasts where we have direct evidence. However, the extent to which indirect estimates will be available via the dose-response relationship will depend on the availability of remaining dose-response information once direct evidence has been removed, and on the complexity of the dose-response relationship fitted. Where indirect estimates are not available, then the model will simply predict a very uncertain estimate or may fail to converge. The inconsistency.loops() function in MBNMAdose can be used to identify contrasts in which both direct and indirect estimates are available, and to indicate whether it arises from pathways of head-to-head evidence (as for node-splitting in a treatment-level network) or only from the dose-response relationship. It also provides the number of doses of each agent in a comparison on which the indirect dose-response relationship must be estimated, which can help identify the complexity of function that could be used to estimate it.
Node-splitting is complicated by the existence of multiarm trials (trials with three or more arms). A K-arm trial provides (K-1) relative effects, but the choice of the "reference" arm 1, determines whether the trial contributes to the direct estimate or not, and this choice is somewhat arbitrary. For example, suppose that we have a 4-arm study i comparing Placebo with agent A at doses x 1 and x 2 and agent B at dose x 3 , and we were interested in node-splitting for (x 2 , A) vs (x 3 , B). If we take Placebo as the reference arm and define three relative effects for each active arm compared to Placebo, then the study does not contribute to the direct estimate for d Dir 24 Following these rules and using Placebo as the reference arm, the relative effects for (x 1 , A) and (x 3 , B) relative to (x 2 , A) in the example above would be discarded since a multiarm trial cannot be inconsistent with itself. However, in dose-response MBNMA there may be inconsistencies within a multiarm trial due to departures from the assumed dose-response relationship. We therefore propose that the following approach be taken for inclusion of multiarm trials in node-splitting for dose-response MBNMA.
1. If the multiarm trial contains the contrast (x c , a c ) vs (x k , a k ) that is being split, then reorder the arms with (x c , a c ) as the reference arm 1 and (x k , a k ) is arm 2, so that the multiarm trial always contributes to d Dir (xc,ac),(xk,ak) if possible. 2. Model all the relative effects from the multiarm trial as usual, but replace the relative effect for i,2 with d Dir (xc,ac),(xk,ak) The covariance structure reflects the fact that the multiarm trial cannot be inconsistent with itself, the trial contributes to the direct estimate for the contrast that is split, and the remainder of the trial arms contribute to the estimation of the dose-response MBNMA model which is used to form the indirect estimate.

Implementation
The models are implemented using the MBNMAdose package version 0.3.1 12 in R version 4.02, which uses Bayesian Markov chain Monte Carlo (MCMC) simulation in JAGS. 25 R code for all analyses can be found on GitHub (https:// github.com/hugaped/Dose-response-MBNMA-consistency). We illustrate a global assessment of consistency and methods for node-splitting at the treatment-level in the psoriasis dataset. We focus our analysis on two different dose-response functions, to demonstrate the impacts of a good fitting (Emax) and a badly fitting (exponential) dose-response model. We also include results for a simple linear dose-response model (very badly fitting), as readers will be most familiar with this. For global testing of inconsistency using UME, we also fit an agent-level NMA that lumps different doses of agents together assuming a common effect. Common and random treatment effect models were fitted and compared. Vague normal prior distributions (N(0, 1000)) were given to direct treatment parameters d Dir (xc,ac),(xk,ak) , nuisance parameters i , and dose-response parameter a i,k for exponential MBNMA models. For Emax MBNMA models, a correlation was modeled between dose-response parameters by assigning them a multivariate normal prior: ( For ED50 a i,k parameters it was necessary to ensure that they only took positive values so prior distributions were specified on the log-scale. Σ is the covariance matrix, and its inverse (the precision matrix) was assigned a minimally informative Wishart distribution as a prior, with 2 • of freedom and a scale matrix Ω = Wishart(Ω, 2)). The between-study SD, , was given a uniform prior distribution (U(0, 5)). Sensitivity of the results to the choice of prior distributions are reported in Supplementary Materials. Models were run for 50 000 burn-in iterations and monitored for 50 000 iterations on 4 chains. Trace plots, autocorrelation and Brooks-Gelman-Rubin plots were used to assess convergence. 26 Unless otherwise stated, results are presented as posterior medians and 95% credible intervals (95% CrI).
All models were compared using DIC, with a difference of three points or greater being considered as meaningful. DIC was calculated using the Kullback-Leibler divergence for estimation of pD. 27

RESULTS
Model fit statistics for all models are reported in Table 1. Heterogeneity was found to be very low when fitting a treatment-level NMA, leading to the selection of a common effect model based on DIC. MCMC chains struggled to converge for between-study SD in the random effect Emax MBNMA, however model fit statistics did converge and indicated that the common effects model gave a lower DIC and model fit was adequate (posterior mean residual deviance 75.6 compared with 70 data points). Lumping doses at the agent-level or using an exponential or linear dose-response resulted in high heterogeneity and the selection of random treatment effect models in all three instances. Among selected models, DIC was higher in those with higher heterogeneity due to the additional number of effective parameters. Both the MBNMA Emax and treatment-level NMA models had similar DIC which was lower than for other models fitted. The slightly lower residual deviance in the treatment-level NMA model was offset by the fewer parameters required in the Emax MBNMA model. This suggested that the Emax dose-response relationship gave an adequate fit to the data. Predicted probabilities of response for common effects treatment-level NMA, random effects agent-level NMA, random effects exponential MBNMA, random effects linear MBNMA, and common effects Emax MBNMA are shown at different doses for each agent in Figure 5.
Points below the line of equality in the deviance-deviance plots for the common effect treatment-level NMA model against each of the other fitted common effect models indicate lack of fit for the dose-response model assumed. For the Emax MBNMA the deviance-deviance plot showed generally good fit, though there was some lack of fit for studies 15 and 16 that compared different doses of Ustekinumab ( Figure 6A). However, they showed poor fit for the exponential MBNMA and the agent-level NMA models, suggesting that these do not explain the dose-response relationship. However, the random effects versions of these models fitted well because the lack of fit is captured as heterogeneity in these models ( Figure S2). Deviance-deviance plots are shown separately for common and random effects linear MBNMA models ( Figure S3). These showed very similar trend to the exponential MBNMA models.

Global assessment of inconsistency
For global assessment of inconsistency, the posterior mean residual deviances were similar for all selected models and the treatment-level UME model that relaxed the consistency assumption (Table 1). However, whilst there was no evidence of F I G U R E 5 Predicted responses for different agents in the network based on results from a common effects treatment-level NMA (thick vertical lines and points), a random effects agent-level NMA (solid horizontal line with red 95% CrI), a common effects Emax MBNMA (dotted curve with blue 95% CrI), a random effects Exponential MBNMA (short dashed curve with green 95% CrI) and a random effects Linear MBNMA (long dashed curve with purple 95% CrI). Predictions were calculated from relative effects applied to a probability of placebo response of 0.05. Predicted responses are shown as posterior medians and 95% CrIs inconsistency for the treatment-level NMA and the Emax MBNMA models based on DIC, for the exponential MBNMA and agent-level NMA the DIC were considerably higher than for the UME model, suggestive of inconsistency in these models driven by the high levels of heterogeneity. Deviance-deviance plots are shown for each model compared to the treatment-level UME ( Figure 6B). Whilst the treatment-level NMA model showed no evidence of substantially poorer fit for any data points, most data points in all other models showed considerably higher contributions to the posterior mean deviance compared to the treatment-level UME than when compared to the treatment-level NMA, which may be suggestive of inconsistency ( Figure 6A), but may also be due to incorrect specification of the dose-response relationship.
There were two data points in particular in the Emax MBNMA which appeared to have a substantially higher contribution to the posterior mean deviance than in the treatment-level UME. These points correspond to studies comparing placebo with different doses of Ustekinumab (study numbers 15 and 16), which were also identified in Figure 6A as having a poorer fit for the dose-response relationship than other data points.
For the exponential MBNMA and agent-level NMA there were a large number of data points for which deviance contributions were higher when common treatment effects models were compared to a common effects UME model ( Figure 6B). However, these differences were no longer discernible when random treatment effects models were compared ( Figure S1), since they were masked by high heterogeneity which resulted in lower posterior mean deviances. This could be identified by examining the reduction in between-study SD and DIC for the random effects UME model compared to the random effects exponential MBNMA and treatment-level NMA models (Table 1). For common and random effects F I G U R E 6 Deviance-deviance plots showing the contribution to the posterior mean deviance for each data point from different common treatment effect models compared to treatment-level NMA (A) and UME (B) models. Comparisons in (B) are to UME models. For better visibility points which contribute up to 15 to the posterior mean deviance are plotted, meaning that 10 data points that contribute >15 to the posterior mean deviance have been excluded from MBNMA exponential and NMA agent-level plots. Red stars represent data points described in the text corresponding to arm 1 in studies 15 and 16 that compare different doses of Ustekinumab. The red diagonal line represents the line of equality, at which deviance is equal in both consistency and UME models Linear MBNMA models, the trend was similar to that observed for common and random effects exponential MBNMA models, but with more extreme differences in deviance contributions from UME models arising from poorer fit of the dose-response function ( Figure S3).
Given that there is poor dose-response fit for these models, it is difficult to further confirm if there is also inconsistency in these models. Node-splitting can be used to provide a local test to further explore the potential inconsistency identified.

Node-splitting
In the psoriasis dataset there was direct evidence for 34 different comparisons. However, by parameterising comparisons in each study as relative effects compared to the study reference arm (such that a 3-arm study is parameterised by two relative effects and a study reference treatment effect) this resulted in 20 different comparisons for which direct evidence was available. Both direct and indirect evidence were available for 4 of these 20 comparisons, meaning that node-splitting was possible for these in both MBNMA and NMA (Table S3 and Figure S4). For these comparisons, indirect estimates made use of evidence from both the consistency relationship via a pathway of direct comparisons, and from the dose-response relationship. For all four comparisons both Emax MBNMA and treatment-level NMA models showed no evidence of inconsistency between direct and indirect evidence (Figure 7). For three of the comparisons (comparisons 1, 2, and 4) direct and indirect estimates from the Emax MBNMA were more similar than in the treatment-level NMA. However, inconsistency was identified in the exponential and linear MBNMAs for the comparison of Ixekizumab 40 mg/wk vs Ixekizumab 20 mg/wk (comparison 1) (Figures 7 and S5). For the other three comparisons (comparisons 2-4) there was no evidence of inconsistency in the exponential or linear MBNMAs.
A further nine comparisons (comparisons 5-13) made use of indirect evidence that was only available via the dose-response relationship, providing additional potential for node-splitting in MBNMA that was not possible in the treatment-level NMA (Table S3 and Figure S4). For estimating indirect evidence via the dose-response relationship, the number of available doses is critical, as this will limit the complexity of the dose-response function that can be fitted. The number of doses for each agent in the comparison are shown in Table S3. To fit an Emax dose-response function for all agents, we would need at least three doses of each agent available in the indirect evidence. Given that evidence for the direct effect is separated from evidence used to estimate the indirect effect, this reduces the information available to estimate each of the effects. This means that fitting an Emax MBNMA to the indirect evidence is only possible for comparisons 1 to 4 (for which indirect evidence is also available via a pathway of direct comparisons, as discussed above) and additionally for Ustekinumab 5.62 mg/wk vs Ixekizumab 30 mg/wk (comparison 5). For this comparison, an Emax MBNMAs did not show any evidence of inconsistency ( Figure 7). However, if we fit an exponential or linear dose-response function, only two doses for each agent in the indirect evidence are needed to estimate dose-response functions for all agents, which means that we can estimate indirect evidence for all nine indirect dose-response comparisons. Out of the nine comparisons that could be analyzed using a single parameter dose-response relationship, inconsistency was identified in two comparisons (comparisons 6-7) that could be analyzed using an exponential MBNMA and one comparison (comparison 10) that could be analyzed using a linear MBNMA (Figures 7 and S5). Notably, the comparisons in which inconsistency was identified for the different MBNMAs were not the same. For other comparisons there was no evidence of inconsistency.
Overall, this suggested no inconsistency in treatment-level NMA or Emax MBNMA, but that fitting an exponential or linear MBNMA could lead to inconsistency in existing node-splits in which indirect evidence was estimated by both the consistency assumption and dose-response relationship, and that this could also lead to inconsistency in additional comparisons for which indirect evidence was only informed by the dose-response relationship.

DISCUSSION
Ignoring the effects of different doses is a common cause of heterogeneity and inconsistency in NMA. Dose-response MBNMA can reduce heterogeneity and inconsistency by modeling a dose-response relationship. However, it is still important to critique model fit and check consistency assumptions where possible. We have presented several methods to assess model consistency both globally and for specific contrasts. The methods were illustrated with a dataset of trials comparing different agents for the treatment of moderate-to-severe psoriasis, which had features that are typical in dose-response MBNMA, comprising of (i) phase 2b F I G U R E 7 Estimates of direct, indirect and overall evidence for each comparison from each model for which node-splitting was possible. Bayesian p-values are given to the right of the plot, which show the agreement between direct and indirect estimates. Interventions are coded according the first two letters of the agent and dose (mg/week) dose-ranging studies where studies have multiple doses of a single agent that are compared with Placebo in multiarm trials, perhaps with a single (licensed) dose of a competitor also included in the study; (ii) phase 3 studies that compare two active agents head to head, each at a single dose. Although in this dataset there were reasonable opportunities for assessing consistency, networks comprised more exclusively of phase 2b dose-ranging studies may have more limited scope for this. The very features of such studies (multiarm trials, comparisons with placebo) are important mechanisms that guard against inconsistency, and so should not be viewed as a problem, but rather as part of the solution to avoid inconsistency in the first place. Only 5 out of 23 studies in the network were 2-arm studies, and the other 18 were multi(≥3)-arm studies comparing multiple doses of one or more agents.
We did not find any evidence of inconsistency in the psoriasis dataset when models were used that fitted the data appropriately (treatment-level NMA, Emax MBNMA), but the Emax MBNMA had the advantage that at several doses relative effects were estimated more precisely. However, when an exponential or linear dose-response relationship was used this led to a high level of heterogeneity and the identification of inconsistency within multiple comparisons in the network. Tests for consistency are inextricably linked to model fit and therefore depend on the dose-response relationship assumed. Thorough assessment of model fit for a range of alternative dose-response relationships is therefore recommended, and we advise selection of the best fitting dose-response function prior to comparing common vs random treatment effect models, as the impacts of an inappropriate dose-response function can be masked by fitting random effect models with resulting high heterogeneity.
In the presence of high heterogeneity (such as in exponential MBNMA, linear MBNMA, and agent-level NMA models) a global assessment by comparison with treatment-level NMA and UME models may fail to clearly identify evidence of inconsistency, since inconsistency can instead manifest as high heterogeneity. 3,28 Node-splitting can be a useful tool for highlighting the difference between these forms of heterogeneity by specifically testing for inconsistency on the set of comparisons for which direct and indirect evidence contributions are available.
In MBNMA node-splitting models, indirect estimates can sometimes be obtained via the dose-response relationship even when a pathway of direct evidence is not available (comparisons 5-13) and indirect estimates are not possible for standard NMA. The dose-response relationship can be used in the same way to estimate effects in otherwise disconnected networks. 5 This does not specifically require the inclusion of placebo data, and the methods can be used in the absence of this, although placebo arms typically provide a lot of information to the dose-response relationship, particularly for parameters relating to the efficacy at lower doses (eg, ED50 a ). In networks with limited evidence at lower doses the indirect estimates via the dose-response relationship may be biased in some circumstances, which may impact the assessment of inconsistency using node-splitting. 5 Simpler dose-response functions provide more potential to obtain indirect estimates via the dose-response relationship and therefore increase the availability of comparisons on which node-splitting is possible, yet if the true underlying dose-response relationship requires a more complex function, fitting a simpler function may introduce bias. This likely explains the inconsistency identified by node-splitting in several comparisons for exponential and linear MBNMA models. Ensuring that comparisons are only split if the indirect evidence contains sufficient information with which to estimate the same dose-response function as used in the MBNMA consistency analysis limits this bias. In the psoriasis example for our selected Emax MBNMA we would therefore in practice only choose to split comparison 5 (Ustekinumab 5.62 mg/wk vs Ixekizumab 30 mg/wk) in addition to the comparisons for which indirect evidence is available via consistency relationships (comparisons 1-4), given the doses available in the dataset for both agents in this comparison.
We have also noted that the effects of inconsistency and poor dose-response fit on deviance contributions for individual data points can be aggregated, leading to substantially higher contributions to the posterior mean deviance in a MBNMA model compared to the treatment-level UME. This does not appear to affect the total fit of the model, suggesting that it is balanced by improved fit of other data points. However, it is important to be aware of the causes of this when inspecting deviance-deviance plots and to identify to what degree the difference is caused by inconsistency vs fit of the dose-response curves. Given that there was no evidence of inconsistency in node-splitting for the Emax MBNMA, the issues identified in deviance-deviance plots for studies 15 and 16 that included Ustekinumab was most likely due to poor dose-response fit rather than inconsistency, and it may be that the dose-response relationship for this agent is different than for other biologics. However, given that the dose-response function for Ustekinumab implied by the treatment-level NMA results would be nonmonotonic ( Figure 5-the posterior median for the relative effect of Ustekinumab 5.62 mg/wk is lower than at both 3.75 and 7.5 mg/wk), there may be some argument to suggest that the results of the Emax MBNMA at these doses are more biologically plausible.
Whilst we describe methods for identifying inconsistency in dose-response MBNMA, we have not yet suggested steps to take if it is found. Given the link between inconsistency and dose-response fit that we have shown here, our first suggestion would be to ensure that the fit of the dose-response function is reasonable. If this is not the source of inconsistency then other causes can be investigated. 3 Inconsistency can arise from the impact of effect modifiers (variables that influence the treatment effect) that differ between comparisons. Impacts of these effect modifiers may also differ within a comparison, resulting in heterogeneity. One common source of heterogeneity/inconsistency is from "lumping" different doses of agents together, and this is likely to be avoided by explicit modeling of the dose-response relationship using MBNMA. However, other possible variables that could be effect modifiers and may differ between comparisons are age, severity at baseline, and previous treatments. 3 Ensuring strictly defined inclusion/exclusion criteria for a systematic review will help limit heterogeneity/inconsistency, as well as early identification of potential effect modifiers prior to commencing synthesis.
In this study, we have restricted attention to parametric functional forms for the dose-response relationship. Hierarchical models, where different doses are assumed exchangeable, have also been proposed, 29,30 including a hierarchical model that constrains a monotonic relationship with dose. 30 The methods described here can extend naturally to nonparametric dose-response functions and some of these models are implemented in the R package MBNMAdose. 12 However, methods for assessing inconsistency in such models have not yet been fully explored.
Design-by-treatment interaction models have been proposed to assess inconsistency in the NMA literature, where the effect from AB studies is distinguished from those from ABC studies, ACD studies, etc. 31 However, these models do not naturally extend to dose-response models, because dose defines both the study design and the explanatory variable for the dose-response model. Furthermore, there are typically lots of different study designs in dose-response MBNMA (as seen for the psoriasis example, Table S2), making the design-by-treatment interaction models infeasible to estimate in this context. Cross validation has been proposed to identify potential outliers, 32,33 but can also be used to explore inconsistency in NMA. 10,21,34 The idea is to omit a subset of the data, fit the consistency model (Equations (7) and (8)) to the remaining data, and then compare the predictions from the model with the observed omitted data. This allows the computation of a cross-validation p-value, which represents the probability that the model predicts a value as, or more, extreme than that observed. If all the evidence on a specific contrast is excluded, then cross-validation becomes a form of test for inconsistency for that contrast.
Although the idea of cross-validation is very similar to node-splitting, the methods differ in how the comparisons are made between the direct and indirect evidence. In node-splitting both direct and indirect estimates of treatment effects are computed and then compared. In cross-validation only an indirect estimate is obtained which is then used to predict the observed data accounting for both sampling error and predictive uncertainty. As a consequence, in node-splitting all evidence can be used to estimate the between studies variance jointly, whereas in cross-validation the omitted data do not contribute to the estimate of the between-studies variance. Because the cross-validation method uses the predictive distribution from the random effects model, and the estimate of the between-studies variance is less certain (due to being based on less data), cross-validation is less likely to detect inconsistency than node-splitting. Because there is often little power to detect inconsistency in NMA, and we wish to be confident that we can identify it if it is present, we prefer the node-splitting approach to cross-validation, although this may result in some false positive findings.
Sparsity of evidence may require additional modeling assumptions to be made, such as exchangeable agent effects (within class) for the Emax a and/or ED50 a parameters, if pharmacologically plausible. 4 Note that such hierarchical models may mask inconsistency which, as mentioned above, can alternatively manifest as heterogeneity. A simulation study to fully explore the ability of the methods to detect inconsistency under a range of modeling assumptions would be useful future work.

CONCLUSIONS
Testing for inconsistency is important when synthesizing direct and indirect evidence to obtain robust relative effects to inform drug-development and policy decisions. We have proposed methods to assess consistency for dose-response MBNMA models and highlight the importance of also considering the fit of the dose-response function as part of this assessment. Poorly fitting dose-response functions are associated with higher heterogeneity and may introduce inconsistency. More complex dose-response functions can allow for better model fit but sufficient data at multiple doses of different agents are required to fit them. This highlights the importance of including phase II dose-finding evidence in systematic reviews, rather than simply focusing on the licensed doses of different agents. Furthermore, inclusion of phase II multiarm placebo-controlled studies protects against inconsistency because these study designs help guard against inconsistency in the first place.