Modelling time‐course relationships with multiple treatments: Model‐based network meta‐analysis for continuous summary outcomes

Background Model‐based meta‐analysis (MBMA) is increasingly used to inform drug‐development decisions by synthesising results from multiple studies to estimate treatment, dose‐response, and time‐course characteristics. Network meta‐analysis (NMA) is used in Health Technology Appraisals for simultaneously comparing effects of multiple treatments, to inform reimbursement decisions. Recently, a framework for dose‐response model‐based network meta‐analysis (MBNMA) has been proposed that combines, often nonlinear, MBMA modelling with the statistically robust properties of NMA. Here, we aim to extend this framework to time‐course models. Methods We propose a Bayesian time‐course MBNMA modelling framework for continuous summary outcomes that allows for nonlinear modelling of multiparameter time‐course functions, accounts for residual correlation between observations, preserves randomisation by modelling relative effects, and allows for testing of inconsistency between direct and indirect evidence on the time‐course parameters. We demonstrate our modelling framework using an illustrative dataset of 23 trials investigating treatments for pain in osteoarthritis. Results Of the time‐course functions that we explored, the Emax model gave the best fit to the data and has biological plausibility. Some simplifying assumptions were needed to identify the ET50, due to few observations at early follow‐up times. Treatment estimates were robust to the inclusion of correlations in the likelihood. Conclusions Time‐course MBNMA provides a statistically robust framework for synthesising evidence on multiple treatments at multiple time points. The use of placebo‐controlled studies in drug‐development means there is limited potential for inconsistency. The methods can inform drug‐development decisions and provide the rigour needed in the reimbursement decision‐making process.


| INTRODUCTION
In drug development, a key decision is whether to proceed to a further clinical trial and if so, which follow-up outcomes to measure and which comparator agent(s) to include. To inform this decision, it is important to understand the onset and duration of action of not just the agent under development, but also competitor agents. In addition, understanding the time-course of pharmaceutical agents is useful for licencing agencies when making decisions on safety and efficacy of a new agent, and for reimbursement agencies making policy decisions based on manufacturer submissions where different studies and agents have reported results at different follow-up times.
Model-based meta-analysis (MBMA) is a technique for synthesising results of multiple studies, usually randomised controlled trials (RCTs), to gain understanding of the pharmacodynamic profile of different agents during drug development. 1 Key characteristics within this profile are the dose-response and time-course relationships of an agent, and how they compare with competitors. MBMA has been used to understand these relationships across studies. [2][3][4] Some MBMAs have used indirect evidence to compare several agents simultaneously. [5][6][7] However, these types of analyses typically model each agent separately, based on pooling individual study arms. Whilst this approach allows the inclusion of nonrandomised and/or single-arm studies and has been proposed in the meta-analysis literature, 8 it breaks the randomisation within RCTs and ignores within-study comparisons, effectively losing the advantages of the RCT design and potentially inducing bias in the resulting estimates. 9 Network meta-analysis (NMA) allows evidence on multiple treatments to be combined provided they form a connected network of treatment comparisons (where treatment refers to a distinct formulation, such as agent and dose combination). [10][11][12][13][14] NMA pools evidence from RCTs in a way that respects the randomisation in the design of the included studies. It strengthens inference by combining all evidence (both direct and indirect) on the comparisons of interest, under the assumption of consistency-agreement between direct and indirect evidence on the same treatment comparison. When conducting an NMA, it is essential to test for inconsistency where possible and consider results accordingly, as the validity of the resulting estimates rests upon the consistency assumption. Methods have been developed that formally test for inconsistency in NMA, 15,16 something which has not previously been possible when making indirect comparisons in MBMA.
However, recently, MBMA has been extended to incorporate a formal consistency framework by combining ideas from NMA and MBMA in the context of dose-response models with multiple agents. 17 This model-based network meta-analysis (MBNMA) approach respects the randomisation in the included RCTs and allows for formal testing of inconsistency in the network of evidence.
Mawdsley et al 17 described their method for doseresponse models with an outcome at a single time-point. In phase II and III trials, there are often multiple follow-up times reported within a study, which represent repeated measures on the same individuals and so will be correlated. This correlation needs to be accounted for either by modelling the time-course, [18][19][20] or with a multivariate likelihood, [21][22][23][24] or both. 2,25 Typically for metaanalysis, only aggregated data are available in published RCTs. This can lead to additional complexities when accounting for correlations between repeated measurements, as the correlation structure may not be known.
Whilst repeated observations over time may be considered discrete observations, from the perspective of drug development, where the focus is on predicting relative efficacy at different time points, it is desirable to estimate a continuous function that describes the relationship between relative effects over time. In this paper, we describe the MBNMA approach with time-course models for repeated observations within the included studies. We also present methods to assess whether the included evidence exhibits inconsistency. Note that in this paper we focus on the time-course relationship only, and so the estimates are obtained for each distinct treatment regimen (agent and dose combination).
The paper is organised as follows. We begin by describing the MBNMA framework to incorporate multiple follow-up times for continuous summary outcomes using models for time-course. We present a general framework, indicating some of the common functional relationships typically seen in practice and accounting for residual correlation between time points. We also outline an approach for model selection and critique and present methods to assess consistency in the time-course setting. We illustrate the methods with an example dataset of treatments for pain in osteoarthritis. Finally, we end with a discussion of the methods in the context of earlier work and indicate areas for further developments.

| Likelihood for continuous summary outcomes
Although individual patient data (IPD) may be available from a manufacturer's own study, it is likely that only aggregate level data from publications are available from other studies. We therefore develop our model at the level of study summaries (eg, mean differences). We assume that for each study we have a continuous summary outcome, such as mean outcome or log-odds of response, y i,k,m , together with standard errors, se i,k,m , reported for each study i, arm k = 1, … ,K i , and at time point m = 1, … ,M i , where study i has K i arms and reports at M i time points and s i,m gives the actual time corresponding to the m th time point in study i. This formulation allows for different studies to report at different times. Typically, m = 1 represents a baseline observation at time s i,1 = 0. The treatment given in study i, arm k, is indicated by t i,k .
Because we have repeated measures from the same individuals within each study, the observations may be correlated, which can be captured with a multivariate normal likelihood: where y i,k is a vector of the observed summary measures over time points, θ i,k is a vector of modelled outcomes, and Σ i,k is an M i × M i covariance matrix: where ρ i;k;m 1 ;m 2 is the within-study correlation between summary measures at time points m 1 and m 2 for study i arm k. In practice, correlations are rarely reported in the literature and will only be available from studies where we have IPD. In addition, the correlations estimated from any available IPD will be at the individual level, which may be different to correlations at the summary level. 26 One approach to deal with unknown within-study correlations at the summary level, if IPD are available or information on this correlation can be obtained from external data, is to assume that the correlations seen between time points for individual patients are the same as those seen for summary measures, and also that correlations in the study for which correlation information is available also apply in the aggregate data trials. 27,28 However, this might lead to ecological bias as there is no guarantee that correlations at the individual level will be the same as at the aggregate level. An alternative is to estimate within-study correlations based on the aggregate data summaries by giving prior distributions to the ρ i;k;m 1 ;m 2 . This approach allows the possibility of using informative prior distributions based on information gained from external data. In order to identify correlation parameters, some constraints will be required, such as assuming a particular covariance structure. For example, a compound symmetry structure can be assumed, in which a single parameter, ρ, is estimated for the correlation between all time points (assumed to be the same across all studies): ρ i;k;m 1 ;m 2 ¼ ρ. Alternatively, it might be more reasonable to assume an autoregressive AR(1) structure in which covariances are dependent on the amount of time between observations where ρ i;k;m 1 ;m 2 ¼ ρ When studies that report mean change from baseline and final values at each time point are included, they can contribute information to ρ i;k;m 1 ;m 2 by modelling separate likelihoods for both pieces of data (see below). 29

| Mean change from baseline by time
Where aggregated summaries are reported as mean change from baseline (baseline corresponding to m = 1), we have summaries for time points m = 2, … , M i defined as y change i;k;m ¼ y i;k;m − y i;k;1 À Á and their standard errors se change i;k;m . Covariances between mean changes from baseline across time-points m 1 and m 2 (dropping the i,k subscripts for ease of exposition) are By writing the model for mean change from baseline in terms of the model for mean outcomes, we can combine studies where some report mean outcomes and some report mean change from baseline by giving each type of data the appropriate likelihood and using a sharedparameter model. 30 If some studies report both mean outcomes and mean change from baseline, then both pieces of data can be included. Modelling both outcomes simultaneously provides sufficient evidence to estimate the correlations, ρ i;k;m 1 ;m 2 . 29

| Time-course model
We put the time-course model on the aggregate-level means: where f defines a functional relationship over time s, and λ i,k = (λ 0,i ,λ 1,i,k ,λ 2,i,k ,…) are a set of parameters that describe the relationship in mean outcomes over time.
In all time-course models, there will be a "nuisance parameter" λ 0,i which represents the "intercept" at time, common across arms. Note that for many time-course models the λ 0,i parameters will cancel out when using change from baseline data. We put our modelling assumptions on the remaining parameters, λ 1,i,k ,λ 2,i,k ,…, leaving the λ 0,i unconstrained (achieved in a Bayesian analysis by giving independent vague prior distributions to the λ 0,i parameters).

| Exponential model
One of the most commonly used models is the exponential model, which has intercept λ 0,i , and a single parameter of interest, λ 1,i,k , which represents the rate at which the mean outcome falls over time, assuming a constant rate of growth/decay:

| Linear model
Another model with a single parameter of interest is the linear model: where λ 0,i is the intercept and λ 1,i,k the fall in mean outcome for a unit increase in time.

| E max model
A functional form commonly used in pharmacometrics, which has two parameters of interest, is the E max model: where the intercept λ 0,i , often referred to as E 0 , is the initial outcome at baseline (time = 0), λ 1,i,k , typically referred to as E max , is the maximum possible effect of a treatment relative to baseline, and λ 2,i,k , typically referred to as ET 50 , is the time point at which 50% of the maximum treatment effect has been achieved.

| Piecewise linear model
Piecewise models can allow for considerable flexibility, though they may not so accurately resemble true biological relationships and may not be appropriate when the intention is to predict values close to where the pieces meet (the "knots"). The simplest example of this is a two-piece linear model with a single knot at s = S: where λ 0,i is the intercept, λ 1,i,k the change in mean outcome for a unit change in time during time period (0,S), and λ 2,i,k the change in mean outcome for a unit change in time during time period after S. The intercept for the second piece (λ 0,i +λ 1,i,k S) ensures that the two regression lines meet at the knot.

| Network meta-analysis model
The NMA model describes the impact of treatments on one or more of the parameters of the time-course model, λ 1,i,k ,λ 2,i,k ,…. If the NMA model is given for a single timemodel parameter, λ 1,i,k , we have for a given link function g which transforms the outcome to a scale where relative treatment effects may be expected to be additive. μ i is the time-course model parameter (on the transformed scale) for arm 1 of study i, and δ i,k the study-specific relative effect for the treatment used in arm k relative to arm 1 of study i. For example, for an exponential time-course model (Equation 2), it would be natural to put the NMA model on the log-scale: where μ i is the log growth/decay rate on arm 1, and δ i,k is the log rate-ratio for treatment arm k compared with treatment arm 1, of study i. The μ i are nuisance parameters and given independent vague prior distributions in a Bayesian analysis to allow these to be unconstrained. By treating these as nuisance parameters, the focus of this modelling strategy is on estimating relative treatment effects rather than on characterising the time-course on the reference treatment (eg, placebo effect). In fact, because different studies may have included different control (arm 1) treatments, the μ i do not have a consistent interpretation across studies.
Treatment effects can be either assumed common ("fixed") or similar/exchangeable ("random") across studies. For the random effects model, study-specific treatment effects are assumed to be normally distributed around a mean treatment effect that adheres to the consistency relationships, with common between-studies variance τ 2 across treatment comparison: The consistency relationships reflect the comparison made between the treatment t i,k used on arm k and the treatment t i,1 used on arm 1 of each study. The fixed effect model is obtained by setting τ 2 = 0. The model estimates "basic parameters" d 1,k , the pooled mean relative effect for treatment k relative to treatment 1 (the reference treatment for the NMA). The d 1,k are each given independent vague normal priors in a Bayesian analysis. All other relative effects for treatment k relative to treatment c, d c,k , can then be derived from the consistency relationships 12,30 : Time-course functions with multiple (nonintercept) timecourse parameters may have NMA models specified for one or more of these parameters, although a relatively rich dataset is required to estimate NMA models with more than one treatment effect parameter. Suppose we expect the treatments to influence two parameters of the time-course model, λ 1,i,k and λ 2,i,k (for example these could represent E max and ET 50 ). The NMA model proceeds as for a single parameter; however, for a random effects model, we need to allow for correlations between the study-specific treatment effects on the two time-course parameters. Note that the link functions g do not have to be the same for the different parameters: The random effects model for δ 1,i,k , δ 2,i,k , etc needs to be multivariate to allow for correlations between relative effects on the different time-course parameters. For example, for the E max model, the E max parameter λ 1,i,k can be positive or negative, and so we can put the model on the natural scale, whereas the ET 50 parameter λ 2,i,k may only take positive values, and so it makes sense to model this on the log scale, giving with a bivariate random effects distribution: where all parameters are as before, with an extra subscript to indicate whether they relate to λ 1,i,k or λ 2,i,k . The correlation between the treatment effects on the two parameters is given by ρ δ . Different parameterisations are available for the between-studies covariance matrix that may be more computationally stable, such as a Cholesky parameterization or a spherical parameterization. 31 Fixed effect models can be obtained by setting the between-study variance parameters to 0.

| Multi-arm trials
When including multi-arm trials, it is important to account for correlation between relative effects within a trial, as all relative effects will have the same comparator.
For a common between-study variance, the correlation between these relative effects will be 0.5. 11 For MBNMAs with a single nonintercept time-course parameter, this can be done either using a multivariate normal distribution to model a vector of random effects, 30 or, for the purposes of writing more generic code, using a conditional univariate distributions formulation for the random effect of arm k > 2, given all arms from 2 to k − 1 32 : When modelling multiple nonintercept time-course parameters, the correlation between relative effects can be modelled simultaneously to the correlation between the parameters ρ δ using a multivariate normal distribution on a vector of random effects, δ i , whose length is equal to the number of parameters multiplied by the number of arms, K i , in study i. For models with two parameters of interest, this is

| Simplifying modelling assumptions
For models with many parameters, there may be insufficient data to be able to estimate all parameters (ie, the parameters may not be identifiable). To aid identifiability when there are two or more non-nuisance parameters, simplifying assumptions can be made to constrain the parameters. In this way, a model can be constrained to be as complex or simple as the data allow, provided there is biological plausibility for any simplification. Note that any shared parameters will be more influenced by studies/treatments with more information (ie, those with more observations within studies). One or more of the following may be considered:

| Fixed effect models
One or more of the modelled parameters could be modelled as a fixed treatment effect, reflecting an assumption of homogeneity where different studies of the same comparison estimate a common effect. So, if there is a fixed effect model on the second parameter, Equation 9 becomes and if there is a fixed effect model on both parameters, Equation 9 becomes

| Class-effect models
Relative treatment effects for one (or more) of the parameters could be assumed to come from a hierarchical model with a common mean, which may depend on class 13,33-35 : For example, in the E max model, it may be that it is reasonable to assume that treatments within the same class might have a similar onset of action (and so have similar, exchangeable treatment effects on ET 50 ) but reach different maximum effects (E max ).
An even more constrained model fixes the treatment effects within a class to be equal:

| Constrain the baseline effect
To further aid identifiability of treatment effects, it may be necessary to impose further constrains on the baseline for one (or more) of the time-course parameters, for example with an exchangeable model: This would imply that the reference treatment effects for each study were assumed to be distributed about a single common mean effect, η 2 , and would therefore only be suitable in networks for which all included trials have the same reference treatment (eg, placebo).

| Reduce to a single treatment effect
For models with multiple (nonintercept) time-course parameters, a further simplification is to only model treatment effects on one of the time-course parameters.
The other parameters are assumed to be treatment independent and modelled on an absolute, rather than relative, scale: The μ 2,i,k could be left unconstrained or assumed exchangeable for each treatment within a class.

| Testing for inconsistency
To test whether the consistency assumption (Equation 7) holds, several approaches have been proposed for identifying inconsistency between direct and indirect evidence that arises within a closed "loop" of treatments for which independent sources of information are available. 15,36 It is important to note that available loops of treatments to test for inconsistency will depend on the choice of reference treatment used in the network. 36,37 In addition, the evidence provided by studies with three or more arms is not independent (due to the common reference arm), and within-study relative effects must be internally consistent. Therefore, loops of evidence consisting only of studies with three or more arms will always be consistent. Furthermore, the choice of reference treatment for a study with three or more arms can affect whether it is possible to test for inconsistency. For example, suppose we have three different studies providing evidence on P (Placebo) vs N (Naproxen 1000 mg/d), P vs C (Celebrex 200 mg/d), and PvsNvsC, respectively. If we take P as the reference for the three-arm trial, then the three-arm trial provides estimates of PvsN and PvsC, so the model only estimates PvsN and PvsC directly (see Figure S3A -Supporting Information). The NvsC effect is derived from the PvsN and PvsC estimates. If, on the other hand, we take N as the reference for the three-arm trial, then the three-arm trial provides estimates of PvsN and NvsC, which together with the two-arm evidence provides independent estimates of PvsN, PvsC, and NvsC, and we can test for inconsistency. We use the convention that we take placebo as reference for all studies that include a placebo arm. In studies without placebo, we use the first drug alphabetically, at its lowest dose.
An unrelated mean effects (UME) model does not include constraints forced by the consistency equations and is equivalent to fitting separate pairwise metaanalyses to each direct comparison whilst sharing treatment-independent parameters across all the comparisons, 16 such as between-study heterogeneity or treatmentindependent time-course parameters. The results from this model can then be compared with those from the MBNMA. A better model fit (lower deviance) or lower standard deviations (SD) for exchangeable parameters or random treatment effects would suggest that inconsistency may be present in the network.
A more explicit method for testing inconsistency for specific comparisons is the node-splitting method. 15 This technique involves splitting the evidence for a given comparison within a loop of treatments into "direct" evidence from head-to-head RCT comparisons and "indirect" evidence that arises from the consistency relationships. A Bayesian P-value can be calculated for the treatment effects estimated using the direct and indirect evidence, which represents the proportion of the two posterior distributions that overlap.
Note that when performing pairwise meta-analyses to estimate direct evidence, the sharing of parameters across direct comparisons for which limited information is available can make tests for inconsistency conservative, and this should be borne in mind when interpreting them.
For models with multiple time-course parameters (Equation 8), it is important to consider that inconsistency may be present for treatment effects on either or both of the time-course parameters.
Donegan et al 38 present inconsistency models to explore consistency on two parameters. We suggest testing for inconsistency on each time-course parameter separately, because if inconsistency is identified in either parameter for a given comparison, then this should be a cause for concern and should prompt further investigation of the included studies to identify the potential cause.
Multiple testing may also be an issue here, as the number of tests in a typical network will be multiplied by the number of time-course parameters in the MBNMA model. However, inconsistency tests are typically underpowered, and we advise erring on the side of caution as it is better to incorrectly identify inconsistency when there is none present than to incorrectly fail to identify inconsistency when true inconsistency is present.

| Treatment ranking for time-course relationships
In NMA, it is common to calculate ranking probabilities (probability of being first best, second best, etc) for each treatment within a network, as this is an easily interpretable measure for decision-makers to use. In time-course MBNMA, we can rank on any function of the time-course model, which can allow for an extremely flexible decision-making framework. This could include ranking based on any one of multiple time-course parameters or ranking on the predicted response at a desired follow-up time. For time-course functions with multiple parameters, note that the ranking of treatment effects may differ for different time-course parameters. For example, we could have a treatment that ranks highest for ET 50 indicating that it acts more quickly than other treatments but ranks lowest for E max indicating that the overall response is lower than for other treatments.
For models with less easily interpretable time-course parameters, it may also be beneficial to have an overall ranking that takes into account the full time-course relationship. Calculating the Area Under the Curve (AUC) for the time-course relationship for each treatment using parameters estimated from the model is a pragmatic way of doing this. However, care must be taken when choosing the duration of time-course over which to calculate AUC, as treatment rankings may be sensitive to this choice.

| Measures of model fit
Models are implemented using a Bayesian approach, and therefore we use the posterior mean of the deviance to compare the goodness-of-fit of the models, 39 where smaller values of deviance are preferred. Model selection is based on the Deviance Information Criterion (DIC) which represents a compromise between model fit and model complexity, 39,40 defined as the sum of the posterior mean deviance (a measure of fit) and the effective number of parameters (a measure of complexity). We use pD calculated using the Kullback-Leibler information divergence as the effective number of parameters. 41 For the selected final model (based on a univariate likelihood), we also report the posterior mean of the residual deviance (defined as the deviance for the model minus the deviance for a saturated model), which can be compared with the number of unconstrained data points to give an overall measure of model fit. Lack of fit is explored by plotting an appropriate posterior summary (median if skewed) of the contribution to the residual deviance for each datapoint against time. Note, we do not compute residual deviances for models with a multivariate likelihood in which the correlation between time points is estimated from the data, as the saturated model is not uniquely defined.

| Model selection strategy
We propose a step-by-step approach for model selection of time-course relationships in MBNMA, recognising that the available evidence may not be sufficient to be able to estimate some of the more complex, but less restrictive, models.

| Illustrative example-Pain in osteoarthritis
The methodology is illustrated using a dataset of RCTs investigating treatments for pain in patients with osteoarthritis. Pain was measured on the Western Ontario and McMaster Universities Arthritis Index (WOMAC) scale 42 and was recorded at multiple time points up to a maximum of 24 weeks. In order to maintain a consistent imputation method for missing data across studies, only those with last observation carried forward analyses (LOCF) were included, as this was the imputation method reported in the majority of papers. Agents with multiple doses were split to form the network of treatments, meaning that each combination of agent and dose was considered to be a separate treatment. Although SDs were typically available at baseline, they were missing for 269 out of 345 observations and were therefore imputed accounting for changes over time using the method of Boucher. 43 We acknowledge that this is a high proportion of data points for which to impute SD, but our aim here is to illustrate the method rather than to provide clinically useful treatment estimates. We note however that in pharmacometrics, SD is not always reported as weighting is often performed using sample size-in practice, we would always recommend that SDs be reported and measured.
The illustrative dataset consists of 23 RCTs comparing 29 treatments. Each study has a median of 3.5 (range: 2-7) follow-up measurements, and all studies use LOCF imputation for analyses. Figure 1 shows the network of comparisons in the data, and Figure 2 shows mean WOMAC pain in each study arm plotted over time for each treatment. The dataset is freely available in the Supporting Information (OsteoarthritisData.csv). 44

| Implementation
Models were estimated using JAGS 4.2.0 (codes in the Supporting Information). All fixed treatment effect models were run on three independent chains for 20 000 iterations following 80 000 burn-in iterations with a thinning parameter of 10. For random treatment effect models, convergence took more iterations-models were run for 500 000 iterations following 400 000 burn-in iterations with a thinning parameter of 15. Gelman's r statistic 45 and visual inspection of the chains were used to assess convergence. Vague normal prior distributions (N(0,1000)) were given to the basic parameters d 1,k , nuisance parameters μ i , and treatment effect-independent time-course parameters. For ET 50 parameters in E max models, it was necessary to ensure that they only took positive values so priors for these were specified on the log-scale. The between-study SD and SDs on exchangeable parameters were given uniform prior distributions (U(0,5)). We inspect the posterior for the SDs to ensure they are not being constrained by the prior. We also assessed sensitivity of results to the prior by fitting the same models using half-normal prior distributions (N(0,6.25)). In models with a multivariate likelihood, ρ was given a uniform prior distribution (U(0,1)) to reflect the belief that outcomes at different time points in the same study are likely to be positively correlated. For bivariate models with two nonintercept parameters, the correlation between these parameters, ρ δ , was given a uniform prior distribution (U (−1,1)). For piecewise time-course models, knot location was selected through trial-and-error by fitting models with different knot values (allowing a minimum of 0.1 weeks between knots in different models) and selecting the value from the best fitting (lowest mean posterior deviance) model. For all treatments, including placebo, visual inspection of the data shows that WOMAC scores decrease over time in a nonlinear fashion, with a rapid decline in pain during the first 1 to 2 weeks that quickly levels out ( Figure 2). This suggests that a simple linear model will not be a good fit for the data, but that exponential, piecewise linear, or E max models may be more suitable. Both exponential and E max models have good biological plausibility and are frequently used for modelling pharmacodynamic properties of drugs. However, the limited number of observations at earlier follow-up times suggests that there may not be enough information to identify parameters that model the rapid decline, such as ET 50 parameters or parameters in the first piece of piecewise models. For two treatments (Duloxetine 90 mg/d and Lumiracoxib (NR-dose not reported)), there is only data from studies with baseline measurement and a single follow-up time compared with placebo, which is insufficient information to be able to identify any nonlinear candidate time-course function. We assumed that these treatments followed the same time-course function (though with different time-course parameter values) as the other treatments in the network, and we consider the impact of this assumption in the discussion.
Step 2. Compare time-course models with univariate likelihood and fixed treatment effects Table 1 shows model fit statistics for linear, exponential, piecewise linear, and E max models (with various simplifying assumptions). As expected, a linear time-course model gave a very poor fit to the data (Equation 3, Figure 3, posterior mean deviance = 6935.2- Table 1). An exponential time-course was also a poor fit (Equation 2, Figure 3, posterior mean deviance = 5856.3- Table 1), as it did not capture the fast rate of decline in WOMAC scores that occurred within the first 2 weeks. A piecewise linear time-course with a knot at week one showed substantially better model fit (Equation 5, Figure 3, posterior mean deviance = −189.3- Table 1) than the linear or exponential models. However, by far, the best fitting time-course appeared to be an E max model (Equation 4, posterior mean deviances less than −441- Table 1). Figure 3 shows that whilst the posterior mean contribution to the deviance displays a pattern for the linear and exponential models (suggesting the time-course is not adequately captured), there is no systematic pattern discernible for the piecewise linear models and best-fitting E max model (see below), and the deviance contributions are much lower for the E max model than the piecewise linear model. It was not possible to fit an E max model with separate treatment effects on the ET 50 parameter, due to the limited number of observations at earlier time points in the dataset. We were therefore obliged to make some simplifying assumptions for ET 50 . We fitted class-effect models  10) with agent-specific means (E max model 1) or with a common mean for all treatments (E max model 3). We also fitted class-effect models with fixed ET 50 treatment effects (Equation 11) equal to agent-specific values (E max model 2) or with an equal value for all treatments (E max model 4). For all of E max models 1 to 4, we were also obliged to further assume an exchangeable model for the reference treatment effect (placebo in all studies), μ ET 50 ;i (Equation 12), to allow estimation.
Model fit was similar for E max models 1 to 4, but the model that assumed an equal treatment effect for ET 50 for all treatments compared with placebo, common across studies (E max model 4), had the lowest DIC.
Step 3. Compare random and fixed treatment effect models for selected time-course model  ). Nonetheless, we provide the JAGS code for this model in the Supporting Information. Based on these results, we select the fixed effects E max model 4.
Step 4. Assess overall fit of selected univariate model The posterior mean residual deviance for E max model 4 was 288.1, which is lower than the number of data points (341), indicating a good fit to the data.
Step 5. Fit the selected time-course model using a multivariate likelihood Accounting for residual correlation between time points using a multivariate likelihood (Equation 1) for E max model 4 gave an estimated correlation of ρ = 0.28 (95%CrI 0.10, 0.41) when using a multivariate compound symmetry covariance structure, and ρ = 0.50 (95%CrI 0.19, 0.65) when using a multivariate AR(1) covariance structure ( Table 2). Figure S1 (Supporting Information) compares the univariate E max model 4 treatment effects with those from the equivalent multivariate specifications with compound symmetry and autoregressive AR(1) covariance structures. Estimates and their 95% CrIs appear to be reasonably consistent between these models, indicating that accounting for correlation leads to only marginal differences in treatment estimates. Although the differences are very slight, it is interesting to note that use of a multivariate likelihood with compound symmetry covariance structure typically leads to increased precision of treatment estimates compared with the univariate likelihood model, whilst use of a multivariate likelihood with AR(1) covariance structure has more of an effect on the point estimate.
Predictions from the selected model (E max model 4) AUC for time-course relationships (E max model 4) Table 3 shows the median rank and their 95%CrI from E max model 4 for E max treatment effects for each treatment, and the AUC for each treatment calculated over 24 weeks follow-up (the maximum latest follow-up in any of the included studies). As ET 50 was constrained to be equal across all treatments, the rankings are only dependent on E max , and therefore E max rankings match the AUC rankings. Etoricoxib 60 mg/d was the highest median ranked treatment for both AUC and E max .

Step 6 Test for inconsistency
In the osteoarthritis dataset, all studies included a placebo arm. Within the contrast-based NMA approach, the relative effects within a study are only estimated for each treatment versus the study reference treatment-it is not necessary to estimate relative effects between FIGURE 3 Median posterior residual deviance contributions over time from univariate fixed treatment effects models with linear, exponential, piecewise linear, and E max (model 4) time-course relationships in the pain in osteoarthritis dataset. Residual deviances closer to 0 indicate a better fitting model. Nonlinearity in these plots indicates that the effect of time has not been properly accounted for. The scales for residual deviance vary between the upper and lower panels nonreference treatments within a multi-arm study because these will be defined by the difference between the relative effects for each treatment versus the study reference, as each study must be internally consistent. Therefore, there were no closed loops of treatments in the network that were made up of independent sources of evidence, and as a result it was not possible to test for inconsistency ( Figure S3A -Supporting Information).
For illustrative purposes, to create a dataset in which it is possible to test for inconsistency, we expanded our inclusion criteria to all studies irrespective of their method of imputation. This added an extra seven studies   Table 1), comparing univariate and multivariate likelihoods, fitted to the osteoarthritis pain dataset. For the exchangeable baseline parameters, standard deviations (SD) are reported as posterior mean SD (95% credible interval). Correlation is reported as posterior mean (95% credible interval)  to create an "augmented dataset" (30 studies in total). The augmented dataset is freely available in the Supporting Information (AugmentedInconsistencyData. csv). 44 One of these additional studies 47 compared Celebrex 200 mg/d, Rofecoxib 25 mg/d, and Naproxen 100 mg/d but not Placebo. This created two loops in the network in which direct and indirect estimates came from independent sources, meaning it was possible to test for inconsistency ( Figure S3B -Supporting Information).
In the augmented dataset, results from the UME model were very similar to the MBNMA model (E max model 4). Posterior mean residual deviance was 374.0 for the UME model compared with −370.3 for the MBNMA model, whilst the between-study SD for the reference treatment effect for ET 50 was almost identical for both UME (0.69 (95%CrI: 0.43, 1.18) and MBNMA models (0.69 (95%CrI: −0.44, 1.18). There is therefore no evidence to invalidate the consistency assumption. A node splitting model was fitted for the two closed loops of treatments in the network that comprised independent data sources, giving two comparisons on which to node split ( Figure S3B -Supporting Information).

| Model checking
For all models for which results are reported, Gelman's r statistic and visual inspection of the chains were indicative of convergence. Convergence diagnostic plots for the final model (E max model 4) and for a nonconverging "ideal" E max model with no simplifying assumptions on ET 50 are given in the Supporting Information. Posterior densities were not constrained or strongly influenced by priors. In the final model (E max model 4), the SD for the exchangeable study reference treatment effect was 0.64 (95%CrI: 0.39, 1.15) when using a uniform (U(0,5)) prior and 0.64 (95%CrI: 0.39, 1.16) when using a half-normal () prior ( Figure S4 -Supporting Information). For the random E max treatment effects model (E max model 5), the between-study SD for random E max treatment effects was 0.09 (95%CrI: 0.00, 0.23) when using either a uniform (U(0,5)) prior or a half-normal () prior, suggesting the choice of prior is not influential.

| DISCUSSION
We have presented a method to pool evidence from trials that form a network of comparisons across multiple treatments, reporting continuous summary outcome measures over multiple time points. The method respects the randomisation in RCTs, can include a variety of different functional forms for the time-course, allows for testing of consistency of the evidence, and demonstrates how a multivariate likelihood can be used to account for residual correlation between time points.
In the pain in osteoarthritis dataset, we found that the E max model provided the best fit and allowed for the greatest degree of flexibility, both in the time-course shape and in the specification of various time-course parameters (E max and ET 50 ). The E max model with the lowest DIC (E max model 4) assumed a fixed effect across all nonplacebo treatments on ET 50 (Equation 12). In this model, we estimated AUC of the time-course function over 24 weeks follow-up and found Etoricoxib 60 mg/d to have the highest ranked AUC.
Whilst WOMAC pain in this dataset was measured as a continuous outcome, MBNMA can also be performed on any data provided it can be summarised as a continuous outcome that can be assumed to have a normal likelihood. This therefore allows for binary data (ie, % respondents at each follow-up time) to be analysed using MBNMA if they are summarised as log-odds and are not near the boundary probabilities (0% or 100%).

| Time-course function
Previous methods for performing NMA on longitudinal data have typically accounted for the dependency between different follow-up times by either modelling an appropriate time-course function 18,19 or using nonparametric approaches to account for responses at each time point. 21,34 To our knowledge, none of the methods for modelling longitudinal continuous data in NMA have described methods for testing inconsistency.
Jansen et al 19 used fractional polynomials to model a nonlinear trend, with an inflated variance for each time point to approximate correlations between observations. These allow for a very flexible shape for modelling the time-course relationship. However, fractional polynomials are difficult to interpret and may not have a clear biological justification, making them less desirable for use in pharmacometric studies, where the objective is to define how the efficacy of a treatment changes over time rather than to simply account for it. For modelling time-course in pharmacometrics, exponential and E max functions are typically used, as these have a biological basis in describing the mechanism of drug action and elimination from the body. 48 Taking this into account, Ding and Fu 18 modelled an exponential shape in NMA and described how their model could be adapted for a sigmoid shape similar to that of an E max function.
Nonparametric models that do not specify a particular time-course relationship have also been proposed, although these do not allow for any interpolation or extrapolation of treatment effects at unmeasured points in time, which makes them less applicable to drug development. Ishak et al 21 used a multivariate likelihood to account for the dependency between different time points. Dakin et al 34 used a saturated model to estimate treatment effects separately for different time bins, although for this method more data are required to reliably estimate the treatment effects for each bin.
One of the strengths of our MBNMA framework is that information on time-course characteristics can be inferred from other treatments or agents by assuming varying degrees of similarity and thus improving identifiability. However, sharing information on timecourse parameters across a network assumes exchangeability, and it is important to be aware of the implications of this assumption and to consider whether it holds across different agents or classes of treatments.

| Correlation between time points
Having selected an appropriate time-course relationship in the pain in osteoarthritis dataset, we found that estimated residual correlation was reasonably high. This was in contrast to our expectation that explicitly modelling the time-course should have generated conditional independence. However, accounting for this correlation had only a slight impact on treatment effect estimates or 95% CrIs. This suggests that for MBNMA where the focus is on summary estimates, whilst accurately characterising the within-study correlation and covariance structure may be important, it is likely to be less critical than accurately characterising the time-course. With regards to the importance of modelling within-study correlation, there is some debate in the literature. A simulation study by Ishak et al 49 suggests that the impact of ignoring withinstudy correlation on treatment estimates may typically be small, even in cases where there has been no specific modelling of a time-course function. However, Riley 50 has shown that this is only the case when between-study variation is large relative to within-study variation, or when there are complete data with only small differences in the within-study covariance matrices across studies. Ahn and French 2 support this position, demonstrating that ignoring correlation in longitudinal MBMA led to inflated residual variance. We are currently performing a simulation study to further examine the relationship between time-course fit and correlation in MBNMA.
Note that the estimated covariance matrix in the multivariate likelihood will depend on the time-course model fitted, and the strength and importance of correlations between time points are also likely to depend considerably on how close together follow-up measurements are in time, with closer measurements expected to be more strongly correlated.
Previous longitudinal MBMA methods that account for correlated residuals have been developed in a frequentist framework, using NONMEM software to allow for modelling of interarm variability in addition to interstudy variability. 2 The authors used an exponential model for the time-course and also accounted for nonlinear dose-response in their model. Although we follow a Bayesian approach and our model is formulated somewhat differently, our multivariate model with a compound symmetry covariance structure is similar to their method. However, the key difference is that our approach respects randomisation and allows for inconsistency testing. Without these features, the methods are unlikely to meet the requirements of reimbursement agencies.

| Modelling assumptions
For our selected E max model in the pain in osteoarthritis dataset (E max model 4), assuming a fixed effect across all nonplacebo treatments for ET 50 , whilst allowing all treatment effects to be different for E max implies that the onset of action is the same for the different treatments relative to placebo, but that treatments differ in the maximal change in outcome achieved. In practice, this might be considered an unusual modelling assumption, as one might expect ET 50 to differ between active treatments, particularly for those acting via different biological pathways. However, the onset for all these treatments was very rapid, and there were insufficient observations at early time points to reliably estimate this. In fact, none of the included studies report WOMAC scores within a week from baseline. Given that this is later than the estimated ET 50 (approximately 0.6 weeks), it is not surprising that this parameter is difficult to estimate. For this example, we would therefore caution against making inferences at very early time points. Using informative prior distributions for the τ 2 ET 50 or for d ET 50 ;1;k parameters for which information is sparse may improve estimation. Information from noncomparative pharmacodynamic studies of different agents may be useful to provide support for specific prior distributions. Another approach may be to incorporate information from other treatments in a more biologically plausible manner, such as by using the dose-response relationships between treatments within an agent. 17 Explicitly modelling the bivariate correlation between E max and ET 50 may in some circumstances also provide additional information to help identify ET 50 and reduce the need for such strong simplifying assumptions. This is likely to be the case when correlation between E max and ET 50 is high. However, in the pain in osteoarthritis example, this still was not sufficient to help identify ET 50 , even when alternative parameterizations for the covariance matrix were used. 31 Within time-course MBNMA, it is necessary to assume the same time-course function for the included set of treatments. Whilst the mean responses over time for most treatments supported the use of an E max function, there were only two observations (baseline and one follow-up measurement) for studies comparing Lumiracoxib (NR-dose not reported) vs Placebo and Duloxetine 90 mg/d vs Placebo.
For Lumiracoxib NR, it was reasonable to assume that the time-course function will be similar to other doses of Lumiracoxib for which there are more observations. However, there are no other doses of Duloxetine to make an equivalent assumption, and as the mechanism of Duloxetine is also different to any other agent in the dataset, it may follow a different time-course function. Yet, as this treatment did not contribute any indirect evidence to the rest of the network (which could induce bias in other treatment estimates if modelled appropriately) and there was no evidence to suggest a different timecourse function would be applicable for this treatment, we feel it is reasonable to assume a similar time-course function to other treatments provided treatment effect estimates for Duloxetine 90 mg/d are interpreted with caution. This case highlights the importance of understanding the underlying pharmacometrics of treatments in the data, and of dialogue between clinicians, pharmacometricians, and analysts.
An additional assumption made in all the E max models due to the inclusion of Lumiracoxib NR and Duloxetine 90 mg/d was that of an exchangeable placebo (Equation 12), as the lack of multiple follow-up measurements made separate estimation of all three parameters for the time-course (E 0 , ET 50 , and E max ) impossible for these comparisons. This is likely to have caused a certain degree of shrinkage and may therefore induce bias in treatment estimates for ET 50 due to back-propagation of the information on the reference treatment. 9

| Inconsistency
In NMA, two approaches are typically used for dealing with longitudinal studies. A single consistent time point may be used for analysis across studies, ignoring evidence from other time points. Alternatively, the final time point from studies with different follow-up times may be "lumped" together to allow for networks to be connected, yet this lumping can often be a source of inconsistency and/or heterogeneity. 16 Whilst MBNMA solves the issue of lumping together studies with different follow-up times or discarding information on multiple time points, the choice of model will affect the presence of inconsistency. We suspect that a poorly fitting time-course model may induce inconsistency. It is therefore important to explore different functional forms and identify a good model before testing for inconsistency.
For the purposes of drug development, the potential for inconsistency testing in MBNMA may in fact be relatively rare. The typical design of Phase II trials is multiarm placebo-controlled, meaning that there are no closed loops of treatments that are not made up of multi-arm trials (as in the illustrative osteoarthritis dataset). As these trials must inherently be internally consistent, this provides no means to test for inconsistency. However, we are still relying on the consistency assumptions to make indirect comparisons, so although in these cases we cannot formally test for inconsistency it is important to consider whether these assumptions are valid.
For the pain in osteoarthritis augmented dataset, we did not find any evidence of inconsistency in the augmented dataset when including non-LOCF studies, and parameter estimates were robust to their inclusion. In practice, we would recommend careful consideration of inclusion criteria to ensure only studies on which the consistency assumption is expected to hold are included. 13 It is worth noting that a standard NMA performed by "lumping" the latest time point in each of the studies (an approach frequently used but not one that we would recommend) highlights the benefit of performing MBNMA when dealing with different follow-up times. In terms of DIC, a random treatment effects NMA was preferred over a fixed treatment effects NMA, with a nonzero between-study SD (0.25 (95%CrI: 0.17, 0.36)). A Bayesian P-value for the node-split of Celebrex 200 mg/d vs Naproxen 1000 mg/d was 0.011 and for Celebrex 200 mg/ d vs Rofecoxib 25 mg/d was 0.100 suggesting reasonable evidence of inconsistency in both comparisons that was particularly concerning given that the direct and indirect evidence for both showed opposite directions of effect. Therefore, by accounting for time-course using MBNMA, we have explained heterogeneity and inconsistency that can arise when using standard NMA methodology.

| Limitations
There are a few limitations to the methodology that we seek to investigate further in simulation studies. The first is that the quantity of data required for MBNMA may be significant, particularly for more complex time-course functions. This therefore means that analyses may typically require strong simplifying assumptions within the modelling that are difficult to test. This could relate to the assumption that time-course functions are the same across all treatments within the network, but also to the need to interpolate or extrapolate over the time-course when few observations are available for a particular treatment.
We also do not fully understand the importance of correctly accounting for the correlation between time points, and how failure to do this might affect estimates for different parameters in the model. We believe that the impact of this is likely to depend on the key parameters of interest in the model. Simulation can help to shed light on this issue.

| Future direction
In future work, we plan to incorporate simultaneous doseresponse and time-course modelling into the MBNMA framework, 17 and to develop simulation studies to explore the robustness and data requirements of dose-response, time-course, and methods for assessing inconsistency in MBNMA. We are also developing an R package for MBNMA to facilitate its implementation.

| CONCLUSION
MBNMA combines the strengths of both MBMA and NMA, leading to a statistically robust framework for synthesising evidence on multiple treatments at multiple time points whilst preserving randomisation and allowing for assessment of consistency. By unifying these statistical techniques, the methods can provide both the information needed to inform drug-development decisions, and also the rigour required by reimbursement agencies to incorporate valuable evidence from drug development into the decision-making process.

| Highlights
Within drug development, MBMA is increasingly used to inform drug decisions such as whether to proceed to further clinical trials, and if so, what the design of the study should be. However, these types of analyses typically model each drug separately by pooling individual study arms, which breaks the randomisation and ignores within-study comparisons, effectively losing the advantages of the RCT design and potentially inducing bias in the resulting estimates.
Our MBNMA framework preserves randomisation by modelling relative effects and allows for testing of inconsistency between direct and indirect evidence. This manuscript extends previous methodology on dose-response MBNMA to allow the modelling of nonlinear time-course characteristics, incorporating multiple study time points and accounting for correlation between them. MBNMA combines techniques from two different disciplines, pharmacometrics and evidence synthesis, thereby acting as a bridge between early phase clinical research and Health Technology Appraisal. Glossary Agent = an intervention/compound/drug Identifiability = the capacity for parameters in a model to be reliably estimated Treatment = a specific dose and agent combination.