A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects

Random-effects meta-analyses are very commonly used in medical statistics. Recent methodological developments include multivariate (multiple outcomes) and network (multiple treatments) meta-analysis. Here we provide a new model and corresponding estimation procedure for multivariate network meta-analysis, so that multiple outcomes and treatments can be included in a single analysis. Our new multivariate model is a direct extension of a univariate model for network meta-analysis that has recently been proposed. We allow two types of unknown variance parameters in our model, which represent between-study heterogeneity and inconsistency. Inconsistency arises when different forms of direct and indirect evidence are not in agreement, even having taken between-study heterogeneity into account. However the consistency assumption is often assumed in practice and so we also explain how to fit a reduced model which makes this assumption. Our estimation method extends several other commonly used methods for meta-analysis, including the method proposed by DerSimonian and Laird (1986). We investigate the use of our proposed methods in the context of a real example.


Introduction
Meta-analysis, the statistical process of pooling the results from separate studies, is commonly used in medical statistics and now requires little introduction. The univariate random-effects model is often used for this purpose. This model has recently been extended to the multivariate (multiple outcomes; Jackson et al., 2011) and network (multiple treatments; Lu and Ades, 2004) meta-analysis settings. In a network meta-analysis, more than two treatments are included in the same analysis. The main advantage of network meta-analysis is that, by using indirect information contained in the network, more precise and coherent inference is possible, especially when direct evidence for particular treatment comparisons is limited. Here, we describe a new model that extends the random-effects modeling framework to the multivariate network meta-analysis setting, so that both multiple outcomes and multiple treatments may be included in the same analysis.
Other multivariate extensions of univariate methods for network meta-analysis have previously been proposed. For example, Achana et al. (2014) analyze multiple correlated outcomes in multi-arm studies in public health. Efthimiou et al. (2014) propose a model for the joint modeling of odds ratios on multiple endpoints. Efthimiou et al. (2015) develop another model that is a network extension of an alternative multivariate meta-analytic model that was originally proposed by Riley et al. (2008). A network meta-analysis of multiple outcomes with individual patient data has also been proposed by Hong et al. (2015) under both contrast-based and armbased parameterizations, and Hong et al. (2016) develop a Bayesian framework for multivariate network meta-analysis. These multivariate network meta-analysis models are based on the assumption of consistency in the network, extending the approach introduced by Lu and Ades (2004). In contrast to these previously developed methods, the method proposed here relaxes the consistency assumption. This assumption is sometimes found to be false across the entire network (Veroniki et al., 2013). We model the inconsistency using a design-by-treatment interaction, so that different forms of direct and indirect evidence may not agree, even after taking between-study heterogeneity into account. However, we assume that the design-by-interaction terms follow normal distributions, and so conceptualize inconsistency as another source of random variation. This allows us to achieve the dual aim of estimating meaningful treatment effects while also allowing for inconsistency in the network.
Although we allow inconsistency in the network, we propose a relatively simple model. Our preference for a simple model is because the between-study covariance structure is typically hard to identify accurately in multivariate meta-analyses (Jackson et al., 2011) and also because network meta-analysis datasets are usually small (Nikolakopoulou et al., 2014). The new model that we propose for multivariate network meta analysis is a direct generalization of the univariate network meta-analysis model proposed by Jackson et al. (2016), which is a particular form of the design-by-treatment interaction model (Higgins et al., 2012). In addition to proposing a new model for multivariate network meta-analysis, we also develop a corresponding new estimation method. This estimation method is based on the method of moments and extends a wide variety of related methods. In particular, we extend the estimation method described by DerSimonian and Laird (1986) by directly extending the matrix based extension of DerSimonian and Laird's estimation method for multivariate meta-analysis (Chen et al., 2012;Jackson et al., 2013). We adopt the usual two-stage approach to meta-analysis, where the estimated study-specific treatment effects (including the within-study covariance matrices) are computed in the first stage. We give some information about how this first stage is performed but our focus is the second stage, where the meta-analysis model is fitted.
The article is set out as follows. In Section 2, we briefly describe the univariate model for network meta-analysis to motivate our new multivariate network meta-analysis model in Section 3. We present our new estimation method in Section 4 and we apply our methods to a real dataset in Section 5. We conclude with a short discussion in Section 6.

A Univariate Network Meta-Analysis Model
Here, we describe our univariate modeling framework for network meta-analysis Law et al., 2016). Without loss of generality, we take treatment A as the reference treatment for the network meta-analysis. The other treatments are B, C, etc. We take the design d as referring only to the set of treatments compared in a study. For example, if the first design compares treatments A and B only, then d = 1 refers to two-arm studies that compare these two treatments. We define t to be the total number of treatments included in the network, and t d to be the number of treatments included in design d. We define D to be the number of different designs, N d to be the number of studies of design d, and N = D d=1 N d to be the total number of studies. We will use the word "contrast" to refer to a particular treatment comparison or effect in a particular study, for example, the "AB contrast" in the first study.
We model the estimated relative treatment effects, rather than the average outcomes in each arm, and so perform contrast based analyses. We define Y di to be the c d × 1 column vector of estimated relative treatment effects from the ith study of design d, where c d = t d − 1. We define n d = N d c d to be the total number of estimated treatment effects that design d contributes to the analysis, and n = D d=1 n d to be the total number of estimated treatment effects that contribute to the analysis. To specify the outcome data Y di , we choose a baseline treatment in each design d. The entries of Y di are then the estimated effects of the other c d treatments included in design d relative to this baseline treatment. For example, if we take d = 2 to indicate the "CDE design" then c 2 = 2. Taking C as the baseline treatment for this design, the two entries of the Y 2i vectors are the estimated relative effects of treatment D compared to C and of treatment E compared to C. For example, the entries of the Y di could be estimated log-odds ratios or mean differences.
We use normal approximations for the within-study distributions. We define S di to be the c d × c d within-study covariance matrix corresponding to Y di . We treat all S di as fixed and known in analysis. Ignoring the uncertainty in the S di is acceptable provided that the studies are reasonably large and is conventional in meta-analysis, but this approximation is motivated by pragmatic considerations because this greatly simplifies the modeling. We do not impose any constraints on the form of S di other than they must be valid covariance matrices. The lead diagonal entries of the S di are within-study variances that can be calculated using standard methods. Assuming that the studies are composed of independent samples for each treatment, the other entries of the S di are calculated as the variance of the average outcome (for example the log odds or the sample mean) of the baseline treatment.
We define δ AB 1 , δ AC 1 , · · · , δ AZ 1 , where Z is the final treatment in the network, to be treatment effects relative to the reference treatment A, and call them basic parameters (Lu and Ades, 2006). We use the subscript 1 when defining the basic parameters to emphasize that they are treatment effects for the first (and in this section, only) outcome. We define c = t − 1 to be the number of basic parameters in the univariate setting. Treatment effects not involving A can be obtained as linear combinations of the basic parameters and are referred to as functional parameters (Lu and Ades, 2006). For example, the average treatment effect of treatment E to treatment C, δ CE 1 = δ AE 1 − δ AC 1 , is a functional parameter. We define the c × 1 column vector δ = (δ AB 1 , δ AC 1 , · · · , δ AZ 1 ) T and design specific c d × c design matrices Z (d) . We use the subscript (d) in these design matrices to emphasis that they apply to each individual study of design d; we reserve the subscript d for design matrices that describe regression models for all outcome data from this design. If the ith entry of the Y di are estimated treatment effects of treatment J relative to the reference treatment A then the ith row of Z (d) contains a single nonzero entry: 1 in the (j − 1)th column, where j is the position of J in the alphabet. If instead the ith entry of the Y di are estimated treatment effects of treatment J relative to treatment K, K = A, then the ith row of Z (d) contains two nonzero entries: 1 in the (j − 1)th column and −1 in the (k − 1)th column.
Our univariate model for network meta-analysis is , all di , d , and di are independent, and P c d is the c d × c d matrix with ones on the leading diagonal and halves elsewhere. We refer to τ 2 β and τ 2 ω as the between-study variance, and the inconsistency variance, respectively. The term di is a study-by-treatment interaction term that models betweenstudy heterogeneity. The model di ∼ N(0, τ 2 β P c d ) implies that the heterogeneity variance is the same for all contrasts for every study regardless of whether or not the comparison is relative to the baseline treatment (Lu and Ades, 2004). Other simple choices of P c d , such as allowing the off-diagonal entries to differ from 0.5, violate this symmetry between treatments. For example, in the case d = 2 indicating the CDE design, the between-study variances for the CD and CE effects in this study are given by the two diagonal entries of τ 2 β P c d , which are both τ 2 β . The between-study variance for the effect of E relative to D is (−1, 1)τ 2 β P c d (−1, 1) T , which is also τ 2 β . The d are design-by-treatment interaction terms that model inconsistency in the network. The model d ∼ N(0, τ 2 ω P c d ) implies that the inconsistency variance is the same for all contrasts for every design; other simple choices of P c d violate this symmetry.
To describe all estimates from all studies, we stack the Y di from the same design to form the n d × 1 column vector Jackson et al. (2016) then use three further matrices that we also define here because they will be required to describe the estimation procedure that follows. The matrix M 1 is defined as a n × n square matrix where m 1ij = 0 if the ith and jth entries of Y , i, j = 1, · · · n, are estimates from different studies; otherwise m 1ii = 1, and m 1ij = 1/2 for i = j. The matrix M 2 is defined as a n × n square matrix where m 2ij = 0 if the ith and jth entries of Y , i, j = 1, · · · n, are estimates from different designs; otherwise m 2ij = 1 if the ith and jth entries of Y are estimates of the same treatment comparison (for example, treatment A compared to treatment B) and m 2ij = 1/2 if these entries are estimates of different treatment comparisons. The supplementary materials show a concrete example showing how these two matrices are formed. Jackson et al. (2016) also define a n × c univariate design matrix Z, where if the ith entry of Y is an estimated treatment effect of treatment J relative to the reference treatment A then the ith row of Z contains a single nonzero entry: 1 in the (j − 1)th column, where j is the position of J in the alphabet. If instead the ith entry Y is an estimated treatment effect of treatment J relative to treatment K, K = A, then the ith row of Z contains two nonzero entries: 1 in the (j − 1)th column and −1 in the (k − 1)th column. Defining S d = diag(S d1 , · · · , S dN d ), and then S = diag(S 1 , · · · , S D ), model (1) can be presented for the entire dataset as

A Multivariate Network Meta-Analysis Model
We now explain how to extend the univariate model in Section 2 to the multivariate setting to handle multiple outcomes. We define p to be the number of outcomes, and so the dimension of the network meta-analysis, so that we now consider the case where p > 1. The Y di are now pc d × 1 column vectors, where the Y di contain c d column vectors of length p. For example, in a p = 5 dimensional meta-analysis and continuing with the example where d = 2 indicates the CDE design, we have c 2 = 2. The Y 2i are then 10 × 1 column vectors where, taking C as the baseline treatment for this design, the first five entries of the Y 2i are estimated relative treatment effect of D compared to C and the second five entries are the same estimate of E compared to C. We define the pc × 1 column vector δ = . . , δ AZ p ) T , so that this vector contains the basic parameters for each outcome in turn. When p = 1 the vector δ reduces to its definition in the univariate setting, as given in Section 2.
We define β and ω to be p × p unstructured covariance matrices that are multivariate generalizations of τ 2 β and τ 2 ω . These two matrices contain the between-study variances and covariances, and the inconsistency variances and covariances, respectively, for all p outcomes. We refer to β and ω as the between-study covariance matrix, and the inconsistency covariance matrix, respectively. We continue to treat the within-study covariance matrices S di as if fixed and known in analysis but these are now pc d × pc d matrices. The entries of the S di matrices that describe the covariance of estimated treatment effects for the same outcome can be obtained as in the univariate setting. However, the other entries of S di , that describe the covariance between treatment effects for different outcomes, are harder to obtain in practice. A variety of strategies for dealing with this difficulty have been proposed (Jackson et al., 2011;Wei and Higgins, 2013).

The Proposed Multivariate Model for Network
Meta-Analysis In the multivariate setting, to allow correlations between estimated treatment effects for different outcomes, both within studies and designs, we propose that model (1) is generalized to , where all di , d , and di are independent, and ⊗ is the Kronecker product. The random di and d continue to model between-study heterogeneity, and inconsistency, respectively. Recalling that δ contains the basic parameters for each outcome in turn, the design matrices X (d) provide the correct linear combinations of basic parameters to describe the mean of all estimated treatment effects in Y di . Model (2) reduces to model (1) in one-dimension. The definition of P c d means that β and ω are the between-study covariance matrix, and inconsistency covariance matrix, for all contrasts. We continue define Y as in the univariate setting, where Y contains n column vectors of estimated treatment effects that are of length p, so that Y is a np × 1 column vector in the multivariate setting. We define the multivari- (2) can be presented for the entire dataset as where we continue to define S as in the univariate case. Matrices M 1 and M 2 are the same as in the univariate setting, and so continue to be n × n matrices. Model (3) is a linear mixed model for network meta-analysis and is conceptually similar to other models of this type (Piepho et al., 2012).
If ω = 0 then all d = 0 and there is no inconsistency; we refer to this reduced model as the "consistent model." If both β = 0 and ω = 0 then all studies estimate the same effects to within-study sampling error and we refer to this model as the "common-effect and consistent model." Missing data (unobserved entries of Y) are common in applications as not all studies may provide data for all outcomes and contrasts. When there are missing outcome data, the model for the observed data is the marginal model for the observed data implied by (3), where any rows of Y that contain missing values are discarded. We will use a non-likelihood based approach for making inferences and so assume any data are missing completely at random (Seaman et al., 2013). We define the diagonal np × np missing indicator matrix R, where

Multivariate Estimation: A New Method of Moments
Our estimation procedure is motivated by the univariate method proposed by DerSimonian and Laird (1986). This was developed in the much simpler setting where each study provides a single estimate Y i , and where the random-effects is the pooled estimate under the commoneffect model (τ 2 = 0). Now consider an alternative representation of this Q statistic. Taking Y = (Y 1 , · · · , Y n ) T , S = diag(S 1 , · · · , S n ), and W = S −1 means that Q = tr(W(Y −Ŷ)(Y −Ŷ) T ), whereŶ is obtained under the common-effect model. To obtain a p × p matrix generalization of Q for multivariate analyses, we replace the trace operator with the block trace operator in this expression . The block trace operator is a generalization of the trace that sums over all n of the p × p matrices along the main block diagonal of an np × np matrix. This produces a p × p matrix. In the absence of missing data we can write our multivariate generalization of the Q statistic, btr(W(Y −Ŷ)(Y −Ŷ) T ), as a weighted sum of outer products of p × 1 vectors of residuals under the common-effect and consistent model. Hence, the distribution of btr(W(Y −Ŷ)(Y −Ŷ) T ) depends directly on the magnitudes of unknown variance components.

A Q Matrix for Multivariate Network Meta-Analysis
We define a within-study precision matrix W corresponding to S. If there are no missing outcome data in Y then we define W = S −1 , where S is taken from model (3). If there are missing data in Y then the entries of W that correspond to observed data are obtained as the inverse of the corresponding entries of the within-study covariance matrix of reduced dimension (equal to that of the observed data) and the other entries of W are set to zero. For example, consider the case where Y is a 6 × 1 vector but only the second and fifth entries are observed; this corresponds to much less outcome data than would be used in practice but provides an especially simple example. Then we define S r , where the subscript r indicates a dimension reduction, as a 2 × 2 matrix whose entries are the within-study variances and covariances of the two observed entries of Y. The 6 × 6 precision matrix W then has all zero entries in the first, third, fourth, and sixth rows and columns. However, the remaining entries of W are the entries of the 2 × 2 matrix S −1 r , so that W 22 = (S −1 r ) 11 , W 25 = (S −1 r ) 12 , W 52 = (S −1 r ) 21 , and W 55 = (S −1 r ) 22 . We defineŶ to be the fitted value of Y under the common-effect and consistent model ( β = ω = 0), so thatŶ = HY where H = X(X T WX) −1 X T W. We also define an asymmetric np × np matrix Our definitions of W and R mean that WR = W, which results in the simplified version of Q in (4). From the first form given in (4), we have that the residuals Y −Ŷ are pre-multiplied by R, so that any residuals that correspond to missing outcome data do not contribute to Q. Furthermore, missing outcome data do not contribute toŶ because they have no weight under the common-effect and consistent model. Hence, we can impute missing outcome data with any finite value without changing the value of Q. This is merely a convenient way to handle missing data numerically and has no implications for the statistical modeling.

Design Specific Q Matrices for Multivariate Network Meta-Analysis
In order to identify the full model, we will require designspecific versions of Q that only use data from a particular design. As in the univariate setting, we stack the outcome data from design d to form the vector In the multivariate setting, the vector Y d contains n d estimated effects each of length p, so that Y d is now a pn d × 1 column vector. We define the design specific n d × n d matrix M d 1 , where m d 1ij = 0 if the ith and jth estimated effect (of length p) in Y d , i, j = 1, · · · , n d , are from separate studies; otherwise m d 1ii = 1 and m d 1ij = 1/2 for i = j. We define the pn d × pc d design matrix X d which is obtained by stacking identity matrices of dimension pc d , where we include one such identity matrix for each study of design d. Hence, X d = 1 N d ⊗ I pc d , where 1 N d is the N d × 1 column vector where every entry is one. We also define the pc d × 1 column vector An identifiable design-specific marginal model for outcome data from design d only, that is implied by model (2), is where S d = diag(S d1 , · · · , S dN d ). We can also calculate design specific versions of (4) where we calculate all quantities, including the fitted values, using just the data from studies of design d. We define these pn d × pn d design specific matrices as where W d , R d , andŶ d in (6) are defined in the same way as W, R andŶ in (4) but where only data from design d are used. Hence, R d and W d are the missing indicator matrix, and the within-study precision matrix, of Y d , respectively. We When computing H d we take the matrix inverse to be the Moore-Penrose pseudoinverse (Searle, 1971). This is so that any design-specific regression corresponding to this hat matrix that is not fully identifiable (due to missing outcome data) can still contribute to the estimation. We use model (5) to derive the properties of Q d in equation (6).

The Estimating Equations
We base our estimation on the two p × p matrices btr(Q) and D d=1 btr(Q d ), where Q and Q d are given in (4) and (6), respectively. Specifically, we match these quantities to their expectations to estimate the unknown variance parameters using the method of moments.

Evaluating E[btr(Q)] and deriving the first estimating equation.
We define A = (I np − H) T W and B = (I np − H) T R, which are known np × np matrices. We also divide the matrices A and B into n 2 blocks of p × p matrices, and write A i,j and B i,j , i, j = 1, · · · n, to mean the ith by jth blocks of A and B respectively. Hence, A i,j and B i,j are both p × p matrices. In the supplementary materials, we show that We apply the vec(·) operator to both sides of the previous equation and use the identity vec(AXB) = (B T ⊗ A)vec(X) (see Henderson and Searle, 1981) Upon substituting E[btr(Q)] = btr(Q), β =ˆ β and ω =ˆ ω in equation (7), the method of moments gives one estimating equation in the vectorized form of two unknown covariance matrices.

Evaluating E[btr(Q d )]
and deriving the second estimating equation. Model (5) depends upon one unknown covariance matrix, β . The intuition is that, upon using all D of the Q d matrices in (6) and the method of moments to estimate β , we will then be able to estimate the other unknown covariance matrix ω using the first estimating equation. We define design specific btr(Q d ) and β =ˆ β in (8), we obtain a second estimating equation from the method of moments.

Solving the Estimating Equations and Performing Inference
We solve the estimating equation resulting from (8) for vec(ˆ β ) and substitute this estimate into the estimating equation resulting from (7) and solve for vec(ˆ ω ).

4.4.1.
Estimating Σ β under the consistent model. Some applied analysts may prefer to assume the consistent model ( ω = 0). As in the univariate case , we have two possible ways of estimating β under the consistent model: we can use the estimating equation resulting from (7) with ω = 0 or the estimating equation resulting from (8) as in the full model. Also as in the univariate case, we suggest the former option because it uses the information made by assuming consistency when estimating β . However, this first option is valid only under the consistent model.

"
Truncating" the estimates of the unknown covariance matrices so that they are symmetric and positive semi-definite. As in the univariate case, there is the problem that the point estimates of the two unknown covariance matrices are not necessarily positive semi-definite. The method of moments does not even initially enforce the constraint that the point estimates of the unknown covariance matrices are symmetrical (Chen et al., 2012;Jackson et al., 2013). We produce symmetric estimators corresponding to an estimated covariance matrix ofˆ as (ˆ T +ˆ )/2 (Chen et al., 2012;Jackson et al., 2013). This also corresponds to taking the average of estimates that result from our Q and Q d matrices and their transposes . We then write these symmetric estimators in terms of their spectral decomposition (Chen et al., 2012;Jackson et al., 2013) and truncate any negative eigenvalues to zero to provide the final symmetric positive semi-definite estimated covariance matrices. Specifically, we define the truncated estimate corresponding to the symmetricalˆ as where λ i is the ith eigenvalue of the symmetricˆ and e i is the corresponding normalized eigenvector.

Inference for δ.
Inference for δ then proceeds as a weighted regression where all weights are treated as fixed and known. WritingV as the estimated variance of Y in (3), in the absence of missing outcome data we haveδ = (X TV−1 X) −1 X TV−1 Y where Var(δ) = (X TV−1 X) −1 . In the presence of missing data we can, under our missing completely at random assumption, apply these standard formulae for weighted regression to the observed outcomes. Alternatively and equivalently, we can impute the missing outcome data in Y with an arbitrary value and replaceV −1 with the precision matrix corresponding toV, calculated in the way explained for S in Section 4.1 (Jackson et al., 2011). Approximate confidence intervals and hypothesis tests for all basic parameters for all outcomes then immediately follow by takingδ to be approximately normally distributed. Inferences for functional parameters follow by taking appropriate linear combinations ofδ.

Special Cases of the Estimation Procedure
In the supplementary materials, we show that the proposed method reduces to two previous methods in special cases. If all studies are two arm studies and consistency is assumed then the proposed method reduces to the matrix based method for multivariate meta-regression . The proposed multivariate method reduces to the univariate DerSimonian and Laird method for network meta-analysis  when p = 1.

Model Identification
If the necessary standard matrix inversions resulting from the estimating equations from (7) and (8) cannot be performed then both unknown variance components cannot be identified using the proposed method. A minimum requirement for any multivariate modeling is that the common-effect and consistent model must be identifiable. This means that there must be some information (direct or indirect) about each basic parameter for all outcomes. Two or more studies of the same design must provide data for all possible pairs of outcomes to identify β . Two or more studies of different designs must provide data for all possible pairs of outcomes to identify ω . If these conditions are satisfied then the model will be identifiable. In situations where our model is not identifiable we suggest that simpler models should be considered instead. Possible strategies for this include considering models of lower dimension or the consistent model. In practice, it is highly desirable to have more than the minimum amount of replication required, both within and between designs, so that the model is well identified. We make some pragmatic decisions in the next section for our example to provide sufficient replication within designs, in order to estimate β with reasonable precision.

Example
The methodology developed in this article is now applied to an illustrative example in relapsing remitting multiple sclerosis (RRMS). Multiple sclerosis (MS) is an inflammatory disease of the brain and spinal cord and RRMS is a common type of MS. The effectiveness of a new treatment is typically measured to assess its impact on relapse rate and odds of disease progression. Magnetic Resonance Imaging (MRI) allows measurement of the number of new or enlarging lesions in the brain. Three outcomes are included in our analyses, so that p = 3 in the full three-dimensional network meta-analysis. These three outcomes are: (i) the log rate ratio of new or enlarging MRI lesions; (ii) the log annualized relapse rate ratio; and (iii) log disability progression odds ratio. Relapse is defined as appearance of new, worsening or recurrence of neurological symptoms that can be attributable to MS, accompanied by an increase of a score on the Expanded Disability Status Scale (EDSS) and also functional-systems score(s), lasting at least 24 hours, preceded by neurologic stability for at least 30 days. Disability progression is defined as an increase in EDSS score that was sustained for 12 weeks, with an absence of relapse at the time of assessment. Negative basic parameters indicate that treatments B-F are beneficial compared to treatment A throughout. Data in this illustrative example were obtained from ten randomized controlled trials of six treatment options  (2) BD Relapse rate and disability progression only Mikol CD All three outcomes measured FREEDOMS 1 AEF All three outcomes measured FREEDOMS 2 AEF All three outcomes measured TRANSFORMS CEF All three outcomes measured Left-hand-side network corresponds to studies reporting the log annualized relapse rate ratio and log disability progression odds ratio (y 2 and y 3 ) for which data are complete. The right-hand-side network corresponds to studies reporting the log rate ratio of new or enlarging MRI lesions (y 1 which is not reported in four studies). The numbers shown on the network edges are the number of direct comparisons of each pair of treatments; the absence of an edge indicates that there is no direct comparison. Three of the thirteen studies are three arm trials which are each taken to provide three direct comparisons (a direct comparison between each treatment pair). Hence, there are 19 direct comparisons in the left-hand-side network where there is no missing data.
(coded in the network data as treatments A-F); placebo (A), interferon beta-1b (B), interferon beta-1a (C), glatiramer (D), and two doses of fingolimod; 0.5 mg (E) and 1.25 mg (F). Three trials of fingolimod were three-arm (two doses and a control) and are included as three-arm studies. Three trials of interferon beta (one 1a and two 1b) were three-arm (also two doses and a control), and these were included as separate two-arm trials (each dose against the control, with the number of participants in each control arm halved). This ignores the differences in doses of interferon beta and was a pragmatic decision to help provide an identifiable network. Briefly, in this example there is very little replication within designs, so that identifying β well is very difficult without making pragmatic decisions such as this. Sormani et al. (2010) also treat these particular studies as two separate studies in this way, which helps them to identify their meta-regression models. Treating these three studies as separate two-arm trials means that the data are analyzed as being from thirteen studies and a summary of the resulting data structure is shown in Table 1. There are eight different designs in Table 1 and so there is relatively little replication within designs, even when including three of the three-arm studies as separate two arm studies. Full details of the dataset that are relevant to this article are described in the supplementary materials and see also Bujkiewicz et al. (2016). Figure 1 provides network diagrams that show the number of comparisons between each pair of treatments on the edges. In these diagrams, the three arm studies (Table 1) are taken to contribute three comparisons, for example, the CEF study contributes CE, CF, and EF comparisons. Two estimates of treatment effect from this study contribute to analyses however because C is taken as the baseline; the study's estimated EF treatment effect contains no additional information once its CE and CF contrasts are included in the analysis.   Table 2 shows the estimates of the basic parameters (treatment effects relative to the reference treatment, placebo) obtained from univariate network meta-analyses, bivariate analyses for all three combinations of pairs of outcomes and the trivariate analysis. The results are similar across all analyses, and conclusions from univariate and multivariate analyses are the same. This is disappointing because multivariate analyses have not resulted in more precise inference. The entries ofˆ β andˆ ω are shown in Table 3. The positive estimates obtained for the unknown variance components suggest that this example exhibits some between-study heterogeneity and inconsistency. In order to assess the impact of the unknown variance components, we also fitted the consistent model and the common-effect and consistent model (results not shown) using all three outcomes (p = 3). On average, the standard errors of the fifteen basic parameters from the full model are 35% greater (range: 13-84%) than those from the consistent model, which in turn are 58% (range: 8-128%) greater than those from the common-effect and consistent model. Both the between-study heterogeneity and inconsistency have notable impact.
The multivariate analysis adds to the univariate analyses in two main ways. Firstly, the finding that the multivariate analysis is in good agreement with the univariate analyses is a particularly important finding for treatment effects on MRI where a substantial proportion of data were missing. It has been demonstrated by Kirkham et al. (2012) that a multivariate approach to meta-analysis can help obtain more accurate estimates in the presence of outcome reporting bias. Hence, the multivariate analysis reduces concerns that this univariate analysis is affected by reporting bias. Secondly joint inferences for all three outcomes are possible under the multivariate model. For example, and as we might anticipate, in our example the estimated log annualized relapse rate ratios and log disability progression odds ratios are highly positively correlated; from Var(δ) in our three-dimensional multivariate meta-analysis, the correlations between the five pairs of estimated basic parameters for these two outcomes are all between 0.63 and 0.75. Medical decision making based jointly on these two outcomes should take this high positive correlation into account, and this is only possible by using a multivariate approach. For example, a formal decision analysis involving these two outcomes should be based on their joint distribution rather than their two marginal distributions. In the supplementary materials, we perform a simulation study to further explore how the proposed methodology performs.

Discussion
We have proposed a new model for dealing with both multiple treatment contrasts and multiple outcomes, to provide a framework for conducting multivariate network meta-analysis. By using a matrix-based method of moments estimator, our methodology naturally builds on previous work (such as the well-known DerSimonian and Laird approach) and is computationally very fast, relative to other potential estimation approaches such as REML or MCMC; this is especially the case in very high dimensions and so our methodology is particularly advantageous for ambitious analyses of this type. The main disadvantage is that, as a necessary consequence of its semi-parametric nature, the method of moments is not based on sufficient statistics and so is not fully efficient. The loss in efficiency relative to maximum likelihood estimation awaits investigation but we anticipate that this will be less serious for inferences about the average effects than the unknown variance components. Furthermore, the within-study normal approximations used in our model are not necessarily very accurate even in moderately sized studies.
Since our analysis uses a general design matrix, the modeling may easily be extended by adding study level covariates to describe and fit multivariate network meta-regressions. In the network meta-analysis setting, these regressions have the potential to explain the reasons for inconsistency and model multiple dose level responses. Our method of moments estimation can be combined with approaches that "inflate" confidence intervals from a frequentist random effects metaanalysis (Hartung and Knapp, 2001;Jackson and Riley, 2014).
In conclusion, we have developed a new model and estimation method for multivariate network meta-analysis, which can describe multiple treatments and multiple correlated outcomes. An R function is available in the web supplementary materials that implements the proposed methodology.

Supplementary Materials
Web appendices referenced in Sections 2, 4, and 5 are available with this article at the Biometrics website on Wiley Online Library. Computing codes are also available.