Multivariate meta-analysis of mixed outcomes: a Bayesian approach

Multivariate random effects meta-analysis (MRMA) is an appropriate way for synthesizing data from studies reporting multiple correlated outcomes. In a Bayesian framework, it has great potential for integrating evidence from a variety of sources. In this paper, we propose a Bayesian model for MRMA of mixed outcomes, which extends previously developed bivariate models to the trivariate case and also allows for combination of multiple outcomes that are both continuous and binary. We have constructed informative prior distributions for the correlations by using external evidence. Prior distributions for the within-study correlations were constructed by employing external individual patent data and using a double bootstrap method to obtain the correlations between mixed outcomes. The between-study model of MRMA was parameterized in the form of a product of a series of univariate conditional normal distributions. This allowed us to place explicit prior distributions on the between-study correlations, which were constructed using external summary data. Traditionally, independent ‘vague’ prior distributions are placed on all parameters of the model. In contrast to this approach, we constructed prior distributions for the between-study model parameters in a way that takes into account the inter-relationship between them. This is a flexible method that can be extended to incorporate mixed outcomes other than continuous and binary and beyond the trivariate case. We have applied this model to a motivating example in rheumatoid arthritis with the aim of incorporating all available evidence in the synthesis and potentially reducing uncertainty around the estimate of interest. © 2013 The Authors. Statistics inMedicine Published by John Wiley & Sons, Ltd.


Introduction
When the synthesis of data from studies reporting multiple outcomes is required, multivariate random effects meta-analysis (MRMA) can be used instead of performing meta-analyses on each outcome separately, which has an advantage of taking into account the correlations between the outcomes. Methods for MRMA have been developed for a number of purposes, for example, to estimate multiple outcomes from clinical trials [1], to model relationships between surrogate endpoints [2], or to evaluate diagnostic tests [3]. Attention has mainly focussed on frequentist approaches, but methods, which use a Bayesian framework, have also been developed [2,4]. In a recent review, Jackson et al. [5] describe advances in the development of the methodology of multivariate meta-analysis and discuss the advantages and disadvantages of the use of these methods. One of the advantages of integrating data on multiple outcomes IPD is used to construct prior distributions for the within-study correlations between the HAQ and the alternative outcomes whereas external summary data (ESD) from a systematic review allows us to derive prior distributions for the between-study correlations.
The outline of this article is as follows. In Section 2, we introduce the motivating example and describe the data and the logic behind the meta-analysis model. Section 3 contains the details of trivariate meta-analysis model, including methods of constructing the prior distributions for the correlations. In Section 4, we briefly describe the implementation of these models in WinBUGS [18]. We include the results of applying the model to the RA example in Section 5, which is followed by the discussion in Section 6. We describe further technical details, including the description of the multivariate model for any number of outcomes, in the Appendices.

Systematic review: 'Lloyd data'
A systematic review and meta-analysis was carried out [17] to investigate the effectiveness of tumor necrosis factor-alpha inhibitors such as etanercept, infliximab, and adalimumab used as second line treatment in patients with RA. Standard instruments for measuring response to treatment in RA were considered: the HAQ and Disease Activity Score (DAS-28) measures, and the American College of Rheumatology (ACR) response criteria. The results of the meta-analyses of all three outcomes individually showed that the biologic interventions are effective when used sequentially. Data collected in this systematic review are used to investigate how multivariate meta-analysis can be applied to incorporate multiple outcomes, such as 20% response according to the ACR criteria (ACR20; a binary outcome used on log odds scale in MRMA) and changes from baseline of DAS-28 and HAQ scores (continuous outcomes), in evidence synthesis, which aims to estimate the change from baseline of the HAQ score. The estimate of the HAQ is of a particular interest to clinicians and decision-makers in health care as it is often used to estimate quality of life of patients following treatments of RA. Table I gives details of the three outcomes, which were reported in each of the studies within the systematic review. As indicated in the footnote of the table, standard errors (and occasionally the mean values) were often obtained from other measures or imputed. These unreported values could be treated as unknown in a Bayesian analysis. However, for the simplicity and exposition of this model, we chose to treat them as constant. The details of how these values were obtained can be found in Appendix 2 of the original meta-analysis of these data [17]. We will refer to these data as the Lloyd data throughout this paper.

External individual patient data
None of the studies included in the Lloyd data reported correlations between outcomes, whereas such study-level correlations are required to fully specify a model of correlated outcomes. These correlations cannot be obtained directly from summary data, such as the Lloyd data, and none of the studies listed the IPD; thus, external data were required to estimate the correlations.
The IPD was obtained from the British Rheumatoid Outcome Study Group (BROSG) trial, which was designed to assess the benefit of aggressive disease-modifying antirheumatic drug treatments in patients with established RA, conducted by Symmons et al. [19]. It was a randomized trial, which recruited 466 patients with stable RA. The trial assessed clinical outcomes (i.e., HAQ, DAS-28, and ACR20) in two cohorts of patients managed using either a regime focussed on symptomatic control of pain and stiffness in the shared care setting or a more aggressive regime focussing on control of symptoms and joint inflammation in the hospital setting.
The BROSG trial found no difference between the aggressive versus symptomatic treatment arms. Therefore, this data set may be used as a single cohort of 466 patients with established RA whose condition deteriorated modestly over a 3 year period. The data from the BROSG trial were used to construct prior distributions for the within-study correlations between outcomes in the multivariate models. Given the complexity of the models in this paper, only those 293 patients for whom all three clinical outcomes (HAQ, DAS-28, and ACR20) were reported were included in the analysis (to avoid further data imputation).

External summary data
In contrast to the within-study correlations, we can estimate the between-study correlations from the model using study level summary data. In a Bayesian framework, this could entail placing noninformative prior distributions on these correlations. However, one of the advantages of Bayesian approach we want to exploit here is the possibility of incorporating external information in the analysis. We can achieve this by using external information to construct informative prior distributions. To illustrate this, we conducted an ad hoc meta-analysis of ESD to obtain estimates of the between-study correlations, which can be used as prior distributions in our models (as described in Sections 3.4 and 3.5). The ESD included studies of the same type of treatment as in the Lloyd data but used as the first-line treatment. Supporting Information ‡ include further details with the full list of studies included in the ESD.

Logic of the meta-analysis model and notation
In our motivating example, we aim to model the summary data of the correlated outcomes from the Lloyd data using a multivariate meta-analysis in a Bayesian form. To do so, we need to place prior distributions on the within-study and the between-study correlations (which are not known in the Lloyd data). We use the IPD, described in Section 2.2, to construct the prior distributions for the within-study correlations and the ESD, described in Section 2.3, to construct the prior distributions for the betweenstudy correlations. Figure 1 illustrates this data structure and the role of each element within it. We use the external data to construct the prior distributions for the within-study and between-study correlations only. The remaining parameters of the model, such as the pooled effects and the between-study standard deviations, are given noninformative prior distributions [6]. Note that the external data set used in this example was not very large. However, in more general circumstances, the relevance and rigor of the external evidence can be taken into account. For example, the variance of the prior distribution can be adjusted to construct a less informative distribution [6,20]. In addition, when there are multiple external data sources, we can carry out a random effects meta-analysis. A number of authors have advocated ‡ Supporting information may be found in the online version of this article.

Trivariate random-effects meta-analysis
For the purpose of simplicity and direct link to the Lloyd data, the model presented here includes only three outcomes. The full multivariate model is described in Appendix A. Suppose that we have summary data available on at least one of three outcomes (Y 1 , Y 2 , and Y 3 ). To be able to combine data from studies reporting these outcomes, we can carry out a meta-analysis of the three outcomes simultaneously. If we assume that all three outcomes are normally distributed (i.e., they are continuous outcomes or binomial outcomes on the log scale as for example log odds of response), we can model them simultaneously in the TRMA model (by extending bivariate models discussed by van Houwelingen et al. [22] and Riley et al. [23]) as follows: 0 In the previous model, we assume outcomes Y 1i , Y 2i , and Y 3i to be estimates of correlated effects 1i , 2i , and 3i with corresponding within-study covariance matrices † i of the estimates. These studylevel effects follow a trivariate normal distribution with means .ˇ1;ˇ2;ˇ3/ and covariance T in this hierarchical framework. We will refer to (1) as the within-study model and (2) as the between-study model. Prior distributions need to be specified for missing data (which we describe in Section 3.1), the within-study correlations (constructed in Section 3.2), and the elements of the between-study covariance matrix. The latter we propose to construct for an alternative parameterization of the between-study model as described in Section 3.3 with the choice of prior distributions discussed in Section 3.4 and finally constructed in Section 3.5.
Note that the bivariate random-effects meta-analysis (BRMA) is a special case of TRMA in (1)-(2), which can be obtained simply by reducing the dimension of TRMA, as described in Appendix B. If all the within-study correlations in TRMA, that is, the within-study correlations 12 wi , 13 wi , and 23 wi , and the between-study correlations 12 b , 13 b , and 23 b are equal to zero, then the model (1)-(2) reduces to three univariate random-effects meta-analysis (URMA) models for Y 1i , Y 2i , and Y 3i : which in a Bayesian framework require specification of prior distributions such as the normal prior distribution for the mean effectˇj N.0; 1000/ and the half normal for the between-study standard deviation j N.0:0; 1000/I.0; / (where N. ; /I. ; / denotes the half normal distribution [6]). Examples of alternative prior distributions for URMA can be found elsewhere [24].

Missing data
In our model, we assume that not all the studies report all the outcomes Y 1 , Y 2 , and Y 3 ; that is, there are missing data for at least one of the outcomes for some of the studies. The advantage of conducting this meta-analysis in a Bayesian framework is that we give all the outcomes (whether known or missing) a distribution, which leads to these missing values being estimated directly from the model through the Markov Chain Monte Carlo (MCMC) simulation. Defining the trivariate model, as in (1), already provides distributions for missing values of Y 1i , Y 2i , and Y 3i . However, corresponding missing standard deviations 1i , 2i , and 3i still need to be estimated. By assuming exchangeability of the variances, we can assume the corresponding population variances (rather than the variances of the mean) to come from the same distribution, for example, and 2 1i D var 1i =N i , 2 2i D var 2i =N i , and 2 3i D var 3i =N i . By 'borrowing of information' from the studies reporting the Y 1i (Y 2i , Y 3i ) and their standard deviations, the variances (and hence standard deviations) for studies with missing Y 1i (Y 2i , Y 3i ) are predicted by the MCMC simulation, conditional upon both the data and the posterior estimates of the model parameters.

Prior distributions for the within-study correlations
Clinical studies very rarely report the correlations between clinical outcomes, that is, the within-study correlations 12 wi , 13 wi , and 23 wi ; therefore, in the model (1), we assume them to be unknown. In the Bayesian framework, however, we must give the correlations prior distributions. These prior distributions can be either noninformative (such as a uniform distribution over range 1 to 1) or more informative prior distributions potentially based on external information, for example, IPD containing observations on all outcomes from another, but relevant, study such as the IPD described in Section 2.2.
Suppose we have data available from an external study with individual participants for whom data on all three outcomes have been collected. We want to use IPD to construct prior distributions for the withinstudy correlations 12 wi , 13 wi , and 23 wi . If all the outcomes in IPD were continuous, we could assume that they follow a common trivariate normal distribution and modeling them this way would allow us to evaluate the within-study correlation directly from the covariance matrix. Often, however, outcomes are a mixture of continuous ones and others, for example, binary ones (as is the case of ACR20, in our motivating example, which in the summary data is taken as log odds of ACR20 response, but in IPD, it is a binary outcome). Then, we cannot assume that the three variables follow a common trivariate normal distribution (unlike in the case of aggregate data in (1), we are unable to estimate log odds of response for individual patients).
There are two ways of overcoming this problem of constructing the prior distributions for the withinstudy correlations in TRMA. One is to use an approximation where the correlation between the mean of the continuous variable and the log odds of the binary one equals the correlation between the continuous observation and the binary observation. Another approach is to carry out a double bootstrap analysis. We have chosen the latter as it may be applied to other types of mixed outcomes. Double bootstrap methods have been developed to, for example, estimate confidence intervals [15,25]. We have carried out a simulation, described in Appendix C, to show that this method can be applied to estimate summary statistics and the correlations between them.
We applied the double bootstrap method in the following way. From the IPD (containing observations on all three outcomes), we sampled N b1 D 500 bootstrap samples of size equal to that of the IPD, allowing for repetitions (first level bootstrap samples). From each of the bootstrap samples, we sampled another N b2 D 500 bootstrap samples (second level bootstrap samples). For each of the second level bootstrap samples, we calculated the mean values of the continuous variables and log odds of the binary ones. For the sets of summary statistics corresponding to each of the first level bootstrap samples, we then calculated the correlations between them. We used the simulated series of 500 sets of the three correlations (in the form of the 500 3 matrix) as a set of empirical prior distributions for the within-study correlations. In our main TRMA model, we sampled rows of this matrix (containing three correlations) to preserve the between-outcome correlation structure to ensure the nonsingularity of the within-study covariance matrices. In our example, the between-study variability exceeds the variability within the included studies; therefore, the estimates of the within-study correlations have little impact on the final results of TRMA [13], and hence, the same prior distributions for all within-study correlations in each of the studies could be used as in Nam et al [4] where the correlations do not vary with the study suffix i. However, in a more general scenario, this may not be appropriate. Therefore, for each of the studies in meta-analysis, we sample the three within-study correlations from different copies of the empirical prior distribution, thus allowing wi to have a different value for each study i, in every MCMC iteration.
Note that, as mentioned previously, if all three variables were continuous, it would be possible to estimate the within-study correlation directly (without the need for the bootstrap simulation). This is described for the bivariate version of the model in Appendix B.

Parameterization of the between-study model
For the between-study model (2), we need to specify prior distributions for the between-study variances 2 1 , 2 2 , and 2 3 , and the between-study correlations 12 b , 13 b , and 23 b . One approach, often adopted, is to give the inverse of the covariance matrix in (2), that is, the precision matrix T 1 , a Wishart prior distribution [26]. An alternative approach is to parameterize the between-study model (2) in the product normal formulation, that is, in the form of a product of a series of univariate conditional normal distributions [27]. We chose this latter approach as it is more intuitive and allows us to use an explicit formulation of prior distributions for 12 b , 13 b , and 23 b on the basis of external evidence. This approach is easier to implement, however, it requires making an additional assumption of independence of Y 2 and Y 3 conditional on Y 1 . This assumption means that elements .2; 3/ and .3; 2/ of the precision matrix are zero as the partial correlation coefficient between Y 2 and Y 3 is zero [28]. We express this formulation as follows 8 which can then be used to estimate the parameters of the between-study model (2): 8 < :ˇ1 Note that the variable 1i in (5) has been centered to avoid high autocorrelation in the MCMC simulation.

Choice of the prior distributions for the between-study correlations
The formulae in (6) show the interdependencies between the parameters (i.e., the correlations, regression coefficients, and the standard deviations). Because they are inter-related, placing prior distributions on such parameters requires caution to ensure that they are plausible and realistic. For example, placing noninformative prior distributions on the standard deviations 1 , 2 , and 3 and the regression coefficients  Figure 2. Examples of constructing prior distribution for the between-study correlation using independent noninformative prior distributions for the standard deviations and the regression coefficient, which can lead to implausible prior distribution for the correlation. 20 , 21 , 30 , and 31 in (5) may appear to be an appropriate approach; however, it can result in inadequately constructed prior distributions for the between-study correlations, 12 b , 13 b and 23 b , as shown in Figure 2 for the correlation between Y 1 and Y 2 , 12 b . For example, using a normal prior distribution for the regression coefficient 21 and an independent half normal distributions for the between-study standard deviations 1 and 2 can lead to a W-shaped prior distribution for the correlation (top row in Figure 2), whereas using a normal prior distribution for the regression coefficient and independent uniform distributions for the between-study standard deviations can lead to a U-shaped prior distribution for the correlation (bottom row in Figure 2). This choice of the hyper parameter for the prior distribution leads to sampling from extreme values of the correlation, also leading to poor convergence.
We found that in this case, it is more suitable to give the correlations an appropriate prior distribution first. Defining them together with the prior distributions for the between-study standard deviations 1 , 2 , and 3 enables evaluation of the regression coefficients 21 and 31 from (6) Figure 3 shows examples of such parameterizations of the prior distributions. Using an informative prior distribution for the between-study correlation (such as a normal prior distribution for the Fisher transformation of a correlation) and 'vague'; for example, half normal distributions for the standard deviations 1 and 2 gives a more plausible implied prior distribution for the regression coefficient 21 (top row in Figure 3). Another possible approach is to use the vague uniform prior distribution for the correlation as well as the between-study standard deviations, which also gives a plausible prior distribution for the regression coefficient (bottom row in Figure 3). Placing prior distributions first on 12 b and 13 b and on the between-study standard deviations 1 , 2 , and 3 and then estimating the implied prior distributions for the coefficients 21 and 31 largely improves the convergence of all the parameters, that is, the between-study correlations 12 b and 13 b and the coefficients 21 and 31 . We give the coefficients 20 and 30 noninformative prior distributions as they do not affect the correlations.

Construction of the prior distributions for the between-study correlations
We can use external data to obtain the prior distributions for the between-study correlations. We can obtain such data, for example, from a systematic review of studies (i.e., involving patients with the same condition and under comparable treatment) reporting all outcomes such as the ESD introduced in Section 2.3. A trivariate meta-analysis (1)-(2) is carried out on the three outcomes in ESD simultaneously in the same way as the main TRMA described in this section. Here, however, we choose to include only studies reporting all three outcomes; hence, there will be no missing data. We use vague prior distributions for the within-study and between-study correlations and the between-study standard deviations (obtaining informative prior distributions for these parameters would require another external data meta-analysis leading to an infinite process). We can use directly the posterior between-study correlations from this ESD analysis, 12 ESD and 13 ESD , as the prior distributions for 12 b and 13 b in (7). In the MCMC simulation of the main TRMA model, we sampled pairs of 12 b and 13 b (thinned out to 500 pairs) to preserve the correlation structure between the conditionally dependent variables.
Having defined all prior distributions, we can obtain relevant estimates (posteriorˇ1,ˇ2, andˇ3 and the corresponding variances and correlations) from MCMC of the trivariate hierarchical model (defined in (1) and (2)).

Implementation in WinBUGS
We implemented all models in WinBUGS [18] where the estimates were obtained using MCMC simulation using 100,000 iterations (including 50,000 burn-in). We checked convergence by visually assessing the history, chains, and autocorrelation using graphical tools in WinBUGS and using the Geweke method in the BOA package [29]. We present all posterior estimates as means with the 95% highest probability density intervals (HPDIs) except for the estimates displayed in the forest plots, which are means with the 95% credible intervals (CrI).

Results
We have applied the TRMA model to the Lloyd data to simultaneously model data on the HAQ, DAS-28, and ACR20 (with the IPD and ESD used to construct the prior distributions for the within-study and the between-study correlations between the outcomes, as described in Sections 3.2 and 3.5). In the Lloyd data, the HAQ and DAS-28 estimates represent the mean change from baseline of the outcomes, and ACR20 is a proportion of responders, which is transformed to log odds of response to be able to follow the normal distribution in TRMA. To investigate the impact of including more data in the analysis on uncertainty around the HAQ estimate, we explored results of the meta-analyses on three levels: using URMA of HAQ (as in (3)), bivariate random-effects meta-analysis (BRMA) combining the HAQ and the DAS-28 (as in (B1)-(B2) in Appendix B), and finally, TRMA by extending the data by ACR20. Table II lists the parameters for the prior within-study and between-study correlations used in both BRMA and TRMA. Table III shows the results obtained from all the models.

Results for bivariate random-effects meta-analysis of Health Assessment Questionnaire and Disease Activity Score
We carried out BRMA of HAQ and DAS-28 with the prior distributions for the within-study correlations estimated from the IPD (methods described in Appendix B) and for the between-study correlation estimated from the ESD (as in Section 3.5). The BRMA allowed us to include 10 cohorts (from the eight studies reporting DAS-28 but not HAQ in Table I) in addition to those eight that could be used in URMA of HAQ (studies reporting HAQ in Table I). The posterior correlation between DAS-28 and HAQ obtained from the IPD (used as a prior distribution for the within-study correlation in each study, as discussed in Section 2.2) was relatively weak, with mean das;haq The HAQ estimate shifted toward a more extreme value when using BRMA (change in HAQ from baseline shifts from 0.25 in URMA to 0.28 in BRMA) with uncertainty reduced by 25% of the width of the 95% HPDI compared with the interval from URMA. The estimate of DAS-28, however, moved to a less extreme value when using BRMA (from 1.57 to 1.51), with reduced uncertainty. Adding more data to the meta-analysis by using BRMA reduced uncertainty but not heterogeneity; the between-study variances H and D remain almost the same, with only reduced uncertainty for H . However, the direct comparison of the variances with those from URMA is confounded by the different prior distributions being used: half normal prior distributions are used for H and D in URMAs, whereas in BRMA, half normal prior distributions are used for H and D ( 1 and 2 in (5) for TRMA); hence, the implied probabilities for H and D are different. Figure 4 shows three forest plots representing estimates of the HAQ from URMA (left) and BRMA (middle), and DAS-28 from BRMA (right). As in all forest plots (Figures 4 and 5), black solid lines correspond to the 'shrunken' estimates H i and pooled estimatesˇH , the grey solid lines show the estimates obtained from the systematic review (data used in this meta-analysis), and the grey dashed lines (estimates also marked with a on their right) correspond to the predicted estimates for the studies that did not report the HAQ but reported the DAS-28 (or reverse in plots for the DAS-28). Dashed and dotted black lines below the pooled estimates represent the pooled estimates obtained from meta-analyses of reduced numbers of outcomes, for example, estimates from URMA below the estimates obtained from BRMA for comparison.
In the forest plot of the estimates from URMA, the estimates are shrunken toward the mean, which is especially noticeable for those studies with higher uncertainty around the known estimates (i.e., Bennet, Haroui, and Iannone).    Forest plots for estimates of (from left to right) 20% response according to the American College of Rheumatology (ACR) criteria, Disease Activity Score (DAS-28), and Health Assessment Questionnaire (HAQ) from trivariate random-effects meta-analysis (TRMA) of HAQ, DAS-28, and ACR20. Graph shows estimates from the systematic review with 95% confidence intervals (grey solid lines), predicted missing estimates from TRMA with 95% credible intervals (CrIs; grey dashed lines), 'shrunken' estimates with 95% CrIs (black solid lines), and the pooled estimates with 95% CrIs (black solid lines for pooled effect from each of the TRMAs and black dotted (dashed) lines representing results from bivariate random-effects meta-analysis (univariate random-effects meta-analysis) for comparison).
(which for URMA is 0.26) with shrunken estimate equal 0.38 (95% CrI [ 0.63, 0.15]). Borrowing of strength across studies leads to both the shift toward the mean and decrease in uncertainty reducing the width of the credible interval by 14% for the Haroui estimate. Note that the uncertainty corresponding to the pooled estimate is higher compared with the estimate obtained using the frequentist univariate meta-analysis [17]; 0.25 (95% CI [ 0.4, 0.11]). This is due to the between-study variance having a probability distribution (fixed in the frequentist approach) adding to the overall uncertainty in the model. In BRMA, in addition to borrowing of strength across studies, there is also borrowing of strength between outcomes. In the middle forest plot in Figure 4 (representing estimates of HAQ from BRMA), the predicted estimates contribute to the pooled estimate even though there is considerable uncertainty associated with them. This additional borrowing of information across outcomes leads to the aforementioned reduction in uncertainty around the pooled estimate of the HAQ.
As can be seen in Figure 4 (middle and right-hand-side forest plots of estimates from BRMA), the predicted estimates of the HAQ (DAS-28) follow the heterogeneity pattern of the corresponding known estimates from the DAS-28 (HAQ) because of the informative (and of high mean) prior between-study correlation DH b = 0.86 (95% HPDI [0.46,0.999]). Studies that reported the DAS-28, but not the HAQ, had on average relatively high positive estimates, which led to the high positive predicted estimates for the HAQ. Especially extreme values predicted for the HAQ were those for studies by Cohen, Di-Poi, Nikas, and one cohort of the Wick study, which had extreme values for the DAS-28. Also, two studies out of those three reporting the HAQ, but not the DAS-28, had more extreme estimates, showing little or no effect (only small improvement in HAQ in the Hyrich study and no improvement in Iannone), which led to the more extreme predicted estimates for the DAS-28 and, in consequence, reduced the pooled effect measured by the change from baseline of the DAS-28.

Results for trivariate random-effects meta-analysis of Health Assessment Questionnaire, Disease Activity Score, and 20% response according to the American College of Rheumatology criteria
We were able to include three further studies (which reported the ACR20, but not the HAQ or DAS-28 scores) by extending BRMA to TRMA. By incorporating an additional outcome, ACR20, we not only extended the data by those three studies but also incorporated the data on ACR20 from eight studies already in BRMA, which reported the HAQ and/or DAS-28 as well as ACR20 (the details can be found in Table I). Adding ACR20 in TRMA did not, however, lead to any change in the point estimate of HAQ or its uncertainty; they remained almost the same as in BRMA. The between-study heterogeneity increased slightly ( H increased from 0.21 to 0.22) after inclusion of the three studies reporting the ACR20. This is likely due to the relatively high between-study heterogeneity of studies reporting the ACR20 and the prior correlation between the ACR20 and the HAQ being of rather low mean and relatively noninformative. Figure 5 shows forest plots for all three outcomes: the ACR20, DAS-28, and HAQ. The predicted estimates (grey dashed lines) for the HAQ and DAS-28 follow the heterogeneity pattern of the corresponding known estimates of the DAS-28 and HAQ, respectively, as these are highly correlated outcomes, with the posterior between-study correlation DH b D 0:83 (95% HPDI [0.45,0.99]). However, the predicted estimates of the HAQ and DAS-28, for studies where neither of these outcomes were reported, are close to the overall mean with considerable associated uncertainty (as are the predicted estimates for ACR20 for studies reporting HAQ and/or DAS-28 but not ACR20). This is likely due to the lack of correlation between the ACR20 and the HAQ. This resulting level of uncertainty around the predicted estimates and the high between-study heterogeneity of studies reporting the ACR20 is the most likely explanation for the lack of a further reduction in uncertainty around HAQ when extending the analysis to TRMA. An additional forest plot showing all estimates of HAQ (obtained from all the models: URMA, BRMA, and TRMA) plotted together for comparison can be found in the Supporting Information.

Discussion
We have developed a Bayesian multivariate meta-analytic framework for incorporating data from multiple sources of evidence: studies reporting multiple mixed endpoints and additional external data, which are used to inform the parameters of the model in the form of prior distributions. The developed model gives a flexible way for combining data on mixed outcomes beyond the bivariate case. In this framework, a product normal formulation is adopted to describe the between-study variability of correlated outcomes. The advantage of this approach is the possibility of an explicit incorporation of external information to inform the prior distribution for the between-study correlation. However, as we have demonstrated, careful specification of the univariate prior distributions is essential so that the resultant prior distributions that are induced for the correlations are both plausible and realistic. We also show how individual patient data, consisting of mixed outcomes, can be used to construct empirical prior distributions for the within-study correlations. Our model comprised three parts as the prior distributions for the correlations resulted from the analyses of the two external data sets. It would be possible to carry out all components of the analysis within a single comprehensive Bayesian analysis. However, the flow of information would need to be restricted when putting the parts of the model together to prevent the main analysis from feeding back to, and influencing, the part of the model in which the empirical prior distributions were constructed [18]. In our model, we used directly the posterior distributions of the relevant correlations, obtained from the analysis of external data, as the empirical prior distributions for the correlations. By sampling directly from the posterior distributions, we avoided the potential problem of having to make distributional assumptions to summarize the prior external evidence.
It has been previously discussed [30] that applying a multivariate approach to meta-analysis makes little difference to the results compared with those from the univariate approach. However, our findings show that this is clearly not always the case, and the multivariate approach can be of practical importance. We found that employing this Bayesian approach to evidence synthesis of outcomes in RA by combining the HAQ endpoint with the DAS-28 leads to a significant reduction in the uncertainty around the HAQ from 0.25 (95% HPDI [ 0.43, 0.09]) obtained from URMA to 0.28 (95% HPDI [ 0.41, 0.14]) obtained from BRMA. We have obtained similar results when using TRMA of the HAQ, DAS-28, and ACR20. The inclusion of the third outcome, ACR20, did not contribute to the HAQ estimate or reduction of uncertainty around it.
We cannot predict the extent of the gain in the precision of the pooled estimate, and the inclusion of additional outcomes through a multivariate approach does not always lead to reduced uncertainty around the estimate of the outcome of interest. In fact, in our example, extending the bivariate meta-analysis to the three-dimensional case did not contribute to a further reduction in uncertainty around the clinical outcomes. The additional number of studies included in the meta-analysis, by extending the analysis to multiple outcomes, can also lead to additional heterogeneity and therefore potentially increase the level of uncertainty. Sturtz et al. [31], have discussed a similar effect in the context of mixed treatment comparison meta-analysis where extending a network (by inclusion of additional interventions and studies) can sometimes lead to increased uncertainty around the estimates because of increased heterogeneity. However, there are other potential advantages of using a multivariate approach. It has been recently suggested by Kirkham et al. [32] that under some circumstances, a multivariate approach to meta-analysis can lead to a more appropriate estimate of the clinical outcome in the presence of outcome reporting bias. This may apply to the case of RA (discussed previously) where, because of the availability of a number of instruments for measuring the disease activity or response to treatment, authors may choose to report only those outcomes whose estimates are positive and/or significant. Although, in our model, we make no assumption regarding outcome reporting bias, this multivariate approach may reduce the effect of such bias if it existed.
One of the limitations of this particular example in RA is that the IPD comes from a study with treatments, which are very different from those used in the Lloyd data. It is possible that not only is the treatment difference an issue, but also that patients did not respond particularly well to the treatment and there was relatively little variability between patients leading to a low correlation. However, despite this limitation, we chose to use these prior distributions as they serve the purpose of illustrating how IPD can be used to construct a set of empirical prior distributions for the whole correlation structure of the within-study correlations. However, it is not always possible to obtain relevant individual patient data. Nevertheless, the Bayesian approach has another advantage as it allows elicitation techniques to be used to incorporate experts' opinions in the form of prior distributions in a model [33]. For example, methods for inclusion of expert opinion on bias in the synthesis of studies have been developed [20].
We can apply the modeling framework presented in this article to a range of settings in evidence-based medicine. For example, alternative outcomes such as surrogate endpoints are increasingly considered in decision-making in health care, especially in the early stages of the drug development process when extended follow-up time is required to obtain the main clinical outcome of interest. In addition to the multivariate framework as a way of combining data on multiple correlated outcomes, the Bayesian approach gives another level of integrating diverse sources of evidence, as it allows for incorporation of external information in the form of prior distribution. In our view, this approach is an important step toward the coherent synthesis of multiple sources of evidence, which is vital in policy decisions in health care [7].

Appendix A. Multivariate random-effects meta-analysis
Suppose that we have summary data available on at least one of N outcomes (Y 1 , Y 2 , : : : , Y N ). If we assume that all outcomes are normally distributed (i.e., they are continuous outcomes or transformations of other outcomes), we can model them simultaneously in the MRMA model (by extending TRMA model described in Section 3) as follows: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : In the this model, outcomes Y 1i , Y 2i , : : : , Y N i are assumed to be estimates of correlated effects 1i , 2i , : : : , N i with corresponding within-study covariance matrices † i of the estimates. These studylevel effects follow a multivariate normal distribution with means .ˇ1;ˇ2; : : : ;ˇN / and covariance T in this hierarchical framework. Equations (A1) and (A2) describe the within-study and the between-study models, respectively.

A.1. The between-study model in the product normal formulation
Similarly as for TRMA, by assuming independence of Y 2 , : : : , Y N conditional on Y 1 , we can parameterize the between-study model (A2) in the form of product normal formulation as follows: Rearranging the formulas in (A4) leads to the following relationships between the regression coefficients, standard deviations, and the between-study correlations: which can be used to define inter-related prior distributions in a similar manner as described in Section 3.4 for TRMA.

Appendix C. Simulation to test the double bootstrap method for summary statistics
We have carried out a simulation to show that a double bootstrap method [15,25] can be applied to estimate correlation between summary statistics with uncertainty from individual subject data containing correlated observations. For the simulation, we chose to use the bivariate case of normally distributed data as in the normal case we know what the correlations between the summary statistics are: The correlation between the normally distributed individual observations equals the correlation between the means. We simulated data, of size N , containing two correlated variables: For these data, we calculated the correlation r and corresponding confidence interval by transforming the correlation (using the Fisher transformation): We obtained the standard error using the formula: Fcorr_SE D 1= p N 3, and then, the confidence interval for the transformed correlation is This confidence interval is then back-transformed to obtain the confidence interval for the correlation: r lci D exp.2Fcorr lci / 1 exp.2Fcorr lci / C 1 and r uci D exp.2Fcorr uci / 1 exp.2F corr uci / C 1 (C4) Using the simulated data, we then performed a double bootstrap simulation in the following way. From the simulated data, we sampled N b1 bootstrap samples. From each of the bootstrap samples, we sampled another N b2 bootstrap samples (second level bootstrap samples). For each of the second level bootstrap samples, we calculated the mean values of the outcomes. We calculated the correlations between the means: boot Y 1 Y 2 . We obtained confidence interval using the Fisher transformation in the same manner as for the correlation r previously. We then compared boot Y 1 Y 2 with r. We ran this simulation for the following example: N D 1000, y 1 N.0; 5/, y 2 N.2 C 4 y 1 ; 10/. We obtained the following results: r D 0:946, (95% CI: [0.939;0.952]) for the correlation obtained from the data directly and D 0:946, (95% CI: [0.939;0.953]) for the correlation obtained from the double bootstrap simulation with N b1=b2 D 5000 bootstrap samples on both levels.