A Bayesian approach for two‐stage multivariate Mendelian randomization with mixed outcomes

Many research studies have investigated the relationship between baseline factors or exposures, such as patient demographic and disease characteristics, and study outcomes such as toxicities or quality of life, but results from most of these studies may be problematic because of potential confounding effects (eg, the imbalance in baseline factors or exposures). It is important to study whether the baseline factors or exposures have causal effects on the clinical outcomes, so that clinicians can have better understanding of the diseases and develop personalized medicine. Mendelian randomization (MR) provides an efficient way to estimate the causal effects using genetic instrumental variables to handle confounders, but most of the existing studies focus on a single outcome at a time and ignores the correlation structure of multiple outcomes. Given that clinical outcomes like toxicities and quality of life are usually a mixture of different types of variables, and multiple datasets may be available for such outcomes, it may be much more beneficial to analyze them jointly instead of separately. Some well‐established methods are available for building multivariate models on mixed outcomes, but they do not incorporate MR mechanism to deal with the confounders. To overcome these challenges, we propose a Bayesian‐based two‐stage multivariate MR method for mixed outcomes on multiple datasets, called BMRMO. Using simulation studies and clinical applications on the CO.17 and CO.20 studies, we demonstrate better performance of our approach compared to the commonly used univariate two‐stage method.

toxicities resulted from a treatment can be accepted to a certain degree if the treatment can significantly improve the chance of survival, 6,7 and many of these toxicities may be correlated, 8 of which the types of toxicities and frequencies may largely depend on the patients' characteristics (eg, some baseline blood measures like magnesium level). For instance, a patient may have a higher chance of experiencing adverse events such as vomiting and diarrhea if some of their baseline blood variables are higher than normal range.
Studying such relationships can help clinicians make better treatment decisions and be more prepared for the management of the adverse events, and it has become more feasible given the increasing collection of detailed outcome data from clinical trials. 9,10 One example is the CO.17 trial which was conducted by the Canadian Cancer Trials Group (CCTG). As a phase III randomized, placebo-controlled study, its primary objective was to examine the effect of cetuximab on colorectal cancer patients compared to placebo in terms of survival. Besides the primary analysis on survival, the study data also suggested that there may be a difference in toxicities experienced by different patient groups. 11 Another example, the CO.20 trial, also conducted by CCTG, found that the addition of brivanib (BRI) to cetuximab was associated with increased toxicities and did not significantly improve the overall survival of patients. 1,12,13 These findings raise a natural question of whether some toxicities can be attributed to the patients' baseline characteristics.
To answer this kind of question, an intuitive way is to examine the effect of an exposure variable on an outcome using a regression model. However, the presence of potential confounders, either measured or unmeasured, may render such analysis invalid. 14 This is usually the case in clinical studies where the patients are not randomly assigned by treatment. Meanwhile, measures of toxicities or quality of life scores usually come with a mixture of different types of variables (eg, binary and continuous) that may be correlated, and there may be multiple datasets available, each containing some of the outcomes of interest. 15 It is challenging to analyze multiple mixed variables from multiple datasets jointly in an efficient way. 16,17 Over the past few years, the Mendelian randomization (MR) approach has become a popular approach to handle the confounding problem in clinical investigations, especially where observational data are collected and analyzed, with the help of genetic instrumental variables, [18][19][20] which are genetic variants (eg, single-nucleotide polymorphism [SNP]) in the most cases. Many different methods of MR have been proposed with the majority of them relying on certain instrumental variable assumptions. When those conditions are met, MR methods can efficiently estimate the causal effect of an exposure on a single outcome as well as make statistical inference. Some methods are more robust to the instrumental variable assumptions, while some are extensions that can analyze multiple exposures. [21][22][23][24][25][26][27] Nevertheless, most of the established methods focus on univariate analysis, meaning that they analyze one outcome at a time. When research interests are on multiple outcomes, it may be beneficial to conduct multivariate analysis, which jointly models different outcomes simultaneously. Since multivariate analysis can make use of the correlation information and avoid the Bonferroni correction that is known to be conservative, 28 it can be more powerful than the univariate analysis, especially when testing the overall causal effect, whether the exposure has significant effect on any of the outcomes. 29 To conduct multivariate analysis on mixed outcomes, several methods have been developed, though most of them are computationally challenging due to the complexity of the likelihood function. 15,[30][31][32][33][34] Besides, none of these methods use the MR framework to handle confounders with instrumental variables. Deng et al. 29 proposed a two-stage MR method with multivariate analysis on mixed outcomes (binary and continuous), combining the MR framework with the composite likelihood approach used in Bai et al. 34 This approach, called MRMO, has been shown to have higher power than the standard two-stage univariate MR analysis in most cases. However, MRMO was developed for studies with single dataset, it cannot be applied to the case where multiple datasets are available, especially when some datasets do not contain all the outcomes of interest.
In this article, we propose a Bayesian approach for two-stage multivariate MR with mixed outcomes, denoted by BMRMO, which combines the composite-likelihood based two-stage multivariate MR method with a Bayesian framework to handle multiple datasets for mixed outcome variables. While we focus on testing the causal effect in this article, BMRMO can also provide good estimation when the required assumptions are met, especially for continuous outcomes. The main novelty of our approach is not only the methodology development of innovative MR framework on mixed responses, but also the use of Bayesian algorithm to integrate multiple datasets, which is methodologically a promising approach to extend the existing analytic framework on multiple datasets. We apply the Metropolis-Hastings algorithm to obtain the posterior samples of the causal effects. 35 In terms of examining the overall causal effect, we propose three different ways based on the adjusted credible intervals, the Wald test and Bayes factor, respectively. 36 We evaluate the performance of our proposed multivariate method through simulations and the application to the CO.17 and CO.20 data.

Two-stage univariate analysis
Before introducing our new approach, we briefly describe the basic framework of traditional MR analysis and the difference between univariate and multivariate MR analyses. As depicted in Figure 1A, traditional MR analysis focuses on one outcome (eg, Y 1 ) at a time, using a single or multiple instrumental variables (IVs), usually chosen as some genetic variants (G), to examine the causal effect of the exposure (X) on this outcome while avoiding the confounding problem brought by unmeasured or unincluded confounders (U). In this case, the causal estimand is the average change in an outcome (eg, Y 1 ) for a unit change in X, which can be written as E ( . 37 For MR, three IV assumptions need to be met: (a) each IV should have an effect on X; (b) IVs should not be associated with U; (c) the pathway for IVs to affect Y has to go through X. Traditional MR usually uses univariate analysis, meaning that it analyzes each outcome at a time. For study with multiple outcomes, several univariate MR analysis will be conducted on each outcome separately. This may lead to loss of power, especially when testing the overall causal effect (the exposure's effect on any of the outcomes), since the correlation information between multiple outcomes is not used. On the other hand, building joint models to analyze different outcomes jointly, known as multivariate analysis, may make hypothesis testing more powerful by incorporating the possible correlations between different outcomes, as shown in Figure 1B.
To describe the two-stage approach of univariate MR, we consider a simple scenario with one dataset (D 0 ) for the exposure (n 0 subjects with G (0) = ( G 0,il ) n 0 ×p and X (0) = ( X 0,i ) n 0 ×1 ) and one dataset (D 1 ) for the outcomes (n 1 subjects with . G m,il , X m,i and Y m,ij stand for the lth IV value, exposure value and the jth outcome value of the ith subject in dataset D m . We have p instruments and q outcomes in total.
Two-stage univariate MR builds the first-stage model X 0,i = 0 + ∑ p l=1 l G 0,il + e 0,Xi based on dataset D 0 , where l is the effect of the lth SNP on the exposure, and then use this model to predict the exposure values for dataset D 1 , denoted bŷ X 1,i . Next, the second-stage models can be constructed as follows: To test the exposure effect on a single outcome j, we need to test whether coefficient j1 is significant. Note that there are other variations of the two-stage MR approach, but we only focus on the more commonly used standard approach. According to the literature, 22 for both continuous and binary outcomes, standard two-stage MR can provide a valid way to test the causal effect of an outcome on the exposure. For binary outcomes, we consider the probit link instead of the logistic link in the article, since it is more comparable to the mixed response model that we will introduce.
To test whether 11 = 21 = … = q1 = 0 (ie, the exposure does not have any effect on any outcome), univariate MR usually applies the minP test with Bonferroni correction. It compares q times the smallest P-value of any coefficient j1 to the significant threshold (eg, 0.05), which may give conservative results. 28

Multivariate mixed response model
To model multiple outcomes jointly, which may give us power benefit, Deng et al. 29 proposed a two-stage multivariate Mendelian randomization method, called MRMO, which accounts for mixed outcomes. The general idea is that in the second stage concerning dataset D 1 , instead of modeling each outcome separately, a joint model is used, with MRMO uses the pairwise likelihood, defined as L jk ( 35,36 to estimate j1 's and their covariance matrix, which can be then used to test each single j1 as well as the overall causal effect by the Wald test.

Univariate analysis with multiple outcome datasets
In a scenario with multiple datasets which contain possibly different number of outcomes, applying the traditional univariate MR analysis is straightforward. We only need to combine all subjects that have a certain outcome measured when analyzing this outcome. Suppose we still have p IVs and q outcomes. Dataset D 0 contains n 0 subjects with their IV information and exposure values. Datasets D m (m = 1, 2, … , M) each contains n m subjects with their IV information and some of the q outcomes. We assume that the subjects of different datasets are independent and do not have any overlaps. Figure 1C shows an example, where dataset D 1 contains G and all three outcomes, while datasets D 2 , D 3 , and D 4 have different dimensions of outcome variables. In this case, to conduct univariate analysis, we use D 0 to build the first-stage model to predict the exposure values for each outcome dataset D m (m = 1, 2, 3, 4), denoted byX m,i . Then, we can construct the second-stage model for outcome 1 using those datasets that have this outcome (D 1 , D 2 , and D 4 ) to estimate and test the causal effect. Similarly, we can use D 1 , D 2 , and D 3 to analyze outcome Y2, D 1 , D 3 to analyze outcome Y 3 . To be more specific, the first stage of our two-stage model can be written as follows: The second stage of our two-stage model can be written as follows: To avoid confusion, unless otherwise specified, subscripts m, i, j, and l are used to denote the mth dataset, ith subject, jth outcome, and lth SNP, respectively.

Bayesian multivariate mixed response model
When dealing with multiple datasets containing possibly different outcome variables, we cannot directly apply the MRMO method for multivariate analysis, since it's pairwise likelihood is based on complete data (ie, each subject in the outcome dataset [s] should have all of the q outcomes measured). One possible approach is that when calculating the pairwise likelihood of a pair of outcomes, we may combine all the subjects that have this pair of outcomes recorded across different datasets. However, doing so will require each outcome dataset to have at least two outcomes, or it will not be applicable to calculate the pairwise likelihood.
To solve this problem, we propose a Bayesian two-stage multivariate MR model, denoted by BMRMO, which incorporates the idea of Bayesian update. Firstly, we build the first-stage model using D 0 to predict the exposure valuesX m,i for each outcome dataset D m , similar to what was described for the univariate analysis. Then, the second-stage model is defined using pairwise distributions, similar to what was done in MRMO. Suppose Y m,ij is the jth outcome for subject i in dataset D m .
If outcomes j, k are both continuous, following Cox et al., 31 then we assume , with m,ij = j0 + j1Xm,i and m,ik = k0 + k1Xm,i . This means for subject i in dataset m, the two outcomes follow a multivariate normal distribution with correlation jk . If outcomes j, k are both binary, then we use the probit models, with are a pair of latent variables for subject i in dataset m, m,ij = j0 + j1Xm,i and m,ik = k0 + k1Xm,i . If one outcome j is binary, and the other outcome k is continuous, then we combine the latent variable model with the linear regression model, using .
It can be derived that for a pair of continuous outcomes, For a pair of binary outcomes, For one continuous and one binary outcomes, , and Φ is the cumulative distribution function of the standard normal distribution. Our parameters include intercepts j0 (j = 1, … , q), exposure effects j1 (j = 1, … , q), standard errors j (j = 1, … , q) and correlations jk (j = 1, … , q; k = 1, … , q; j ≠ k). The full composite likelihood can be written as follows: In the Bayesian framework, we assume the prior distribution of the parameters has density p( ). Suppose for dataset D m , the relevant density function is p m , and the recorded outcome data is Y (m) . We can apply the Bayesian update procedure illustrated in Figure 2A to obtain the posterior density , which can then help us make inference on certain parameters (eg, the causal effects j1 's).
, we only consider the available outcome variables in dataset D m . If D m has a set of multiple outcomes, denoted by S m , then we propose to use the composite likelihood based on these outcomes, which means is equal to the part of CL( ) whose outcomes are available in D m . When updating with p m ( , since it only considers outcomes that belong to S m , only the parameters related to those outcomes are updated. If D m only has one outcome, denoted as outcome O m , then we can only apply the marginal model, which means we consider building is defined as the likelihood based on this marginal model. In practice, mixing pairwise likelihood and marginal likelihood in p * ( ) may be problematic because data points may be used multiple times in pairwise likelihood, while datapoints used in marginal likelihood are only used once. We propose to mitigate this problem by replacing contains multiple outcomes. Note that even though we are able to obtain p * ( ) using the above procedure, this posterior distribution can be very complex. As a result, we propose to use the Metropolis-Hastings (MH) algorithm to generate posterior samples from p * ( ), based on which we can make inference about our parameters of interest. 35 Our algorithm is described as follows: 1. Choose the starting point for , denoted by (1) . Choose a distribution with density g based on the current parameters to generate the next candidate. 2. For each iteration t, generate a candidate ( * ) from the proposal distribution g ( ( * ) | (t) ) . Calculate the acceptance ratio . Generate a random number u from the standard uniform distribution. Then After obtaining T samples, we burn in the first B samples. Then, we can use (t) (t = B + 1, … , T) to infer about the parameters. Note that the efficiency of MH decreases with the increase of parameters. To reduce the number of parameters, for simplicity, we propose a single correlation parameter = jk rather than having different jk 's, which if unrestricted, will drastically increase the number of parameters. We will show in our simulation study that this choice of simplification is relatively robust under different values of jk 's. In terms of choosing the starting point and the proposal distribution, we use the estimates from the marginal models (described in Section 2.3) and a univariate distribution with parameter . For example, if the current estimate for j0 is (t) j0 , then the next candidate will be generated from Unif j0 + ). We choose = 0.1 by default, which yields robust results.

Inference procedure of BMRMO
To infer whether a single causal effect j1 is 0, we can use the MH posterior samples to estimate its (1 − ) (eg, 95%) credible interval. If the credible interval contains 0, then we can conclude that the exposure does not have a causal effect on outcome j. If it does not contain 0, then we can conclude that the exposure has a causal effect on outcome j. To examine the overall causal effect (whether 11 = 21 = … = q1 = 0), we have different possible options. A widely used approach for Bayesian analysis is using the Bayes factor. We can build the full model A 1 (as described in Section 2.4) and estimate its likelihood Pr(D 1 , … , D M |A 1 ) as p * ( )∕p( ) as well as the null model A 0 (fixing 11 = 21 = … = q1 = 0) and its likelihood Pr(D 1 , … , D M |A 0 ). Then, the Bayes factor is calculated as Pr(D 1 , … , D M |A 1 ) ∕ Pr(D 1 , … , D M |A 0 ). If the Bayes factor is greater than 10, then we can conclude that there is strong evidence showing that the exposure has a causal effect on at least one of the outcomes. 38 One alternative option is to use the credible intervals while adjusting for multiple testing. For each j1 , we can use the MH posterior samples to estimate its c * confidence interval instead of the (1 − ) credible interval, where c * = (1 − ) 1∕q . Another alternative option, assuming the posterior distribution of the causal effects is close to normal, is to apply the Wald test. We can use the MH posterior samples to estimate the means of the causal effects as well as their variance-covariance matrix, and then we can carry out the Wald test. Though the Bayes factor may seem more fitting in our Bayesian framework, we will demonstrate the effectiveness of different options in our simulations.

2.6
Proposed general procedure In this section, we present the general procedure of applying our approach to examine the causal effect of an exposure on multiple outcomes using multiple datasets, illustrated by Figure 2B. For this article, we focus on the last stage, comparing the use of BMRMO for multivariate analysis to the standard approach of two-stage univariate MR analysis. As discussed in the previous research, 29 compared to the various MR techniques that use summary statistics, two-stage methods that use individual-level data can easily incorporate moderately correlated IVs, which may be beneficial when there are not many available IVs that are independent. Also, it will be much more challenging to build joint models involving mixed outcomes if we use summary statistics. We also limit our discussion to independent and non-overlapped samples for the exposure and different outcomes, since overlapping samples may lead to biased estimates and more false discovery, as pointed out by literature. 18, 39 We will discuss more about the strengths, limitations and possible improvement of our approach in the discussion section.

Assumptions
In this section, we give a brief summary of the assumptions that are needed for BMRMO. First of all, as an MR approach, BMRMO requires the basic MR assumptions to hold: (a) each IV should have an effect on X; (b) IVs should not be associated with U; and (c) the pathway for IVs to affect Y has to go through X. These three assumptions are also usually referred to as the relevance assumption, independence assumption and exclusion restriction assumption. 40 In addition, since BMRMO uses a mixed response model with latent variables in the second stage of two-stage MR, it brings the corresponding distributional assumption on the outcome variables. Specifically, we assume that the binary outcomes originate from some latent continuous variables that follow normal distributions. Detailed structures of the distributions are provided in Section 2.4. For the different datasets, we assume the individuals do not overlap, and they are from a single population of interest, which is crucial for applying the two-stage methods. Note that the above assumptions are required for testing the causal effect, which is the focus of our article. Under the same assumptions, BMRMO can obtain consistent effect size estimates for continuous outcomes. More details regarding estimation are provided in Appendix B of the supplementary materials.

Simulations
We simulate datasets D 0 , D 1 , … , D M with M = 6. Each outcome dataset D m (m = 1, 2, … , 6) has sample size n m = 120, and the exposure dataset D 0 has sample size 720. For the IVs, we simulate p = 10 independent SNPs with minor allele frequencies (MAFs) generated from Unif(0.3, 0.5). Suppose we have q 1 binary outcomes and q 2 continuous outcomes. We generate U, X and Y using where G m,ik , U m,i , X m,i , and Y m,ij are the genotype of the kth SNP, confounder, exposure, and jth outcome for subject i in dataset D m . Z m,ij is the jth latent outcome, which is connected to the binary outcome by the probit link, and the continuous outcomes by the identity link. GU,k , GX,k represent the kth SNP's effect on U and X, while XZ,j and UZ,j stand for the effects of X and U on the jth latent variable. UX  Following relevant studies, 29,41 GX,k 's are generated from a truncated normal distribution with mean zero and SD 0.15. We take GX,k > 0.08 to ensure the IV strength assumption is met. We also scale GX,k 's so that the IVs explain about 20% of the variation in the exposure. We set GU,k = 0, UX , UZ,j ∼Unif(−0.5,0.5).
After we simulate the seven whole datasets, we remove certain variables to make sure dataset D 0 only contains information on the IVs and the exposure, datasets D 1 , … , D 6 each contains the IVs and a subset of the outcomes. We apply the BMRMO method and the two-stage univariate MR method to these datasets to examine the difference between their performances. By default, we choose T = 25000 and B = 1000 for BMRMO. A brief discussion on the validity of these numbers is provided in Appendix A of the supplementary materials. Table 1 shows a summary of the differences between our examined major scenarios. When comparing the proportion of falsely identified causal effect at the presence of no causal effect, we set XZ,1 = XZ,2 = XZ, 3  Before introducing the results, we would like to specify some terms and abbreviations. "UVA" and "MVA" stand for standard two-stage univariate MR and BMRMO, respectively. "minP" corresponds to the minP test with Bonferroni correction for UVA. "CI," "Wald," and "BF" refer to the overall tests based on the adjusted credible intervals, Wald test and Bayes factor, respectively for BMRMO.
In Scenario 1, we assume D 1 has outcomes 1, 2, 3; D 2 has outcomes 1, 2; D 3 has outcomes 1, 3; D 4 has outcomes 2, 3; D 5 has outcome 1; D 6 has outcome 2. This means we have a mixture of datasets that do not contain all of the outcomes. As shown in Figure 3, both univariate and multivariate analyses are able to control the rates of falsely identified causal effect when testing a single outcome or testing the overall causal effect. In terms of power, as shown in Figure 4, the overall tests of BMRMO usually have higher power than the minP test, especially when some correlations are negative. The Wald test tends to have the highest power, but the Bayes factor approach also has decent power. Note that when the three outcomes are uncorrelated, BMRMO is still able to boost power over univariate analysis, especially when the exposure is affecting more than one outcome. When there is only one outcome affected by the exposure, the Bayes factor approach does not show much advantage compared to the minP test. These results are consistent with the previous findings. 42,43 The minP test with Bonferroni correction is similar to the SPU(Inf) test, proposed by Pan et al., 42 and usually works well when the signal is sparse, meaning that most of the outcomes are not affected by the exposure. Meanwhile, the Wald test and the TA B L E . 1 Summary of different scenarios.

4
Yes D 1 has outcomes 1, 2, 3; D 2 has outcomes 1, 2; D 3 has outcomes 1, 3; D 4 has outcomes 2, 3; D 5 has outcome 1; D 6 has outcome 2.  Bayes factor approach learn more toward the SPU (2) test, 42 which means they are more advantaged when the signal is relatively dense, with multiple outcomes affected by the exposure. Next, we examine another scenario where for the outcome data, we only have datasets D 1 , D 2 and D 3 , and they all contain all three outcomes. This means we are looking at an ideal situation with complete data. For power comparison, in Scenario 2, we choose the following cases: Case 2A: XZ,1 = 0, XZ,2 = 0, and XZ,3 = 0.4. The exposure affects only one outcome. Case 2B: XZ,1 = 0.2, XZ,2 = 0, and XZ,3 = 0.2. The exposure affects two outcomes. Case 2C: XZ,1 = XZ,2 = XZ,3 = 0.15. The exposure affects all three outcomes. As shown in Figures 5 and 6, results are very similar to those in the previous setting. Again, both univariate and multivariate analyses are able to control the proportion of falsely identified causal effect, while the overall tests of BMRMO tend to have higher power than the univariate overall test, especially when there are multiple causal effects. The increase of power brought by BMRMO is the largest when the correlation between different outcomes is negative. The results of Scenarios 3-5 are provided in Appendix B of the supplementary materials, including examples showing that BMRMO is able to handle moderately correlated instruments as well as the situation where none of the outcome datasets contain all of the outcomes.

Real data application
To illustrate how BMRMO and the univariate MR analysis perform differently in a real setting, we apply them to the CO.17 and CO.20 data. [11][12][13] The CO.17 and CO.20 trials were two independent phase III-randomized trials aimed to study the treatment effect of cetuximab compared to placebo and the treatment effect of cetuximab plus brivanib alaninate compared to cetuximab alone, respectively, for colorectal cancer patients. A total of 78 subjects who received cetuximab and 80 subjects who received placebo in the CO.17 trial as well as 284 subjects who received cetuximab plus brivanib alaninate and 300 subjects who received cetuximab alone in the CO.20 trial were genotyped and passed quality control (there is no subject who only took placebo in the CO.20 trial as the main objective of this trial was to compare the combined treatment with the cetuximab only treatment). At the beginning of our analysis, 533 631 SNPs are genotyped using the Illumine Oncoarray platform. We would like to point out that even though this application uses the data from two randomized trials, when combining both CO.17 and CO.20 data, we are treating our data as observational data, as the subjects were randomized based on different treatment groups instead of the exposure variable of interest. Our approach can be used on observational studies without randomization, which is what MR methods are usually applied to. Figure 7 shows our study process. For the CO.20 data, we use the subjects who received cetuximab plus brivanib alaninate as dataset D 0 , and the subjects who received cetuximab plus placebo as dataset D 1 . For the CO.17 data, we use the subjects who received cetuximab as dataset D 2 . As a result, we have three independent datasets D 0 , D 1 , and D 2 . We are interested in examining whether baseline magnesium level (a continuous variable), an exposure variable that is associated with certain genetic variants, has a significant effect on at least one of the three toxicity outcomes: diarrhea, aspartate aminotransferase (AST), and lactate dehydrogenase [LDH] for patients treated by cetuximab. Diarrhea is recorded as a binary variable (whether a patient experienced it within 8 weeks after allocation), while AST and LDH are two continuous variables defined as the worst (maximum) values within 8 weeks. We log-transform AST and LDH and exclude outliers.
For our two-stage MR analysis, we build a first-stage model using dataset D 0 , and then use this model to predict the exposure values for datasets D 1 and D 2 . Next, we build second-stage models using D 1 and D 2 to analyze the causal effect of baseline magnesium level on diarrhea, AST and LDH. We compare the Wald test and Bayes factor approach based on BMRMO with the standard two-stage univariate MR approach. The P-value of the univariate overall test is 0.076, meaning that if we rely on univariate analysis, then we will conclude that the exposure has a marginal effect on any of the outcomes. Meanwhile, the P-value for the Wald test based on BMRMO with 200 000 iterations and 20 000 burn-ins is 0.009, and the Bayes factor is 15.3, which means we have stronger evidence to conclude that F I G U R E 7 Workflow for the CO.17/CO.20 data application. the baseline magnesium level has a significant effect on at least one of the outcomes. These results are consistent with our simulation findings, showing that multivariate analysis may give us more power to detect a significant causal effect compared to the standard univariate analysis. Besides, based on the credible intervals, baseline magnesium' effect on AST is significant with a negative posterior mean (−1.85). In conclusion, having a higher baseline magnesium level may lower the risk of elevated AST levels. More information on this application, including checking the MR assumptions and the posterior distributions of the causal effects, is available in Appendix C of the supplementary materials.

DISCUSSION
We propose a novel approach to conduct two-stage MR in a Bayesian framework with multivariate analysis. Incorporating the composite likelihood method and the Metropolis-Hastings algorithm, this new method can be applied to situations where researchers have a mix of binary and continuous outcomes from multiple datasets. We have also proposed three different ways to conduct hypothesis testing based on our multivariate modeling, including the adjusted credible interval method, Wald test and the Bayes factor approach. Our simulation studies show that in terms of the overall test, while both multivariate and univariate MR analyses can control the proportion of falsely identified causal effect, BMRMO has consistently higher power than the univariate MR method. The increase of power is largest when multiple outcomes are affected by the exposure, and there is negative correlation between the study outcomes. Besides, the Wald test based on BMRMO tends to show slightly higher power than the Bayes factor approach, while the Bayes factor approach may seem more appropriate in a Bayesian setting. Note that even though we have incorporated various scenarios in our simulations, the number of configurations is still limited, which means more simulations can be explored for future complex data structures. In practice, we recommend conducting both tests and comparing their results. If the results are inconsistent (eg, the Wald test has significant result while the Bayes factor is not large), we should not accept the significant result. Instead, more investigation needs to be conducted, which may involve collecting more data to increase the sample size.
After applying the univariate and multivariate MR methods to the CO.17 and CO.20 data, we have found stronger evidence from the proposed method than that from the univariate method that baseline magnesium has a significant effect on at least one of the three toxicity outcomes of interest (diarrhea, AST and LDH), confirming the potential power advantage of multivariate analysis over the univariate analysis. We also found that the significant signal comes from AST, and the causal effect on AST is negative. This is clinically relevant since for patients treated with cetuximab, hypomagnesemia is potentially associated with improved outcomes, and our result suggests that predisposition to low magnesium may lead to increased liver toxicities. Whether this relationship is a result of low magnesium leading to increased levels of cetuximab (pharmacokinetic association) or increased efficacy of cetuximab on the metastatic cancer (pharmacodynamic association) has yet to be confirmed. Nevertheless, since it is usually more likely to observe AST abnormalities occurring with drug toxicity than liver metastases, our result may suggest though does not prove that the relationship between hypomagnesemia and elevated AST may be related more to drug toxicities.
Note that since the current version of BMRMO is based on the Metropolis-Hastings algorithm, the computation burden can be heavy when we have a large number of parameters in our models. In the future, we may explore more computational efficient ways to carry out the analysis, with Gibbs sampling for instance, 44 though this can be challenging given the complexity of the composite likelihoods. Another possible extension for our method is to incorporate the Bayesian framework's ability to handle missing data, which may involve specifying more parameters to estimate. Meanwhile, our method uses the 2SPS (two-stage predictor substitution) approach, which can provide consistent estimates for continuous outcomes but may have biased estimates for some binary outcomes as the second-stage model is not linear, while other methods like 2SRI (two-stage residual inclusion) may provide better estimates in certain scenarios. 45 Nevertheless, we choose to use the standard 2SPS since 2SPS may still perform as good as or even better than 2SRI in some scenarios, 39 and 2SPS is valid for testing the causal effect, 18 which is the focus of this article. In the future, we may explore other estimation algorithms to reduce bias and find extra assumptions needed to acquire consistent estimates for binary outcomes. Following Zou et al., 46 we may also extend our method to the situation with overlapping samples, which will involve more methodology considerations to account for compared to independent samples.
We would also like to point out that the focus of this article is on comparing the multivariate analysis to the standard two-stage MR analysis using individual-level data. In MR research, it is of importance to select valid IVs, since violation of the MR assumptions may lead to problematic results, unless a robust method that addresses invalid IVs is carefully applied. In the presence of a growing emphasis on invalid IVs, it will be very beneficial to develop a robust multivariate approach that can handle invalid IVs and compare it to the robust MR methods, which are usually based on summary statistics. However, due to the complicated likelihoods for mixed outcomes, developing an efficient and robust multivariate approach using summary statistics only (eg, a multivariate approach parallel to the MR-Egger regression) may be particularly challenging if we still consider mixed outcomes instead of only one type of outcome. Meanwhile, we may also explore the possibility of extending the multivariate analysis to other types of outcomes such as survival outcomes.

ACKNOWLEDGMENTS
The authors would like to acknowledge the clinical contributions of Lillian Siu as well as CCTG (Canadian Cancer Trials Group) and AGITG (Australasian Gastro-Intestinal Cancer Trials Group's) contributions for the CO.17 and CO.20 studies.

FUNDING INFORMATION
This work was supported by the Alan Brown Chair in Molecular Genomics, the Lusi Wong Family Fund, and the Posluns Family Fund, all through the Princess Margaret Cancer Foundation.

DATA AVAILABILITY STATEMENT
R code for our simulation studies is available at https://github.com/yangq001/BMRMO. The data related to CO.17 and CO.20 are not publicly available due to privacy or ethical restrictions.