Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Methods have been developed for Mendelian randomization that can obtain consistent causal estimates while relaxing the instrumental variable assumptions. These include multivariable Mendelian randomization, in which a genetic variant may be associated with multiple risk factors so long as any association with the outcome is via the measured risk factors (measured pleiotropy), and the MR-Egger (Mendelian randomization-Egger) method, in which a genetic variant may be directly associated with the outcome not via the risk factor of interest, so long as the direct effects of the variants on the outcome are uncorrelated with their associations with the risk factor (unmeasured pleiotropy). In this paper, we extend the MR-Egger method to a multivariable setting to correct for both measured and unmeasured pleiotropy. We show, through theoretical arguments and a simulation study, that the multivariable MR-Egger method has advantages over its univariable counterpart in terms of plausibility of the assumption needed for consistent causal estimation, and power to detect a causal effect when this assumption is satisfied. The methods are compared in an applied analysis to investigate the causal effect of high-density lipoprotein cholesterol on coronary heart disease risk. The multivariable MR-Egger method will be useful to analyse high-dimensional data in situations where the risk factors are highly related and it is difficult to find genetic variants specifically associated with the risk factor of interest (multivariable by design), and as a sensitivity analysis when the genetic variants are known to have pleiotropic effects on measured risk factors.

The same IVW estimate using summary statistics can be obtained using weighted linear regression (equation 3): thetaUI = summary(lm(bY~bX1-1, weights=bYse^-2))$coef [1] se_thetaUI.fixed = summary(lm(bY~bX1-1, weights=bYse^-2))$coef [1,2]/ summary(lm(bY~bX1-1, weights=bYse^-2))$sigma se_thetaUI.random = summary(lm(bY~bX1-1, weights=bYse^-2))$coef [1,2]/ min(summary(lm(bY~bX1-1, weights=bYse^-2))$sigma, 1) In the fixed-effect model we divide the standard error of the causal estimate by the estimated residual standard error to force the residual standard error to be 1. For the multiplicative random-effect model the standard error is divided by the estimated residual standard error when the variability in the genetic associations is less than expected by chance (underdispersion). When there is evidence of heterogeneity between the causal estimates (overdispersion) the standard error is unaltered. The multiplicative random-effects model will result in a larger standard error compared to the fixed-effect model if there is heterogeneity between the causal estimates. The causal estimate obtained from the fixed-and multiplicative random-effects models will be the same.

Multivariable IVW:
The multivariable IVW method expands the IVW method using weighted linear regression by estimating the causal effects of the additional risk factors on the outcome. We will include additional two risk factors and assume the causal estimate of interest is the effect of risk factor 1 on the outcome. Either fixed-or multiplicative random-effects can be used to estimate the standard error of the causal effect. theta1MI = summary(lm(bY~bX1+bX2+bX3-1, weights=bYse^-2))$coef [1] se_theta1MI.fixed = summary(lm(bY~bX1+bX2+bX3-1, weights=bYse^-2))$coef [1,2]/ summary(lm(bY~bX1+bX2+bX3-1, weights=bYse^-2))$sigma se_theta1MI.random = summary(lm(bY~bX1+bX2+bX3-1, weights=bYse^-2))$coef [1,2]/ min(summary(lm(bY~bX1+bX2+bX3-1, weights=bYse^-2))$sigma,1) Multivariable MR-Egger: The multivariable MR-Egger method is equivalent to the multivariable IVW method using weighted linear regression except the intercept is estimated rather than being set to zero. Testing whether the intercept term is equal to zero is equivalent to testing for directional pleiotropy and the validity of the InSIDE assumption. As with univariable MR-Egger, the standard errors should be calculated from the multiplicative random-effects model. The genetic associations should be orientated with respect to the risk increasing or decreasing allele of the risk factor of interest. In this sample code we will assume the causal effect of risk factor 1 is of primary interest. Since the residual standard error is estimated for the multivariable MR-Egger model we use the t-distribution with J − (K + 1) degrees of freedom for inference.

MR-Egger
In this section, we compare the precision of the causal estimates from the univariable (θ 1U E ) and multivariable (θ 1M E ) MR-Egger models. For the multivariable model, we consider the genetic associations β X k with two risk factors (k = 2), where the variance of the multivariable MR-Egger estimateθ 1M E is given by: Under a fixed-effect model, the variance of the univariable MR-Egger estimate is proportional to the inverse of var(β X 1 ). 1 The estimate from the multivariable MR-Egger modelθ 1M E will be more precise than its univariable counterpartθ 1U E if: From the above inequality,θ 1U E will always be more precise thanθ 1M E when β X 1 and β X 2 are correlated. Under a multiplicative random-effects model (used throughout this paper), the variance of the residual error is estimated under the univariable MR-Egger model (φ 2 U E ) and the multivariable MR-Egger model (φ 2 M E ). Forθ 1M E to be more precise thanθ 1U E , we require: If β X 2 explains additional independent variability in the genetic associations with the outcome β Y , and β X 1 and β X 2 are independent, then the estimate from multivariable MR-Egger will be more precise than the estimate from univariable MR-Egger. If β X 1 and β X 2 are correlated, then the precision ofθ 1M E will depend upon the strength of the correlation between β X 1 and β X 2 , and the amount of additional independent variability β X 2 explains in β Y . As the correlation between β X 1 and β X 2 increases, and β X 2 explains no additional independent variability in β Y , the precision of the multivariable MR-Egger estimateθ 1M E will decrease.

A3 Summary statistics from the simulation study
The IVW and MR-Egger methods do not account for uncertainty in the genetic associations with the risk factor, referred to by Bowden et al as NO Measurement Error (NOME). 1 If there is substantial uncertainty in these association estimates and in a two-sample setting, the causal effect estimate from univariable MR-Egger may be biased towards the null. Bowden et al have shown that the relative attenuation in the MR-Egger estimate is approximately equal to the I 2 statistic from the meta-analysis of the weighted associations with the exposureβ The I 2 statistic lies between 0 and 1, with smaller values corresponding to more biased MR-Egger estimates. If the I 2 statistic is close to 1, then there should be little or no attenuation of the causal estimate from the univariable MR-Egger method. Bowden et al recommend that methods to account for this uncertainty be considered if the I 2 statistic is less than 90%. 1 The F-statistic is often reported in Mendelian randomization studies as a measurement of the strength of the instrumental variables, with larger values representing stronger instruments. For a two-sample Mendelian randomization analysis with summarized data, the F-statistic for each genetic variant j can be approximated by We use this approximation below. The data-generating model used in the simulation study did not provide the standard errors of the genetic associations with the three risk factors se(β X k ), as they were not required for the methods considered. To estimate the mean values of the F-statistics and I 2 statistics, we must make assumptions about the values of these standard errors. We assume that the genetic associations with the risk factors are provided on the standard deviation scale. If the associations were estimated from a sample size of 10 000, this results in a standard error of 0.01. Assuming that the standard errors of the genetic associations with the three risk factors are 0.01 across the 185 genetic variants, we obtain the mean F-statistics and I 2 statistics displayed in Table A1 and Table A2. The I 2 statistics (reported as a %) are close to 100% across the different scenarios. These results are consistent with the simulation study where the causal estimates from the univariable and multivariable MR-Egger methods showed no attenuation towards the null.  To investigate the behaviour of the multivariable MR-Egger method when causal relationships between risk factors exist, additional simulations were performed where X 2 was causally dependent on X 1 . We assume that X 2 is causally dependent on X 1 , and the causal effect of X 1 on X 2 is γ. The total causal effect of X 1 on Y is θ 1 + γθ 2 ; consisting of the direct effect (θ 1 ) and the indirect effect via X 2 (γθ 2 ). The simulations outlined in Section 4 were repeated with the second line in the data generating model replaced with: The causal effect of X 1 on X 2 (γ) was set to 0.5. All other parameters were taken as in the original simulation study. |β X 1j |, (β X 2j + γ|β X 1j |), and β X 3j were the covariates included in the multivariable IVW and multivariable MR-Egger models. Note that the functional relationship between X 1 and X 2 induces a correlation structure between the covariates |β X 1j | and (β X 2j + γ|β X 1j |) included in the multivariable models, even when β X 1 and β X 2 are generated independently. To account for the additional uncertainty in β Y j , the weights for univariable MR-Egger are given by equation 5, while the weights for multivariable IVW and multivariable MR-Egger were the same as the original simulation study (equation 15).

Results
The results from the simulations that included a causal relationship between X 1 and X 2 , using 10 000 simulated datasets, are presented in Web Table A3 (β X k generated independently, with the functional relationship between X 1 and X 2 inducing a correlation structure between |β X 1j | and (β X 2j +γ|β X 1j |)) and Web Table A4 (β X k correlated). β X k generated independently, with a correlation structure between the covariates |β X 1j | and (β X 2j + γ|β X 1j |): In scenarios where there was no bias in the original set of simulations, the multivariable IVW and multivariable MR-Egger methods consistently estimated the direct effect of X 1 on Y (θ 1 ), whilst the univariable MR-Egger method consistently estimated the total causal effect of X 1 on Y (θ 1 + γθ 2 ).
Bias for the multivariable IVW method was present in scenarios 3 and 4 only, as in the original simulation study (Tables 3 and 4). Compared to the results in Table 3, precision and power to detect a causal effect were reduced for the multivariable IVW and multivariable MR-Egger methods. This reduction in power may be due to the correlation structure between |β X 1j | and (β X 2j + γ|β X 1j |), and the multivariable models conditioning on a mediator. Univariable and multivariable MR-Egger methods produced biased estimates of the total and direct causal effects in scenario 4 (InSIDE violated) only. Unlike the original simulation study, precision and power to detect a causal effect were always better for the univariable MR-Egger method.
β X k correlated: The multivariable IVW and multivariable MR-Egger methods estimated the direct effect of X 1 on Y , as in the independently generated setting. As with the original simulations (Tables 3 and 4), the InSIDE assumption for univariable MR-Egger was violated for all four scenarios, resulting in biased point estimates. However, as with the original simulation study, the multivariable InSIDE assumption was satisfied for scenarios 1,2 and 3, and so causal estimates from multivariable MR-Egger were unbiased. There was a more noticeable reduction in the precision and power to detect a causal effect for the multivariable IVW and multivariable MR-Egger methods under the correlated setting. Table A3: Performance of multivariable IVW, univariable MR-Egger and multivariable MR-Egger with respect toθ 1 for a null (θ 1 = 0) and positive (θ 1 = 0.3) causal effect where β Xk are generated independently for all k (with a correlation structure between the covariates |β X1j | and (β X2j +γ|β X1j |)), with a causal effect of X 1 on X 2 (γ = 0.5). All tests were performed at the 5% level of significance.

A5 Correlated genetic variants
The methods discussed in this article have assumed that the genetic variants are uncorrelated (not in linkage disequilibrium). There may, however, be cases where using multiple correlated variants from the same gene region will be more efficient than using uncorrelated variants from different gene regions. 2 If the genetic variants are in partial linkage disequilibrium, and each variant explains independent variation in the risk factor, then the inclusion of these variants will increase the power of the MR study.
The precision of a MR study will not increase, however, if the variants are perfectly correlated.
If correlated variants are included in an MR study, using summarized level data, the analysis should account for the correlation structure of the variants. If the correlation of the variants is not taken into consideration, the causal estimate will be too precise and this may lead to inappropriate inferences. To account for the correlation between the genetic variants for the univariable and multivariable IVW methods, we can use generalized weighted linear regression of the genetic associations, where the correlations of the variants are included in the weighting matrix, with the intercept set to zero. 2,3 If Ω st = se(β Ys ) se(β Yt )ρ st , where ρ st is the correlation between variants s and t, then the causal estimate from a weighted generalised linear regression for univariable MR is:θ with the standard error of the causal estimate: Whilst the univariable MR-Egger estimates can be obtained by fitting the same generalized weighted linear regression model, but allowing the intercept term to be estimated, the effect of using correlated genetic variants in the univariable MR-Egger method has not been considered in detail. Further investigation into the impact correlated variants may have on the interpretation of the direct effect, and the InSIDE assumption, must be considered at the univariable level first, and then expanded to multivariable MR-Egger.