Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator

ABSTRACT Developments in genome‐wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse‐variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite‐sample Type 1 error rates than the inverse‐variance weighted method, and is complementary to the recently proposed MR‐Egger (Mendelian randomization‐Egger) regression method. In analyses of the causal effects of low‐density lipoprotein cholesterol and high‐density lipoprotein cholesterol on coronary artery disease risk, the inverse‐variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR‐Egger regression methods suggest a null effect of high‐density lipoprotein cholesterol that corresponds with the experimental evidence. Both median‐based and MR‐Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.


Introduction
Over the past decade, Mendelian randomization has become an established tool for probing questions of causality when characterizing the etiology of disease (Burgess and Thompson, 2015;Davey Smith and Ebrahim, 2003). The requirement for such approaches stems from a fundamental limitation of observational data, namely that causation cannot automatically be inferred from an association between an exposure and a disease. The association could be due to unobserved confounding between the exposure and the outcome, or reverse causation (the outcome affects the exposure) (Davey Smith and Ebrahim, 2004). These limitations are generally of no consequence when the aim is merely to predict the likelihood of future outcomes. However, if an exposure has a noncausal association with an outcome, then public health or pharmaceutical interventions targeted at the exposure will realize no material benefit and represent a waste of resources.
The basic premise of Mendelian randomization relies on genetic variants that explain variation in the exposure, but do not affect the disease outcome except possibly through the exposure. Such genetic variants are known as instrumental variables (IVs) (Greenland, 2000). Subgroups of individuals with differing numbers of alleles of a genetic IV can be thought of as having been randomized to receive a different mean level of the exposure during their life course (Davey Smith and Ebrahim, 2005;Nitsch et al., 2006). If the randomization is indeed uncontaminated (in the sense that a person's genetic subgroup is independent of all factors, except the exposure and any causal consequence of the exposure), then differences in the outcome between genetic subgroups can be causally attributed to the exposure (Didelez and Sheehan, 2007). The following three assumptions are necessary for a genetic variant to be a valid IV (Martens et al., 2006): Figure 1. Illustrative diagram representing the hypothesized relationships between genetic variant G j , exposure X , disease Y , and confounders U when G j is a valid instrumental variable (IV). Crosses indicate violations of assumptions IV2 and IV3 that potentially lead to invalid inferences from conventional methods.
inferences must by necessity make an untestable assumption. The validity of a causal conclusion from a Mendelian randomization analysis depends on the plausibility of these assumptions.
Early implementations of the Mendelian randomization approach were largely constrained by limitations of power as investigations were undertaken in small sample sizes, and used only a handful of genetic variants (each explaining a small proportion of the variance in the exposure). However, a revolution in the field is under way led by the identification of increasing numbers of genetic variants robustly associated with particular traits, and the public release by many large consortia of summary association estimates for hundreds of thousands of genetic variants with exposures and disease outcomes (Burgess et al., 2015b), such as the Global Lipids Genetics Consortium (GLGC) for lipid fractions [GLGC, 2013] and the CARDIoGRAM consortium for coronary artery disease (CAD) risk (CARDIoGRAMplusC4D Consortium, 2013). The availability of such summary data has facilitated powerful Mendelian randomization investigations to be conducted in a two-sample framework by providing genetic associations precisely estimated in large sample sizes .
The inclusion of multiple variants in a Mendelian randomization analysis typically leads to increased statistical power (Freeman et al., 2013), but presents new challenges (Glymour et al., 2012). First, if there is substantial overlap in the datasets from which the association estimates with the exposure and with the outcome were obtained, then the resulting analysis suffers from bias and inflated type 1 error rates when the included variants are "weak" (i.e., they do not explain a substantial proportion of variation in the exposure in the dataset under analysis) (Burgess et al., 2011;Pierce and Burgess, 2013). Second, it may not be the case that all included genetic variants are valid IVs.
In this paper we propose a new method for Mendelian randomization using summary data that offers protection against invalid instruments: the weighted median estimator. This approach can provide a consistent estimate of the causal effect even when up to 50% of the information contributing to the analysis comes from genetic variants that are invalid IVs. We explore the statistical properties of the weighted median estimator in a realistic simulation study, and compare with an alternative summary data analysis method also robust to some violations of the IV assumptions, MR-Egger regression (Bowden et al., 2015). We explain how the two approaches differ in their assumptions, and when they each work well or fail. We provide an illustrative estimate of two-sample Mendelian randomization using summary data on the associations of 185 genetic variants with high-density lipoprotein cholesterol (HDL-c), low-density lipoprotein cholesterol (LDL-c), and triglycerides from the GLGC, and with CAD risk from the CARDIoGRAM consortium. We conclude with a discussion of the issues raised and the potential for future research.

Methods
Consider data from a Mendelian randomization study on J genetic variants G 1 , . . . , G J , a continuous exposure X and a continuous outcome Y. All confounding variables are subsumed into a single variable U. We initially assume that all genetic variants are valid IVs, and further assume that all the relationships between variables in Figure 1 are linear without heterogeneity or effect modification: Assumption IV1 tells us that all variants are associated with the exposure, so γ j = 0 for all j . Assumptions IV2 and IV3 tell us that the genetic associations with the outcome j are equal to the genetic associations with the exposure γ j multiplied by the causal effect of the exposure on the outcome β: so j = βγ j . The error terms Xj and Yj are assumed to be normally distributed and contain contributions from the confounder U and all genetic variants except G j . In a one-sample setting, the exposure and outcome data are collected on the same individuals, in which case Xj and Yj are correlated. If exposure and outcome data are collected on different sets of individuals (known as two-sample Mendelian randomization (Pierce and Burgess, 2013)), then these error terms are independent. We assume throughout this manuscript that all genetic variants are uncorrelated (i.e., not in linkage disequilibrium), so that the information provided by each genetic variant is independent. Extensions to allow for correlated variants in a straightforward application of Mendelian randomization have been developed , but the situation of uncorrelated variants is usual in applied practice.

Inverse-Variance Weighted Method
The causal effect of the exposure on the outcome can be estimated using the j th variant as the ratio of the geneoutcome association and the gene-exposure association estimates (Lawlor et al., 2008): If the IV assumptions are satisfied for genetic variant j , then j = βγ j and the ratio estimate is consistent asymptotically. Furthermore, if the genetic variants are uncorrelated (not in linkage disequilibrium) then the ratio estimates from each genetic variant can be combined into an overall estimate using a formula from the meta-analysis literature (Johnson, 2013):β where σ Yj is the standard error of the gene-outcome association estimate for variant j . This is referred to as the inversevariance weighted (IVW) estimator . Provided that the genetic variants are uncorrelated, the IVW estimate is asymptotically equal to the two-stage least squares estimate commonly used with individual-level data. If all genetic variants satisfy the IV assumptions, then the IVW estimate is a consistent estimate of the causal effect (i.e., it converges to the true value as the sample size increases), as it is a weighted mean of the individual ratio estimates.

Simple Median Estimator
The IVW estimate is an efficient analysis method when all genetic variants are valid IVs. Unfortunately, it will be biased even if only one genetic variant is invalid. For this reason, the IVW estimate can be said to have a 0% breakdown level. However, an estimator exists that enjoys a 50% breakdown level; that is, it provides a consistent estimate of the causal effect when up to (but not including) 50% of genetic variants are invalid. This simple estimator is the median ratio estimate (Han, 2008). Specifically, letβ j denote the j th ordered ratio estimate (arranged from smallest to largest). If the total number of genetic variants is odd (J = 2k + 1), the simple median estimator is the middle ratio estimateβ k+1 . If it is even (J = 2k), the median is interpolated between the two middle estimates 1 2 (β k +β k+1 ). In terms of notation, we assume in this manuscript that genetic variants are ordered according to their ratio estimates.
In order to understand why the median estimator achieves a 50% breakdown level, we consider a fictional analysis using 10 genetic variants, six of which are valid IVs and four of which are invalid. Figure 2 (left) shows a scatter plot of 10 geneexposure (γ j ) and gene-outcome (ˆ j ) association estimates for an Mendelian randomization study with a finite sample size (Kang et al., 2015). The ratio estimate for each genetic variant is the gradient of the line connecting the relevant datapoint for that variant to the origin. The true causal effect is shown by the dotted black line, the median estimate by the dashed line, and the IVW estimate by the solid line. Estimates from valid IVs are shown by hollow circles, estimates from invalid IVs are shown by solid circles. Although the valid IVs follow the true slope, the IVW estimate is pulled away from the true value by the invalid instruments, which yield biased estimates of the causal effect. Figure 2 (right) shows the same scatter plot for an infinite sample size. Now the six valid instruments lie perfectly on the true line, and all yield the same true causal estimate. The median ratio estimate (in this case, an average of the fifth and sixth ratio estimates) is the true causal effect. In contrast, the IVW estimate remains biased even with infinite data, as the ratio estimate from each genetic variant always contributes toward the overall IVW estimate.

Weighted Median Estimator
The simple median estimator is inefficient, especially when the precision of the individual estimates varies considerably. In order to account for this, a weighted median can be defined as follows. Let w j be the weight given to the j th ordered ratio estimate, and let s j = j k=1 w k be the sum of weights up to and including the weight of the j th ordered ratio estimate. Weights are standardized, so that the sum of the weights s J is 1. The weighted median estimator is the median of a distribution having estimateβ j as its p j = 100(s j -w j 2 )th percentile. For all other percentile values, we extrapolate linearly between the neighboring ratio estimates. The contribution Weights and percentiles of the empirical distribution function assigned to the ordered ratio instrumental variable estimates (β j ) for the hypothetical examples given in Figure 3.

Figure 3. Empirical distribution functions of ordered ratio instrumen-
tal variable estimates (β j ) used for calculation of the simple median estimate (black) and two weighted median estimates (shown in red and blue) using the weights given in Table 1.
of the j th genetic variant to the empirical distribution is proportional to its weight w j . The simple median estimator can be thought of as a weighted median estimator with equal weights. Although the simple median provides a consistent estimate of causal effect if at least 50% of IVs are valid, the weighted median will provide a consistent estimate if at least 50% of the weight comes from valid IVs. We assume that no single IV contributes more than 50% of the weight, otherwise the 50% validity assumption is equivalent to assuming that this IV is valid (in which case, an analysis should simply be based on this one IV). Some technical remarks on weighted medians are given in Supporting Information Appendix 1.
As an illustration, two sets of weights are given in Table 1, and percentiles are calculated for each set of weights as well as for the simple median (equal weights). As the first set of weights are symmetric, the weighted median in this case equals the simple median. However, less weight is given to outlying estimates, and the empirical distribution function (Fig. 3, red line) is close to the median value across a wider range of the distribution. Confidence intervals for the weighted median, which can be obtained by a parametric bootstrap method, should therefore be narrower. In the second set of weights, smaller estimates happen to receive more weight (Fig. 3, blue line). The weighted median estimate will be interpolated between ratio estimatesβ 3 andβ 4 , but will be closer toβ 4 as the percentile p 4 is closest to 50%. The exact weighted median estimate in this case will bê The weighted median can also be thought of as the simple median from a set of values (a pseudopopulation) in which the ratio estimateβ 1 for variant 1 appears 100 × w 1 times, ratio estimateβ 2 for variant 2 appears 100 × w 2 times, and so on. R code to calculate weighted median estimates, confidence intervals, standard errors and P-values is provided in Supporting Information Appendix 2.
Analogously to the IVW method, we suggest using the inverse of the variance of the ratio estimates as weights: These weights are derived from the delta method for the variance of the ratio of two random variables, and represent the reciprocal of the variance of the ratio estimates (the inversevariance weights) (Thomas et al., 2007). Standardized weights are w j = w j j w j . The unstandardized weights are identical to those used in the IVW estimator. Only the first-order term from the delta expansion is used here; further terms could be considered, although we found that they did not affect estimates or standard errors from the weighted median method substantially.

Penalized Weighted Median Estimator
In Figure 2 (left side), although the invalid IVs do not contribute directly to the median estimate, they do influence it. The simple median estimate in this case is an average of the fifth and sixth ratio estimates. If the invalid IVs were not present, the median estimate would be the average of the third and fourth ratio estimates. The presence of invalid instruments does not affect the median estimate asymptotically, but in this example it will bias the estimate in finite samples, as the fifth and sixth ratio estimates will always be larger than the third and fourth ratio estimates. This is likely to be a problem when the estimates from invalid IVs are not balanced about the true causal effect (as in this example, all four invalid estimates are greater than the true causal effect).
One way of minimizing this problem is downweighting the contribution to the analysis of genetic variants with heterogeneous ratio estimates. Heterogeneity between estimates can be quantified by Cochran's Q statistic: where we takeβ to be the IVW estimate (Greco et al., 2015). The Q statistic has a chi-squared distribution on J -1 degrees of freedom under the null hypothesis that all genetic Genetic Epidemiology, Vol. 40, No. 4, 304-314, 2016 variants are valid IVs and the same causal effect is identified by all variants. Under this null hypothesis, the components of the Q statistic corresponding to the individual genetic variants (Q j ) approximately have chi-squared distributions with 1 degree of freedom. So as not to distort the weightings of the majority of variants, we propose penalization using the one-sided upper P-value (denoted q j ) on a chi-squared 1 distribution corresponding to Q j , by multiplying the weight by the P-value multiplied by 20 (or by 1 if the P-value is greater than 0.05). The (unstandardized) penalized weights (w * j ) are therefore This means that most variants will be unaffected by the penalization, but outlying variants will be severely downweighted.

MR-Egger Regression
An alternative robust method for Mendelian randomization with summary data has been recently proposed by Bowden et al. [2015], referred to as "MR-Egger regression." This approach was motivated from a method in the metaanalysis literature for the assessment of small-study bias (often called "publication bias") (Egger et al., 1997). This performs a weighted linear regression of the gene-outcome co-efficientsˆ j on the gene-exposure coefficientsγ j : in which all theγ j associations are orientated to be positive (the orientation of theˆ j associations should be altered if necessary to match the orientation of theγ j parameters), and the weights in the regression are the inverse variances of the gene-outcome associations (σ -2 Y j ). Reorientation of the variants is performed as the orientation of genetic variants is arbitrary (i.e., estimates can be presented with reference to either the major or minor allele), and different orientations of genetic variants change the estimate of the intercept, as well as the sign and magnitude of the pleiotropic effect of the genetic variant. If there is no intercept term in the regression model, then the MR-Egger slope estimateβ E will equal the IVW estimate (Burgess et al., 2015a).
The value of the intercept termβ 0E can be interpreted as an estimate of the average pleiotropic effect across the genetic variants (Bowden et al., 2015). The pleiotropic effect is the effect of the genetic variant on the outcome that is not mediated via the exposure. An intercept term that differs from zero is indicative of overall directional pleiotropy; that is, pleiotropic effects do not cancel out and the IVW estimate is biased.
MR-Egger regression additionally provides an estimate for the true causal effectβ E that is consistent even if all genetic variants are invalid due to violation of IV3, but under a weaker assumption known as the InSIDE (instrument strength independent of direct effect) assumption. If the association of the j th genetic variant with the outcome j = βγ j + α j , where α j is the pleiotropic (direct) effect of the variant, then the InSIDE assumption states that the pleiotropic effects α j must be distributed independently of the instrument strength parameters γ j (Kolesár et al., 2014). (Formally, the consistency property holds both as the sample size and the number of instruments increases. For a fixed number of instruments, consistency only holds asymptotically if the correlation between the α j and γ j parameters is zero.) The InSIDE assumption is likely to be satisfied if pleiotropic effects on the outcome are direct (i.e., not via a confounder). There is some empirical evidence supporting the proposition that genetic effects on separate exposures are independent (Pickrell, 2015). However, if the pleiotropic effects of genetic variants are all via a single confounder, then they will be correlated with instrument strength, and the InSIDE assumption will be violated.

Simulation Study
In order to investigate the performance of the weighted median method in realistic settings, as well as to determine in what scenarios it performs well or badly in comparison with the IVW and MR-Egger regression methods, we perform a simulation study. We assume there are 25 genetic variants that are candidate IVs, and consider three scenarios: 1. Balanced pleiotropy, InSIDE assumption satisfiedpleiotropic effects are equally likely to be positive as negative, these effects are uncorrelated with the instrument strength. 2. Directional pleiotropy, InSIDE assumption satisfiedonly positive pleiotropic effects are simulated, these effects are uncorrelated with the instrument strength. 3. Directional pleiotropy, InSIDE assumption not satisfied-pleiotropic effects are via a confounder, these effects on the outcome are therefore positive and are correlated with the instrument strength.
The status of a genetic variant as an invalid IV is determined by a random draw for each variant. The probability of being an invalid variant is taken as 0.1, 0.2, and 0.3. We consider cases with 10,000 and 20,000 participants.
We generated 10,000 simulated datasets for each scenario in a two-sample setting with two values of the causal effect (β = 0, null causal effect; β = 0.1, positive causal effect); the simulations are repeated in Supporting Information Appendix 3 in a one-sample setting. Only the summary data on the genetic associations with the exposure and with the outcome (and their standard errors) are used as data inputs in the analysis methods (individual-level data are not used). Details of the data-generating model and the parameters used in the simulation study are given in Supporting Information Appendix 3.

Simulation Results
The simulation results in the two-sample setting are given in Table 2 (null causal effect) and Table 3 (positive causal effect). In Scenario 1 (balanced pleiotropy), the methods all give close to unbiased causal estimates, and have reasonable Type 1 error rates (power under the null is equal to Type 1 Mean estimates, mean standard errors, and power of 95% confidence interval to reject null hypothesis of inverse-variance weighted, weighted median, and MR-Egger regression methods in simulation study for two-sample Mendelian randomization with a null (β = 0) causal effect. Abbreviations: IV, instrumental variable; SE, standard error. magnitude of association with the exposure, then the MR-Egger regression estimate would not be identified. In Scenario 2 (directional pleiotropy, InSIDE assumption satisfied), estimates from the IVW method are biased with inflated Type 1 error rates. Estimates from the weighted median methods are less biased, although there is a consistent bias in the direction of the pleiotropic variants. Nominal Type 1 error rates are maintained when only 10% of genetic variants are invalid IVs, although Type 1 error rates are above the expected 5% rate when 20% or more genetic variants are invalid IVs. However, even though they are inflated, Type 1 error rates are far lower for the median-based methods than those from the IVW method. The penalized weighted median estimates are less biased when a few genetic variants are invalid, but when 30% of genetic variants are invalid, it seems that the invalid IVs are often retained in the analysis (as their estimates are homogeneous with each other) and valid IVs are excluded. Estimates from MR-Egger regression are close to unbiased, and Type 1 error rates are at nominal levels, suggesting its utility as a sensitivity analysis method when the InSIDE assumption is satisfied. However, the wide confidence intervals and low power with a true causal effect (close to 5%) mean that MR-Egger regression is not likely to be a discerning sensitivity analysis, but rather giving conservative findings as to whether a causal effect is present or not. Although power to detect a causal effect is greater in the IVW method, this is achieved at the cost of the Type 1 error rate; comparisons of power that do not make reference to the Type 1 error rate are meaningless.
In Scenario 3 (directional pleiotropy, InSIDE assumption not satisfied), all methods suffer from bias and inflated Type 1 error rates. Most concerning are results from the MR-Egger method, which are more biased than those from the IVW method and have similar Type 1 error rates. The weighted median methods, and in particular the penalized weighted median method, give lower Type 1 error rates, suggesting their potential use as a sensitivity analysis method when InSIDE is not satisfied, as well as when it is satisfied.
In Supporting Information Tables A1 and A2, results are presented in a one-sample setting. Each of the methods suffers from weak instrument bias in this setting, and Type 1 error rates are inflated. Estimates from MR-Egger regression are substantially more biased than those from other methods. However, the median-based methods remain a reasonable sensitivity analysis, as Type 1 error rates are similar to those of the IVW method in Scenario 1, and substantially lower in Scenarios 2 and 3. In Supporting Information Table A3, results are presented in a two-sample setting for the simple median estimator, and a weighted median estimator using inverse-standard error weights rather than the inversevariance weights used above. Simple median estimates are slightly less precise than those from the weighted median methods, and have similar bias and Type 1 error properties in Scenarios 1 and 2, but much improved Type 1 error rates in Scenario 3. This is because the invalid variants in this scenario are stronger than the valid variants, and so receive more weight in the weighted analyses. It is unclear that this would occur in applied practice, although it suggests that the simple median method is a worthwhile additional sensitivity analysis. The inverse-standard error weighted median estimator has improved Type 1 error properties compared with the IVW median estimator particularly in Scenario 3 (although not uniformly), indicating the potential to use different choices of weights in median-based methods to provide further sensitivity analyses.

Example: Lipid Concentrations and Coronary Artery Disease Risk
LDL-c is observationally positively associated with CAD risk (Di Angelantonio et al., 2009). Evidence from randomized trials of pharmaceutical interventions to lower LDL-c concentrations (such as statins (Pedersen et al., 1994;Cheung et al., 2004)), and from Mendelian randomization (Linsel-Nitschke et al., 2008;Do et al., 2013) suggests that this association is reflective of a causal effect of LDL-c on CAD risk. On the contrary, although HDL-c is observationally inversely associated with CAD risk, interventions to raise HDL-c concentrations (Schwartz et al., 2012) and focused Mendelian randomization investigations (Voight et al., 2012) suggest that there is not even a moderate causal effect of HDL-c on CAD risk. However, a more liberal Mendelian randomization analysis including genetic variants associated with other lipid fractions (in particular, triglycerides) did suggest a causal effect of HDL-c on CAD risk (Holmes et al., 2015). As a proof of concept example, we use summary data from the GLGC on genetic associations with lipid fractions, and from CAR-DIoGRAM on associations with CAD risk, and perform the analysis methods discussed in this paper to see whether the causal effect of LDL-c is preserved by the weighted median and MR-Egger regression methods, as well as whether the spurious causal effect of HDL-c is contradicted by the robust methods. We use data provided by Do et al. [2013] on 185 genetic variants for this analysis. The genetic associations with the lipid fractions are in standard deviation units, and with the outcome are log odds ratios, so the causal effects represent log odds ratios per 1 standard deviation increase in the lipid fraction. Further details of the analysis are provided in Supporting Information Appendix 4.
Taking each of LDL-c, HDL-c, and additionally triglycerides as the exposure variable in turn, we consider two analysis strategies. First, we take all genetic variants associated with the exposure at a genome-wide level of significance (taken as P < 10 -8 ). Second, we restrict to genetic variants for which the P-value for association with the target exposure (say, HDL-c) is less than the P-values for association with the nontarget exposures (say, LDL-c and triglycerides). We perform each of the methods presented in the manuscript (IVW, simple median, weighted median, penalized weighted median, MR-Egger regression). The data are presented as scatter plots in Figure 4 and Supporting Information Figure A1, and as funnel plots in Supporting Information Figure A2.
Results are given in Table 4. The causal effects of LDL-c and triglycerides are robustly detected by all the analysis methods. In contrast for HDL-c, both MR-Egger regression  and the weighted median methods suggest that the "causal effect" of HDL-c on CAD risk detected by the IVW method is suspect. With the exception of the simple median estimates, all other robust analysis methods give estimates compatible with the null. The test for directional pleiotropy in the MR-Egger regression method suggested pleiotropy for HDL-c, but not for LDL-c or triglycerides. Although similar results for HDL-c can be obtained by filtering out genetic variants that are suspected to have pleiotropic effects, by omitting genetic variants there is a loss of power (e.g., in Holmes et al. a liberal genetic risk score explained 3.8% of the variance in HDL-c, whereas a conservative score omitting potentially pleiotropic variants explained only 0.3%).
By way of comparison between the robust methods, similar results were seen in this example as in the simulation study. The median-based methods were consistently and substantially more precise than the MR-Egger regression method, with standard errors reduced by around 30-50%. The precision of the weighted median methods, in particular the penalized weighted median method, was not much worse than that of the IVW method, and in some cases was slightly better. The simple median method was not as impressive, suggesting a causal effect of HDL-c on CAD risk, and giving less precise estimates than those from the weighted median methods. The MR-Egger regression method performed well despite doubts about the InSIDE assumption in the case of HDL-c; many variants associated with HDL-c are also associated with LDL-c and triglycerides, and these associations are approximately proportional (see Supporting Information Figure A1), hence pleiotropic effects on CAD risk may operate via LDL-c and triglycerides. This may be the reason why the MR-Egger estimate changed sign in the analysis using variants having primary association with HDL-c.

Discussion
In this paper, we have introduced a weighted median method for the estimation of a causal effect using multiple IVs. Unlike other methods commonly used in Mendelian randomization, this method can give consistent estimates when Genetic Epidemiology, Vol. 40, No. 4, 304-314, 2016   We have shown how the method performs in a simulation study and in an applied example. A summary of the median-based and other methods considered in this paper is given in Table 5. Of particular interest is the comparison between the weighted median approach and MR-Egger regression. MR-Egger regression can give consistent estimates when 100% of genetic variants are invalid IVs, whereas the weighted median method requires 50% of the weight to come from valid IVs. However, the weighted median approach allows the IV assumptions to be violated in a more general way for the invalid IVs, whereas MR-Egger regression replaces one set of untestable assumptions (IV2 and IV3) with a weaker, but still untestable assumption (the InSIDE assumption). Additionally, weighted median estimates were substantially more precise than those from MR-Egger regression in the simulation study and applied example. MR-Egger regression estimates are likely to be particularly imprecise if all genetic variants have similar magnitudes of association with the exposure.
Although the only mechanism for generating invalid IVs considered in this paper was pleiotropy, median-based methods are agnostic to the mechanism by which the invalid IVs violate the IV assumptions. Consistent estimates would be guaranteed if some genetic variants were invalid IVs due to other mechanisms, such as linkage disequilibrium, population stratification, or differential survival (Bochud et al., 2008). One potential source of bias that may not be resolved by the proposed robust methods is selection bias (Hernán et al., 2004). This could arise from differential selection of individuals in the datasets from which the genetic associa-tions are obtained, or else the selection of genetic variants based on their strength in the dataset under analysis, or if genetic variants were discovered as associated with the exposure in the dataset under analysis. Selection bias may affect all genetic variants in a particular analysis, and so is unlikely to be addressed by the use of the median-based methods proposed in this paper.
We clarify that the objective of this paper is not to provide guidance on how to choose genetic variants for a Mendelian randomization analysis. This is an important question, but not one that it is possible to answer in a general way, in that it requires biological as well as statistical considerations of the specific analysis question in each case. The robust analysis methods presented in this paper are able to weaken the assumptions necessary for the consistent estimation of a causal effect. However, they do not eliminate the need to assess the validity of the IV assumptions. This is particularly true for the median-based approaches, as the assumption that 50% of the information in the analysis comes from valid IVs is still required. As a corollary, these robust methods should not be used to promote analysis approaches based on the whole genome for causal inference without further justification.
One assumption that we have made in this paper is that of no causal effect heterogeneity. This means that the same change in the outcome is expected no matter how the exposure is intervened on, and so the same causal effect is identified by different genetic variants that are valid IVs. In practice, it may be that genetic variants influence the exposure via different biological mechanisms. However, empirical evidence for the causal effect of LDL-c on CAD risk provides no evidence of causal effect heterogeneity despite genetic variants having different mechanisms and substantially different magnitudes of association with LDL-c (Davey Smith and Hemani, 2014).
Other approaches for the estimation of causal effects with some invalid IVs have been proposed. Kang et al. have proposed a method based on penalized regression for detecting and accounting for invalid instruments that provides a consistent estimate of causal effect if at least 50% of the genetic variants are valid (Kang et al., 2015). Han has proposed a similar penalized estimator within the generalized method of moments framework, again with a 50% breakdown level (Han, 2008). Kolesár et al. have proposed a method within the framework of k-class estimators with a 100% breakdown level under the InSIDE assumption (Kolesár et al., 2014). The first two methods are similar to the median-based approaches proposed here, and the final method is similar to MR-Egger regression. An alternative approach was proposed by Windmeijer et al. based on Hansen's overidentification test, a test of the homogeneity of the ratio estimates from different candidate IVs (Windmeijer et al., 2015). The basic idea of this approach is to report a causal estimate based on genetic variants whose ratio estimates are mutually similar. A related approach based on step-wise selection using a heterogeneity test statistic was proposed by Johnson [2013]. Although the exclusion of certain variants to consider their potential impact on the overall estimate may be a reasonable sensitivity analysis, a potential danger of such an approach ✗ √ Consistent when 100% of genetic variants are invalid, but requires variants to satisfy a weaker assumption (the InSIDE assumption). This assumption is not automatically violated by an association between a genetic variant and a confounder, but it would be violated if several variants were associated with the same confounder. Substantially less efficient than IVW and median-based methods, and more susceptible to weak instrument bias in a one-sample setting.
Breakdown refers to the breakdown level, the proportion of information that can come from invalid instrumental variables (IVs) before the method gives biased estimates. IV2 and IV3 refer to whether violations of the second (no association with confounders) and third (no direct effect on the outcome) instrumental variable assumptions are allowed ( √ ) or not allowed (✗).
is that of post hoc or data-driven analysis, in which genetic variants are cherry-picked for inclusion in the analysis model, and dissenting variants are filtered out. We have chosen in this paper to focus on the twosample situation using summary data as this is most relevant to Mendelian randomization; all the other methods have been developed for use with individual-level data in a one-sample setting, and hence are not considered in this manuscript. We look forward to further methodological developments in this area. One particular area that may be fruitful for such developments is bias-correction methods in the meta-analysis literature; two particular examples are the trim-and-fill method (Duval and Tweedie, 2000) and the use of pseudodata (Bowden et al., 2006). Equivalently, there may be application of the median-based methods proposed in this paper in the field of meta-analysis, to provide a more robust pooled estimate.
From a practical perspective, it is important to acknowledge the limitations of all methods for obtaining causal inferences. Our aim in presenting the median-based methods in this paper is not to recommend a single authoritative method for all Mendelian randomization analyses. Rather, examining the results from different methods that make different assumptions (IVW, simple median, weighted median, MR-Egger regression) provides a sensitivity analysis that either adds to or questions the robustness of a finding from a Mendelian randomization investigation. If, as in the case of the effect of LDL-c on CAD risk, a causal effect is reported across all methods, then a causal finding is far more plausible than if the methods give contradictory findings. Our advice in Mendelian randomization investigations using multiple genetic variants where the IV assumptions are in doubt for some or all genetic variants, would therefore be to perform and report results from a range of sensitivity analyses using robust methods, including the simple median, weighted median, and MR-Egger regression methods, in addition to the main analysis result.