A framework for the investigation of pleiotropy in two‐sample summary data Mendelian randomization

Mendelian randomization (MR) uses genetic data to probe questions of causality in epidemiological research, by invoking the Instrumental Variable (IV) assumptions. In recent years, it has become commonplace to attempt MR analyses by synthesising summary data estimates of genetic association gleaned from large and independent study populations. This is referred to as two‐sample summary data MR. Unfortunately, due to the sheer number of variants that can be easily included into summary data MR analyses, it is increasingly likely that some do not meet the IV assumptions due to pleiotropy. There is a pressing need to develop methods that can both detect and correct for pleiotropy, in order to preserve the validity of the MR approach in this context. In this paper, we aim to clarify how established methods of meta‐regression and random effects modelling from mainstream meta‐analysis are being adapted to perform this task. Specifically, we focus on two contrastin g approaches: the Inverse Variance Weighted (IVW) method which assumes in its simplest form that all genetic variants are valid IVs, and the method of MR‐Egger regression that allows all variants to violate the IV assumptions, albeit in a specific way. We investigate the ability of two popular random effects models to provide robustness to pleiotropy under the IVW approach, and propose statistics to quantify the relative goodness‐of‐fit of the IVW approach over MR‐Egger regression. © 2017 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd


Introduction
The fundamental aim of Epidemiology is to determine the root causes of illness, with the focus of many epidemiological analyses being to examine whether an environmental exposure modifies the severity, or the risk of, disease. Of course, it is well known that causal conclusions cannot strictly be drawn from mere statistical associations between an exposure and outcome, unless all possible confounders of the association are identified, perfectly measured and appropriately adjusted for. Mendelian randomization (MR) [1,2] offers an alternative way to probe the issue of causality in epidemiological research, by using additional genetic information satisfying the Instrumental Variable (IV) assumptions. Briefly, the genetic data must both predict the exposure and predict the outcome only through the exposure. Figure 1 shows an illustrative causal diagram relating a single nucleotide polymorphism (SNP) G j to an exposure, X and outcome, Y, in the presence of an unmeasured confounder, U. Here, a single U is used to denote the combined influence of all unmeasured confounders.
To state the IV assumptions more formally: the jth SNP must be: associated with X (IV1); independent of U (IV2) and independent of Y given X and U (IV3). These assumptions are encoded by the arrows in Figure 1. If IV1-IV3 are satisfied for variant G j , then traditional IV methods can be employed to reliably test for a causal effect using G j , X and Y alone, without any attempt to adjust for U at all.  of, extend and connect the work of Rücker et al. [14] and Henmi and Copas [15] for probing the issue of small study bias in meta-analysis with the work of Del Greco et al [16] and Bowden et al [9,17] for detecting and accounting for pleiotropy in summary data MR analysis. In Section 2, we clarify the assumptions necessary for a two-sample summary data MR analysis, by starting from an underlying model for data at the individual level. In Sections 3 and 4, we explore how the IVW and MR-Egger methods can be applied to pleiotropy-affected summary data, by drawing on the meta-analytical framework of Rücker et al [14]. In Section 5, we look at the issue of estimating the causal effect in practice, when some of the crucial assumptions are violated. In Section 6, we illustrate the methods discussed using an application that probes the causal effect of plasma urate levels on cardiovascular disease risk. We conclude with a discussion and point to future research.

Modelling assumptions
We start by writing the assumed data generating model for subject i out of N, linking genetic data on L uncorrelated variants G i1 , … , G iL ≡ G i. , to their exposure X i and outcome Y i in the presence of an unobserved summary confounder U i , as represented in Figure 2. X, Y and U are assumed to be continuous and G ij represents the number of alleles of SNP j assigned to subject i (0,1 or 2) at the point of conception. For simplicity, we ignore intercept terms in the models below: Here, represents the causal effect of X on Y that we wish to estimate and j represents the underlying true association between SNP j and the exposure X (i.e. not through U). We assume that the genetic data has been coded to reflect the number of exposure increasing alleles (be they 'minor' or 'major'), so that j ⩾ 0. We define IV1 to be satisfied for variant j when j is non-zero. Conversely, when they are nonzero, the parameters j and j summarise all violations of IV2 and IV3 for SNP j, respectively. x and y represent the influence of all confounders on the exposure and outcome, respectively. Only when x and y are both non-zero can confounding take effect, thus biasing a traditional observational analysis so that the association between X and Y is not equal in general to (hence the need for IV methods). In the main body of this paper, we will interpret all parameters indexing these (and subsequent) models as fixed values in the 'classical' sense. Alternative Bayesian formulations are of course possible, a topic we return to in the discussion.
The focus of this paper is to consider the case where only summary data estimates of the SNP-exposure and SNP-outcome associations and their standard errors are available for each of the L variants. In order to derive models for these two sets of L summary data estimates, we firstly define G −j as the (L−1)-length vector of genetic variables without G j . We now re-write models (2) and (3) as models for X|G j and Y|G j only, by marginalising over (U, G −j ) and (X, U, G −j ), respectively: Now the coefficients of G ij in (4) and (5)

Two-sample Mendelian randomization
It is becoming increasingly common to perform summary data MR analyses using sets of genetic association estimates with the exposure and outcome that have been gleaned from separate samples. The most natural way of justifying this approach is to assume that the SNP-exposure and SNP-outcome samples, whilst being independent, are homogeneous in the sense that models (4) and (5) hold for both. We will refer to them as sample '1' and '2', with N 1 and N 2 subjects, respectively. For clarity, we now write explicit models for the two sets of L SNP-exposure and SNP-outcome association estimates from the two samples. We refer to the jth SNP-exposure association estimate aŝj (with variance 2 Xj ) and the jth SNP-outcome association estimate asΓ j (with variance 2 Yj ): From (4) and (5), the magnitude of 2 Xj and 2 Yj are affected by the allele frequencies of SNPs in G −j and the sample size (N 1 or N 2 ). Because they relate to homogeneous but independent samples, Xj and Yj can be considered as mutually independent. Finally, although they must be estimated from the data, it is common practice to assume that the variances 2 Xj and 2 Yj are known. We will refer to the various assumptions and simplifications made in Section 2.1 as the 'Two-Sample Assumptions' (TSA).
We now state some additional assumptions that are commonly utilised by the IVW or MR-Egger approaches for causal inference (vis estimation of ), based on models (6) and (7) and the TSA. All of these assumptions are re-stated for clarity in Table I Xj is small enough to be treated as zero. This would be strictly true if N 1 were infinite, but is frequently reasonable because large study sizes (often over 100 000) are now common and rising year-on-year. This is referred to as the NO Measurement Error (NOME) assumption in Bowden et al [17].

The InSIDE assumption.
If assumptions IV2 and IV3 are satisfied for variant j ( j = j = 0), then the total pleiotropic effect of variant j on the outcome is also zero. However, their violation does not necessarily preclude valid causal inference: consistent estimation of the causal effect is still possible if the  (6) and (7) known. NOME assumption 2 Xj = 0: Negligible uncertainty in SNP-exposure assxociation.

Perfect InSIDE
Sample covariance cov( j , j ) = 0 for data at hand.
VIS assumption (under IV2) j ≠ i for some i and j not equal to i Some variation in instrument strengths present.
magnitude of the pleiotropy in equation (7), j +k y j , is independent of the SNP-exposure associations, j + k x j . This was first identified as a crucial assumption for causal inference in the econometrics literature by Kolesar et al [18] and was independently derived for use in MR by Bowden et al [9], who termed it the InSIDE assumption (Instrument Strength Independent of Direct Effect). Because j is a common factor of both the instrument strength and pleiotropy terms, by far the most natural way to imagine that InSIDE could hold (even in principle) is if IV2 holds ( j = 0) and the magnitude of the pleiotropy not via U ( j ) is independent of j across the instruments. We will therefore adopt this convention when referring to InSIDE from now on. If the InSIDE assumption is satisfied then the sample covariance of the j s and j s will tend to zero as the number of variants, L, increases. We will refer to this as 'general' InSIDE. However, if the sample covariance of the j s and j s is exactly zero for the data at hand, we will say that InSIDE is 'perfectly' satisfied.

The VIS assumption.
If the SNP-exposure association parameter estimateŝ1, … ,̂L were all identical, then their sample variance would be zero. If this were the case, it would rule out using thêjs as an explanatory variable in any regression model (specifically MR-Egger regression). We therefore define the VIS (Variation in Instrument Strength) assumption as being satisfied when the aforementioned sample variance of the true parameter values 1 , … , L is non-zero.

NOME, InSIDE and VIS under a weighted analysis.
In practice when implementing the IVW and MR-Egger regression approaches, the analysis has traditionally been weighted by the standard error of the SNP-outcome association estimates, Yj , to improve efficiency. For consistency, this is the approach we will subsequently follow. In order to define the NOME, InSIDE and VIS assumptions within the context of this weighted analysis (or any other weighted analysis), it is necessary to divide the relevant parameter by the chosen weight. For example, for VIS to be satisfied in a analysis weighted by Yj , we require some variation between the weighted instrument strength terms j ∕ Yj . Similar transparent modifications would also be required for NOME and InSIDE.

Mendelian randomization under violations of IV3
In this section, we discuss the motivation for using the meta-analytical methods of IVW and MR-Egger regression, when all the assumptions outlined in Table I are satisfied except IV3. We will therefore use the following simplified versions of models (6) and (7) as our starting point: Data generating model under: IV1-2, TSA, NOME, InSIDE and VIS SNP-exposure:̂j = j (8) In order to easily discuss violations of IV3, we define the sample mean and variance of the terms as and 2 , respectively. Specifically, we shall investigate four important special cases of model (9) that are defined and differentiated by the chosen values of and 2 . Under model (9), the j term is the sole source of additional variation in the SNP-outcome association estimates (given the SNP-exposure estimates), apart from the residual error term Yj . There may of course be factors inducing additional heterogeneity in the SNP-outcome associations other than pleiotropy, so in practice we can think of j as representing the combination of all these sources. For simplicity, we will continue to refer to this as pleiotropy.
We start by assuming that IV3 actually holds for all variants in model (9), which is consistent with j = = 2 = 0. We call this case (a) and it is the natural starting point for an MR analysis. In this case, model (9) can be simplified to the following linear model with no intercept: Both sides of the equation are divided by Yj in order to make the variance homoscedastic and clarify the mechanics of model fitting under a weighted analysis. An overall estimate can be calculated from model (10) using standard regression theory, which yields the well known meta-analytic formulae: In the MR context, this is referred to as the IVW estimate [6,12]. Formula (11) can alternatively be derived by calculating a weighted average of the set of ratio estimates of causal effect,̂j =Γ j ∕ j , obtained using each SNP in turn, where the weight given tôj is the inverse of its variance, 2 Yj ∕ 2 j , as in a fixed effect meta-analysis.
It is precisely assumptions IV2 and IV3 that dictate the intercept in (10) should be constrained to zero. The NOME assumption dictates the simple form for var(̂j) because this means that the denominator of j can be treated as a constant (even though in truth it is the random variablêj).

Testing and accounting for balanced pleiotropy
The variance of the IVW estimate under case (a) (the fixed effect model), is given by: The value of 1 the numerator in (12) is a direct consequence of the fact that the variance of the residual error in model (10) is equal to 1. As suggested by Del Greco et al [16], the data can be scrutinised to assess this assumption by calculating Cochran's Q statistic with respect to the L IV estimates as follows: If true, Q follows a 2 distribution on L-1 degrees of freedom. If the L estimates exhibit over-dispersion or heterogeneity (as will often happen) then, under model (9), this must be due to pleiotropy and an extension to the basic model is required. The most natural first extension, we argue, would be to allow for 'balanced' pleiotropy. That is, we relax IV3 to allow non-zero j s but assume they have zero mean (e.g. = 0, 2 > 0). We call this case (b) and re-write model (10) as: Model (14) is justified for any L under Perfect InSIDE, because this allows pleiotropy to be treated as independent residual error with respect to the explanatory variable j . It is only strictly valid under General InSIDE as L → ∞. Fitting model (14) is equivalent to performing a standard additive random effects meta-analysis [19,20]. That is, we can obtain a new IVW point estimate and variance for the causal effect under model (14) by applying formulae (11) and (12) and replacing w j = 2 j / 2 Yj with w * j = 2 j /( 2 Yj + 2 ) using a plug-in estimate for 2 (for example the DerSimonian and Laird estimate [19]). We will refer to the IVW estimate obtained by fitting model (14) aŝA RE IVW . The two IVW estimateŝI VW and ARE IVW do differ, because under the latter the variance component 2 is used to re-weight the contribution of the jth IV estimate. The variance of̂A RE IVW is additionally inflated via w * j to account for heterogeneity due to pleiotropy. When a large value of 2 is estimated from the data, this will have the effect of evening out the weight given to each ratio estimate in the analysis by making them a less direct function of each estimate's precision.

Multiplicative random effects models.
The following multiplicative random effects model [21,22] could instead be applied to the summary data estimates in the presence of observed heterogeneity: If adopted, model (15) returns the same point estimate for as under the fixed effect IVW model,̂I VW . That is, the variance component 2 is not allowed to influence its point estimate. However, the addition of the scale parameter 1∕2 allows the variance of̂I VW to increase when heterogeneity is detected. Specifically, is estimated from the variance of the residual error usinĝI VW = Q L−1 where Q is defined in (13). The variance of the IVW estimate under model (15) is then set to so that it iŝI VW times the variance under the fixed effect model -case (a).
If the additive pleiotropy model (14) is actually correct then the residual error of the multiplicative pleiotropy model (15) is miss-specified, because the constant term 1∕2 should be replaced with , which will usually result in a small loss of efficiency. One reason to prefer the multiplicative model in the general meta-analysis context, is that̂I VW is known to be more robust to small study bias than̂A RE IVW [15]. We will subsequently show that the same principle holds true in the MR context, when 'directional' pleiotropy is present in the data, a notion to which we now turn our attention.

Testing and accounting for directional pleiotropy
'Directional' pleiotropy occurs when the mean value of the pleiotropy distribution, , is non-zero [9]. We will show how this can be addressed by performing MR-Egger regression. We first consider the special case where is non-zero but 2 = 0. This means that j = for all j in model (7). Although this appears highly unlikely, it is actually what MR-Egger regression (as proposed in [9]) assumes in its simplest form. We call this case (c), which can be written aŝ MR-Egger regression fits model (16) under case (c) to give an estimate for the intercept and causal effect . In order to distinguish them from the IVW estimate, we will refer to the MR-Egger estimate for the causal effect aŝ1 E and the mean pleiotropic effect aŝ0 E , where The variance of the MR-Egger estimate is equal to the lower diagonal element of (Z T Z) −1 but when Y1 , … , YL are all exactly equal, this reduces to This simple formula indicates that the variance of̂1 E is essentially dictated by the variance of the SNP-exposure associations. It also reinforces the importance of the VIS assumption for MR-Egger: its causal estimate is simply undefined when no such variation exists.
Case (c) allows for directional pleiotropy but assumes that the pleiotropic effect is the same across all variants. This means that pleiotropy induces bias but no additional heterogeneity, which implies that the residual error about model (16) equals 1. We can test the plausibility of this assumption via a simple adaptation of the Q ′ statistic due to Rücker et al [14] to our context: where w j = 2 j 2 Yj as before. If case (c) is correct then Q ′ should follow a 2 distribution with L-2 degrees of freedom. If a non-zero intercept̂0 E is estimated and the Q ′ statistic is significantly larger than L − 2, a more reasonable model might be that directional pleiotropy is inducing bias and additional heterogeneity into the MR summary data estimates ( ≠ 0, 2 >0). We call this case (d) and re-write model (16) accordingly as an additive random effect model [14]: As for case (b), model (19) is valid for any L under perfect InSIDE and as L → ∞ under General InSIDE. Updated estimateŝA RE 0E and̂A RE 1E could be obtained by changing the definition of the design matrix Z so that the jth row is equal to and where 2 is substituted with a suitable estimate. For the reasons already outlined in the case of the IVW model, we have so far preferred not to use the additive pleiotropy model (19) in practice to fit MR-Egger regression. We have instead opted to preserve the original fixed effect MR-Egger point estimates, and to scale up their variance if heterogeneity is detected, by fitting the multiplicative model: Under model (20), additional heterogeneity can be taken into account by estimating viâE = Q ′ ∕ (L − 2) and scaling up the variance of the MR-Egger estimates to bêE times their value under case (c).

When will the IVW and MR-Egger methods be unbiased?
Table II attempts to clarify the additional assumptions that are required for unbiased estimation of by the IVW and MR-Egger methods under cases (a) to (d). We use the word 'additional' to distinguish them from baseline assumptions (which we define as TSA, NOME and IV2) as well as those that are implied by each case (such as IV3 in case (a)). If a specific case automatically invalidates a method, so that unbiased estimation is not possible in general, we mark it with an '×'. Under case (a) (no pleiotropy), assumptions IV2 and IV3 hold which means that Perfect InSIDE is satisfied because cov(0, j ) ≡ 0. The IVW method therefore only requires IV1 to hold and MR-Egger only requires VIS to hold in order to be unbiased.
Under case (b) (balanced pleiotropy), both methods are consistent (asymptotically unbiased as L → ∞) under the General InSIDE assumption because heterogeneous pleiotropy contributes to the residual error. Under cases (c) and (d) (directional pleiotropy), the IVW estimate cannot consistently estimate in general because it does not adjust for a non-zero mean pleiotropic effect. Under case (c), Perfect InSIDE is satisfied because cov( , j ) = 0. The only additional assumption required by MR-Egger for unbiased estimation is VIS. Under case (d), MR-Egger requires both VIS and General InSIDE to be consistent.

Can we navigate between models using Rücker's framework?
Moving from the underlying model in case (a) to those in (b), (c) and finally (d) represents a natural way to progressively relax assumption IV3 using heterogeneity statistics when performing MR. Following the approach of Rücker et al in the general meta-analysis context, we would start by assuming all genetic variants are valid -case (a) -but move to case (b) if Cochran's Q is sufficiently extreme with respect to a 2 L−1 distribution, in order to allow for balanced pleiotropy. We would then assess whether directional pleiotropy is present by fitting model (c) and calculate the heterogeneity about this model via Rücker's Q ′ . If the difference Q − Q ′ is sufficiently extreme with respect to a 2 1 distribution, we would infer that directional pleiotropy is an important factor, and adopt model (c). Finally, we would assess whether Q ′ is sufficiently extreme with respect to a 2 L−2 distribution. If so, we would opt for model (d) to account for the fact that pleiotropy is inducing heterogeneity and bias.
Unfortunately, this framework does not seamlessly translate to the MR setting, because IVW model (10) is not strictly nested within MR-Egger model (16). This means that Q − Q ′ is only approximately 2 1 distributed. Q ′ may in fact be larger than Q, making Q − Q ′ negative. The equivalent IVW and Egger regression models assumed by Rücker et al, are nested so that the equivalent Q − Q ′ difference has the required distribution.
Therefore, we can only informally use a Q − Q ′ to decide whether MR-Egger regression represents a better fit than IVW. We therefore suggest the user considers moving from IVW model's (a) and (b) to MR-Egger model's (c) and (d) when Q − Q ′ is large and positive, and the MR-Egger intercept parameter estimate is sufficiently far from zero and precise.  for over-dispersion due to balanced pleiotropy by moving to model (b). Because the MR-Egger estimate is identical to the IVW estimate, this means that̂0 E = 0 and Q − Q ′ = 0. Therefore, no evidence exists to support moving to model (c) and the MR-Egger estimate. Figure 3 (right) shows an illustrative scatter-plot of SNP-outcome and SNP-exposure associations consistent with case (c) (solid black dots, homogeneous directional pleiotropy) and case (d) (hollow black dots, heterogeneous directional pleiotropy). Under both cases (c) and (d) the points are consistent with a regression line with the same non-zero intercept. Under case (c),̂E is equal to 1 whereaŝI VW is much larger (4 in this instance). Under case (d)̂E is equal to 2 whereaŝI VW is again larger (5 in this instance). Now, the solid black data would support a move from model (a) (and the IVW estimate) to model (c) (and the MR-Egger estimate) because Q,̂0 E and Q−Q ′ are 'large', but Q ′ on its own is not. The hollow black dots would additionally yield a large enough Q ′ in order to opt for model (d), thereby suggesting inference should be based on the MR-Egger estimate accounting for heterogeneity and bias due to pleiotropy.

An illustrative example
As in the general meta-analysis context, heterogeneity statistics such as Q and Q ′ are only strictly valid under the assumption that the precision of SNP-exposure and SNP-outcome estimates are known, despite being estimated from the data. Likewise, their power to 'detect' heterogeneity will be small when the number of genetic variants, L, is small. This could mean that, even if model (d) were true, it may often not be possible to infer this by rejecting models (a), (b) and (c) at the historically popular 5% significance level. Furthermore, choosing a model (and by extension a causal estimate) that depends on the value of a statistical test (i.e 'testimation' [23,24]) could also induce unwanted bias. Additional research is required to fully understand the operating characteristics of the schema suggested here in realistic data settings, especially the use of heterogeneity statistic estimators, before they can be confidently used in practice. Figure 3 (left), we see strong evidence that the IVW estimate is a reliable measure of causal effect. Because it is also guaranteed to be more precise, there is no reason whatsoever to prefer the MR-Egger estimate. Conversely, in Figure 3 (right), we see strong evidence that the IV assumptions have been violated in a meaningful way and the MR-Egger approach has more appeal. We therefore seek a statistic that will favour the IVW approach a priori unless MR-Egger regression is a demonstratively better fit to the data. A simple statistic, Q R encapsulates this reasoning, and is again closely connected to the work of Rücker et al [14]. It simply measures the relative value of the residual heterogeneity under both the IVW and MR-Egger methods:

A measure of relative fit. In
Q R will generally lie in the unit interval, but because the MR-Egger and IVW models are non-nested, in rare cases the inequality 0 ⩽ Q ′ ⩽ Q will not hold. In Figure 3 (left), we see that (under case (a) or (b)) Q R is approximately equal to 1, which tells us that the IVW method is to be preferred over MR-Egger. In Figure 3 (right), we see that Q R is equal to 1/4 in case (c) and 2/5 in case (d), indicating that inferences from the MR-Egger approach should be given due consideration. In the limit, if the IVW approach was an increasingly bad fit to the data relative to MR-Egger, then Q R would tend to zero and MR-Egger should be the preferred choice.

Estimation under violations of IV2 and IV3
In Section 3, we assumed that IV2 held to demonstrate how established meta-analytic methods can, in theory, be transparently adapted to the MR setting. We now explore, from a theoretical standpoint, how IV2 and IV3 violation distort the causal estimand identified by the IVW and MR-Egger approaches. To do this, we return to and assume models (8) and (9) hold, set x = y = 1, but allow the j terms to be non-zero. When this is the case, the true estimand for the causal effect identified by variant j using the ratio method is: We now make an additional simplifying assumption that the SNP-outcome standard error is constant across variants ( Yj = Y ). When this is true, the fixed effect IVW estimand is equal to and the fixed effect MR-Egger estimand is equal to where E[.], cov(.) and var(.) refer to sample expectations, covariances and variances, respectively. If IV2 were satisfied ( j = 0 for all j,) then the MR-Egger estimand tends to as L grows large under General InSIDE because the numerator of the bias term in (24) will tend to zero. If, in addition, the sample mean of the j s were zero (General InSIDE plus balanced pleiotropy) then the IVW estimand would also tend to as L grew large because the numerator of the bias term in (23) would tend to zero. In order to investigate the bias of the IVW and MR-Egger estimators under increasing IV2 violation, we explore the value of their underlying estimands, by firstly generating three independent fixed vectors of L = 50 , and parameters with means 1/2, 2 and 1/2, respectively. For clarity, care was taken to ensure that the sample covariance of all 50 and parameters was very close to zero, so that Perfect InSIDE is satisfied across all variants when the j terms are 0 (IV2 satisfied). The full parameter vectors are given in Table A1 in the Appendix. In addition, Figure A.1 in the Appendix plots the bias term of the MR-Egger causal estimand in equation (24) under IV2 when the included set of instruments is increased sequentially from 2 to 50. This is shown to illustrate the point that, even when j and j are generated by independent processes (i.e. under General InSIDE), a sufficiently large number of variants is needed to ensure that their scaled sample covariance (i.e. the MR-Egger bias term) settles at 0. In Figure 4 (left), we plot the vector versus the vector to show the instrument strength and pleiotropic effect parameters are uncorrelated under IV2 and Perfect InSIDE. We label the axes with the addition of '0 × ' to stress why this is the case.
In Figure 4 (right), we plot + 3 versus + 3 to show the instrument strength and direct pleiotropic effect parameters when IV2 and InSIDE are violated. In this case, non-zero parameters induce a strong positive correlation between the combined instrument strength and pleiotropy terms. We now define the sensitivity parameter That is, A equals the correlation between the instrument strength and direct effect due to pleiotropy when j is set to A × j . In Figure 4 (left), 0 = 0 and in Figure 4 (right), 3 = 0.62. In Figure 5 (left), we plot the value of IVW and 1E in equations (23) and (24) as a function of A and A for = 0. Their values can therefore be interpreted as a bias. Because the mean pleiotropic effect is nonzero, IVW is biased even under InSIDE, whereas 1E is not. However, we see that 1E is more strongly  affected by violations of InSIDE than IVW . When A ⩾ 4.2 and A ⩾ 0.8, 1E is actually further away from the truth than IVW . In Figure 5 (right), we plot the value of the IVW and MR-Egger estimands as in Figure 5 (left) when j is replaced with j −̄, in order to make the pleiotropy balanced at A=0. Consequently, when 0 = 0, IVW is unbiased. Furthermore, the IVW estimand is less biased than the 1E for A ⩾ 2.5 and A ⩾ 0.65.

Fixed versus additive random effects for IVW
In Section 3, we cautioned against using the additive random effects IVW estimatêA RE IVW as opposed to the fixed effect estimatêI VW when directional pleiotropy is present. In order to explain the reason for our caution, we plot the value of the additive random effects IVW estimand, ARE IVW versus its fixed effect counterpart (at =0) using our fixed parameter constellation under increasing violations of the InSIDE assumption as before.
We calculated ARE IVW using equation (11) by plugging in the estimand for j from equation (22) and using random effects weights w * j = ( j + A j ) 2 ∕( 2 Yj + 2 ). In order to specify these weights, we set Yj equal to a constant value of 0.3, and then used the DerSimonian and Laird method to estimate 2 for each given value of A. Figure 6 shows the results for the two previously discussed scenarios of directional pleiotropy using ( , , ) and balanced pleiotropy using ( −̄, , ). In order to aid interpretation, the x-axis includes scales for A and, additionally, the amount of heterogeneity as measured by Higgins' I 2 statistic [25]: (Q − (L − 1))/Q.
Under the directional pleiotropy scenario (Figure 6, top-left), both ARE IVW and IVW are biased even when the InSIDE assumption is satisfied ( A = 0). The directional pleiotropy manifests itself as heterogeneity about the fixed effect IVW estimate (as measured by Cochran's Q statistic), and at A =0, I 2 =0.45. As increases IVW is more robust than ARE IVW , staying closer to the true value of zero for A ⩽ 0.8. This occurs because the observed heterogeneity is caused, disproportionately, by j terms emanating from variants with a weaker SNP-exposure association, because their bias terms in equation (22) will (in general) be the largest.
When it detects heterogeneity, the additive random effects estimate compounds the problem by reweighting the ratio estimates more evenly across the board, thereby giving more weight to the weaker (and more biased) ratio estimates. The fixed effect estimate, and by extension the multiplicative random effects estimate, naturally gives ratio estimates from variants with weaker SNP-exposure associations little weight in the analysis, regardless of whether heterogeneity is detected or not.
Under the balanced pleiotropy example (Figure 6, top-right), both IVW and ARE IVW are unbiased at A =0. At A =0, I 2 is very close to zero also, indicating that almost all of the heterogeneity observed in the directional pleiotropy case discussed earlier at A =0 was due to bias (i.e. non-zero ). As A increases, IVW and ARE IVW remain close together. This is because there is no longer a directional trend in the j s as a function of the jth SNP-exposure association. In order to stress this point, Figure 6 (bottom) shows funnel plots of the inverse standard errors versus their causal estimands j at A = 0 under directional (bottom- left) and balanced (bottom-right) pleiotropy. Inverse standard errors used in the y-axis are calculated under NOME. The funnel plot is used as a graphical tool to aid detection of small study bias in meta-analysis [26] and has been transferred to the MR setting by Bowden et al [9]. Only in the directional pleiotropy case will asymmetry be observed in the funnel plot. That is, imprecise causal effect estimates are not seen to 'funnel' in towards the more precise causal estimates equally from either side, but appear to be skewed in one particular direction. Therefore, funnel plot asymmetry is a sure sign that a large difference will exist between the fixed and additive random effects IVW estimates, as well as between the IVW and MR-Egger estimates.

Estimation under violations of NOME
So far, in this paper, we have assumed that the NOME assumption is satisfied. In truth, however, NOME will never hold exactly because the variance of the SNP-exposure association, 2 Xj will always be positive, so that̂j ≠ j . In this context, we can think of IVW and MR-Egger approaches as fitting regression models using an explanatory variable that is affected by measurement error. This is known to induce an attenuation in the subsequent effect estimates towards zero due to 'regression dilution' [27,28]. The implications of NOME violation on the two approaches is discussed at length in separate work by Bowden et al [17], but are summarised below.
For clarity, suppose that the pleiotropy distribution and additional assumptions defining either case (a) or (b) holds (Table II) so that both the IVW and MR-Egger methods yield unbiased estimates for under NOME (i.e. when 2 Xj = 0 for all j). When NOME is in fact violated for variant j ( 2 Xj > 0), the expected attenuation in its individual ratio estimate,̂j, can be approximated by: where F j =̂2 j ∕ 2 Xj . Because it is a weighted average of ratio estimates, the expected attenuation in the IVW estimate,̂I VW , can therefore be approximated by substituting F j in the above formula withF = ∑ L j=1 w j F j ∕ ∑ L j=1 w j , where w j are as defined in (11). The attenuation in the IVW estimate can be alleviated by increasing the strength of each instrument through F j so that IV1 is more strongly satisfied.
In contrast, the extent of attenuation in the MR-Egger estimate,̂1 E , due to NOME violation is not governed by the same F statistic. It can instead be accurately gauged using a modification of the I 2 statistic applied directly to the entire set of SNP-exposure associations (and referred to as I 2 GX in [17]) as below: wherēis the mean of the weighted̂js. The I 2 GX statistic measures the spread of variation in the estimated instrument strengths (thêjs) relative to their average uncertainty. It lies between 0 and 1 but is generally reported as a percentage. An I 2 GX close to 100% ensures that the attenuation will be minimal, and means that the VIS assumption is strongly satisfied.
In most circumstances, the extent of attenuation is likely to be far worse for MR-Egger than for the IVW estimate. Bowden et al [17] employ the method of Simulation Extrapolation (SIMEX) [29,30] to adjust the MR-Egger estimate for this dilution. Briefly, this involves generating pseudo-data sets based on the original summary data estimates, but under increasingly strong violations of the NOME assumption, to obtain a series of increasingly diluted parameter estimates. A statistical model is then fitted to this series in order to extrapolate back to the estimate that would have been obtained if NOME had been satisfied. In the next section, we show the SIMEX approach applied in practice to both the IVW and MR-Egger estimates.

Assessing the causal role of plasma urate on CHD risk
Although traditional epidemiological studies have suggested that urate levels are associated with a range of cardiometabolic diseases, it is not certain whether the relationship is causal. Recently, Mendelian randomization has been applied to assess the possibility of a causal link [31,32]. Specifically, White and colleagues [32] conducted a two-sample summary data MR analysis to assess the causal role of plasma urate levels in influencing cardiovascular disease risk. Summary data estimates of the association of 31 uncorrelated genetic variants with plasma urate concentration were obtained from the genome wide association studies catalogue http://www.genome.gov/gwastudies/ with p-values less than 5×10 −7 . Estimates for the association of the 31 variants with cardiovascular disease risk were obtained from a seperate study population using CARDIoGRAM [3], currently the largest genetic association study measuring cardiovascular outcomes. Here, we repeat and extend the analysis of White et al's for the IVW and MR-Egger approaches, using the methods described thus far. Full results are shown in Table III. Figure 7 (left) shows a scatter plot of the SNP-exposure and SNP-outcome association estimates, where the genetic data has been recoded in order to make the SNP-exposure associations positive. The SNPoutcome association estimates represent log-odds ratios of CHD for a 1 standard deviation increase in plasma urate. Figure 7 (right) shows the corresponding funnel plot, which can be interpreted as discussed in Section 5. The vertical (and horizontal) lines show the causal effect estimates (and 95% confidence intervals) inferred by IVW and MR-Egger regression.
We start our MR analysis by assuming that all variants are valid IVs and that the NOME, IV2, IV3 and InSIDE assumptions hold, which justifies the use of the fixed effect IVW model (10). The NOME assumption seems reasonable for these data under the IVW method, due a large mean F statistic across the genetic variants of 247 (with the weakest being 22 and the strongest being 4886). The IVW estimatê IVW suggests that a one unit increase in plasma urate increases the log-odds ratio of CHD by 0.163. However, there is evidence of heterogeneity among the 31 ratio estimates, Cochran's Q statistic on 30 degrees of freedom is 63.9, and the scale factor estimatêI VW = 2.13. We therefore relax assumption IV3 to allow each variant to have a pleiotropic effect (assuming it is 'balanced' under InSIDE) and perform random effects analysis. Applying the multiplicative random effects model (15) and so scaling up the variance of the fixed effect IVW estimate bŷI VW , the p-value for̂I VW from a resulting t-test is equal to 0.02. Applying the additive random effects model (14), we estimate the pleiotropic variance parameter 2 to be 0.093, which yields an adjusted point estimate for̂A RE IVW of 0.222 (an increase of 35%). Because both the fixed and additive random effects estimates should be targeting the same quantity under balanced pleiotropy plus InSIDE, this large difference suggests that the pleiotropy could have a significant directional component.
Given the large and significant Q statistic, we now apply MR-Egger regression to these data to probe if there is indeed evidence for directional pleiotropy. We firstly assume that the pleiotropy is directional but constant and fit the fixed effect MR-Egger model (16) under the NOME assumption. The NOME assumption seems reasonable for these data under the MR-Egger method, because the I 2 GX statistic is equal to 99.2% and so a less than 1% dilution in its causal estimate is expected.
Under this model, MR-Egger estimates a non-zero mean pleiotropic effect of 0.008 and, because this is in the same direction, a reduced causal effect estimate,̂1 E , of 0.048. Calculating the residual heterogeneity about the MR-Egger fit using Rücker's Q ′ statistic yields Q ′ = 58.6 with a p-value of 9×10 −4 . We next allow for directional and heterogenous pleiotropy under the InSIDE assumption by fitting multiplicative random effects model (20), and scale up the variance of the fixed effect MR-Egger estimates by a factor of̂E = 2.02. Under this analysis, the p-value for the mean pleiotropic effect (or intercept) parameter estimatê0 E is 0.12 and the p-value for the causal effect estimatê1 E is 0.62. The ratio statistic Q R is 0.917 indicating that the MR-Egger model explains approximately 8% more of the variation in the SNP-outcome association estimates than the IVW approach and the difference Q − Q ′ is also large. Although the NOME assumption appears to be sensible for these data, for completeness, we use the SIMEX method to adjust for NOME violation in the IVW and MR-Egger fits (under multiplicative random effects models only). R code to perform this method can be found in [17]. The results are shown in Figure 8 and Table III. In summary, careful analysis of these data suggest that the large positive causal association of urate concentration with CHD risk is potentially unreliable due to the presence of directional pleiotropy manifesting itself as heterogeneity among the causal effect estimates. Whilst the application of an additive random effects model accounting for balanced pleiotropy (via the IVW estimate) increases the causal effect estimate further still, MR-Egger regression suggests that the pleiotropy has a positive directional element, and consequently adjusts the causal estimate down towards zero. For these data, there is good evidence that MR-Egger provides a better fit than IVW.

Discussion
In this paper, we have attempted to explain the many and varied assumptions that are necessary to apply standard meta-regression and random effects methodology from mainstream meta-analysis to summary data MR. We have also tried to explore the consequences for each method when some of the key assumptions (e.g. IV2, IV3, InSIDE and NOME) are violated, and clarified cases where the violation of one assumption (e.g. IV2 and VIS) acts to promote violation of another (e.g. InSIDE and NOME). We hope that those applying the IVW and MR-Egger regression approaches can use this work to gain a more informed understanding of their strengths and limitations. One such limitation of MR-Egger regression is that it is known to be far less efficient than the IVW method. For this reason, we also gave particular emphasis to examining the properties of the more established IVW estimate under both additive and multiplicative random effects models. We therefore view it as an important finding that the additive approach is less robust to directional pleiotropy, echoing the results of Henmi and Copas [15] for mainstream metaanalyses affected by small study bias. This point was indeed highlighted in the analysis of the urate data affected in Section 6, whereby the IVW estimate actually increased under an additive random effects model in response to positive directional pleiotropy, rather than decreasing towards zero.
In practice, we would encourage users to report the full set of analyses described here, using the various heterogeneity and instrument strength statistics introduced as a guide to the relative importance that could be placed on each method for the data at hand.

Limitations
A major limitation of this work is that we have assumed throughout that the two-sample assumptions hold, in particular that the SNP-exposure and SNP-outcome association estimates gleaned in separate populations provide information on a common (i.e. identical) set of parameters. This will not be true, for example, in the presence of gene-environment interactions if the distribution of the environmental factor differs in the two samples. As further work, it is vital to understand the consequences for inference if this assumption is violated. In particular, it would be interesting to see whether the approaches discussed would remain valid under weaker assumptions. For example, if the parameters indexing the two population models were not identical, but were instead generated from a common distribution. A Bayesian implementation would then seem natural.
Another major limitation of this work is that we assumed additive linear models with no interaction terms for the SNP-outcome association estimates. As shown by Didelez and Sheehan [33], such a model is required for consistent estimation of the causal effect. In practice (and in our real data example) MR analyses are generally performed with respect to a binary outcome yielding log-odds ratios for the SNPoutcome associations. In this case, when a causal effect between the exposure and outcome exists, ratio estimates of causal effect gleaned from individual variants will be attenuated towards the null by varying amounts due to the non-collapsibility of the odds-ratio [34]. This is enough to invalidate our models in its own right. Furthermore, if the data were collected using case-control sampling, as is also common, then log-odds ratio estimates are also susceptible to ascertainment bias [35]. It will be important to develop summary data methodology to properly account for these two issues as well.
A further limitation of our proposed framework for investigation of pleiotropy, is that it is purely based on statistical evidence, and ignores any a priori biological knowledge of possible pleiotropy for individual variants. As more is learned about the association between individual variants and multiple health outcomes, analyses that ignore this additional information will appear increasingly uninformed as well as inefficient.
Finally, we return to the InSIDE assumption. When explaining this assumption and exploring the impact of its violation on the IVW and MR-Egger approaches, we assumed for simplicity that IV2 held. However, it is still possible in theory for InSIDE to hold even when IV2 is violated, albeit in fairly contrived circumstances. For example, suppose that IV2 and IV3 are violated for all variants but the pleiotropy via one route is perfectly negatively correlated with pleiotropy via the other, so that j = − j . Alternatively, suppose that IV3 is satisfied for all variants ( j = 0) but the magnitude of violation of IV2 is the same across variants ( j = ). In both cases, the covariance between the instrument strength and direct effect is zero because the pleiotropy is constant, and Perfect InSIDE holds. Whether these facts could somehow be exploited to improve the performance of MR-Egger regression is a subject for further investigation.   Table A1, as the number of included SNPs is increased sequentially from 1:2 to 1:50. Note: the parameters of Table A1 are generated independently so that General InSIDE is satisfied under IV2 for any subset of the 50 SNPs, but that Perfect InSIDE is satisfied when all 50 are considered.