Conditional inference in cis‐Mendelian randomization using weak genetic factors

Abstract Mendelian randomization (MR) is a widely used method to estimate the causal effect of an exposure on an outcome by using genetic variants as instrumental variables. MR analyses that use variants from only a single genetic region (cis‐MR) encoding the protein target of a drug are able to provide supporting evidence for drug target validation. This paper proposes methods for cis‐MR inference that use many correlated variants to make robust inferences even in situations, where those variants have only weak effects on the exposure. In particular, we exploit the highly structured nature of genetic correlations in single gene regions to reduce the dimension of genetic variants using factor analysis. These genetic factors are then used as instrumental variables to construct tests for the causal effect of interest. Since these factors may often be weakly associated with the exposure, size distortions of standard t‐tests can be severe. Therefore, we consider two approaches based on conditional testing. First, we extend results of commonly‐used identification‐robust tests for the setting where estimated factors are used as instruments. Second, we propose a test which appropriately adjusts for first‐stage screening of genetic factors based on their relevance. Our empirical results provide genetic evidence to validate cholesterol‐lowering drug targets aimed at preventing coronary heart disease.


Introduction
Mendelian randomization (MR) is a widely-used method to estimate the unconfounded effect of an exposure on an outcome by using genetic variants as instrumental variables.An emerging area of clinical research concerns MR studies with genetic variants drawn from gene regions of pharmacological interest.The potential effect of a drug can be investigated by an MR analysis of a genomic locus (cis-MR) encoding protein targets of medicines (Walker et al., 2017).As a result, the cis-MR approach is being increasingly used to provide valuable evidence which can inform designs of expensive clinical trials (Gill et al., 2021).
Furthermore, cis-MR approaches that integrate expression data, with proteins acting as the exposure, are more likely to satisfy the exclusion restriction required for instrument validity (Schmidt et al., 2020).The exclusion restriction requires that any association between instruments and the outcome is only through their effects on the exposure, which is more plausible when the exposure is a direct biological product of the instrument.
A starting point for any MR analysis is to choose appropriate instruments.Since genomewide association studies (GWASs) have been able to identify many strong genetic signals for a wide range of traits, the typical practice in polygenic MR, where variants may be chosen from multiple gene regions, is to select uncorrelated variants with strong measured associations with the exposure.
In contrast, the potential pool of instruments that we can consider for cis-MR is more limited in two respects.First, the instruments are typically in highly structured correlation, owing to how genetic variants in the same region tend to be inherited together.Secondly, when the exposure of interest is a gene product, genetic associations are typically measured from much smaller sample sizes than usual GWASs, which would leave cis-MR analyses more vulnerable to problems of weak instrument bias (Andrews, Stock, and Sun, 2019).Therefore, for our cis-MR focus, we will need to make use of many weak and correlated instruments.
One intuitive option is to filter out variants such that only a smaller set of uncorrelated (or weakly correlated) instruments remain.However, for cis-MR analyses that involve only weak genetic signals, it would seem important to utilise the explanatory power from all available variants.Another option is to only select variants with a strong measured association with the exposure.While this might avoid problems relating to weak instruments, estimation could be more vulnerable to a winner's curse bias (Goring et al., 2002), resulting in poor inferences if the additional uncertainty from instrument selection is not accounted for (Mounier and Kutalik, 2021).
In this paper, we do not propose selecting specific genetic variants as instruments, but rather genetic factors.We consider a two-stage approach.In the first-stage, we exploit the highly structured nature of genetic correlations in single gene regions to reduce the dimension of genetic variants.Following Bai and Ng (2002)'s approximate factor model, the variation in a large number of genetic variants is assumed to be explained by a finite number of latent factors.The estimated genetic factors are particular linear combinations of genetic variants which aim to capture all systematic variation in the gene region of interest.In the second stage, these estimated genetic factors are used as instrumental variables.
In terms of estimation, this two-stage approach is similar to Burgess et al. (2017)'s inversevariance weighted principal components (IVW-PC) method.However, our focus in this paper is not on estimation, but on making valid inferences.In this regard, a major drawback with the IVW-PC method is that fails to provide robust inferences under weak instruments.This is a concern not only due to the potentially smaller sample sizes involved in cis-MR analyses, but because the first-stage dimension reduction of genetic variants is based on their mutual correlation, and not on the strength of their association with the exposure.Thus, there is no guarantee that the estimated genetic factors would be strong instruments.
To provide valid inferences when estimated genetic factors are weak instruments, we consider two different approaches based on conditional testing.The first approach generalises popular identification-robust tests (Moreira, 2003;Wang and Kang, 2021) for our setting with estimated genetic factors as instruments.Similarly to Bai and Ng (2010)'s analysis under strong instruments, the asymptotic null distributions of the identification-robust test statistics can be established even when the true genetic factors are not identified.
One drawback with the identification-robust approaches is that they are unable to provide point estimates of the causal effect.In situations where a few instruments are considerably stronger than others, it is natural to question whether it might be better to discard those instruments which are almost irrelevant.In our case, if some of the estimated genetic factors appear to have very weak associations with the exposure, then we may consider dropping them, and then proceed with usual point estimation strategies.Therefore, in the second approach, we propose a test which appropriately adjusts for first-stage screening of genetic factors based on their relevance.The test controls the selective type I error: the error rate of a test of the causal effect given the selection of genetic factors as instruments (Fithian et al., 2017).
Our empirical motivation concerns a potential drug target of growing interest.Cholesteryl ester transfer protein (CETP) inhibitors are a class of drug that increase high-density lipoprotein cholesterol and decrease low-density lipoprotein cholesterol (LDL-C) concentrations.A recent cis-MR analysis by Schmidt et al. (2021) suggests that CETP inhibition may be an effective drug target for reducing coronary heart disease risk.
A simulation study based on real CETP gene summary data illustrates how both factorbased conditional testing approaches offer reliable inferences under realistic problems faced in practice: small samples, invalid instruments, and mismeasured instrument correlations.
Our application complements Schmidt et al. (2021)'s findings by providing robust genetic evidence that a LDL-C lowering effect of CETP inhibitors is associated with a lower risk of coronary heart disease.
We use the following notation and abbreviations: for all n ≥ N .Let (A) j denote the j-th element of any vector A, and (B) jk denote the (j, k)-th element of any matrix B. Let .denote the Euclidean norm of a vector.For any positive integer A, [A] = {1, . . ., A}.The proofs of the theoretical results are given in Supplementary Material, and R code to apply our methods is available at https://github.com/ash-res/con-cis-MR/.

Approximate factor model and summary data
Let X denote the exposure, Y the outcome, and Z = (Z 1 , . . ., Z p ) a vector of p genetic variants which we assume are valid instrumental variables.For each variant j ∈ [p], let β X j denote the true marginal association of variant j with the exposure, and β Y j denote the true marginal association of variant j with the outcome.The causal effect of the exposure on the outcome is denoted θ 0 , and is described by the linear model Although this specification does not explicitly allow for variants to have direct effects on the outcome that are not mediated by the exposure, we will later discuss how this can be relaxed.

Two-sample summary data
We work within the popular two-sample summary data design, where estimated variantexposure associations are obtained from a non-overlapping, but representative, sample from estimated variant-outcome associations.For each variant, we observe the estimated variantexposure association βX j and a corresponding standard error σ X j , from the marginal regression of X on Z j , j ∈ [p].Likewise, we define βY j and σ Y j for variant-outcome associations.These association estimates are often publicly available from GWASs.For any two variants j and k, we also observe the genetic correlation ρ jk , which can be obtained from popular MR software packages (Hemani et al., 2018).
Let n X denote the size of the sample used to compute variant-exposure associations, and n Y denote the sample size used to compute variant-outcome associations.We assume that n := n X = cn Y , for some positive constant 0 < c < ∞; this ensures that for our two-sample setting, the sampling uncertainty from one association study is not negligible to the other.
Assumption 1 (summary data on genetic associations).For β X = (β X 1 , . . ., β Xp ) and Under the two-sample design, βX and βY are independent.Furthermore, Σ X and Σ Y are assumed known, and satisfy We assume that the genetic association estimates are normally distributed around the true associations, which can be justified by large random sampling in GWASs.The specific form of the variance-covariance matrices Σ X and Σ Y relies on an assumption that the true genetic associations are quite weak.The assumption that the standard errors σ X j and σ Y j , j ∈ [p] are known, and decrease at the usual parametric rate, is common in practice (Zhao et al., 2020, Assumption 1, p.3), and is thought not to be too restrictive (Ye et al., 2021, Theorem 4.1, p.9).The remaining components in Σ X and Σ Y are the pairwise genetic correlations We assume these are known, but our results can be extended to allow for consistently estimated genetic correlations.

Approximate factor model
Our asymptotic framework considers the setting where p → ∞, since we aim to incorporate information from many genetic variants.For our cis-MR focus, we would expect a large number of genetic variants to be in highly structured correlation.Therefore, we assume genetic variants in the region of interest follow an approximate factor model structure (Bai and Ng, 2002).For the vector of genetic variants Z = (Z 1 , ..., Z p ) , we have where Λ = (λ 1 , ..., λ p ) is an unobserved p × r matrix of factor loadings, f is a r-vector of unobserved factors, and e = (e 1 , ..., e p ) is a p-vector of idiosyncratic errors.The component λ j f describes the systematic variation in any j-th genetic variant Z j .Although p → ∞, r is considered to be finite; the systematic variation of p variants can be explained by a much smaller set of r latent factors.Thus, instead of using p genetic variants as instruments, we will aim to use the information of these r latent factors to construct instruments.
Assumption 2 (approximate factor model).(i) the unobserved factors and idiosyncratic errors are identically and independently distributed across individuals; (ii) the idiosyncratic errors may have some dependence across variants; (iii) the idiosyncratic errors may have some correlation with the sampling errors of genetic association estimates; (iv) the factors satisfy E[ f 4 ] = O(1) and Σ F = E[f f ] is an r × r positive definite matrix; (iv) λ j ≤ C λ for some constant C λ > 0, and p −1 Λ Λ → Σ Λ , where Σ Λ is a positive definite, non-random matrix, as p → ∞.
Assumption 2 implies Assumptions A-F from Bai (2003, pp.141-4); the assumptions imply r strong factors exist, and that, as p → ∞, there is a significant difference between the r-th and (r + 1)-th eigenvalues of the genetic correlation matrix ρ.Compared with classical factor models, the assumptions maintained in an approximate factor model are weak enough to prevent separate identification of factors and factor loadings, however both can be estimated up to a r × r rotation matrix.This should involve no loss of information since in terms of retaining the same explanatory power, we only require that the estimated factors span the same space as the true factors (Bai and Ng, 2002).For details on the dependence allowed between idiosyncratic errors across variants, see Bai andNg (2010, Assumption A(c), p.1581).
Further details on the correlations permitted between the idiosyncratic errors in (2) and the sampling errors of genetic association estimates are provided in Supplementary Material.

Weak genetic associations
The focus of this paper is to develop methods for cis-MR inference which are robust to weak genetic signals in the gene region of interest.We believe such methods are particularly important for the potential of cis-MR to identify novel therapeutic targets, since strong genetically-proxied effects of the exposure may not yet be established, and studies may be under-powered due to small samples.Our asymptotic analysis relies on the assumption that many variants have weak genetic associations with the exposure; see for example, Zhao et al. (2020).
Assumption 3 (many weak genetic associations).p n = Θ(1) and Assumption 3 implies the average explanatory power of any individual variant is decreasing with the total number of variants.Since we do not directly use individual variants as instruments, this does not necessarily imply we face a weak instruments problem.We will take r linear combinations of all variants to use as instruments; as discussed by Bai and Ng (2010), it is possible for these linear combinations to be strong instruments even if the explanatory power of individual variants is limited.
However, for our focus, we deem this to be quite unrealistic: our dimension reduction of genetic variants will be based on their correlation structure, not their association with the exposure.Hence, we could end up using instruments that are able to summarise nearly all genetic variation in a gene region, but they are still weakly associated with the exposure.For this reason, we should focus on inferential methods that are robust to weak instruments.

Example: the effect of statins on coronary heart disease
There is an well-established association between high low-density lipoprotein cholesterol (LDL-C) levels and greater risk of coronary heart disease (CHD); see, for example, Mihaylova et al. (2012).Statins are 3-Hydroxy-3-Methylglutaryl-CoA Reductase (HMGCR) inhibitors prescribed to lower LDL-C levels.A cis-MR analysis using genetic variants from the HMGCR gene region provides a way to study the genetically-proxied effect of statin intake on CHD (Ference et al., 2016).
The left panel of Figure 1 shows how genetic variants in HMGCR are in strong correlation.
Nearly 98 percent of the total variation of 2883 genetic variants can be summarised by 14 principal components.With this in mind, we might consider 14 estimated factors as instrumental variables.Since the construction of factor-based instruments does not use the information of variant-exposure associations, the estimated genetic factors may be weak instruments.We consider two options to make valid inferences in this setting.The first option is to use all 14 genetic factors as instruments, and conduct identification-robust tests which are designed to control type I error rates under weak instruments.This approach does not return a point estimate for the causal effect, but provides valid inferences regardless of instrument strength.
The second option is to discard any genetic factors with very weak measured associations with the exposure, and to proceed with usual point estimation strategies assuming that the retained genetic factors are strong instruments.To make honest inferences with this second option, we can conduct conditional tests which account for first-stage instrument selection.
3 Conditional inference in cis-MR with genetic factors

Estimating the factor loadings
We start by estimating the p × r matrix of factor loadings Λ using the variant correlation matrix.Let ρ denote the variant correlation matrix, so that its (j, k)-th element is given by ρ jk .For a given number of factors r, let Λ denote a p × r matrix with its columns given by the eigenvectors corresponding to the largest r eigenvalues of ρ multiplied by √ p.Then, the estimated (re-scaled) factor loadings are given by Λ For our analysis, the number of factors r is assumed to be known, for example, by inspecting the scree plot of the variant correlation matrix.In practice, one could also use data-driven methods to determine the number of factors (Bai and Ng, 2002;Onatski, 2010).

Point estimation under strong factor associations
Under the linear model β Y = β X θ 0 , we can use our variant-exposure and variant-outcome associations to construct a vector of estimating equations, βY − βX θ = 0. Here, there are p estimating equations for 1 unknown.
Given our estimated factor loadings, we can effectively reduce the degree of over-identification.Let ĝ(θ) = Λ ( βY − βX θ), so that ĝ(θ) = 0 provides r estimating equations for θ 0 .In other words, ĝ(θ) = 0 are the estimating equations implied by using the linear combination of variants Λ Z as instruments.For brevity, we will refer to Λ Z as the estimated factors.
A consistent estimator of the variance-covariance matrix of ĝ(θ) is given by Ω Then, a limited information maximum likelihood (LIML; Anderson and Rubin, 1949) estimator is given by We call θF the F-LIML estimator; the LIML estimator which uses the entire vector of estimated factors as instruments.
Theorem 1 (estimation with strong instruments).Under Assumptions 1-3 and Under the condition that all estimated factors are strong instruments, we can directly use Theorem 1 to construct asymptotic confidence intervals and tests for the causal effect θ 0 .

Identification-robust tests under weak factor associations
Standard t-tests based on Theorem 1 will not be valid when the estimated factors are weak instruments.This is because under weak instrument asymptotics the distribution of t-tests will depend on a measure of instrument strength (Stock et al., 2002).Instead, identificationrobust tests offer a way to make valid inferences in this setting.The basic idea behind this approach is to construct pivotal test statistics conditional on a sufficient statistic for instrument strength.Then, since the conditional distributions of these test statistics do not depend on instrument strength under the null hypothesis, the size of the tests can be controlled under weak instruments.
We can follow previous works by constructing these test statistics as a function of two asymptotically mutually independent statistics S and T , where S carries the information of the estimated factors being valid instruments, and where T incorporates information on the strength of these instruments.Specifically, under the null hypothesis , where ΩX = Λ Σ X Λ, ∆G = ΩX θ 0 , and where Ω(θ 0 ) is defined in Section 3.2.
Using S and T , we can construct three commonly-used identification-robust test statistics which will be asymptotically pivotal conditional on where Theorem 2 (identification-robust test statistics).Suppose that Λ β X = Θ(1) and 2 as n, p → ∞, where χ 2 1 and χ 2 r−1 denote independent chi-square random variables.
Since the F-AR statistic is not a function of T , it does not incorporate the identifying power of instruments.As a result, when the model is over-identified (r > 1), the F-AR test may have relatively poor power properties compared with the F-LM and F-CLR tests (Andrews, Stock, and Sun, 2019).Of the three methods, CLR-based tests are widely regarded as the most powerful, due to simulation evidence (Andrews, Moreira, and Stock, 2007) and favourable theoretical properties (Andrews, Moreira, and Stock, 2006).

Conditional tests that adjust for factor selection
While identification-robust tests are designed to control type I error rates for any level of instrument strength, in a sparse effects setting where a few estimated factors are strong instruments and most other estimated factors are very weak instruments, it would be tempting to proceed with F-LIML point estimation after removing very weak instruments.Without any instrument selection, we could use the entire r-vector of estimated factors as instruments, as in Sections 3.2 and 3.3.In contrast, here we wish to filter out certain elements if they are demonstrably weak instruments.
To this end, we construct pre-tests to identify a subset of estimated factors that pass of threshold of relevance; only this subset are then used as instruments.By Bai (2003), Λ estimates ΛH −1 where H is a rotation matrix.Hence, the estimated factor associations Λ βX actually estimate H −1 Λ β X , and not Λ β X as we may intuitively expect.Therefore, for each j ∈ [r], we will test the null hypothesis H 0j : (G) j = 0 against the alternative Simple t-tests are used to screen for relevant estimated factors, using the asymptotic approximation In particular, to conduct a two-sided asymptotic δ-level test for the significance of each estimated factor j ∈ [r], we compare the test statistic | Tj |, where Tj = ( ΩX ) Let S denote the selection event according to these pre-tests.For example, if r = 3 and only the first and third estimated factors pass the pre-test of relevance, then Only using the subset of estimated factors that have passed the pretest of relevance, we will test the null hypothesis H 0 : θ = θ 0 against the general alternative H 1 : θ = θ 0 .To appropriately account for pre-testing of relevant estimated factors, we seek to construct a conditional test which controls the selective type I error (Fithian et al., 2017).That is, for an α-level test, P(reject H 0 |S) ≤ α under H 0 ; i.e. we control the error rate of the test at α given the selection event S.
Suppose r is the number of selected estimated factors, and let R denote the indices of the selected set of estimated factors, so that the selection event is We also let Γ S denote an r × r selection matrix which is constructed such that ΛΓ S is the p × r matrix where its r columns are the columns of Λ that correspond to the selected factors R. We call the resulting LIML estimator which uses only the selected factors 'Selected LIML' (S-LIML), which is denoted θS .
To make honest inferences, we will need to consider how the distribution of θS is impacted by the uncertainty of the pre-test results S. The asymptotic conditional distribution of θS given S depends on an r-dimensional nuisance parameter D − 1 2 G, where D is the diagonal matrix with its (j, j)-th element given by (Ω X ) jj .Thus, along with the selection event S, we will condition on a sufficient statistic for the nuisance parameter, which will cause it to drop from the conditional distribution of θS (see, for example, Sampson and Sill, 2005).
We define some additional quantities in order to introduce a sufficient statistic for the nuisance parameter.The conditional covariance of the vector of pre-test statistics ( T1 , . . ., Tr ) and the estimate θS is given by C Theorem 3 (conditional distribution of S-LIML estimators).Under Assumptions 1-3, Equation (1) and the asymptotic conditional distribution of θS given U = u and S is approximately S K, and K ∼ N (0, 1).
Intuitively, this conditional distribution of θS reveals what the likely values of θS should be under H 0 : θ = θ 0 , given the results of the pre-tests S and observed value U = u.If the S-LIML estimate θS does not lie in a suitable likely region, then we interpret this as evidence against H 0 .
Let D denote an r × r diagonal matrix with its (j, j)-th element given by ( ΩX ) jj .We can construct consistent estimators

Simulation study
This section presents the performance of the identification-robust and conditional test statistics in a simulation study based on real genetic data.The simulation design aims to explore the robustness of our empirical results in Section 5, where we investigate cholesteryl ester transfer protein (CETP) inhibitors as a potential drug target for coronary heart disease (CHD).CETP are a class of drug which increase high-density lipoprotein cholesterol and decrease low-density lipoprotein (LDL-C).Therefore, our exposure X is LDL-C, our outcome Y is CHD, and our potential pool of instruments Z are a set of 196 genetic variants from taken a neighborhood of the CETP gene which are associated with LDL-C at p-value less than 5 × 10 −4 .
The genetic associations with LDL-C and CHD are generated from two independent GWASs using the normality assumptions in Assumption 1.The true variant-exposure associations β X are set as the measured variant associations with LDL-C, and the true variant-outcome associations β Y are set as the measured variant associations with CHD.The covariance matrix for βX is formed as Σ X = ρ • σ X σ X , where ρ is an estimated genetic correlation matrix of the 196 variants, and σ X is the vector of standard errors associated with the measured variant associations with LDL-C.The covariance matrix Σ Y for βY is formed analogously.
Across the range of scenarios we considered, the CLR-based tests were generally the most reliable and powerful.Therefore, our discussion here focuses on the results of the CLR tests, and we will omit presenting the results of the AR and LM-based tests for clarity.
Our proposed methods are based on dimension reduction of all variants rather than selecting specific variants.Instead of using estimated factors as instruments, we might wonder if it is better to simply omit highly correlated variants and conduct identification-robust tests with a smaller number of moderately correlated variants as instruments.This is the approach studied by Wang and Kang (2021).Thus, a natural source of comparison with our factorbased methods are CLR tests which filter out variants if they are correlated with an already included variant at some pre-specified threshold.In MR terminology, this is called pruning.
The tests CLR-20, CLR-40, CLR-60, and CLR-80, will denote CLR tests computed with pruned variants such that their correlations are bounded at R 2 thresholds of 0.2, 0.4, 0.6, and 0.8, respectively.We also compute CLR-01, which is an CLR test using a set of mutually almost uncorrelated variants (R 2 ≤ 0.01), after choosing the most strongly associated variant with LDL-C.
To conduct our S-LIML and F-CLR tests, we need to decide on an appropriate number of factors r.For our simulations, we chose r = 8 since there was a noticeable gap between the 8-th and 9-th eigenvalues of the variant correlation matrix ρ.
Our simulation study focuses on studying the performance of tests under three practical problems of interest in cis-MR analyses: small sample sizes, invalid instruments, and mismeasured instrument correlations.

Smaller sample sizes
Since proteins are the drug target of most medicines, cis-MR analyses often use proteins as the exposure of interest.Genetic associations with protein or gene expression are typically measured with smaller sample sizes than usual GWASs.In practice, this can result in a weak instruments problem since the reported standard errors will be larger.To mimic a small sample size problem, instead of taking the reported standard errors of variant-LDL-C and variant-CHD associations at face value, we divide the standard errors by a fixed constant 0 < η ≤ 1.Hence, we generate two-sample summary data as βX ∼ N ( βX , η −1 Σ X ) and βY ∼ N (θ 0 βX , η −1 Σ Y ), where βX are the reported variant-LDL-C associations.A smaller value of η implies a smaller sample size.Figure 2 shows that when all variants are valid instruments, using very correlated individual variants as instruments perform well; the CLR test with a pruning threshold of R 2 = 0.8 (CLR-80) performed best.Using estimated factors as instruments provides reliable inference for both F-CLR and S-LIML approaches, with both tests performing better than the CLR-01 test which only uses uncorrelated variants as instruments, although S-LIML is over-sized in very small samples (η = 0.25).For this, and future simulation designs, the S-LIML test screened the estimated factors using δ = 0.1, δ = 0.05, δ = 0.01 level pre-tests for the η = 0.25, η = 0.5, and η = 1 scenarios, respectively.With these thresholds most of the 8 estimated factors were retained as instruments, but usually not all of them.All 8 factors were selected in 34.4% of cases under η = 1, 17.9% of cases under η = 0.5, and just 5.6% of cases under η = 0.25.
The inflated type I error rates of the S-LIML approach under very small samples is perhaps not a surprise; while the CLR-based approaches are robust to weak instruments, LIML-based point estimators are not.Even if we appropriately screen out very weak instruments, if the strongest instrument is still quite weak, then the usual asymptotic approximations used to carry out inferences can be poor (Stock et al., 2002).

Invalid instruments
While the linear model ( 1) is thought to provide a reasonable approximation to practice, we may be concerned that proportionality of genetic associations β Y j = θ 0 β X j may not hold exactly over all variants j ∈ [p].Fortunately, the factor model approach may provide some robustness to the inclusion of invalid instruments.For example, under Bai and Ng (2010)'s analysis, a finite number of variants would be permitted to have direct effects on the outcome as long as the total number of variants p grows sufficiently faster than n.Here we study finite-sample behaviour under the model β Y j = θ 0 β X j + τ j , where the direct effects are generated as τ j ∼ U [−0.005τ , 0.005τ ], τ > 0, j ∈ [p].This is similar to an assumption of balanced pleiotropy often maintained in polygenic MR (Hemani et al., 2018).
When the direct effects τ j are random around zero, their impact is to inflate the variance of the resulting estimate.Therefore, when the distribution of the random effects τ j is known, valid inferences can be obtained by estimating a variance correction term to account for the extra heterogeneity (Zhao et al., 2020;Burgess et al., 2020).In contrast, here we set the direct effects τ j to be fixed, so that they directly impact the bias of the resulting estimates, with no de-biasing adjustment possible without imposing further restrictions.
Figure 3 highlights the robust performance of the F-CLR and S-LIML tests under invalid instruments.F-CLR was the best performing test, with only small size distortions.S-LIML was slightly more over-sized with seemingly no power advantage over F-CLR for our experiments based on CETP gene data.The pruning approaches were generally much less robust to invalid instruments, showing substantially inflated type I error rates.Intuitively, we would expect that more liberal pruning thresholds might perform better, as they allow for the direct effects to average out.Of the 5 pruning thresholds we considered, a R 2 = 0.6 threshold performed the best, with CLR-60 making use of 40 out of the 196 variants available.However, even CLR-60 was heavily over-sized when the direct effects were large (τ ≥ 2).

Mismeasured variant correlations
Our final simulation design investigates robustness to a very common problem in cis-MR analyses.It is often the case that the variant correlation matrix ρ is not provided alongside summary genetic association data.In such situations, if researchers want to make use of correlated variants, they would need to obtain estimates of the variant correlation matrix from a different set of subjects.This was the case for our CETP gene analysis, where we obtained a variant correlation matrix ρ from a reference panel (1000 Genomes Project, Auton et al., 2015) using the MR-Base platform (Hemani et al., 2018).
Discrepencies between the variant correlation matrix from the reference sample ρ, and the true variant correlation matrix for the two-sample summary data ρ may arise due to at least two reasons.First, the size of the reference sample may be significantly lower than the sample size of GWASs, thus allowing more room for sampling errors.Secondly, while all samples are asssumed to be drawn from the same population, in practice the two correlation estimates may be based on heterogeneous samples.
To study the problem of mismeasured variant correlations, for any distinct variants j, k, we assume the variant correlation estimate available to the researcher satisfies ρjk = ρ jk + κ jk , where ρ jk is the true variant correlation used to construct the two-sample summary associations, and κ jk is a fixed effect generated as κ jk ∼ ε • U [0, κ], κ ≥ 0, where ε is generated from the Rademacher distribution (it equals 1 with 0.5 probability, and -1 with 0.5 probability).
Figure 4 shows that F-CLR and CLR-01 were best able to control type I errors under mismeasured variant correlations.S-LIML was slightly more over-sized, and CLR-20 much more so, especially under large misspecification (κ = 0.15).We also note that CLR tests using pruned variants under correlation thresholds greater than R 2 > 0.2 were very unstable.In a polygenic MR setting, Wang and Kang (2021)'s analysis suggests that using uncorrelated variants is likely to be a sensible choice when the quality of variant correlation estimates is in doubt.While CLR-01 is certainly able to control type I errors, in our situation with balanced errors (variant correlations are equally likely to under or over-estimated), the S-LIML test and, in particular, the F-CLR test may have significant power advantages.Overall, we believe that the S-LIML and F-CLR tests provide a useful balance between alternative pruning-based approaches; offering robust inferences in several scenarios of practical concern, without compromising too much on power.CLR tests with uncorrelated variants appear to be less powerful than S-LIML and F-CLR tests, and may have poor performance when variants have fixed and balanced direct effects.On the other hand, CLR tests which use individual correlated variants as instruments can be highly sensitive to the pruning threshold chosen, and while they can be very powerful, their performance may degrade badly under misspecification.

Empirical application: CETP inhibition and CHD
Cholesteryl ester transfer protein (CETP) inhibitors are a class of drug which increase highdensity lipoprotein cholesterol and decrease low-density lipoprotein cholesterol (LDL-C) concentrations.At least three CETP inhibitors have failed to provide sufficient evidence of a protective effect on coronary heart disease (CHD) in clinical trials, before the successful trial of Anacetrapib showed marginal benefits alongside statin therapy (Bowman et al., 2017).However, with further trials currently ongoing, cis-MR analyses can offer important supporting evidence to complement experimental results.For example, in recent work, Schmidt et al. (2021)'s cis-MR analysis suggests that CETP inhibition may be an effective drug target for CHD prevention.
From a statistical perspective, we may have a few concerns regarding the criteria used by Schmidt et al. (2021) to select instruments.First, to guard against weak instrument bias, they select variants based on an in-sample measure of instrument strength (F-statistic > 15), which could potentially leave the analysis vulnerable to a winner's curse bias (Mounier and Kutalik, 2021).Secondly, to guard against heterogeneity of genetic associations, they use a measure of instrument validity to remove outliers (Cochran's Q statistic; see, for example, Bowden et al., 2019), which can result in size-distorted tests (Guggenberger and Kumar, 2012).Finally, for those correlated variants with strong measured associations with the outcome, they allow variants up to a pruning threshold of R 2 = 0.4; our simulation results show that inference can be sensitive to the choice of a pruning threshold.
Here we apply conditional inference techniques to investigate the genetically-predicted LDL-C lowering effect of CETP inhibition on the risk of CHD.Genetic associations with LDL-C were taken from a GWAS of 361,194 individuals of white-British genetic ancestry in the UK Biobank and were in standard deviation units (Sudlow et al., 2015).Genetic associations with CHD, measured in log odds ratio (LOR) units, were taken from a meta-GWAS of 48 studies with a total of 60,801 cases and 123,504 controls from a majority European population, conducted by the CARDIoGRAMplusC4D consortium (Nikpay et al., 2015).Genetic variant correlations were obtained from a reference panel of European individuals (1000 Genomes Project, Auton et al., 2015) using the twosampleMR R package (Hemani et al., 2018).A total of 368 genetic variants were drawn from the CETP region, with variant positions within ±100 kb from the CETP gene position indicated on GeneCards (Stelzer et al., 2016).The variant correlation matrix was highly structured, with 10 principal components explaining over 96 percent of the total variation of the 368 genetic variants.Noting the eigenvalue gap between the 10-th and 11-th eigenvalue, we selected r = 10 estimated factors as instruments for the F-AR, F-LM, F-CLR, and S-LIML methods.We also present results for CLR tests which use individual variants as instruments according to different pruning thresholds.
Only 4 of the 10 estimated factors were retained by the S-LIML method after pre-testing for relevant factors at the δ = 0.01 level.Table 1 shows that the S-LIML method gives a point estimate of θS = 1.075 for the LOR change in CHD associated with a 1 standard deviation change in LDL-C, with a corresponding 95 percent asymptotic confidence interval The results were reasonably robust to the choice of pre-testing threshold used to select relevant estimated factors.In particular, pre-tests at the δ = 0.1 and δ = 0.05 levels resulted in 6 estimated factors being selected, and returned 95 percent confidence intervals of θ 0 ∈ [0.292, 1.801] and θ 0 ∈ [0.245, 1.816], respectively.The F-LIML estimator which uses all 10 estimated factors offers a tighter interval than the conditional approaches, although its type I error rate can be more sensitive to weak instruments.1. 95 percent confidence intervals for pruning and factor-based approaches.θ gives the point estimate of the method if applicable.Q gives the p-value associated with testing the null of no heterogeneity in instrument-LDL-C and instrument-CHD associations using Cochran's Q-statistic; see the discussion below.
Table 1 also shows that the 95 percent confidence intervals obtained by inverting the F-CLR and F-LM tests are also tighter than the S-LIML method, while the F-AR approach is much less precise and unable to reject the null hypothesis of no causal association (F-AR p-value: 0.508).For CLR tests with pruned variants, although the confidence intervals are considerably narrower than the factor-based methods, they appear to be sensitive to the correlation threshold chosen.For the case of R 2 = 0.6, the CLR test rejected all plausible values of θ 0 and so failed to return a confidence interval.
Our simulation study illustrated how our factor-based approach was relatively robust to biases from direct variant effects on the outcome.The heterogeneity plots in Figure 5 can provide a limited insight into a potential invalid instruments problem.We note that while some individual variants deviate quite far from the trend line in Figure 5 (center), the heterogeneity appears to be reduced when considering estimated factor associations (Figure 5,right).
A more formal method to test for excessive heterogeneity uses Cochran's Q-statistic.Table 1 shows that the F-LIML and S-LIML approaches provide strong evidence of no heterogeneity when using the estimated factors as instruments.Since identification-robust methods do not provide point estimates, for the starred entries in the last row of Table 1 we evaluated the Q-statistic at the mid-point of the relevant confidence interval.There was no 'degrees of freedom' correction for this substitution which should lead to more conservative p-values (i.e.we are less likely to reject the null of no heterogeneity).Despite this, pruning-based approaches show evidence of greater heterogeneity when considering individual variants as instruments, particularly at the threshold R 2 = 0.4 (CLR-40 heterogeneity p-value: 0.006).
We conclude that the factor-based conditional testing approaches (F-CLR and S-LIML) provide robust genetic evidence that the LDL-C lowering effect of CETP inhibitors is associated with a lower risk of CHD.

Conclusion
There is an increasing focus on using cis-MR analyses to guide drug development; genetic evidence may be crucial to support novel targets, precision medicine subgroups, and the design of expensive clinical trials (Gill et al., 2021).While there are many methods available for robust inferences in polygenic MR using uncorrelated variants, powerful cis-MR investigations need to make sensible use of many weak and correlated genetic associations from the gene region of interest.Using uncorrelated variants as instruments may lead to imprecise inferences which can be vulnerable to heterogeneity in genetic associations.On the other hand, using very correlated variants can result in unstable inferences which are particularly sensitive to common problems of misspecification, such as mismeasured genetic correlations.
With the goal of making reliable and powerful cis-MR inferences, we have developed two conditional testing procedures based on the use of genetic factors as instrumental variables.The methods offer a balance between identification-robust testing procedures which use individual variants as instruments; offering more robust inferences without compromising too much on power.
) by a strong instruments assumption and Lemma 1, so that Part (i) follows by Lemma 3(ii).Therefore, Part (ii) follows by Lemma 6(i) and Slutsky's lemma since by Part (i) and Lemma 6(ii).

First, note that
, by Lemmas 3(v) and (vi), and Lemmas 5(i) and (ii).leads to the joint asymptotic normality result By Lemmas 3(vi) and 4(i), we have To remove dependence of θS on T , we condition on a sufficient statistic for the unknown nuisance , where K ∼ N (0, 1).Thus, if S is the selection event, and R is the set of indices of the estimated factor loadings corresponding to the retained instruments, approximately, S K, and P m (.) denotes the empirical distribution based on m draws of K ∼ N (0, 1).In practice, we also replace the variance components V S , C G and V G with β 2 X j V ar(z j ) − β 2 X k V ar(z k ) + β X j β X k V ar(z j ) 1 2 V ar(z k ) 1 2 ρ jk = V ar(x) + O(n −2κ X ), and so (Σ X ) jk = n −1 X V ar(z j ) − 1 2 V ar(z k ) − 1 2 ρ jk V ar(x) + O(n −(1+2κ) X ).Therefore, the covariance term (Σ X ) jk satisfies (Σ X ) jk = σ X j σ X k ρ jk + O(n −(1+κ) X ).Since σ X j σ X k ρ jk = Θ(n −1 X ), the o(n −1 X ) remainder term is negligible.
If γj = (α Y j , βY j ) is the logit regression estimate for the model P(y = 1|z j ) = L(w j γ j ), by standard asymptotic arguments, where σ 2 γ j = n −1 Y E[w j w j L 1 ] −1 .Note that the asymptotic variance (Σ Y ) jj of the estimated slope coefficient βY j , corresponds to the (2, 2)-th element of the covariance matrix σ 2 γ j .In particular,

[S-9]
Overall, Cov(z j , z k ) , where and the last line follows by noting that L(1 − L) = L 1 .
Thus, the covariance of βY j and βY k is approximately given by where ρ jk is the correlation between z j and z k .
F Simulation results: pruned CLR with invalid IVs [S-10] For the simulation design of Section 4.2 in the main text, we had presented simulation results for the CLR test using individual variants as instruments up to the correlation threshold R 2 = 0.6.Here we present the results which include other pruning thresholds that we had considered.In particular, Figure S.1 highlights that under invalid instruments the CLR test can be quite sensitive to the pruning threshold chosen.On the one hand, we would like to include more variants as instruments to 'even out' the invalid instruments problem under this particular case of balanced direct variant effects.However, the inflated type I error rates under the R 2 = 0.8 threshold suggest that including highly correlated variants may also lead to unreliable tests.Overall, we suggest it may be difficult in practice to determine a suitable pruning threshold for robust and precise inferences.
[ S-11] asymptotically distributed as'.For any sequences a n and b n , if a n = O(b n ), then there exists a positive constant C and a positive integer N such that for all n ≥ N , b n > 0 and |a

Figure 1 .
Figure 1.Scree plot of genetic associations (left), and genetic associations of LDL-C and CHD (right) in the HMGCR gene region.
and H is a rotation matrix.Let QS = S S, QST = S T , and QT = T T .Then, theAnderson and Rubin (1949) statistic with estimated factors is given by F-AR = QS , Kleibergen (2005)'s Lagrange multiplier statistic with estimated factors is given by F-LM = Q2ST / QT , and Moreira (2003)'s conditional likelihood ratio statistic with estimated factors is given by F ) j , against the critical value c δ := Φ(1 − δ/2), where Φ(.) is the standard normal cumulative distribution function.For each j ∈ [r], if | Tj | > c δ , then we have evidence to reject H 0j , and thus we include the estimated factor j as an instrument.Any estimated factors such that | Tj | ≤ c δ are deemed to be weak instruments, and are thus discarded.

Figure 5 .
Figure 5. Scree plot (left), 368 genetic associations with LDL-C and CHD (center), and 10 estimtaed factor associations with LDL-C and CHD (right), in the CETP gene region.

Figure
Figure S.1.Sensitivity to pruning thresholds: inflated type I errors with invalid instruments when testing the null hypothesis H 0 : θ 0 = 0.