Supplemental Materials : Constrained Instruments and their Application to Mendelian Randomization with Pleiotropy

Abstract In Mendelian randomization (MR), inference about causal relationship between a phenotype of interest and a response or disease outcome can be obtained by constructing instrumental variables from genetic variants. However, MR inference requires three assumptions, one of which is that the genetic variants only influence the outcome through phenotype of interest. Pleiotropy, that is, the situation in which some genetic variants affect more than one phenotype, can invalidate these genetic variants for use as instrumental variables; thus a naive analysis will give biased estimates of the causal relation. Here, we present new methods (constrained instrumental variable [CIV] methods) to construct valid instrumental variables and perform adjusted causal effect estimation when pleiotropy exists and when the pleiotropic phenotypes are available. We demonstrate that a smoothed version of CIV performs approximate selection of genetic variants that are valid instruments, and provides unbiased estimates of the causal effects. We provide details on a number of existing methods, together with a comparison of their performance in a large series of simulations. CIV performs robustly across different pleiotropic violations of the MR assumptions. We also analyzed the data from the Alzheimer’s disease (AD) neuroimaging initiative (ADNI; Mueller et al., 2005. Alzheimer's Dementia, 11(1), 55–66) to disentangle causal relationships of several biomarkers with AD progression.


A Controlling Z as Covariates or as Exogenous variables
Consider the following two ways to control Z in TSLS: (1) The 2SLS estimate of β 1 (causal effect of X on Y), by adjusting Z as covariates, is obtained from the regression models: whereX is the fitted value from first stage regression (1a), i.e.X = α 1 G + α 2 Z. We will call this estimateβ 1 .

B Solution to the Constrained Instrumental Variable Problem
Let M = X(X X) −1 X , then c G Xv = c G X(X X) −1 X Xv = c G MXv, max c∈R p ,v∈R r c G Xv = max c∈R p ,v∈R r c G MXv ≤ ||c G M|| ||Xv||, If we have rank(A) = p ≥ rank(Z) = k (columns are uncorrelated), then there exists a solution for w since this is a quadratic optimization problem with quadratic/linear constraints (Golub, 1973).
Consider the QR decomposition of B: where R is a k by k upper triangular matrix with positive diagonal elements. Q is an orthogonal matrix. S is a k by p − k matrix and represents the column permutation matrix (Gu and Eisenstat, 1996) to ensure that the diagonal elements of R are positive and non-increasing, i.e. R is invertible.
R is then unique under these conditions (Golub and Van Loan, 1996).

3
Constrained Instruments for Mendelian Randomization The problem then becomes: subject to conditions: We now know that the solution for d is any eigenvector corresponding to the largest eigenvalue of A 2,2 . There are at most p − k of them.
In conclusion, when n > p the (unique) solution of the constrained instrumental variable where Q is an orthogonal matrix defined by (4) and d is an eigenvector defined by (5).
C Algorithm for obtaining approximately sparse constrained CIV solutions with L 0 penalty 1. Initialization: For a given value of λ , start from an initial guessc and initial L 0 penalty where A − is a generalized inverse of A = Z G and µ is a step-size parameter in gradient descent algorithm.
iii Set c * = c/ √ c G Gc as the updated solution.
3. Update σ with σ = 0.5 σ prev , where σ prev is the previous value of σ used in step 2. If σ > σ min repeat all items in step 2. If not, stop the algorithm and record the last iteration of c as the final solution.
The maximization problem of Equation (8) is solved by repeatedly taking gradient descent steps (i), and then projecting the possible solution back into constrained set ((ii),(iii)). The step (ii) restricts the solution to be on the constrained set (7b) and step (iii) restricts it to the boundary of the constrained set (7a). The unconstrained gradient descent step followed by projection to the feasible set is equivalent to a direct gradient descent step on the feasible set (Cui et al., 2010). The parameters for step-size (µ) and number of iterations (T ) should be carefully chosen to achieve balance between computation cost and precision. That is, the states discovered by this algorithm may not achieve the global maximum value of Equation (6) even with a large number of iterations 5 Constrained Instruments for Mendelian Randomization if we use a step size that is too large. The decreasing list of values for σ is chosen to ensure that the approximation accuracy will gradually increase.

D Non-unique Solutions to the CIV smooth Problem
There may be multiple local solutions to the smoothed problem (Eq 8), which is not a convex optimization problem. Note that the maximization of a convex function over a convex set is not necessarily a convex problem. As a result, a local maximum solution of c may not be the global maximum solution, and numerical optimization techniques may get trapped into a local maximum. Therefore, we start from multiple (e.g. 100) initial points randomly sampled from a multivariate normal distribution N(0, I p ), and let the smoothed L 0 algorithm converge to a set of solutions, possibly arriving at multiple local modes. After examining correlations between all pairs of solutions, highly correlated solutions (≥ 0.9) are removed. The remaining solutions are combined into a matrix c * (of p rows). Finally, we construct new instruments Gc * and refer to them as the CIV smooth instruments.
An example of this potential multi-modal problem is shown in Figure S1, where one simulated dataset from Series II was analyzed. The hierarchical cluster dendrogram shows the solution space for this simulation, demonstrating that there exist multiple different CIV smooth solutions.
However, a principal component analysis of the solutions shows that in most simulations only 1 unique solution stands out. So although multiple distinct solutions do occur, they tend to be similar to each other. To obtain our CIV smooth estimator, we sample possible solutions by starting our converging iterations from multiple initial points, and combining all distinct solutions into a matrix c * . This approach provides a set of the potential instruments with strong association with X and low correlation with Z.

6
Constrained Instruments for Mendelian Randomization Figure S1: Cluster dendrogram (a) and principal component analysis (b) from a one-sample set-up, with X → Z and α x = α z = 0.1 across 200 simulations.
(a) Cluster dendrogram of all (100) converged CIV smooth solutions in one simulation. Red block denotes the identified hierarchical clusters using the number of clusters determined by the silhouette coefficient (Rousseeuw, 1987). The tuning parameter λ controls the amount of regularization, and each value of it corresponds to a specific fitted model. In general, the value of tuning parameter in regularization is chosen to achieve (i) prediction accuracy and (ii) recovering the valid model. The first goal is straightforward and can be easily attained by optimizing the prediction error Y − Xβ * . The latter is more important for our approach in the existence of pleiotropy because pleiotropic genotypes could be "informative" for a prediction model, but render the whole MR analysis invalid. Thus we choose the projected prediction error (Kang et al., 2016), Y − Xβ * , as the measure to obtain optimal tuning parameter value λ . Moreover, 10-fold crossvalidation is employed to estimate the target projected prediction error on a given dataset, in order to obtain consistent choice of λ .
We demonstrate the advantage of choosing λ according to minimized projected prediction error in CIV smooth with a simple simulation. The simulated data contains 9 invalid SNPs and 1 valid SNP, and CIV smooth is implemented on this data to obtain coefficient u j for each SNP j on 7 Constrained Instruments for Mendelian Randomization different values of λ . The regularization path of CIV smooth (the solution values of coefficient c with respect to λ ), is demonstrated in Figure S2. Notice that the coefficient of invalid SNPs could grow with increasing levels of regularization at the beginning of the path ( 0 → 0.1). Moreover,λ 2 , the tuning parameter value corresponding to minimized projected prediction error leads to a more sparse solution c thanλ 1 , which corresponds to minimized prediction error.

Z Contains Only Part of Pleiotropic Phenotypes
The purpose of this section is to evaluate the performance of CIV smooth method and its competitors, as we vary the proportions of observed pleiotropic phenotypes included in Z. Our simulation design follows the structure equations in Section 4.1, including both Series I (γ xz = γ zx = 0) and series II (γ zx = 0). Also, the association parameters α Z were varied to study the impact of strong (α z = 1) or weak (α z = 0.1) pleiotropic effects on performance. In each of the 2 × 2 = 4 scenarios described above, the number (p z = 20 or 50) of genotypes associated with Z and the proportion (0.1, 0.4, 0.6, 0.9) of pleiotropic phenotypes observed were varied to assess CIV smooth performance. We adopt one-sample set-up, as described in Section 2.5, in all simulations.
Specifically, we simulated a set of independent genotypes G in the same way as Section 4.1.
(X, Z Y) were simulated using the following equations: Simulation Series I : Standard Pleiotropy where is the set of all pleiotropic phenotypes for ith sample. S z is a randomly selected subset of (1, ..., p z ) corresponding to a proportion of all pleiotropic phenotypes (|S z |/p z ∈ {0.1, 0.4, 0.6, 0.9}). Z i contains the correspondingly observed pleiotropic phenotypes with respect to S z . See Table 2 for details about the rest parameters (α x , α z , n, p, p z ).
We  Figure S6: Sensitivity analysis results for proportion of pleiotropic phenotypes observed. Boxplots of the bias of the causal effect estimates,β − 1, are shown using the same parameter settings as in simulation Series II, in the scenario of strong pleiotropy (α x = 1, α z = 1). The rows display results corresponding to different values of the proportion of observed pleiotropic phenotypes. the columns show the number of genotypes associated with Z out of 100 genotypes associated with X. 14