Implementing MR‐PRESSO and GCTA‐GSMR for pleiotropy assessment in Mendelian randomization studies from a practitioner's perspective

Abstract With the advent of very large scale genome‐wide association studies (GWASs), the promise of Mendelian randomization (MR) has begun to be fulfilled. However, whilst GWASs have provided essential information on the single nucleotide polymorphisms (SNPs) associated with modifiable risk factors needed for MR, the availability of large numbers of SNP instruments raises issues of how best to use this information and how to deal with potential problems such as pleiotropy. Here we provide commentary on some of the recent advances in the MR analysis, including an overview of the different genetic architectures that are being uncovered for a variety of modifiable risk factors and how users ought to take that into consideration when designing MR studies.

Mendelian Randomization (MR) is an approach which uses genetic data to infer if a risk factor is causally related to an outcome. It utilizes the random assortment of variants at meiosis to mimic a pseudo-randomised controlled trial. Essentially, if variants associated with a risk factor are also associated with the outcome of interest then, subject to some assumptions, a causal relationship may be inferred (Lawlor, 2016). Power is a major rate-limiting step in MR. Many early MR studies used a one-sample approach where the SNP-exposure and SNP-outcome associations were determined within a single data set. However, such an approach fails to capitalize on the power now available via consortia scale genome-wide association studies (GWASs), where more SNP instruments can be identified in the SNP-exposure step and/or where the size of the SNPoutcome data set is increased. It is hence now common to employ two sample approaches, which uses the largest possible datasets, with the inverse variance weighted (IVW) method used to combine estimates across SNPs (Pierce & Burgess, 2013). However, the IVW approach can yield biased estimates in the presence of horizontal pleiotropy. Two classes of approach try to address this; first approaches on the basis of for example, the median (Bowden, Davey Smith, Haycock, & Burgess, 2016) or mode (Hartwig, Davey Smith, & Bowden, 2017) can provide more robust estimates. Second, approaches on the basis of Egger regression can be used, resulting in valid inference in a broader set of scenarios (InSIDE assumption (Bowden, Davey Smith, & Burgess, 2015)). Whilst these approaches complement IVW estimates, allowing better triangulation of evidence on causality, these estimators are less efficient, reducing power (wider confidence intervals on the causal estimates).
An overview of these approaches has been previously described (Burgess, Timpson, Ebrahim, & Davey Smith, 2015;Zheng et al., 2017), although this is an active area of research, with new ongoing methods development. Whilst a fully updated review of the literature may be seen as premature, two high profile methodological approaches were published recently, which have the potential to address the pleiotropy issue more reliably. Hence, in this article, we focus on two approaches: (a) the Genome-wide Complex Trait Analysis-Generalized Summary Mendelian Randomization (GCTA-GSMR; Zhu et al., 2018) and (b) Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO; Verbanck, Chen, Neale, & Do, 2018) and provide some perspective on their utility from a MR practitioner's point of view.
The GCTA-GSMR framework developed by Zhu et al. (2018) is a generalized model to draw MR causal inference between any modifiable exposure and outcome of interest, expanded from its previous SMR framework, which was originally developed to evaluate causality between the gene expression and disease outcomes. GSMR builds on previous approaches for modeling multiple correlated SNPs in MR , by estimating the Linkage Disequilibrium (LD) between SNPs from a reference samplethis avoids the power loss inherent in only using uncorrelated SNPs. The GCTA-GSMR model borrows the GCTA-SMR heterogeneity in dependent instrument (HEIDI) test (Zhu et al., 2016) for assessing heterogeneity in the causal estimates across instruments; the test removes outliers, which may be associated with confounding factors. In addition, GCTA-GSMR models the error in the SNPexposure estimate, a term that was left out in conventional 2-sample MR models as it was assumed to be negligible in the delta-approximation when the F-statistic for the SNPexposure association is large (Stephen Burgess, Butterworth, & Thompson, 2013). The method also implemented a multivariate MR framework to investigate mediation and marginal contribution(s) of multiple risk factors on disease outcomes. The software is easy to apply, requiring only SNPexposure and SNP-outcome genetic association estimates and T A B L E 1 Comparison of SNP-Heterogeneity tests across MR-PRESSO, gSMR, and classical 2-sample MR methods

Formulation of heterogeneity test
Test statistics Description The test statistic T i converges to χ df ( = 1) 2 Computes SNP-level heterogeneity only, and uses the HEIDI test to discard outliers (e.g. T i p value < 0.01). However, theoretically possible to implement a global test.

Empirical
Relies on bootstrap to generate empirical distribution for the causal estimates. The main difference is that it uses a "leave-oneout" approach to obtain unbiased RSS values. Evaluates both SNP-level and global heterogeneity.
Note that this can also be rewritten as Mode, median, inverse variance weighted models Fits an additional intercept term before adjusting for directional pleiotropy. Also models global heterogeneity Abbreviations: GSMR, generalized summary mendelian randomization; HEIDI, heterogeneity in dependent instrument; MR-PRESSO, mendelian randomization pleiotropy residual sum and outlier. β zx and β zy refer to the SNP-exposure and SNP-outcome association estimate. β i refers to the wald-type estimator for SNP i, given by β β best is the GSMR causal estimate for the SNP at top 25-th percentile of log(p value) on the SNP-exposure association. The reason for not using the SNP with the highest log(p value) is to avoid potential SNP-pleiotropy generating a bias on the test statistics. β IVW is the inverse-variance weighted estimate for all SNP instruments. β j − refers to the IVW estimate for all SNP instruments excluding SNP j. Finally, β 0 is the MR-Egger intercept of the regression. Each model in the above also assumes that β values follow Gaussian distributions. In other 2-sample MR models, heterogeneity is often quantified via the Cochran Q test statistics (or Q' for modified MR Egger). a LD-matrix to account for correlation between SNPinstruments. On the other hand, MR-PRESSO by Verbanck and colleagues (Verbanck et al., 2018) assesses pleiotropy from a different viewpoint. MR-PRESSO adopts a "leave-oneout" approach to evaluate whether a specific SNP-instrument is driving the difference in computed residual sum of squares (RSS) against simulated expectations. Briefly, the model incorporates three stages to examine the extent of horizontal pleiotropy. First, a global test is conducted to test whether the total RSS (computed by excluding one SNP each turn) is consistent with that expected by chance. The second stage uses the RSS of individual SNP-instruments to identify outliers. The third stage employs a distortion test to determine the extent to which outliers change the MR causal estimates. Because simulation is used to derive p values, the computational requirements are not trivial.
Required user-input is similar to GCTA-GSMR, although as the approach does not, however, handle correlated SNPs, there is no requirement for a LD-matrix; instead, SNPs must be pre-screened for LD.
Although the MR-PRESSO and GSMR HEIDI approaches tackle pleiotropy within a different framework, they are conceptually similar. Both assume that most SNPs are not strongly affected by horizontal pleiotropy and attempt to control SNP-heterogeneity by removing SNP-outliers. In conventional 2-sample MR techniques, heterogeneity of the causal estimates derived from SNPs is often quantified by the Cochran Q test statistics (Bowden et al., 2018). For MR-PRESSO and GSMR, the methodological differences mainly come from the choice of formulation of the teststatistics to quantify statistical heterogeneity and reliance of parametric/non-parametric solutions (see Table 1). Abbreviations: GCTA-GSMR, genome-wide complex trait analysis-generalized summary mendelian randomization; MR-PRESSO, mendelian randomization pleiotropy residual sum and outlier; SNP, single nucleotide polymorphism.
Causal estimate refers to the estimated effect size (log(OR) on coronary artery disease (CAD) risk per standard deviation increase in genetically predicted LDL-cholesterol (LDL-c). SE refers to the respective standard errors of the causal estimate. Nb denote the number of simulation replicates required to generate the null distribution used in the MR-PRESSO outlier tests. The data for these traits were extracted from publicly available GWAS summary statistics (LDL-c from http://csg.sph.umich.edu/willer/public/lipids2013/; CAD from http://www.cardiogramplusc4d.org/data-downloads/).
Despite MR-PRESSO and GSMR being promising additions to the MR sensitivity toolbox, there are some important limitations. As previously mentioned, the MR-PRESSO model does not incorporate LD (although theoretically it could be implemented using a multivariate normal simulation framework), which may mean the variance explained in the modifiable risk factor is lower than that attainable when correlated SNPs are included. Although GCTA-GSMR does allow correlated SNPs to be included, the model relies on the LD reference panel used to inform LD between SNP instruments being reflective of the target sample (Vilhjálmsson et al., 2015) in a 2-sample MR design. This is unlikely to be an issue for quantitative outcomes but may impact findings for disease outcomes if the LD pattern is substantially different between cases and controls. MR-PRESSO applies a global distortion test, to evaluate whether the removal of the potentially pleiotropic instrument makes a meaningful difference to the overall causal estimate, whereas GCTA-GSMR filters SNP-outlier one at a time and does not apply a global test. Because of its reliance on simulation, the MR-PRESSO runtime varies. In our test F I G U R E 1 Illustrative scenarios for the genetic architecture of modifiable risk factors. The figure above shows the Manhattan plots (left panel) illustrating the different type of genetic architecture for modifiable risk factors used in MR studies. The red line (at y = log10(5e-8)) indicates the genome-wide significance (GW) threshold, where variants with a -log10(p value) above the line are deemed to be genome-wide significant. As genome-wide significant SNPs have F-statistics > 30, they can be used as viable instruments given the other MR-assumptions hold. The GWAS for trait A was modeled after coffee consumption; trait B modeled after Alcohol intake; trait C modeled after BMI. Note that the plots above are illustrative and do not represent the current state of knowledge for these traits. GWASs, genome-wide association studies; SNP, single nucleotide polymorphism example using publicly available GWAS data (Table 3), 10,000 simulation replicates were sufficient (runtime about 6 minutes). The GCTA-GSMR runtime was faster although in practice runtime is not a major issue for either method. In common with other MR sensitivity models (Zheng et al., 2017;median, mode, MR Egger), although both approaches can theoretically be applied to a relatively small number of SNP instruments (>5 say), their power to identify outliers is likely to be limited in such scenarios.
Before we further evaluate the feasibility of these MR approaches in practice, it is useful to give some thought to the likely genetic architecture of the exposure of interest. To examine how practical these recently developed models are in terms of modeling pleiotropy, let us first consider three illustrative scenarios for the genetic architecture of the modifiable risk factor of interest (Figure 1) modeled after real traits (coffee consumption; Coffee & Caffeine Genetics Consortium et al., 2015), alcohol intake (Liu et al., 2019), and body mass index (Locke et al., 2015). Power is a key issue in MR and power is directly related to the variance explained by the chosen SNP instruments (Brion, Shakhbazov, & Visscher, 2013). Hence a key consideration is the increase in cumulative r 2 as increasingly weaker instruments are added (Figure 2). With the total instrument r 2 calculated, the statistical power for the MR analysis can then be easily estimated using the online MR power calculator, mRnd (http://cnsgenomics.com/shiny/mRnd/) web interface (Brion et al., 2013).
For modifiable risk factors from scenario 1 (Figure 1, top panel), power will be adequate with only a few SNPs; most statistical pleiotropy evaluation methods (including GCTA-GSMR and MR-PRESSO) will not be effective in such situationsinstead, it is usual to include the few SNP instruments on the basis of biological grounds. In addition, Phenome-wide Association Studies (Hebbring, 2014;PheWAS) can be conducted to evaluate whether the SNPs affect putative confounders of the exposureoutcome relationship.
On the basis of scenario 2a (Figure 1, middle panel), if the confidence intervals on the causal odds ratios are sufficiently narrow with just a few SNPs of large effect, then theoretically, the MR analysis can proceed as per scenario 1. However, if additional polygenes are available then these can be used alongside the genes of large effect in a GCTA-GSMR or MR-PRESSO analysisthis will provide a statistical evaluation of whether pleiotropy will potentially bias causal inference. This statistical approach may be used to supplement the information on the biological function of specific SNP instruments, where this data is available for the risk factor of interest.
In scenario 2b (Figure 1, bottom panel), power is very unlikely to be sufficient with just a few top SNPs and pleiotropy evaluation for a large number of required polygenes will again be critical. With a large number of SNPs, incorporating biological/functional information for each SNP is unlikely to be tractable and the statistical approaches to identify and remove outliers in GSMR and MR-PRESSO will be useful. In the flowchart (Figure 3), depending on the anticipated genetic architecture of the modifiable risk factor we provide guidance on the preferred MR approach. An overview of the broad selection of modifiable risk factors commonly used in MR studies is given in Table 2 where MR analyses involving a low number of variants remain prevalent in the literature. Furthermore, Table 2 also shows no clear relationship between total variance explained by instruments and the number of instruments, hence the need to evaluate the genetic architecture for our trait of interest before deciding a sensible approach (Table 3).
The MR flowchart (Figure 3) provide potential guidelines on how to utilize various sensitivity analyses at different stages of the MR analysis. Although we attempt to streamline the process for display in the chart, in practice every step requires critical consideration. First, plotting the cumulative r 2 (Figure 2) can help breakdown the distribution of variance explained by instruments to evaluate whether sufficient variance can be captured by several SNPs to allow a well-powered MR. The choice and F I G U R E 2 Distribution of cumulative SNP variance explained based on different forms of polygenicity in genetic architecture. The x-axis represents the cumulative variance explained by SNPs (commonly denoted as r^2) for the underlying trait of interestan important indicator of power for MR analyses. While the y-axis refers to the number of instruments starting from the SNP with the largest r^2 on the underlying trait. The mixed form is analogous to Scenario 2a in the main text. The change in cumulative variance explained by instruments can be used to evaluate whether there is any marginal benefit (on power) for including more SNP instruments. SNP, single nucleotide polymorphism proposed method for pleiotropy assessment will then depend on whether substantial variance can be captured by only a few SNPs. Although approaches, such as GCTA-GSMR and MR-PRESSO offer a statistical approach for dealing with outliers, in some scenarios biological information is available and should be used sensibly (e.g. if one of the SNP instruments explains a high proportion of variance and that SNP has strong pleiotropic effects on putative confounders then it may make sense to drop the SNP before outlier screening in e.g. MR-PRESSO). Where applicable, bidirectional MR (Davey Smith & Hemani, 2014) can be conducted to clarify horizontal pleiotropy from mediation/vertical pleiotropy. Note that statistical methods (for 2-sample MR) available in the literature are not limited to those described in Figure 3.
It is becoming clear that many modifiable risk factors of interest in causal inference studies have a polygenic architecture. Although power remains a ratelimiting step in some applications, with the advent of very large bio-bank scale studies, there are scenarios where power is reasonable provided one is willing to use large numbers of SNPs as MR instruments. In such situations, the pleiotropy assessments provided by tools, such as MR-PRESSO and GSMR will be invaluable in enabling effective and robust causal inference using MR.