## Introduction

Mendelian randomization is a technique for using genetic variants to estimate the causal effect of a modifiable risk factor from observational data [Davey Smith and Ebrahim, 2003]. It has recently been used to strengthen the evidence for causal roles in coronary heart disease of interleukin-6 [Swerdlow et al., 2012] and lipoprotein(a) [Kamstrup et al., 2009]. A limitation of Mendelian randomization is that genetic variants often only explain a small fraction of the variation in the risk factor of interest [Davey Smith and Ebrahim, 2004], so that assessing some causal associations requires sample sizes running into tens of thousands to obtain adequate power [Schatzkin et al., 2009]. This problem can be partially redressed by the use of multiple genetic variants [Palmer et al., 2011]. If each variant explains additional variation in the risk factor, then a combined causal estimate using all of the variants will have greater precision than the estimate from any of the individual variants [Pierce et al., 2011].

One potential source of such data is genome-wide association (GWA) studies, which examine the associations of many genetic variants with a trait. Many large GWA study consortia have been assembled, with sample sizes in some cases running into hundreds of thousands [Ehret et al., 2011]. Individual-level data on study participants are not always available due to issues of practicality and confidentiality of data-sharing on such a large scale. Presentations of results from GWA studies often report the summary associations of all variants that have reached a certain *P*-value threshold, and recently the release of association estimates in published GWA studies for all measured variants has been advocated [Editorial, 2012]. We investigate methods for using these summarized genetic associations with a risk factor and an outcome to estimate the causal effect of the risk factor on the outcome.

For the causal effect to be consistently estimated, each variant used in a Mendelian randomization analysis must satisfy three assumptions [Didelez and Sheehan, 2007]:

- it is associated with the risk factor,
- it is not associated with any confounder of the risk factor–outcome association,
- it is conditionally independent of the outcome given the risk factor and confounders.

A variant satisfying these assumptions is known as an instrumental variable (IV) [Greenland, 2000]. With a single genetic variant used as an IV and a continuous outcome, assuming all associations are linear, the causal effect of the risk factor on the outcome can be estimated as the ratio of the change in the outcome per additional variant allele divided by the change in the risk factor per additional variant allele [Thomas and Conti, 2004]. With individual-level data, each of these changes can be estimated using linear regression. For a binary outcome, such as disease, a log-linear or other appropriate regression model can be used in the regression of the outcome on the variant [Didelez et al., 2010]. If summarized (aggregated) data are available in the form of these regression coefficients, the ratio estimate of the causal effect can be calculated without recourse to individual-level data [Harbord et al., 2013]. However, with multiple variants, it is not clear how to integrate these genetic association estimates together into a single estimate of the causal effect.