Semiparametric methods for estimation of a nonlinear exposure‐outcome relationship using instrumental variables with application to Mendelian randomization

ABSTRACT Mendelian randomization, the use of genetic variants as instrumental variables (IV), can test for and estimate the causal effect of an exposure on an outcome. Most IV methods assume that the function relating the exposure to the expected value of the outcome (the exposure‐outcome relationship) is linear. However, in practice, this assumption may not hold. Indeed, often the primary question of interest is to assess the shape of this relationship. We present two novel IV methods for investigating the shape of the exposure‐outcome relationship: a fractional polynomial method and a piecewise linear method. We divide the population into strata using the exposure distribution, and estimate a causal effect, referred to as a localized average causal effect (LACE), in each stratum of population. The fractional polynomial method performs metaregression on these LACE estimates. The piecewise linear method estimates a continuous piecewise linear function, the gradient of which is the LACE estimate in each stratum. Both methods were demonstrated in a simulation study to estimate the true exposure‐outcome relationship well, particularly when the relationship was a fractional polynomial (for the fractional polynomial method) or was piecewise linear (for the piecewise linear method). The methods were used to investigate the shape of relationship of body mass index with systolic blood pressure and diastolic blood pressure.


INTRODUCTION
Often the shape of association between an exposure and an outcome is nonlinear. For example, the observed association between body mass index (BMI) and all-cause mortality in a Western context is J-shaped (or U-shaped), as risk of mortality is increased for individuals at both ends of the BMI distribution (Flegal, Kit, Orpana, & Graubard, 2013). However, particularly for underweight individuals, this could reflect either reverse causality or confounding, rather than a true causal effect of low BMI increasing mortality risk. Instrumental vari-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
© 2017 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc. able (IV) methods can be used to distinguish between correlation and causation. However, these methods typically assume that the exposure-outcome relationship is linear when estimating a causal effect (Hernán & Robins, 2006). In many cases, investigating the shape of the exposure-outcome relationship is the primary aim of a study. This can be used to define treatment thresholds for pharmaceutical interventions or health guidelines.
A natural way of tackling the nonlinearity problem in IV analysis is to perform a two-stage analysis similar to the well-known two-stage least squares method, except fitting a nonlinear function in the second stage (Horowitz, 2011;Newey & Powell, 2003). However, this approach requires the instrument and any covariates included in the first-stage model to explain a large proportion of variance in the exposure, as information for assessing the shape of relationship between the exposure and outcome will only be available for the fitted values of the exposure from the first-stage regression. If the proportion of variance in the exposure explained by the IV is small, then observing nonlinearity for this limited range of values is unlikely. In Mendelian randomization, the use of genetic variants as IV, genetic variants typically only explain a small percentage of the variance in the exposure (usually in the region of 1-4%; Ebrahim & Smith, 2008).
Two approaches for addressing nonlinearity in the context of Mendelian randomization have recently been proposed (Burgess, Davies, & Thompson, 2014;Silverwood et al., 2014). Burgess et al. (2014) assessed the consequences of performing a linear IV analysis when the exposure-outcome relationship truly was nonlinear, as well as stratifying individuals using the exposure distribution to obtain IV estimates, referred to as localized average causal effects (LACE), in each stratum. Silverwood et al. (2014) performed metaregression of LACE estimates across strata to examine whether a quadratic rather than a linear model was a better fit for relationships between alcohol consumption and a variety of cardiovascular markers.
In this paper, we present two novel semiparametric methods for investigating the shape of the exposure-outcome relationship using IV analysis developed for use in Mendelian randomization. The first is based on fractional polynomials (Royston & Altman, 1994;Royston, Ambler, & Sauerbrei, 1999), whereas the second fits a piecewise linear function. We also propose a test for nonlinearity based on the fractional polynomial method, and assess the impact of varying the number of strata of the exposure distribution used to test for nonlinearity and to estimate nonlinear relationships. We illustrate the methods using data from UK Biobank (Sudlow et al., 2015), a large UK-based cohort, to investigate the shape of the relationship between BMI and blood pressure using Mendelian randomization.

Stratifying on the IV-free exposure
We define the exposure-outcome relationship as the function relating the exposure to the expected value of the outcome. We initially assume that this function is homogeneous for all individuals in the population, and return to its interpretation in case of heterogeneity in the discussion.
To assess the shape of association between exposure and outcome using a single instrument , we first stratify the population using the exposure distribution. If we were to strat-ify on the exposure directly, then an association between the IV and outcome might be induced even if it were not present in the original data, thus invalidating the IV assumptions (Didelez & Sheehan, 2007). This problem can be avoided by instead stratifying on the residual variation in the exposure after conditioning on the IV, assuming that the effect of the IV on the exposure is linear and constant for all individuals across the entire of the exposure distribution (Burgess et al., 2014). In econometrics, this residual is known as a control function (Arellano, 2003). We calculate this residual by performing linear regression of the exposure on the IV, and then setting the value of the IV to 0. We refer to this as the IV-free exposure. It is the expected value of the exposure that would be observed if the individual had an IV value of 0, and can be interpreted as the nongenetic component of the exposure.
In each stratum of the IV-free exposure, we estimate the LACE as a ratio of coefficients: the IV association with the outcome divided by the IV association with the exposure.
The assumption that the effect of the IV on the exposure is constant is a stronger version of the monotonicity assumption (Angrist, Imbens, & Rubin, 1996), and hence the LACE are local average treatment effects (also called complier-averaged causal effects; Yau & Little, 2001) for each stratum (Imbens & Angrist, 1994). We then proceed to estimate the exposureoutcome relationship from these LACE estimates using two approaches: the first based on fractional polynomials, and the second a piecewise linear function.

Fractional polynomial method
The fractional polynomial method consists of metaregression of the LACE estimates against the mean of the exposure in each stratum in a flexible semiparametric framework (Bagnardi, Zambon, Quatto, & Corrao, 2004;Thompson & Sharp, 1999). Fractional polynomials are a family of functions that can be used to fit complex relationships for a single covariate (Royston & Altman, 1994). The standard powers used when modeling using fractional polynomials are = {−2, −1, −0.5, 0, 0.5, 1, 2, 3}, where the power of 0 refers to the (natural) log function. These powers are used throughout this paper. Fractional polynomials of degree 1 are defined as where ∈ . Similarly, fractional polynomials of degree 2 are defined as where 1 , 2 ∈ . In both cases, 0 is interpreted as log( ).
As fractional polynomials of degree larger than 2 are rarely required in practice, these were not considered in this paper (Royston & Altman, 1994). Because a causal effect is an estimate of the derivative of the exposure-outcome relationship (Small, 2014), we fit the LACE estimates using the derivative of the fractional polynomial function (from either (1) or (2)). The method proceeds as follows. First, we calculate the IVfree exposure, and stratify the population based on quantiles of its distribution. Second, the LACE estimate is calculated in each stratum as a ratio of coefficients (the LACE estimate for stratum iŝ| , | , wherê| , is the estimated association of the IV with the outcome in stratum and̂| is the estimated association of the IV with the exposure in the whole population), and the standard error of the LACE estimate is computed as (̂| , ) | (the first term of the delta method approximation; Thomas, Lawlor, & Thompson, 2007). Third, these LACE estimates are metaregressed against the mean of the exposure in each stratum using the derivative of the fractional polynomial function as the model relating the LACE estimates to the exposure values. The original fractional polynomial function then represents the exposure-outcome relationship. As this function is constructed from the LACE estimates, the intercept of the exposure-outcome curve cannot be estimated and must be set arbitrarily. If it is set to 0 at a reference value (for instance, the mean of the exposure distribution), then the value of the function represents the expected difference in the outcome compared with this reference value when the exposure is set to different values.
Confidence intervals (CIs) for the exposure-outcome curve can be computed arithmetically under a normal assumption either using the estimated standard errors from the metaregression or by bootstrapping the second and third steps from above (we maintain the strata and estimate of the IV on the exposure as in the original data, and estimate the associations of the IV with the outcome in bootstrapped samples for each stratum).
To explore a range of possible parametric forms, we fit all possible fractional polynomial models of degrees 1 and 2, and select the best-fitting one based on the likelihood. A fractional polynomial of degree 2 is preferred over one of degree 1 if the twice the difference in the log-likelihood is greater than the 95th percentile point of a 2 2 distribution for the best-fitting fractional polynomial in each class (Royston & Altman, 1994).

Piecewise linear method
Another way of estimating the exposure-outcome relationship is to use a piecewise linear approach. The exposure-outcome relationship is estimated as piecewise linear function with each stratum contributing a line segment whose gradient is the LACE estimate for that stratum. The function is constrained to be continuous, so that each line segment begins where the previous segment finished. As in the fractional polynomial method, although the intercept for each line segment is fixed by the previous line segment, the overall intercept of the exposure-outcome curve cannot be estimated and must be set arbitrarily.
CIs are estimated by bootstrapping the IV associations with the outcome as in the fractional polynomial approach. For a 95% CI, the piecewise linear method is performed for each bootstrapped dataset, and then the 2.5th and 97.5th percentiles of the function are taken at selected points across the exposure distribution; we chose the mean exposure values in each of the strata.

Tests of nonlinearity
There are already two proposed methods in this framework for testing whether a nonlinear exposure-outcome model fits the data better than a linear model. The first is a heterogeneity test using Cochran's Q statistic to assess whether the LACE estimates differ more than would be expected by chance. The second is a trend test where the LACE estimates are metaregressed against the mean value of the exposure in each strata; this is equivalent to fitting a quadratic exposureoutcome model.
A more flexible version of this method is to test the bestfitting fractional polynomial of degree 1 against the linear model. This can be achieved by comparing twice the difference in the log-likelihood between the linear model and the best-fitting fractional polynomial of degree 1 with a 2 1 distribution.

SIMULATION STUDY
To assess the performance of these methods in realistic scenarios for Mendelian randomization, we performed a simulation study. We simulated data for 10,000 individuals for an IV , a continuous exposure that takes only positive values, a continuous outcome , and a confounder (assumed to be unmeasured). The data-generating model for individual is (1), ∼ Unif(0,1), ∼ (0, 1), and ℎ( ) is the function relating the exposure to the outcome (the exposure-outcome relationship). Exposure values were taken to be positive and away from zero so that the outcome takes sensible values for log and negative power functions. The IV explains 2.6% of the variance in the exposure.

Choice of exposure-outcome model
For the fractional polynomial method, all possible fractional polynomials of degrees 1 and 2 were considered as the functional form of the exposure-outcome relationship.
Combinations of effect sizes for the parameters were chosen ranging from 0 to 2. For fractional polynomials of degree 2, we also considered effects in opposing directions for 1 and 2 ; these simulations yielded similar results to those discussed here (results not shown). Fixed-effects metaregression was used in the simulations, however, random-effects metaregression yielded similar results (results not shown).
For the piecewise linear method and comparisons between methods, linear, quadratic, square-root, and logarithmic functions were considered as the functional form of the exposure-outcome relationship, as well as a threshold model:

Evaluating the performance of the methods
To evaluate the fractional polynomial method, we first fitted the correct fractional polynomial model (i.e., with the correct degree and powers) and assessed the bias and coverage of the effect parameter estimates. Subsequently, we fitted all fractional polynomials of the same degree and selected the best-fitting polynomial based on the likelihood. We assessed the proportion of simulations where the best-fitting fractional polynomial was the correct fractional polynomial. If the correct fractional polynomial was not the best-fitting fractional polynomial, we tested whether it was in the group of fractional polynomials that fit the data almost as well as the best-fitting polynomial; defined as those fractional polynomials where twice the difference in the log-likelihood (compared with the best-fitting polynomial) was less than the 90th percentile point of a 2 distribution, where = 1 for comparing fractional polynomials of degree 1 and = 2 for comparing polynomials of degree 2.
To evaluate the piecewise linear method, we first compared the outcome estimates at the mean exposure value in each quantile to the values of the true model at the same points. The coverages of the bootstrapped 95% CIs were also evaluated at these points.
For comparing the fit of the fractional polynomial and piecewise linear models, we used the following heuristic function: where summation is across the quantile groups, and̄is the expected value of the outcome evaluated at the mean value of the exposure in each quantile group.

Varying the number of strata
In the initial simulations, the population was split based on the IV-free exposure into decile groups. Further simulations were performed varying the number of strata using 5, 10, 50, and 100 quantile groups. Tests of nonlinearity were performed to assess the impact of the number of strata on the empirical power of each test. The empirical power of each test was reported as the proportion of simulation replicates withvalue less than 0.05. The heuristic function (3) was calculated based on 10 deciles for each number of strata. For each simulation and set of parameters, 500 replications were performed. Bootstrap 95% CIs were generated using 500 bootstrap samples. All analyses were performed using R version 3.0.2.

Additional simulations to assess impact of violations of assumptions
We performed additional simulations in which the underlying assumptions that the effect of the IV on the exposure and the effect of the exposure on the outcome are fixed and independent were relaxed. In these simulations, we assessed both modeling assumptions by allowing the effect of the IV on the exposure to vary (by drawing the effect parameter from a normal distribution (0.25, 0.1 2 ) for each individual in the population), and allowing the exposure-outcome relationship to vary (by drawing the causal parameter from a normal distribution ( , 0.2 2 ) for each individual in the population). We assessed the impact of allowing each of these parameters to vary separately and both to vary together. In addition, we also allowed variation in both parameters to be correlated by drawing the parameters from a bivariate normal distribution with correlation 0.2. For fractional polynomials of degree 2, only the causal parameter for the second polynomial was allowed to vary across individuals.
We also performed further simulations using a lowfrequency genetic variant having a large effect on the exposure (minor allele frequency = 0.03, linear effect on the exposure = 0.75), and using the original genetic variant but having a superadditive (first allele increases exposure by 0.1 units, second by 0.3 units) and a subadditive (first allele increases exposure by 0.3 units, second by 0.1 units) effect on the exposure in the data-generating model.

Fractional polynomial method
Comparisons of fractional polynomials for all powers are provided in Table S1 (degree 1) and Table S2 (degree 2); a summary of results for the most commonly encountered powers is given in Table 1.
For fractional polynomials of degree 1, when fitting the correct fractional polynomial model, the causal estimate was Notes. Results for all the fractional polynomials of degree 1 (all effect sizes) and degree 2 ( 1 =1 and 2 =2) are presented in Tables S1 and S2; this table is a summary of results for the most commonly encountered powers. are the powers and are the effect parameters. Coverage refers to the number of replications where the true value of was contained within the corresponding 95% CI. The power(s) was correctly chosen (Correct) if the best-fitting fractional polynomial was also the correct fractional polynomial, whil the correct model was within the set of powers that fit the data equally, as well as the best-fitting fractional polynomial (Set) if the difference between twice the log-likelihood for the correct model and the best-fitting model was less than the 90th percentile of the relevant 2 distribution. SD, standard deviation; SE, standard error; FP, fractional polynomial; CI, confidence interval.
generally unbiased (Table 1). Coverage estimates were close to the nominal 95% rate, except for fractional polynomials of power 2 (and power 3; Table S1), where causal estimates were slightly biased, and this small bias led to undercoverage. However, under the null, causal estimates were unbiased and correct coverage rates were maintained. For fractional polynomials of degree 2, a similar pattern was observed except that small biases and resulting undercoverage was more common, although correct coverage rate under the null was always maintained.  When fitting all the fractional polynomial models, the correct fractional polynomial model was fitted more often for a fractional polynomial of degree 1, and when the power of the fractional polynomial differed substantially from 0. Although the power to detect the correct functional form was low for the logarithmic and square-root functions, this is to some extent an artifact of the choice of powers; if a basis with only one concave function (either a logarithmic or a square-root function) were used, then the correct model would be chosen more often. As the causal parameter in the model increased, the correct model was chosen more frequently. However, in all cases, the correct fractional polynomial was in the set of bestfitting fractional polynomials in at least 89% of simulations. The fractional polynomial test for nonlinearity rejects the null exactly when the linear model is not in the set of best-fitting fractional polynomials. With a null causal effect ( = 0), the probability of fitting the "correct" fractional polynomial was not estimated as all fractional polynomials with zero coefficients would describe the data equally well. In reality, the true exposure-outcome relationship is unlikely to have an exact functional form, so the ability to estimate the shape of the relationship is more important that the precise identification of the function.

Piecewise linear method
The piecewise linear method performed well when the true model was piecewise linear (such as a linear or a threshold relationship), with the predicted mean values of the outcome similar to their true values at the mean value of the exposure within each decile of the IV-free exposure ( Table 2). The bootstrapped CIs also had approximately 95% coverage at these points, except for the quantiles at or either side of the point of inflection of the threshold model. However, when the true model was not piecewise linear (in particular, for a quadratic relationship), estimates were biased and coverage was below nominal levels.
Using the heuristic function (3) to compare between the estimates for the best-fitting fractional polynomial and the piecewise linear model, the models performed similarly under a linear model. For a quadratic model, the fractional polynomial method outperformed the piecewise linear method, whereas the opposite was true for a threshold model. This is unsurprising, as the fractional polynomial method performed best when the true model was a polynomial, and likewise for the piecewise linear method when the model was piecewise linear.

Varying the number of strata
The best-fitting fractional polynomial method had a similar or slightly better model fit (judged by the heuristic function) when a greater number of strata were used (Table 3). However, the piecewise linear method fitted the data better when fewer strata were used. Although the fractional polynomial method ensures that the estimate of the exposureoutcome relationship is a smooth function regardless of the number of strata, the estimate from the piecewise linear method becomes increasingly jagged as the number of strata increases.
The coverage under the null (i.e., a linear model) was not overly inflated for any of the tests. In general, the T A B L E 3 Varying the number of strata and tests of nonlinearity fractional polynomial and quadratic tests were more powerful than the Cochran Q test across the simulations. The power of the Cochran Q test also decreased as the number of strata increased, whereas the power of the other tests either remained the same or increased. The quadratic test slightly outperformed the fractional polynomial test when the true model was a quadratic or a threshold model; the fractional polynomial test was slightly superior when the true model was a logarithm or a square-root model.

Additional simulations to assess impact of violations of assumptions
In the simulations where we relaxed the assumptions that the IV-exposure and the exposure-outcome effects are the same for all individuals, we found that the fractional polynomial models of degree 1 and the piecewise linear method both performed well in terms of bias and coverage (Tables  S3 and S4). The only concern was that tests of nonlinearity had slightly inflated Type I error rates when the IV-exposure and exposure-outcome effects were varied in a correlated way; Type I error rate inflation was not observed when the effects were varied either separately or independently.
In the simulations with a low-frequency genetic variant having a large effect on the exposure (Tables S5 and  S6), there was some bias in estimates. This is likely to be the result of weak instrument bias (Burgess & Thompson, 2011): with a low-frequency variant, the variation in instrument strength between the strata is much larger, and so the chances of weak instrument bias affecting the results in specific strata are increased. However, nominal Type I error rates for tests of nonlinearity were maintained. With a superadditive or subadditive model for the genetic association with the exposure (Tables S5 and S6), estimates of the causal parameter in the fractional polynomial method were unbiased, but the power to detect nonlinearity was somewhat reduced. Estimates in the piecewise linear method for a threshold exposure-outcome relationship were somewhat biased. This finding is consistent with previous work on measurement error in the independent variable: here, the IV-free exposure is estimated with error resulting from misspecification of the genetic association with the exposure (see Section 6).

APPLICATION OF METHODS TO THE RELATIONSHIP BET WEEN BMI AND BLOOD PRESSURE IN UK BIOBANK
We illustrate the methods proposed in this paper in an applied example considering the shape of relationship between BMI and blood pressure in the UK Biobank study. The observational relationship between BMI and blood pressure has been investigated previously in a variety of contexts: populations of lean individuals (Kaufman et al., 1997), Danish adolescents (Nielsen & Andersen, 2003), and Iranian adolescents (Hosseini et al., 2010). The relationship has been demonstrated to be monotonically increasing, with inconclusive evidence for or against nonlinearity due to limited sample sizes.
UK Biobank is a prospective cohort study of 502,682 participants recruited at 22 assessment centers across the United Kingdom between 2006 and 2010 (Sudlow et al., 2015). Participants were aged between 40 and 69 at baseline. Extensive health, lifestyle, biological, and genetic measurements were taken on all participants. At the time of writing this paper, genetic information was only available for 133,687 individuals of European ancestry. For individuals on antihypertensive medication, 15/10 mmHg were added to their SBP/DBP (where SBP is systolic blood pressure and DBP is diastolic blood pressure) measurement, respectively. A sensitivity analysis was performed in individuals who had no history of hypertension.
To create an allele score (also called a genetic risk score) of variants related to BMI to be used as an IV, we extracted the 97 variants previously associated with BMI at a genome-wide level of significance by the GIANT consortium (Locke et al., 2015). A proxy variant (rs751414; 2 = 0.99) was used instead of rs2033529, as this variant was not available in UK Biobank; the linkage disequilibrium information was calculated using the European samples from 1000 Genomes (Abecasis et al., 2012). All of the variants were either directly genotyped or well-imputed (INFO > 0.9). The allele score for each individual was computed by multiplying the number of BMI-increasing alleles for each variant by the effect of the variant on BMI (as estimated in the GIANT consortium) and summing across the 97 variants . Overall, this score explained 1.7% of the variance in BMI. We performed both fractional polynomial and piecewise linear methods for estimating the relationships of BMI with SBP and DBP. The fractional polynomial method was implemented using 100 strata, whereas the piecewise linear method was implemented using 10 strata to avoid the exposure-outcome curve being overly jagged. The reference point was set at 25 kg/m 2 .
To account for the multiple centers, we standardized the measure of BMI by stratifying individuals based on their residual value of BMI (the IV-free exposure) after regression of BMI on the allele score, age, sex, and center (as a categorical variable). Adjustment for age, sex, and center was also made in the regressions to obtain the LACE estimates in each quantile group. If additional population stratification were expected, we could additionally adjust for genetic principal components to minimize the effect of population stratification in biasing IV estimates.
To assess the assumption that the effect of the IV on BMI is constant over the entire distribution of BMI, we also considered BMI as the outcome and calculated the associations of the IV with BMI in each of the strata. We then conducted tests (trend and Cochran Q tests) to investigate heterogeneity in the IV associations with BMI in different strata.

Results of applied example
The exposure-outcome relationships for BMI with SBP and DBP estimated using the fractional polynomial and piecewise linear methods are presented in Figure 1. There were strong causal effects of BMI on both SBP and DBP ( -value < 1 × 10 −5 for the causal estimates differing from zero in the fractional polynomial methods). For comparison, the standard two-stage least squares linear estimate was 0.527 mmHg per 1 kg/m 2 increase in BMI (95% CI: 0.363, 0.691) for SBP and 0.433 mmHg (95% CI: 0.338, 0.528) for DBP.
There was strong evidence that the association between BMI and SBP was nonlinear, with the quadratic test yielding a -value of 0.0026 (fractional polynomial test -value = 0.0164, Cochran Q test -value = 0.0346). The best-fitting fractional polynomial of degree 1 for the relationship between BMI and SBP had power −0.5, and there was no evidence to suggest that a fractional polynomial of degree 2 fitted the data better ( -value = 0.135). The estimate of the exposureoutcome relationship from the piecewise linear method visually suggested a threshold-type relationship, with a steep slope up to a BMI value of about 32 kg/m 2 , and a slightly negative slope from 32 kg/m 2 onwards. The relationship between BMI and SBP was similar in individuals with no history of hypertension (Fig. S1).
The association between BMI and DBP was also nonlinear (quadratic test -value = 0.0005, fractional polynomial test -value = 0.0114, Cochran Q test -value = 0.0049), and there was strong evidence that the best-fitting fractional polynomial of degree 2 (with 1 and 2 = 3) fitted the data better than the best-fitting fractional polynomial of degree 1 ( -value = 0.0062). There was no evidence of a different relationship between BMI and DBP for underweight individuals, with the exposure-outcome curve increasing almost linearly up to a BMI of around 40 kg/m 2 . But for hyperobese individuals (BMI > 40 kg/m 2 ), DBP seemed to decrease sharply. This was particularly evident in the fractional polynomial method, which used a greater number of strata and hence had more resolution to consider the shape of the exposureoutcome relationship at the extremes of the BMI distribution. One potential reason for this finding is that hyperobese individuals with high DBP are less likely to be enrolled in UK Biobank, perhaps due to differential survival probability. Another reason could be the difficulties in estimating blood pressure in hyperobese individuals (Leblanc et al., 2013). However, there was no evidence that the relationship between BMI and DBP was nonlinear in individuals with no history of hypertension ( -value >0.05 for all tests; Fig. S1).
There was no evidence that the associations of the IV with BMI varied between the different strata (trend test -value = 0.135, Cochran Q test -value = 0.901).

DISCUSSION
In this paper, we have proposed and tested two novel methods for examining the relationship between an exposure and an outcome using IV analysis in the context of Mendelian randomization. Both methods rely on stratifying the population based on the IV-free exposure; the exposure minus the effect of the IV. A causal effect, referred to as a LACE, is estimated in each stratum of population. The first method performs metaregression on these LACE estimates using fractional polynomials. The second method estimates a continuous piecewise linear function, the gradient of which in each stratum is the LACE estimate for that stratum. Both methods were demonstrated in a simulation study to estimate the true exposure-outcome relationship well when its functional form corresponded to the form of the estimate from each method (i.e., when the exposure-outcome relationship was a fractional polynomial for the fractional polynomial method, and when the relationship was piecewise linear for the piecewise linear method), with causal estimates being close to unbiased and coverage rates generally maintaining nominal levels (in particular, coverage rates were always correct under the null). Additionally, tests of nonlinearity were provided and their performance was assessed. The quadratic and fractional polynomial tests had the best performance in terms of Type I error rate and power.

Comparison of methods
The recommendation as to which method to use depends on the aim of the investigation. The fractional polynomial method will always provide a smooth estimate of the exposure-outcome relationship, and as such has more consistent performance when a large number of strata are chosen (i.e., when the shape of the relationship is considered over a wider and more detailed range of the exposure distribution). Fractional polynomials of degree 1 had better performance than those of degree 2 in terms of bias and coverage of effect estimates. However, fractional polynomials of degree 1 are less flexible and would not be able to model complex exposure-outcome relationships. Additionally, they tend to smooth over discrepancies in the data. For example, if the LACE estimate for individuals in the lowest quantile group for BMI was substantially different to the other LACE estimates, then both this difference and any uncertainty in the LACE estimate would be smoothed over somewhat in the fractional polynomial estimate. Preference between the methods therefore comes down to a question of prior belief: if one truly believes the true exposure-outcome relationship to be smooth, and that estimates in the surrounding quantiles should be used to model the LACE in the target quantile, then the fractional polynomial method should be preferred. However, if one does not want to smooth over estimates, then the piecewise linear method should be preferred; however, the estimate of the exposure-outcome relationship will be more jagged and variable.
The number of strata and divisions between strata should ideally be specified before the analysis according to practical considerations (e.g., previously determined categories of BMI) and the sample size available. If the number of strata is too large, then each stratum will have a small sample size. The LACE estimate in a small stratum will be imprecise and may be susceptible to weak instrument bias (Burgess & Thompson, 2011), particularly if the genetic variant is rare.

Interpretation of the exposure-outcome relationship
If the function relating the exposure to the average value of the outcome is homogeneous across the population, then the methods provided in this paper estimate this function (the exposure-outcome relationship) even if there is unmeasured confounding. If the function is heterogeneous, then the situation is more complicated (Small, 2014). For example, taking BMI as the exposure, if the subject-specific effect curve (as defined by Small, 2014) is linear for all individuals in the population, but the magnitude of effect is greater for overweight individuals, then the exposure-outcome relationship will be quadratic (or at least convex and positive) rather than linear. The exposure-outcome curve at low values of the exposure is only estimated using underweight individuals, and at high values of the exposure only using overweight individuals. However, this is perhaps the most relevant way to express the exposure-outcome relationship, as the causal effect of reducing one's BMI from 20 to 18 kg/m 2 is not so relevant for someone with a BMI of 40 kg/m 2 . Hence, we do not claim any global interpretation of the exposure-outcome relationship as estimated in this paper apart from in the unlikely case that the functional relationship is homogeneous in the population. It is better interpreted as a series of local estimates, which are graphically connected in order to compare and contrast trends in these local estimates at different values of the exposure, and to compare the relative benefit of intervening on the exposure for individuals with different values of the exposure, but which does not necessarily reflect the effect of intervening on the exposure to take any value in its distribution for any single individual.

Measurement error in the exposure
As has been noted in other contexts, estimates of nonlinear relationships are sensitive to measurement error in the exposure (Keogh, Strawbridge, & White, 2012). The standard "triple whammy" of measurement error is likely to apply here: measurement error biases parameter estimates, reduces power, and obscures important features in the shape of relationships (Carroll, Ruppert, Stefanski, & Crainiceanu, 2006). For example, with a threshold relationship, measurement error in the exposure would mean that the point of inflexion in the exposure-outcome relationship would be less sharply evident. In the case of BMI, measurement error is not such an issue, as height and weight can be measured precisely, and neither variable experiences substantial diurnal or seasonal variation. However, for other exposures, measurement error may affect results more severely. As noted in the additional simulation analyses, bias due to measurement error can also occur if the model for the genetic association with the exposure is misspecified.

Requirement of concomitant and individual-level data
Many recent advances in Mendelian randomization have enabled investigations to be performed using summarized data on the genetic associations with the exposure and with the outcome only, and/or in a two-sample setting in which genetic associations with the exposure and with the outcome are estimated on separate groups of individuals (Burgess, Butterworth, & Thompson, 2013;. However, estimation of the exposure-outcome relationship requires both individual-level data and a one-sample setting (otherwise neither stratification of the population nor the estimation of genetic associations with the outcome in the strata is possible). Although large cohorts with concomitant data on genetic variants, exposures, and outcomes are becoming more widely available, particularly in the form of biobanks such as UK Biobank.
In conclusion, these two novel methods are useful in investigating nonlinear exposure-outcome relationships. The methods allow for easy graphical assessment of the shape of the relationship, and allied with tests of nonlinearity, provide an effective tool for assessing nonlinear exposure-outcome relationships using IV analysis for Mendelian randomization.