Simulation study of instrumental variable approaches with an application to a study of the antidiabetic effect of bezafibrate

Authors


ABSTRACT

Purpose

We studied the application of the generalized structural mean model (GSMM) of instrumental variable (IV) methods in estimating treatment odds ratios (ORs) for binary outcomes in pharmacoepidemiologic studies and evaluated the bias of GSMM compared to other IV methods.

Methods

Because of the bias of standard IV methods, including two-stage predictor substitution (2SPS) and two-stage residual inclusion (2SRI) with binary outcomes, we implemented another IV approach based on the GSMM of Vansteelandt and Goetghebeur. We performed simulations under the principal stratification setting and evaluated whether GSMM provides approximately unbiased estimates of the causal OR and compared its bias and mean squared error to that of 2SPS and 2SRI. We then applied different IV methods to a study comparing bezafibrate versus other fibrates on the risk of diabetes.

Results

Our simulations showed that unlike the standard logistic, 2SPS, and 2SRI procedures, our implementation of GSMM provides an approximately unbiased estimate of the causal OR even under unmeasured confounding. However, for the effect of bezafibrate versus other fibrates on the risk of diabetes, the GSMM and two-stage approaches yielded similarly attenuated and statistically non-significant OR estimates. The attenuation of the OR by the two-stage and GSMM IV approaches suggests unmeasured confounding, although violations of the IV assumptions or differences in the parameters estimated could be playing a role.

Conclusion

The GSMM IV approach provides approximately unbiased adjustment for unmeasured confounding on binary outcomes when a valid IV is available. Copyright © 2012 John Wiley & Sons, Ltd.

INTRODUCTION

Instrumental variable (IV) methods are increasingly used to adjust for unmeasured confounding in pharmacoepidemiology.[1-8] An IV is a variable that is associated with the treatment but is independent of unmeasured confounders and has no direct effect on the outcome.[9] In addition, an IV is often assumed to satisfy a monotonicity condition, i.e., that subjects' treatment level increases monotonically with the level of the IV.[9] IVs satisfying the monotonicity condition enable identification of the local average treatment effect (LATE), i.e., the treatment effect among compliers. Under linear models, two two-stage IV approaches, i.e., two-stage predictor substitution (2SPS) and two-stage residual inclusion (2SRI), have been shown to yield unbiased estimators of the LATE.[9] In 2SPS, a first-stage model is fit to predict the treatment based on the instrument and covariates, and a second-stage model is fit for the mean of the outcome as a function of the predicted value of the treatment from the first stage and the covariates; the estimated LATE is the coefficient on the predicted treatment in the second-stage model.[10] In 2SRI, the first stage is the same as for 2SPS, but the second-stage model is the mean of the outcome as a function of the residual from the first-stage regression, actual value of the treatment, and the covariates; the estimated LATE is the coefficient on the actual treatment.[3, 11] For continuous outcomes, the 2SPS and 2SRI IV estimators are equivalent. For binary outcomes, the 2SPS and 2SRI IV odds ratio (OR) estimators are not equivalent, and both have been shown to be biased for the LATE when there is strong confounding.[12] An alternative to the biased two-stage IV estimators of the LATE OR is the generalized structural mean model (GSMM), which Vansteelandt and Goetghebeur have shown is a consistent estimator under certain conditions.[1, 2] Here, we implemented the GSMM approach such that it allows both levels of a dichotomous IV to have access to the treatment, as is the case in most observational studies, and we evaluated the bias of the method with a simulation study.

In addition, we compare different estimators in a study of the effect of bezafibrate on the risk of diabetes. Bezafibrate is a lipid-lowering medication that is widely prescribed in the United Kingdom. Recent published results suggest that the protective effect of bezafibrate against the onset of diabetes is superior to that of other fibrates.[13-17] Flory et al. published a retrospective cohort study using the General Practice Research Database (GPRD) to compare the risk of diabetes between users of bezafibrate versus other fibrates and showed lower incidence of diabetes in bezafibrate users.[18] This analysis controlled for measured confounders but not unmeasured confounders. Inspection of baseline data in the publication by Flory et al. showed statistically significant differences in the prescribing patterns for different fibrates, most notably higher rates of baseline diabetes in patients receiving fenofibrate compared to patients receiving bezafibrate. Although the clinical properties and indications for the fibrates are very similar to one another, this observation suggested that providers might still have subtle prescribing preferences that could introduce unmeasured confounding. To control for this potential unmeasured confounding, we present IV-based results, using the previous prescription from the same practice as the IV; this preference-based IV has been used by other pharmacoepidemiologic studies.[6, 19-21]

METHODS

Notation and assumptions

The treatment (bezafibrate) variable is Z : Z = 1 denotes that a patient received the treatment of interest (bezafibrate); and Z = 0 means a patient received an alternative treatment (other fibrate). The IV variable is R : R = 1 corresponds to a patient exhibiting level 1 of the IV that predisposes to bezafibrate treatment (practice's previous fibrate prescription was for bezafibrate) and R = 0 to a patient exhibiting level 0 of the IV that predisposes to a non-bezafibrate fibrate (practice's previous fibrate prescription was for a different fibrate). The vector of observed baseline confounders will be denoted by X.

For the outcome variables, we define Y as the observed outcome: Y = 1 if the patient exhibits the outcome (incident diabetes), and Y = 0 if the patient does not exhibit the outcome (no diabetes). The potential outcome Y(1) is the outcome a patient would exhibit if she/he were to receive bezafibrate; Y(0) is the outcome if the patient were to receive a different fibrate. Under the assumptions below, Y(1) = Y if the patient actually receives bezafibrate; and Y(0) = Y if the patient receives a different fibrate.

The potential treatment received Z(1)(Z(0)) is the treatment level a patient would receive if predisposed (not predisposed) to take bezafibrate by the IV. Subjects can be classified into four compliance classes: (i) ‘always-takers’ of bezafibrate, Z(1) = Z(0) = 1, would always take the fibrate previously prescribed by that practice; (ii) ‘compliers’ with the suggestion of the IV, Z(1) = 1, Z(0) = 0; (iii) ‘defiers’ take the opposite of what the previous prescription was (‘defiers’), Z(1) = 0, Z(0) = 1; and (iv) never takers of bezafibrate, Z(1) = Z(0) = 0.

The assumptions of the IV approaches pertain to the relationships among the IV, treatment, and outcome. We make the same five assumptions about an IV as Angrist, Imbens, and Rubin[9]: (i) stable unit treatment value assumption (SUTVA);(ii) random assignment; (iii) exclusion restriction, which means the IV affects the outcome only through the treatment and has no direct effect; (iv) non-zero average causal effect of the IV on treatment; and (v) monotonicity, which means that there are no defiers, as defined above. These assumptions enable identification of the LATE OR, the likelihood of success under treatment compared to control for the compliers.

Models

Details of the estimation procedure for the GSMM and the two-stage IV procedures are in Appendix I and II. We focus here on the GSMM. The GSMM seeks to estimate

display math(1)

where ψ is the causal log OR for the effect of treatment on the risk of the outcome (bezafibrate effect on diabetes risk) among those who received treatment and have IV level r. The causal log OR ψ in Equation (1) is assumed to not be modified by the IV R; consequently, ψ is the treatment on treated (TOT) causal log OR, measuring the effect of treatment for those who received treatment. Under the assumptions described in Section 2.1 and the assumption that the causal log OR is not modified by the IV, ψ is equal to the causal log OR for the effect of treatment for the compliers.[22]

Because the nonlinearity of the logistic function distorts integration of the probability model under (1) with respect to treatment Z, which is necessary for estimation, we need to specify an ‘association’ model for each level of the IV. Such a model is not causal, because no parameter corresponds to a causal treatment effect. The association model is specified separately for each level of the IV as a standard logistic model for the log OR of treatment on outcome, adjusting for observed covariates and the IV:

display math(2)
display math(3)

where the first subscript for the β, log OR parameters indicates the level of the IV within which the logistic models in (2) and (3) are applied. For a clinical trial with controls not having access to treatment, Equation (2) is the only association model, because in the control arm, Z = 0 for all patients, so that one cannot apply the association model in Equation (3). However, for observational studies, Z can be either 0 or 1 for either level of the IV, so we specify the second association model in (3) for the second level of the IV. It is explained in Appendix 1 of the supplementary materials how the structural model and the association model are combined to estimate the causal OR ψ.

SAS macros for implementing the GSMM approach are provided in the Appendix.

SIMULATIONS

The simulations for evaluating the GSMM, two-stage, and standard logistic approaches require a fully specified (parametric) model to generate the potential outcomes for Y and Z. We simulate data from the compliance class model[9] to serve as the true model, as it is fully parametric and also accommodates the different assumptions that lead to different causal effects (e.g., LATE effect). Appendix III describes the parameters of the compliance class model used to simulate the data. The parameters that we vary in the simulation study include (i) the amount of unmeasured confounding, which is measured by δ, the difference on the logit scale in the probability that the outcome is 1 under no treatment between compliers and never-takers; (ii) how frequent the outcome is; and (iii) the strength of the instrument, which is measured by the proportion of compliers. We then analyzed each simulated dataset with the GSMM, two-stage, and standard logistic approaches. Our goal is to compare the bias, mean squared error, and confidence interval coverage of the GSMM estimator and compare it to the two-stage estimators.

RESULTS

Simulation results

Table 1 presents simulation results when the sample size was 10,000 and there were no always-takers. For a frequent outcome (true risk of outcome is 0.30), the percent absolute bias for the GSMM was very small (less than 0.35%) with no consistent increase with the magnitude of unmeasured confounding. For a rare outcome (true risk of outcome is 0.03), the percent absolute bias was larger than for the common outcome setting, ranging between 1.50% and 11.00%, with an increase in positive bias with the magnitude of confounding. When there was a large amount of unmeasured confounding that biased the estimator upwards, and a difference of 2 on the logit scale in the probability that the outcome is 1 under no treatment between compliers and never-takers, the estimation routine did not converge for 36% of the simulations, which were discarded from the results. Table 2 presents the results of the simulations of sample size 10,000 that include always-takers. Similar to the non-always-taker case in Table 1, the bias of the treatment effect estimator was also very small (less than 0.26%) for the frequent outcome. As with the non-always-takers case for the rare outcome, the bias estimated from the simulation was larger than for the frequent outcome setting, but it was still very small (less than 4%). Unlike the situation with the non-always-takers, the magnitude of bias did not increase with the magnitude of confounding with always-takers. In both Table 1 and Table 2, the 95% CI coverage was close to 95%.

Table 1. Simulation results of GSMM estimator without always-takers under different outcome risks and true unmeasured confounding relationships.
Outcome RiskδEstimated Log ORaBiasBias(%)Width of 95% CI% Coverage
  • Note: The sample size is 10,000; for outcome risks of 0.30, the true log OR equals 1.2528; and for outcome risk of 0.03, the true log OR equals 0.7246.

  • a

    OR: odds ratio.

  • §

    Based on the 1275 simulations instead of 2000 simulations because the R program stopped when the model didn't converge or the system was computationally singular.

0.3000−2.01.25440.00170.13390.278295.00
 −1.51.25400.00120.09820.286695.10
 −1.01.25450.00170.13750.298195.25
 −0.51.25580.00300.24010.312594.70
 0.01.25660.00380.30430.328095.05
 0.51.25660.00380.30550.342194.20
 1.01.25490.00210.16750.351894.65
 1.51.25550.00270.21560.356794.80
 2.01.25490.00210.17100.357094.95
0.0300−2.00.73550.01101.49040.632595.35
 −1.50.73770.01311.77770.660195.80
 −1.00.74140.01682.27060.703295.85
 −0.50.74490.02032.72790.768595.60
 0.00.74840.02393.18760.864496.05
 0.50.75840.03384.45721.009496.50
 1.00.76590.04135.39491.208795.80
 1.50.78280.05827.43561.496995.20
 2.0§0.81470.090111.0602.210795.21
Table 2. Simulation results of GSMM estimator with always-takers under different outcome risks and true unmeasured confounding relationships.
Outcome RiskδEstimated Log ORBiasBias (%)Width of 95% CI% Coverage
  1. Note: The sample size is 10,000; for outcome risks of 0.30, the true log OR equals 1.2528; and for outcome risk of 0.03, the true log OR equals 0.7246.

  2. a

    OR: odds ratio.

0.3−2.01.25560.00290.22900.317195.90
 −1.51.25550.00280.22100.319995.75
 −1.01.25520.00240.19240.324095.80
 −0.51.25450.00170.13620.329795.80
 0.01.25430.00150.12000.337295.75
 0.51.25300.00020.01840.345996.00
 1.01.25280.00000.00030.355196.75
 1.51.2513−0.0014−0.11290.363696.50
 2.01.2496−0.0032−0.25210.370696.85
0.03−2.00.75000.02543.38760.779497.40
 −1.50.75060.02603.46420.792897.45
 −1.00.75120.02663.54090.813697.05
 −0.50.75090.02633.50380.845397.60
 0.00.75380.02923.87310.895997.35
 0.50.75410.02953.91490.969397.45
 1.00.75380.02933.88071.074697.10
 1.50.75210.02763.66661.216897.25
 2.00.75110.02653.52981.412296.30

Table 3 compares bias, variance, and mean squared error of the 2SPS and 2SRI with the GSMM method. In these simulations the sample size is N = 3000, and the proportion of compliers (strength of the instrument) includes 0.3, 0.5, and 0.7. For the GSMM approach, the estimated bias ranged up to 2%. In contrast, the 2SPS and 2SRI are more biased, both ranging above 50%. For these two approaches, bias increased with increased confounding and a weaker instrument (fewer compliers).

Table 3. Comparing bias, variance, and MSE of the 2SRI, 2SPS, and GMSS approaches without always-takers.
Complianceδ 2SPS  2SRI  GSMM 
Bias%VarianceMSEBias%VarianceMSEBias%VarianceMSE
  1. Note: The sample size is 3000. The outcome risk of 0.30 with the true log OR equaling 1.2528.

0.3−253.91830.07510.5313−68.62140.13890.87791.21390.03480.0350
 −1.542.78600.07030.3576−36.32210.10740.31441.35440.03930.0396
 −129.84850.06320.2030−14.07620.08330.11441.45580.04440.0447
 −0.516.69290.05930.1030−2.49110.07020.07111.48460.05370.0540
 05.66600.05480.05980.44330.05940.05941.49810.06190.0623
 0.5−1.17770.05140.0516−1.18790.05300.05321.86880.06880.0694
 1−4.14400.05420.0568−3.87400.05440.05681.08220.07580.0759
 1.5−1.42740.05590.0562−0.95040.05610.05631.15600.07380.0740
 25.33010.06320.067610.78410.06600.08421.36210.07310.0733
0.5−225.16600.02460.1240−40.89720.04430.30680.48560.01680.0168
 −1.520.69690.02380.0910−22.99510.03650.11950.51330.01790.0179
 −115.02360.02280.0583−9.74520.03010.04500.48270.01920.0193
 −0.58.84350.02240.0347−1.99020.02660.02720.62570.02180.0218
 03.05730.02210.02350.28860.02400.02400.74060.02430.0244
 0.5−1.37460.02170.0220−1.19370.02240.02260.56540.02650.0265
 1−3.18850.02050.0221−2.99740.02060.02200.51850.02620.0262
 1.5−2.47630.02160.0226−2.13830.02180.02250.34260.02710.0271
 20.40390.02330.02334.04170.02450.02710.40640.02760.0276
0.7−210.84170.01270.0311−22.03550.02100.09720.44270.01090.0109
 −1.59.14380.01240.0255−12.68710.01750.04270.44730.01110.0111
 −16.87680.01230.0197−5.44690.01530.01990.45200.01160.0116
 −0.54.15890.01210.0148−1.08370.01380.01390.47950.01220.0122
 01.47230.01180.01210.29840.01260.01260.57290.01270.0128
 0.5−0.79340.01190.0120−0.55300.01220.01230.60710.01370.0137
 1−2.09830.01220.0129−1.97050.01230.01290.58870.01460.0146
 1.5−2.17660.01240.0131−1.95990.01240.01300.64780.01480.0149
 2−1.39570.01280.01310.74370.01330.01340.70220.01510.0152

Comparing the variance of the three approaches, we can see that the GSMM had the smallest and 2SRI the greatest variance. The GSMM also had the smallest MSE among the three approaches. The results are mixed when comparing the MSE between the 2SRI and 2SPS procedures.

Bezafibrate data analysis

Following the methods of Flory et al.,[18] we defined treatment as the initial fibrate treatment. We defined the IV as the prior prescription from the same practice as the patient. If a patient was the first one in the practice to be prescribed a fibrate, there was no IV defined for this patient, and so that patient was excluded. Using this subset of the data, we followed Flory et al. by performing analyses with and without adjusting for the following covariates: calendar year, age, sex, smoking status, body mass index, hypertension, history of myocardial infarction (MI), history of stroke, use of potentially protective drugs (angiotensin-converting enzyme inhibitors), and common potentially diabetogenic drugs (beta blockers, thiazide diuretics, corticosteroids).[15] The analyses were performed with different IV methods.

The assessment of the association between the IV and treatment is presented in Table 4. When the prior fibrate prescription from the same practice was bezafibrate, 79.4% of the patients actually had a bezafibrate prescription. In comparison, when the prior prescription from the same practice was a different fibrate, only 60.7% of patients had a bezafibrate prescription. The OR is 2.49 with 95% confidence interval (2.31–2.69), indicating a statistically significant association between the IV and treatment. On the other hand, the association of IV and the outcome was very weak, with odds ratio 0.97 (95% CI 0.76–1.26), and p value 0.8417.

Table 4. Association of the IV with the treatment and outcome.
 Treatment to bezafibrate (%)OR (95% CI)P-value
IV = Bezafibrate9127 (79.40)2.49 (2.31, 2.68)<0.0001
IV = Other fibrates2648 (60.76)  
 Number of patients with diabetes (%)OR (95% CI)P-value
IV = Bezafibrate216 (1.88)0.97 (0.76, 1.26)0.84
IV = Other fibrates84 (1.93)  

We now consider the association between bezafibrate and diabetes. First, the unadjusted OR of bezafibrate versus other fibrates on the outcome of diabetes in the subset in which an IV could be defined was 0.67 (95% CI 0.53–0.85), indicating a strong inverse association. In Table 5, we compare the different estimates of the effect of bezafibrate on diabetes, with and without adjustment of covariates. Under standard logistic regression (corresponding to the above OR based on actual counts), the OR (95% confidence interval) was 0.67 (0.53,0.85) with or without covariates (differences are at the 1000ths place to the right of the decimal). In contrast, the IV-based approaches yielded ORs that were not statistically different from one, which was due to both attenuated ORs and wider confidence intervals. Specifically, the 2SPS, 2SRI, and GSMM approaches yielded very similar estimates without covariates: 0.87 (0.22, 3.41), 0.90 (0.23, 3.53), and 0.82 (0.50,1.35), respectively. However, adjusting for covariates, the GSMM OR 1.50 (0.72, 3.10) differed substantially from the 2SPS and 2SRI ORs of 0.77 (0.19, 3.16) and 0.78 (0.19, 3.21), respectively. However, in all cases, the IV ORs were not statistically different from one. We note that a crucial assumption for the GSMM approach is that each of the separate association models relating outcome to treatment at each level of the IV is specified correctly. None of the covariates that were originally selected by Flory et al. and used in the analysis were associated with the outcome conditional on the level of the IV. Consequently, it is reasonable to use the GSMM estimate that does not adjust for these covariates. Finally, as shown in Table 5, the GSMM standard error was smaller than the standard errors of the 2SPS and 2SRI approaches, as reflected in the narrower confidence intervals of the GSMM ORs. This was consistent with our simulation results.

Table 5. Comparison of results of analysis of bezafibrate versus other fibrates with respect to risk of first diabetes outcome by different estimation approaches with and without covariatesa.
ModelCovariate(s)Treatment Effect Log ORStandard ErrorP-value for the treatment effect
  1. a

    Covariates include: calendar year, age, sex, smoking status, body mass index, hypertension, history of myocardial infarction, history of stroke, use of potentially protective drugs (angiotensin converting enzyme inhibitors), and common potentially diabetogenic drugs (beta blockers, thiazide diuretics, corticosteroids).

Standard Logistic RegressionNo Covariates−0.390.120.0010
 Covariatesa−0.400.120.0009
IV 2SPSNo Covariates−0.140.700.84
 Covariatesa−0.270.720.71
IV 2SRINo Covariates−0.110.700.89
 Covariatesa−0.250.720.73
IV GSMMNo Covariates−0.200.250.44
 Covariatesa0.400.370.28

DISCUSSION

Under the IV assumptions, the IV approach estimates a causal OR of treatment received (e.g., receiving a prescription for that treatment) among compliers (those who would take the treatment only under encouragement to do by the IV). A standard logistic regression adjusting for observed potential confounders shows a statistically significant inverse association between bezafibrate use and risk of diabetes. However, the different IV approaches yielded statistically non-significant associations due both to attenuated ORs and wider confidence intervals.

In spite of the consistency among the results of the IV approaches without adjusting for covariates, our simulations suggest that the GSMM approach is less biased and has narrower confidence intervals than the more standard IV two-stage approaches to the logistic regression context. Our implementation of the GSMM approach from its original randomized context (where the control group does not have access to treatment) to the current observational context (where patients with both levels of the IV have access to treatment) performed well in the simulations. We have more confidence in the results of the GSMM approach than in those of the two-stage 2SPS and 2SRI approaches, which have been shown to be biased.[3, 12]

One does have to be careful with the GSMM approach, given its reliance on the assumption of correct association models for the treatment-outcome association within each level of the IV. Another difficulty with the GSMM approach is that in simulations in which a large amount of unmeasured confounding biased the standard logistic estimator upwards, the estimation routine sometimes did not converge.

The IV estimates of the effect of bezafibrate prescriptions on the risk of diabetes, not adjusting for observed covariates, indicate that the statistically significant association identified by the standard approach may be due to unmeasured confounders, given that the different IV estimates were consistently attenuated. The differences between the IV estimates and the standard approach could also be due to the fact that they are estimating effects for different target populations: the standard approach is targeted at the whole population, while the IV estimate is targeted at the compliers. Subpopulations for which the instrument is stronger contain a larger proportion of compliers than the general population.[23] Table 6 shows the strength of instrument in various subpopulations, which indicates that the compliers are more likely to be high-risk patients with a history of MI or stroke.

Table 6. Strength of the instrument in subgroups defined by observed risk factors. The strength of the instrument is E[Z|R = 1, S = 1] − E[Z|R = 0, S = 1] where S = 1 denotes membership in the subgroup.
SubgroupE[Z|R = 1, S = 1]E[Z|R = 0, S = 1]Instrument Strength
Full Population.8072.6204.1868
Male Gender.8034.6528.1506
History of Myocardial Infarction.8495.5938.2557
History of Stroke.8136.5600.2536
History of Calcium Channel Blocker Use.7959.6413.1546
History of Thiazide Diuretics Use.8297.6004.2293
History of Corticosteroids Use.7902.6526.1376
History of Beta Blockers Use.7900.6106.1794
History of ACE Inhibitors Use.7546.5566.1980

Caution should also be exercised in using the IV analyses for inference in light of the assumptions that are needed for validity. Additionally, while IV analysis can control for unmeasured confounding, it is often has poor statistical precision, which leads to wide confidence intervals.

Future research should focus on assessing departures from the above IV assumptions. For continuous outcomes, assessments of the exclusion restriction assumption have been based on incorporating weights into the estimation of the linear model analog to the GSMM, where the weights incorporate baseline covariate-IV interactions that affect the probability of treatment given the IV.[24] Extensions of this approach to the GSMM are being investigated, assuming the IV is unrelated to unmeasured confounders of the treatment-outcome relationship. To assess the assumption that the IV is not related to unmeasured confounders of the treatment-outcome relationship, one can extend the sensitivity analysis strategy of Small for linear models to the logistic case.[25]

While our analysis was based on the logistic model, the original paper by Flory et al. used the Cox proportional hazards model. Unfortunately, use of the two-stage IV approaches for such a model is also biased for estimation of the LATE and TOT hazard ratios. Methods have been proposed to resolve such bias in the context of randomized trials that are subject to design constraints, such as when one group is not measured for the treatment-received variable. These methods are challenging to implement and have limitations with the corresponding estimation procedures.[26]

CONFLICT OF INTEREST

The authors declare no conflict of interest.

KEY POINTS

  • Instrumental variable methods provide a way of estimating the causal effect of treatment when there is unmeasured confounding but a valid instrumental variable can be found.
  • A simulation study shows that the generalized structural mean model estimator provides an approximately unbiased estimate of the causal odds ratio when there is a valid instrumental variable, but the two state logistic regression and two stage residual inclusion estimators are biased.
  • A SAS macro is provided for the generalized structural mean model estimator.

Ancillary