Analyzing partially missing confounder information in comparative effectiveness and safety research of therapeutics
Article first published online: 3 MAY 2012
Copyright © 2012 John Wiley & Sons, Ltd.
Pharmacoepidemiology and Drug Safety
Supplement: Methods for Developing and Analyzing Clinically Rich Data for Patient-Centered Outcomes Research
Volume 21, Issue Supplement S2, pages 13–20, May 2012
How to Cite
Toh, S., García Rodríguez, L. A. and Hernán, M. A. (2012), Analyzing partially missing confounder information in comparative effectiveness and safety research of therapeutics. Pharmacoepidem. Drug Safe., 21: 13–20. doi: 10.1002/pds.3248
- Issue published online: 3 MAY 2012
- Article first published online: 3 MAY 2012
- Manuscript Revised: 6 FEB 2012
- Manuscript Accepted: 6 FEB 2012
- Manuscript Received: 8 AUG 2011
- comparative effectiveness research;
- missing data;
Electronic healthcare databases are commonly used in comparative effectiveness and safety research of therapeutics. Many databases now include additional confounder information in a subset of the study population through data linkage or data collection. We described and compared existing methods for analyzing such datasets.
Using data from The Health Improvement Network and the relation between non-steroidal anti-inflammatory drugs and upper gastrointestinal bleeding as an example, we employed several methods to handle partially missing confounder information.
The crude odds ratio (OR) of upper gastrointestinal bleeding was 1.50 (95% confidence interval: 0.98, 2.28) among selective cyclo-oxygenase-2 inhibitor initiators (n = 43 569) compared with traditional non-steroidal anti-inflammatory drug initiators (n = 411 616). The OR dropped to 0.81 (0.52, 1.27) upon adjustment for confounders recorded for all patients. When further considering three additional variables missing in 22% of the study population (smoking, alcohol consumption, body mass index), the OR was between 0.80 and 0.83 for the missing-category approach, the missing-indicator approach, single imputation by the most common category, multiple imputation by chained equations, and propensity score calibration. The OR was 0.65 (0.39, 1.09) and 0.67 (0.38, 1.16) for the unweighted and the inverse probability weighted complete-case analysis, respectively.
Existing methods for handling partially missing confounder data require different assumptions and may produce different results. The unweighted complete-case analysis, the missing-category/indicator approach, and single imputation require often unrealistic assumptions and should be avoided. In this study, differences across methods were not substantial, likely due to relatively low proportion of missingness and weak confounding effect by the three additional variables upon adjustment for other variables. Copyright © 2012 John Wiley & Sons, Ltd.