At the time of this work, Dr. Hartzema was on sabbatical at the U.S. Food and Drug Administration.
Special Issue Paper
Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership†
Article first published online: 27 SEP 2012
Copyright © 2012 John Wiley & Sons, Ltd.
Statistics in Medicine
Special Issue: Papers from the 32nd Annual Conference of the International Society for Clinical Biostatistics
Volume 31, Issue 30, pages 4401–4415, 30 December 2012
How to Cite
Ryan, P. B., Madigan, D., Stang, P. E., Marc Overhage, J., Racoosin, J. A. and Hartzema, A. G. (2012), Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Statist. Med., 31: 4401–4415. doi: 10.1002/sim.5620
This article expresses the views of the authors and does not necessarily represent those of their affiliated organizations.
- Issue published online: 10 DEC 2012
- Article first published online: 27 SEP 2012
- Manuscript Accepted: 28 AUG 2012
- Manuscript Received: 4 NOV 2011
- product surveillance, postmarketing;
- epidemiologic methods;
- electronic health records;
- adverse drug reactions
Background: Expanded availability of observational healthcare data (both administrative claims and electronic health records) has prompted the development of statistical methods for identifying adverse events associated with medical products, but the operating characteristics of these methods when applied to the real-world data are unknown.
Methods: We studied the performance of eight analytic methods for estimating of the strength of association-relative risk (RR) and associated standard error of 53 drug–adverse event outcome pairs, both positive and negative controls. The methods were applied to a network of ten observational healthcare databases, comprising over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of methods at RR thresholds achieving statistical significance of p < 0.05 or p < 0.001 and with absolute threshold RR > 1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC).
Results: Although no specific method demonstrated superior performance, the aggregate results provide a benchmark and baseline expectation for risk identification method performance. At traditional levels of statistical significance (RR > 1, p < 0.05), all methods have a false positive rate >18%, with positive predictive value <38%. The best predictive model, high-dimensional propensity score, achieved an AUC = 0.77. At 50% sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods ranged from 9% to 33%.
Conclusions: Systematic processes for risk identification can provide useful information to supplement an overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying false positive associations. Copyright © 2012 John Wiley & Sons, Ltd.