Background: Expanded availability of observational healthcare data (both administrative claims and electronic health records) has prompted the development of statistical methods for identifying adverse events associated with medical products, but the operating characteristics of these methods when applied to the real-world data are unknown.
Methods: We studied the performance of eight analytic methods for estimating of the strength of association-relative risk (RR) and associated standard error of 53 drug–adverse event outcome pairs, both positive and negative controls. The methods were applied to a network of ten observational healthcare databases, comprising over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of methods at RR thresholds achieving statistical significance of p < 0.05 or p < 0.001 and with absolute threshold RR > 1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC).
Results: Although no specific method demonstrated superior performance, the aggregate results provide a benchmark and baseline expectation for risk identification method performance. At traditional levels of statistical significance (RR > 1, p < 0.05), all methods have a false positive rate >18%, with positive predictive value <38%. The best predictive model, high-dimensional propensity score, achieved an AUC = 0.77. At 50% sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods ranged from 9% to 33%.
Conclusions: Systematic processes for risk identification can provide useful information to supplement an overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying false positive associations. Copyright © 2012 John Wiley & Sons, Ltd.