Data validity issues in using claims data


  • Brian L. Strom MD, MPH

    Corresponding author
    1. Center for Clinical Epidemiology and Biostatistics, Department of Biostatistics and Epidemiology, and Division of General Internal Medicine of the Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
    • 824 Blockley Hall, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6021, USA.
    Search for more papers by this author


This paper overviews the use of claims data in pharmacoepidemiology, examines problems related to claims data use, and focuses on the uncertain validity of diagnosis data. Two contrasting studies are provided of drug-induced neutropenia and Stevens-Johnson Syndrome; both studies were launched at the same time with similar designs. Neutropenia is a laboratory-driven diagnosis, easy to make and confirm. The neutropenia study yielded many useful results, ranging from incidence rates to results with specific drug classes and individual drugs. However, the medical records revealed major unexpected issues from chronic and cyclic neutropenia. In contrast, Stevens-Johnson Syndrome is harder to diagnose, and is represented poorly in the ICD-9-CM coding system. The result was a study productive of much less clinical information. These studies show the important implications of variable data validity to study interpretation. Uniquely problematic situations exist: the illness does not reliably come to medical attention; inpatient drug exposures; an outcome is poorly defined by the diagnostic coding system; descriptive studies; drug effects are delayed and patients lose eligibility; and there are important unknown confounders such as cigarette smoking, occupation, menarche, menopause, etc., about which information cannot be obtained without accessing the patient. Copyright © 2001 John Wiley & Sons, Ltd.