Identification of metastatic cancer in claims data
Article first published online: 3 MAY 2012
Copyright © 2012 John Wiley & Sons, Ltd.
Pharmacoepidemiology and Drug Safety
Supplement: Methods for Developing and Analyzing Clinically Rich Data for Patient-Centered Outcomes Research
Volume 21, Issue Supplement S2, pages 21–28, May 2012
How to Cite
Nordstrom, B. L., Whyte, J. L., Stolar, M., Mercaldi, C. and Kallich, J. D. (2012), Identification of metastatic cancer in claims data. Pharmacoepidem. Drug Safe., 21: 21–28. doi: 10.1002/pds.3247
- Issue published online: 3 MAY 2012
- Article first published online: 3 MAY 2012
- Manuscript Accepted: 3 FEB 2012
- Manuscript Revised: 2 FEB 2012
- Manuscript Received: 1 AUG 2011
- claims data;
- electronic medical record data;
- classification and regression trees
To develop algorithms to identify metastatic cancer in claims data, using tumor stage from an oncology electronic medical record (EMR) data warehouse as the gold standard.
Data from an outpatient oncology EMR database were linked to medical and pharmacy claims data. Patients diagnosed with breast, lung, colorectal, or prostate cancer with a stage recorded in the EMR between 2004 and 2010 and with medical claims available were eligible for the study. Separate algorithms were developed for each tumor type using variables from the claims, including diagnoses, procedures, drugs, and oncologist visits. Candidate variables were reviewed by two oncologists. For each tumor type, the selected variables were entered into a classification and regression tree model to determine the algorithm with the best combination of positive predictive value (PPV), sensitivity, and specificity.
A total of 1385 breast cancer, 1036 lung, 727 colorectal, and 267 prostate cancer patients qualified for the analysis. The algorithms varied by tumor type but typically included International Classification of Diseases-Ninth Revision codes for secondary neoplasms and use of chemotherapy and other agents typically given for metastatic disease. The final models had PPV ranging from 0.75 to 0.86, specificity 0.75–0.97, and sensitivity 0.60–0.81.
While most of these algorithms for metastatic cancer had good specificity and acceptable PPV, a tradeoff with sensitivity prevented any model from having good predictive ability on all measures. Results suggest that accurate ascertainment of metastatic status may require access to medical records or other confirmatory data sources. Copyright © 2012 John Wiley & Sons, Ltd.