Get access

An Algorithm for the Use of Medicare Claims Data to Identify Women with Incident Breast Cancer


  • Ann B. Nattinger,

    Search for more papers by this author
    • Address correspondence to Ann B. Nattinger, M.D., M.P.H., Division of General Internal Medicine, Medical College of Wisconsin, FEC Medical Office Building, Suite 4200, 9200 W. Wisconsin Avenue, Milwaukee, WI 53226. Purushottam W. Laud, Ph.D., is with the Department of Biostatistics, Medical College of Wisconsin, Milwaukee; Ruta Bajorunaite, Ph.D., is with the Department of Mathematics, Statistics, and Computer Science, Marquette University, Milwaukee; Rodney A. Sparapani, M. S., is with the Center for Patient Care and Outcomes Research, Medical College of Wisconsin; and Jean L. Freeman, Ph.D., is with the Departments of Medicine and Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston.

  • Purushottam W. Laud,

  • Ruta Bajorunaite,

  • Rodney A. Sparapani,

  • Jean L. Freeman

  • Grant support from the Department of the Army (DAMD17-96-6262).

  • This study used the linked SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. The authors acknowledge the efforts of the Applied Research Program, NCI; the Office of Research, Development and Information, CMS; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database.


Objective. To develop and validate a clinically informed algorithm that uses solely Medicare claims to identify, with a high positive predictive value, incident breast cancer cases.

Data Source. Population-based Surveillance, Epidemiology, and End Results (SEER) Tumor Registry data linked to Medicare claims, and Medicare claims from a 5 percent random sample of beneficiaries in SEER areas.

Study Design. An algorithm was developed using claims from 1995 breast cancer patients from the SEER-Medicare database, as well as 1995 claims from Medicare control subjects. The algorithm was validated on claims from breast cancer subjects and controls from 1994. The algorithm development process used both clinical insight and logistic regression methods.

Data Extraction. Training set: Claims from 7,700 SEER-Medicare breast cancer subjects diagnosed in 1995, and 124,884 controls. Validation set: Claims from 7,607 SEER-Medicare breast cancer subjects diagnosed in 1994, and 120,317 controls.

Principal Findings. A four-step prediction algorithm was developed and validated. It has a positive predictive value of 89 to 93 percent, and a sensitivity of 80 percent for identifying incident breast cancer. The sensitivity is 82–87 percent for stage I or II, and lower for other stages. The sensitivity is 82–83 percent for women who underwent either breast-conserving surgery or mastectomy, and is similar across geographic sites. A cohort identified with this algorithm will have 89–93 percent incident breast cancer cases, 1.5–6 percent cancer-free cases, and 4–5 percent prevalent breast cancer cases.

Conclusions. This algorithm has better performance characteristics than previously proposed algorithms. The ability to examine national patterns of breast cancer care using Medicare claims data would open new avenues for the assessment of quality of care.