Get access
Advertisement

Fine-grained record integration and linkage tool

Authors

  • Pawel Jurczyk,

    Corresponding author
    1. National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
    2. Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee
    3. Emory University, Mathematics and Computer Science, Atlanta, Georgia
    • Mathematics & Computer Science, Mail Stop: 1131-002-1AC, Emory University, Atlanta, GA 30322
    Search for more papers by this author
  • James J. Lu,

    1. Emory University, Mathematics and Computer Science, Atlanta, Georgia
    Search for more papers by this author
  • Li Xiong,

    1. Emory University, Mathematics and Computer Science, Atlanta, Georgia
    Search for more papers by this author
  • Janet D. Cragan,

    1. National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
    Search for more papers by this author
  • Adolfo Correa

    1. National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
    Search for more papers by this author

  • Presented in part at the American Medical Informatics Association 2008 Annual Symposium, November 8–12, 2008, Washington, DC.

  • The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Abstract

BACKGROUND: As part of the surveillance program to monitor the occurrence of birth defects in the metropolitan Atlanta area, we developed a record linkage software tool that provides latitude in the choice of linkage parameters, allows for efficient and accurate linkages, and enables objective assessments of the quality of the linked data. METHODS: We developed and implemented a Java-based fine-grained probabilistic record integration and linkage tool (FRIL) that incorporates a rich collection of record distance metrics, search methods, and analysis tools. Along its workflow, FRIL provides a rich set of user-tunable parameters augmented with graphic visualization tools to assist users in understanding the effects of parameter choices. We used this software tool to link data from vital records (n = 1.25 million) with birth defects surveillance records (n = 12,700) from the metropolitan Atlanta Congenital Defects Program (MACDP) for the birth years 1967–2006. RESULTS: Compared with the data linkage performed by conventional algorithms, the data linkage of birth certificates with birth defect records in MACDP using FRIL was more efficient. The linkage based on FRIL was also accurate, showing 99% precision and 95% recall. Based on positive user feedback, new features continue to be developed, and the tool is being adopted in several other data linkage projects in MACDP. CONCLUSIONS: A software tool that allows significant user interaction and control, such as FRIL, can provide accurate data linkages for birth defect surveillance programs and allows an objective assessment of the quality of linked data. Birth Defects Research (Part A), 2008. © 2008 Wiley-Liss, Inc.

Get access to the full text of this article

Ancillary