A scaling approach to record linkage
Abstract
With increasing availability of large datasets derived from administrative and other sources, there is an increasing demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality of identifiers used to carry out linkage means that existing approaches are often based upon ‘probabilistic’ models, which are based on a number of assumptions, and can make heavy computational demands. In this paper, we suggest a new approach to classifying record pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal. Copyright © 2017 John Wiley & Sons, Ltd.
Citing Literature
Number of times cited according to CrossRef: 6
- Katie Harron, James C. Doidge, Harvey Goldstein, Assessing data linkage quality in cohort studies, Annals of Human Biology, 10.1080/03014460.2020.1742379, 47, 2, (218-226), (2020).
- Ana Kostadinovska, Muhammad Asim, Daniel Pletea, Steffen Pauws, Overview of Data Linkage Methods for Integrating Separate Health Data Sources, Data Science for Healthcare, 10.1007/978-3-030-05249-2, (217-238), (2019).
- James Chipperfield, A weighting approach to making inference with probabilistically linked data, Statistica Neerlandica, 10.1111/stan.12172, 73, 3, (333-350), (2019).
- Susan M Shortreed, Andrea J Cook, R Yates Coley, Jennifer F Bobb, Jennifer C Nelson, Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health, American Journal of Epidemiology, 10.1093/aje/kwy292, (2019).
- Robespierre Pita, Clicia Pinto, Samila Sena, Rosemeire Fiaccone, Leila Amorim, Sandra Reis, Mauricio L. Barreto, Spiros Denaxas, Marcos Ennes Barreto, On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort, IEEE Journal of Biomedical and Health Informatics, 10.1109/JBHI.2018.2796941, 22, 2, (346-353), (2018).
- Katie L Harron, James C Doidge, Hannah E Knight, Ruth E Gilbert, Harvey Goldstein, David A Cromwell, Jan H van der Meulen, A guide to evaluating linkage quality for the analysis of linked data, International Journal of Epidemiology, 10.1093/ije/dyx177, 46, 5, (1699-1710), (2017).




