Volume 36, Issue 16
Research Article

A scaling approach to record linkage

Harvey Goldstein

Corresponding Author

E-mail address: h.goldstein@bristol.ac.uk

University of Bristol, Bristol, U.K.

University College London, London, U.K.

Correspondence to: Professor Harvey Goldstein, University of Bristol Graduate School of Education, Bristol BS8 1JA, U.K.

E‐mail: h.goldstein@bristol.ac.uk

Search for more papers by this author
Katie Harron

London School of Hygiene and Tropical Medicine, London, U.K.

Search for more papers by this author
Mario Cortina‐Borja

University College London, London, U.K.

Search for more papers by this author
First published: 16 March 2017
Citations: 6

Abstract

With increasing availability of large datasets derived from administrative and other sources, there is an increasing demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality of identifiers used to carry out linkage means that existing approaches are often based upon ‘probabilistic’ models, which are based on a number of assumptions, and can make heavy computational demands. In this paper, we suggest a new approach to classifying record pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal. Copyright © 2017 John Wiley & Sons, Ltd.

Number of times cited according to CrossRef: 6

  • Assessing data linkage quality in cohort studies, Annals of Human Biology, 10.1080/03014460.2020.1742379, 47, 2, (218-226), (2020).
  • Overview of Data Linkage Methods for Integrating Separate Health Data Sources, Data Science for Healthcare, 10.1007/978-3-030-05249-2, (217-238), (2019).
  • A weighting approach to making inference with probabilistically linked data, Statistica Neerlandica, 10.1111/stan.12172, 73, 3, (333-350), (2019).
  • Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health, American Journal of Epidemiology, 10.1093/aje/kwy292, (2019).
  • On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort, IEEE Journal of Biomedical and Health Informatics, 10.1109/JBHI.2018.2796941, 22, 2, (346-353), (2018).
  • A guide to evaluating linkage quality for the analysis of linked data, International Journal of Epidemiology, 10.1093/ije/dyx177, 46, 5, (1699-1710), (2017).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.