Special Issue Paper
Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables

Article first published online: 16 JUL 2012
DOI: 10.1002/sim.5498
Copyright © 2012 John Wiley & Sons, Ltd.
Issue

Statistics in Medicine
Special Issue: Papers from the 32nd Annual Conference of the International Society for Clinical Biostatistics
Volume 31, Issue 30, pages 4231–4242, 30 December 2012
Additional Information
How to Cite
Hof, M. H. P. and Zwinderman, A. H. (2012), Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables. Statist. Med., 31: 4231–4242. doi: 10.1002/sim.5498
Publication History
- Issue published online: 10 DEC 2012
- Article first published online: 16 JUL 2012
- Manuscript Accepted: 30 MAY 2012
- Manuscript Revised: 22 MAY 2012
- Manuscript Received: 3 NOV 2011
- Abstract
- Article
- References
- Cited By
Keywords:
- record linkage;
- matching error;
- regression analysis
In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, it is necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator. Copyright © 2012 John Wiley & Sons, Ltd.

1097-0258/asset/SIM_left.gif?v=1&s=1b631772c3897aa95941da3609d901cd1d389e83)
1097-0258/asset/olbannerright.gif?v=1&s=6d257623b3308a7485294c87b3b5e1e665484099)