SEARCH

SEARCH BY CITATION

Keywords:

  • string comparators;
  • unsupervised learning;
  • error rates

Abstract

This article describes methods for matching duplicates within or across files using non-unique identifiers such as first name, last name, date of birth, address, and other characteristics. Copyright © 2010 John Wiley & Sons, Inc.

For further resources related to this article, please visit the WIREs website.

This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of either the U.S. Department of Housing and Urban Development or the U.S. Census Bureau.