Author name disambiguation: What difference does it make in author-based citation analysis?



In this article, we explore how strongly author name disambiguation (AND) affects the results of an author-based citation analysis study, and identify conditions under which the traditional simplified approach of using surnames and first initials may suffice in practice. We compare author citation ranking and cocitation mapping results in the stem cell research field from 2004 to 2009 using two AND approaches: the traditional simplified approach of using author surname and first initial and a sophisticated algorithmic approach. We find that the traditional approach leads to extremely distorted rankings and substantially distorted mappings of authors in this field when based on first- or all-author citation counting, whereas last-author-based citation ranking and cocitation mapping both appear relatively immune to the author name ambiguity problem. This is largely because Romanized names of Chinese and Korean authors, who are very active in this field, are extremely ambiguous, but few of these researchers consistently publish as last authors in bylines. We conclude that a more earnest effort is required to deal with the author name ambiguity problem in both citation analysis and information retrieval, especially given the current trend toward globalization. In the stem cell research field, in which laboratory heads are traditionally listed as last authors in bylines, last-author-based citation ranking and cocitation mapping using the traditional approach to author name disambiguation may serve as a simple workaround, but likely at the price of largely filtering out Chinese and Korean contributions to the field as well as important contributions by young researchers.