Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient
Article first published online: 25 FEB 2003
Copyright © 2003 Wiley Periodicals, Inc.
Journal of the American Society for Information Science and Technology
Volume 54, Issue 6, pages 550–560, April 2003
How to Cite
Ahlgren, P., Jarneving, B. and Rousseau, R. (2003), Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient. J. Am. Soc. Inf. Sci., 54: 550–560. doi: 10.1002/asi.10242
- Issue published online: 19 MAR 2003
- Article first published online: 25 FEB 2003
- Manuscript Accepted: 8 NOV 2002
- Manuscript Revised: 1 OCT 2002
- Manuscript Received: 7 FEB 2002
Author cocitation analysis (ACA), a special type of cocitation analysis, was introduced by White and Griffith in 1981. This technique is used to analyze the intellectual structure of a given scientific field. In 1990, McCain published a technical overview that has been largely adopted as a standard. Here, McCain notes that Pearson's correlation coefficient (Pearson's r) is often used as a similarity measure in ACA and presents some advantages of its use. The present article criticizes the use of Pearson's r in ACA and sets forth two natural requirements that a similarity measure applied in ACA should satisfy. It is shown that Pearson's r does not satisfy these requirements. Real and hypothetical data are used in order to obtain counterexamples to both requirements. It is concluded that Pearson's r is probably not an optimal choice of a similarity measure in ACA. Still, further empirical research is needed to show if, and in that case to what extent, the use of similarity measures in ACA that fulfill these requirements would lead to objectively better results in full-scale studies. Further, problems related to incomplete cocitation matrices are discussed.