Research Article
Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment
Article first published online: 17 AUG 2006
DOI: 10.1002/asi.20335
Copyright © 2006 Wiley Periodicals, Inc., A Wiley Company
Issue

Journal of the American Society for Information Science and Technology
Volume 57, Issue 12, pages 1616–1628, October 2006
Additional Information
How to Cite
Leydesdorff, L. and Vaughan, L. (2006), Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment. J. Am. Soc. Inf. Sci., 57: 1616–1628. doi: 10.1002/asi.20335
Publication History
- Issue published online: 14 SEP 2006
- Article first published online: 17 AUG 2006
- Manuscript Revised: 8 OCT 2005
- Manuscript Accepted: 8 OCT 2005
- Manuscript Received: 11 JUL 2005
- Abstract
- Article
- References
- Cited By
Abstract
Co-occurrence matrices, such as cocitation, coword, and colink matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of these data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This article discusses the difference between a symmetrical cocitation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (such as the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical cocitation matrix but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment, in which the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed by using both the traditional methods of multivariate analysis and the new visualization software Pajek, which is based on social network analysis and graph theory.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)