Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?



In the past several years studies have started to appear comparing the accuracies of various science mapping approaches. These studies primarily compare the cluster solutions resulting from different similarity approaches, and give varying results. In this study we compare the accuracies of cluster solutions of a large corpus of 2,153,769 recent articles from the biomedical literature (2004–2008) using four similarity approaches: co-citation analysis, bibliographic coupling, direct citation, and a bibliographic coupling-based citation-text hybrid approach. Each of the four approaches can be considered a way to represent the research front in biomedicine, and each is able to successfully cluster over 92% of the corpus. Accuracies are compared using two metrics—within-cluster textual coherence as defined by the Jensen-Shannon divergence, and a concentration measure based on the grant-to-article linkages indexed in MEDLINE. Of the three pure citation-based approaches, bibliographic coupling slightly outperforms co-citation analysis using both accuracy measures; direct citation is the least accurate mapping approach by far. The hybrid approach improves upon the bibliographic coupling results in all respects. We consider the results of this study to be robust given the very large size of the corpus, and the specificity of the accuracy measures used.