Get access

Improving the accuracy of co-citation clustering using full text



Historically, co-citation models have been based only on bibliographic information. Full-text analysis offers the opportunity to significantly improve the quality of the signals upon which these co-citation models are based. In this work we study the effect of reference proximity on the accuracy of co-citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the results of traditional co-citation clustering using only the bibliographic information to results from co-citation clustering where proximity between reference pairs is factored into the pairwise relationships. We find that accounting for reference proximity from full text can increase the textual coherence (a measure of accuracy) of a co-citation cluster solution by up to 30% over the traditional approach based on bibliographic information.

Get access to the full text of this article