SEARCH

SEARCH BY CITATION

Abstract

With increasing digital information availability, semantic web technologies have been employed to construct semantic digital libraries in order to ease information comprehension. The use of semantic web enables users to search or visualize resources in a semantic fashion. Semantic web generation is a key process in semantic digital library construction, which converts metadata of digital resources into semantic web data. Many text mining technologies, such as keyword extraction and clustering, have been proposed to generate semantic web data. However, one important type of metadata in publications, called affiliation, is hard to convert into semantic web data precisely because different authors, who have the same affiliation, often express the affiliation in different ways. To address this issue, this paper proposes a clustering method based on normalized compression distance for the purpose of affiliation disambiguation. The experimental results show that our method is able to identify different affiliations that denote the same institutes. The clustering results outperform the well-known k-means clustering method in terms of average precision, F-measure, entropy, and purity.