Finding subject terms for classificatory metadata from user-generated social tags



With the increasing popularity of social tagging systems, the potential for using social tags as a source of metadata is being explored. Social tagging systems can simplify the involvement of a large number of users and improve the metadata-generation process. Current research is exploring social tagging systems as a mechanism to allow nonprofessional catalogers to participate in metadata generation. Because social tags are not from controlled vocabularies, there are issues that have to be addressed in finding quality terms to represent the content of a resource. This research explores ways to obtain a set of tags representing the resource from the tags provided by users. Two metrics are introduced. Annotation Dominance (AD) is a measure of the extent to which a tag term is agreed to by users. Cross Resources Annotation Discrimination (CRAD) is a measure of a tag's potential to classify a collection. It is designed to remove tags that are used too broadly or narrowly. Using the proposed measurements, the research selects important tags (meta-terms) and removes meaningless ones (tag noise) from the tags provided by users. To evaluate the proposed approach to find classificatory metadata candidates, we rely on expert users' relevance judgments comparing suggested tag terms and expert metadata terms. The results suggest that processing of user tags using the two measurements successfully identifies the terms that represent the topic categories of web resource content. The suggested tag terms can be further examined in various usages as semantic metadata for the resources.