Semantic video search using tagsonomies



Taxonomy can be defined as a controlled vocabulary that establishes hierarchical or associative relationships between terms. One problem of taxonomy is that it depends too much on the human element; a library can acquire titles on any subject, but it is unreasonable to expect indexers to be experts in every field of knowledge (Steele, 2009). Another problem is a timeliness problem inherent in taxonomy; taxonomy needs to be agreed upon and codified into a classification prior to use by the indexers (Peters, 2009). This can lead to a situation where classification structures are not compatible with current knowledge.

Recently, users have started to use social software to store and tag documents with their own tags to make them retrievable. The collection of tags used within one platform is called folksonomy. Transforming the creation of explicit metadata for resources from a professional activity into a shared, communicative activity by users is an important development that should be explored and considered for future systems development (Mathes, 2004).

Folksonomy tags offer some advantages; they are easy to handle and take into account the users' own vocabulary. Yi (2009) attempted to assess the indexing value of social tags in a context of an information-retrieval model using the Latent Semantic Indexing method. His study result showed the potential of using social tags as indexing terms for the DDC-based classification of tagged resources. Further, Geisler and Burns (2007) stated that YouTube tags provide real added value especially for users searching, because 66% of them do not appear in the other metadata. Morrison (2008) compared search performance of folksonomies in information retrieval from social bookmarking sites with that of search engines and subject directories. The results of this study showed that search engines had the highest precision and recall rates; however, folksonomies fared surprisingly well.

Folksonomy tags suffer from some problems. The folksonomy-based approach lists tags without indicating relationships in flat name spaces, unlike the taxonomy-based approach, which displays words indicating relationships between them. Thus, folksonomies do not include any vocabulary control; synonyms are not bound together, and homonyms are not distinguished, which leads to a decrease in their retrieval effectiveness. The hype about folksonomies being a better method of information retrieval has by now given way to the realization that without a structure, they are not so powerful as previously assumed (Peters, 2009).

Many ideas are emerging on how to structure folksonomies with semantic information obtained from taxonomies, without sacrificing their features. Kolbitsch (2007) proposed WordFlickr, based on the use of WordNet for expanding Flickr queries. An informal experiment compared search results from the prototype implementation of WordFlickr with results from Flickr. However, this study did not formally verify that WordFlickr was superior to Flickr in terms of retrieval efficiency.

This study investigates the value of folksonomy tags for indexing videos and the feasibility of tag structure. Furthermore it also explores how effective is tag control through query expansion (tag gardening) in searching videos. To do so, we designed a structured folksonomy-based system (hereafter, a tagsonomy-based system) in which queries can be expanded through tag control; equivalent, synonymous, or related tags are bound together, in order to improve the retrieval effectiveness (recall and precision) of videos. Then, we evaluate the proposed system by comparing it to a tag-based system without tag control, in terms of recall and precision rates.


Design of tagsonomy- and tag-based systems

We designed tagsonomy- and tag-based systems. As sample data, we selected 300 videos with three or more tags from the YouTube site, because tags might be very useful for improving access to videos with limited textual metadata. The tagsonomy-based system enables users to expand their queries with synonymous or related tags. For this system, we created three word files (word-form, synonym, and related word files) for query expansion. The word-form file was created to summarize different forms of compounding, singular and plural forms, abbreviations and full-names, and multiple languages based on the use of Wikipedia or dictionaries. Then, the synonymous tag file was constructed to link tags via synonymous relationships based on the use of WordNet. Lastly, the related tag file was formed to link tags via syntagmatic relations based on the use of Flickr's related tags, which are generated based on co-occurrence analysis. Second, we designed a tag-based system without tag control.

We developed two HTML interface for the two systems. In the tagsonomy-based system, a user can submit a query to the system; the user can directly input a query or click the “alphabetical tag file” or “category tag file” and select tags from the list. Then, the query is automatically expanded using the word-form file. Next, the user can choose what type of relations (synonymous or/and related tags) should be used for expanding the query (see Figure 1).

Figure 1.

Tagsonomy-based Search Interface

Participants and Questionnaire

We recruited 58 participants from Myongji University. All of them were undergraduate students majoring in library and information science. We divided the participants into two groups (A and B) in order to remove gender gap or major (or grade) difference. We constructed nine search queries.

Experiment Procedures

After explaining the search function and how to search the two systems, we asked the 29 Group A participants and the 29 Group B participants to search the tagsonomy- and tag-based systems, respectively, for answers to the nine queries. The experiment was conducted at a university computer lab. We allowed the 58 participants to answer the nine queries within 50 minutes. After the test, we calculated the recall and precision rates for 522 cases (58 participants of two groups times 9 queries).

Test Results

We used a t-test to compare the retrieval performance of the tagsonomy-based system to that of tag–based system. As a result, the recall mean (0.63) of the tagsonomy-based system was statistically higher that that (0.53) of the tag-based system. However, the precision mean (0.75) of the tagsonomy-based system was not statistically higher that that (0.74) of the tag-based system.1

Table 1. Results of t-test
System/MeasureMean and S.D. of Recall and Precision Ratest-values (p-values)
 Tagsonomy SystemTag SystemTagsonomy vs. Tag
Recall0.63 (0.39)0.53 (0.36)2.71 (0.00)
Precision0.75 (0.33)0.74 (0.33)0.23 (0.60)


Our findings of the research questions are as follows. First, video tags are similar to indexer-assigned terms and useful for describing the content of a video. Second, there is a problem with the tag structure via query expansion due to the relatively low matching proportions between YouTube tags and WordNet terms. This study also investigated the effectiveness of tag structure through query expansion when searching and browsing videos through experiments. The experimental results showed that the tagsonomy-based system can improve recall rates, but precision rates. There are several ways to improve precision rates. We suggest that providing video's multimedia surrogates along with textual surrogates might be one solution for users to quickly derive the gist of videos, thus improving precision rates.