Tagging has been a buzz word in the information science field in recent years and the majority of research on tagging has emphasized its advantages and potentials. However, few pointed out that significant overlapping exists between tagging and other existing metadata fields such as title and description, which makes tagging lose its effectiveness. In this study, with a sample from YouTube.com, the significant overlaps are identified to demonstrate the ineffectiveness of tagging.
Tagging has been a buzz word in the information science field in recent years and the majority of research on tagging has emphasized its advantages and potentials, which include providing alternatives to indexers' inconsistency, allowing for flexibility of users' term choice particularly for newly-created ones (Matusiak, 2006), and facilitating social aspects of online communities by so-called “social tagging” (Furnas et al., 2006). A small number of studies have questioned the effectiveness of tagging, motivation of tagging, and the dominant “personal” not “social” aspect of tagging (Sen et al., 2007), along with the reaffirmation of the value of traditional controlled vocabulary (Macgregor and McCulloch, 2006). In terms of sloppiness of tagging, Guy and Tonkin found that 40 percent and 28 percent of tags were erroneous in Flickr and del.icio.us respectively. They also found 8 percent of Flickr tags and 11 percent of del.icio.us tags to be plural forms. (Guy and Tonkin, 2006)
However, few pointed out that significant overlapping exists between tagging and other existing metadata fields such as title and description, which makes tagging lose its effectiveness. In this study, with a sample from YouTube.com, the significant overlaps are identified to demonstrate the ineffectiveness of tagging.
Videos from YouTube.com, probably the most popular personal video sharing site currently, were used for this study. 337 of the “most popular” videos from the site were selected during December 2007 due to convenience. Although each video has various “metadata,” only its title, description, and “tags” were considered. For each video, the numbers of overlapping words between title and tags, between title and description, and between tags and description were manually counted. The counting is strictly word-by-word counting without any consideration of variations such as articles (a, an, and the), apostrophes, plurals (-s or -es, for example) and tenses (-ed, for example). Upon counting, several ratios were calculated: the percentage of words from the title used in the tags, the percentage of words from the tags used in the title, the percentage of words from the title used in the description, and the percentage of words from the tags used in the description.
Preliminary Data Analysis
The initial sample size was 337 videos. Table 1 shows the total number of words in their titles and tags, the total number of shared words between the titles and the tags, the total number of words from tags found in their descriptions, and the total number of words from titles found in their descriptions. On average, fewer than 5 words are used for a title and fewer than 9 words are used for a tag set.
|# of words in title||# of words in tags||# of shared words b/w title and tags||# of words from tags in description||# of words from title in description|
Table 2 shows the ratios based on the raw numbers in Table 1. It shows about 46% of the words from the titles are found in the tags literally and about 25% of the words from the tags are found in the titles literally. Similarly, about 52% of the words from the titles are found in the descriptions and about 27% of the words from the tags are found in the descriptions.
|Words from title used in tags||Words from tags used in title||Words from description used in tags||Words from description used in title|
In addition, there are quite a few videos whose title and tags are 100% overlapped. 77 videos out of 337 have all the words in their titles in their tags (the number of words in title / the number of words shared with tags = 1), while 33 videos out of 337 have all the words in their tags in their titles (the number of words in tags / the number of words shared with title = 1).
Significance of Study
This study reveals a significant redundancy of tagging against already established access points such as title and description with an empirical data set. Unlike the majority of current research on tagging which support tagging's potential, this study questions the effectiveness of it for theoretical and practical reasons.
Significant overlapping between the words used in tagging and title and between tagging and description were confirmed with the initial data analysis based on strict word-by-word comparison. It is believed that with more aggressive word counting, considering variations such as plurals and tenses, the overlapping percentage would be much higher. In further data analysis, highly refined counting will be conducted to confirm this prediction with a much larger data set.
In fact, tagging is not a new concept. Many journals, conference proceedings, and even dissertations have required keywords from authors to improve their information retrieval performances in the database for years. Unfortunately, their efforts in the concept of keywords do not seem successful so far and the “new” concept of creators tagging their work does not seem to show much improvement either.