Categorizing music mood in social context



Music mood is an emerging metadata type of music, but there are no well-accepted mood categories. This research proposes a new method to categorize music mood in the social context of music listening. This method combines the strength of social tags, linguistic resources and human experts. Preliminary results show that the proposed method is promising in identifying mood categories that better reflect users' actual music information behaviors.

The Challenge

Research on music information behavior has identified music mood or the affective aspects of music as an important criterion in music information seeking and organization (e.g., Cunningham et al., 2004). However, music mood, due to its subjectivity, has been far from well studied in information science. First and foremost, there are no standard music mood categories. Experiments on classifying music according to mood have used different categories, making it difficult to compare across studies (Hu and Downie, 2007). Among the few existing online music services that provide mood labels (e.g.,, some of the terms used to describe the mood categories are rarely used in daily life and their application can be highly idiosyncratic. Over the years, music psychologists have proposed a number of music mood models, but these models were criticized as being developed in pure laboratory settings and thus lacking the social context of music listening (Juslin and Laukka, 2004). Therefore, it remains a challenge to identify mood categories that reflect the context of real-life music listening and users' perspectives. Such categories would enhance the research on music mood classification techniques and the development of music repositories that support organizing and accessing music by mood.

Can Social Tags Help?

With the birth of Web 2.0, the general public can now post text tags on music pieces and share these tags with others. The accumulated user tags can yield so called “collective wisdom” that can augment values of music itself and create the social context of music seeking and listening. Specifically, there are two major advantages of social tags. First, social tags are assigned by real music users in real-life music listening environments, thus they represent the context of real-life music information behavior better than labels assigned by human assessors in laboratory settings. Second, social tags available online are in a large quantity incomparable to data collected in any human evaluation experiments, providing a much richer resource of discovering users' perspectives.

The advantages have attracted researchers to exploit social tags in categorizing music mood, yet a previous study yielded an oversimplified set of only 3 mood categories (Hu et al. (2007). This can be attributed to the following shortcomings of tags as described by Guy and Tonkin (2006). First, social tags are uncontrolled and thus contain much noise or junk tags. Second, many tags have ambiguous meanings. For example, “love” can be the theme of a song or a user's attitude towards a song. Third, a majority of tags are tagged to only a few songs, and thus are not representative (so called “long-tail” problem11 ). Fourth, some tags are essentially synonyms (e.g., “cheerful” and “joyful”), and thus do not represent separate and distinguishable categories. To address these problems, this study builds on the initial research in this area by Hu et al. (2007) and proposes a new method for deriving more realistic mood categories from social tags.

Combining Social Tags, Linguistic Resources and Human Expertise

The proposed method combines the strength of social tags, linguistic resources and human expertise. Starting from a large set of social tags on music pieces, the method employs linguistic resources (e.g., domain-oriented lexicons) to filter out junk tags and tags with little or no affective meanings. In the next step, the ambiguity problem is untangled by human experts in the music domain. With their music knowledge, human experts assess whether a tag takes an unambiguous affective meaning in the music domain. To solve the synonym problem, the method follows the recommendations from a previous research on music mood metadata by Hu and Downie (2007). Specifically, synonyms are identified with linguistic resources (e.g. thesaurus) and are grouped into the same categories. Each new category is then defined collectively by all terms in it instead of picking one term as a category label. Finally, the unrepresentative “long-tail” is chopped by removing tags that were assigned to few (e.g., less than 20) music pieces.

A dataset used in this study was about 8,800 songs accessible to the author. 61,849 unique social tags on these songs were collected from, the most popular tagging site for Western music22 .

The linguistic resource used to solve the uncontrolled vocabulary problem was WordNet-Affect, an affective extension of WordNet (Strapparava and Valitutti, 2004). In WordNet-Affect, affective labels are assigned to concepts (synsets) representing emotions, moods, situations eliciting emotions, or emotional responses. As a general resource for text sentiment analysis, it has a good coverage of mood related words. There were 1,586 mood-related words extracted from WordNet-Affect and 348 of them exactly matched the tags collected from

Not all of the 348 tags are mood-related in the music domain. For cleaning up tags, human expertise is the most reliable resource, and the scope of 348 tags is well manageable for a small number of human experts. Both of the two human experts consulted in this project are music information retrieval (MIR) researchers with music background and native English speakers. They first removed tags with special music meanings that did not involve an affective aspect, such as “trance” and “beat”. Second, since only descriptive terms could be used as category labels, judgmental tags such as “bad”, “good” and “great” were removed. Third, ambiguous tags that cannot be clarified using available information were removed, so as to ensure the quality of the resultant mood categories. There were 186 tags remained and 4,197 songs were tagged with at least one of the tags.

WordNet is a natural resource for the synonym problem, because it organizes words into synsets. Words in the same synset are synonyms in the linguistic point of view. Moreover, WordNet-Affect also links each non-noun synset (verb, adjective and adverb) with the noun synset from which it is derived. For instance, the synset of “sorrowful” is marked as derived from the synset of “sorrow”. Both synsets represent the same kind of mood and should be in the same category. Hence, mood-related tags appearing in and being derived from the same synset in WordNet-Affect were merged into one group. At the end of this step, the 186 mood related tags were merged into 49 groups.

Several tag groups were further merged if they were deemed musically similar by the experts. For instance, the group of (“cheer up”, “cheerful”) was merged with (“jolly”, “rejoice”); (“melancholic”, “melancholy”) was merged with (“sad”, “sadness”). This resulted in 34 groups each representing a mood category for this dataset. Using the linguistic resources allowed this process to proceed quickly and minimized the workload of the human experts.

Finally, after removing the groups of tags applied to less than 20 songs, 18 tag groups with 135 tags left. They represented 18 mood categories each of which was collectively defined by all the tags in the group. Table 1 shows the categories, a subset of their member tags and number of songs within each category tagged with at least 1 tag in the category.

Table 1. mood categories derived by social tag analysis
original image

Analysis on the Categories

To further understand the value of the method, the distances between the resultant 18 mood categories were calculated based on song co-occurrences among the categories. Figure 1 shows the distances plotted in a 2-dimensional space using Multidimensional Scaling (Borg and Groenen, 2005). As shown in the figure, categories that are intuitively close (e.g., those denoted by “glad”, “cheerful”, “gleeful”) are positioned together; while the categories far apart in the figure are indeed very different in common sense (e.g., the ones denoted as “aggressive” and “calm”; “cheerful” and “sad”). This is evidence that the mood categories derived from the proposed method using linguistic resources and human expertise are able to reflect the distribution of empirical music data and thus to categorize music in a realistic manner.

Figure 1.

relative distances of 18 mood categories based on song co-occurrences

Conclusion and Future Research

Mood categories have been a much debated topic in MIR. This paper proposes a method that combines the strength of linguistic resources and expert knowledge to identify mood categories from social tags on online music communities. The resultant categories can provide a realistic and user-centric guidance in organizing music and facilitating music access by mood. Future work will compare the derived categories to emotion models proposed in music psychology, which will disclose, among other things, whether the categories derived from empirical music listening data reflect theoretical models developed in laboratory settings. In general, such comparison can help researchers refine or adapt theoretical models to better fit the reality of users' information behaviors.


  1. 1

    “long-tail” means the tag distribution follows a power law: many tags are used by a few users while only a few tags are used by many users (Guy and Tonkin, 2006).

  2. 2 latest updated on July 22, 2008.