Answering the unanswered question? Contextualizing a holistic theoretical framework for cross-genre music information retrieval



This proposed exploratory study seeks to contextualize music information retrieval (MIR) in relation to similarities across genre. Drawing upon relevant MIR studies, we propose a holistic theoretical framework that future researchers could utilize to develop systems that facilitate cross-genre music information retrieval (CGMIR).

Introduction and context

Online music vendors and music social networking sites use genre as a convenient way to make recommendations to customers and users. This practice perpetuates generalized assumptions about the types of music individual listeners prefer, depriving them of the ability to find music from other genres that they might like better than music from a “preferred” genre.

A holistic theoretical framework would build upon previous research into MIR studies that have yielded similar results from different genres, providing a broad foundation for systems that facilitate cross-genre music information retrieval (CGMIR). Whether in pre-existing or yet-to-be-developed systems, following such principles would allow users to find unforeseen and surprising connections among works from different genres.

Conductor and composer Leonard Bernstein found value in works from diverse genres. Some of his own works, including West Side Story (1957) and Mass (1971), reflect this outlook. His interest in different types of music reached its apotheosis in his 1973 lecture series The Unanswered Question: Six Talks at Harvard (1992), which draws upon Chomskyan linguistics to demonstrate the universality of musical grammar. Horowitz (1993) criticizes these efforts to find analogous structures between music and language, but Bernstein's preexisting interest in different genres indicates that universality of musical grammar requires little or no reference to Chomsky. In fact, a CGMIR system could prove practical in exploring such a possibility.

Bernstein is not alone in exploring similarities among genres. The Beatles received cultural validation from Bernstein, as well as a number of conductors and composers (Kozinn, 2004). In addition, the band cites Karlheinz Stockhausen as an influence. A variety of non-classical musicians also reference the composer, including Frank Zappa, Jerry Garcia, Miles Davis, Brian Eno, and Björk (Didcock, 2005).

A study by Collingwood (2008) finds further connections among genres by positing that classical and metal fans share a number of personality traits, which draw them to their preferred genres of music. Usually employed to signify scariness, sexuality, or both, the tritone (or “Devil's Interval”) appears in a number of metal and classical works (Rohrer, 2006).

The aforementioned examples demonstrate just a few ways in which musicians associated with one genre borrow from other genres. They also underscore potential uses for CGMIR systems.

Previous research with implications for CGMIR

Downie (2003) discusses a variety of music facets that can be utilized in MIR. They include pitch, tempo, and harmony, as well as timbral, editorial, textual, and bibliographic facets. This range of facets alone demonstrates the challenges of integrating them effectively into MIR systems, as well as the potential for yielding cross-genre results.

Roos and Manaris (2007) describe a power-law metrics-derived search they utilized to find pieces that share aesthetic similarities with Miles Davis' “Blue in Green.” “Chanson de Matin” by Edward Elgar appeared as one of the results. As Roos and Manaris point out, experts in musicology might miss such connections, demonstrating that the development of holistic CGMIR systems could prove valuable to their work.

Baumann has led studies with implications for CGMIR (Baumann, Klueter, & Norlien, 2002; Baumann, 2003; Baumann & de Rosnay, 2004; Baumann & Halloran, 2004). Drawing upon some of Downie's facets such as tempo and timbre, an “automatic audio analysis” picked out low-level features to determine music similarity. These results were compared with those yielded in a lab to find disparities. With temporal clustering, a Nearest Neighbor (NN) classifier found similarities among pieces from different genres. Due to time constraints, however, only a small number of professional musicians and laypeople have participated in these studies.

The “Search inside the Music” project (Lamere, 2008) analyzes and organizes music in an attempt to expand the scope of MIR beyond genre. This method draws upon several of Downie's facets, as well as subjective measures like emotion. Recommendations are based upon similarity of tastes, as well as autotagging of unpopular or new music. Related to Lamere's research, a small-scale study led by Neal (2009) focused on the efficacy of using emotion for retrieving music within

Although the aforementioned examples seem promising for developing CGMIR systems, the sample size for each study was relatively small.

Components of holistic CGMIR systems

Construction of holistic CGMIR systems would require a combination of the measures mentioned above. Somewhat related to the research described by Lamere (2008), other objective and subjective measures could further enhance the capabilities of such systems:

  • Influence (Objective): Usually cited within written texts, this refers to the aesthetic or creative impact one musician has had on another.

  • Allusions to other pieces (Objective): This refers to similarities between individual works.

  • Textual (Objective or Subjective): Laypeople and experts detect similarities in the “aboutness” of works in different genres.

  • Similarity of excerpts and/or complete works (Subjective): Laypeople and experts detect similarities between the musical aspects of different works, even without objective evidence. Some laypeople may lack the ability to describe such similarities in technical terms.

Conclusions and future study

As mentioned earlier, research related to the potential for CGMIR has been relatively limited. A critical mass of information provided by diverse techniques could increase the likelihood of users finding similarities among pieces in different genres.

Development of CGMIR systems would require multidisciplinary efforts by specialists in information science, computer science, and music. Experts from other fields could also make their own contributions; historians, for example, could aid in contextualizing works by time period. To make such a system even more dynamic, laypeople with access to a CGMIR system could make their own contributions as well.

CGMIR systems should provide users with a variety of tools and techniques to describe how they find connections among pieces of music in different genres, including traditional text searching and entry points of information based on the facets outlined by Downie (2003). Tagging facilities would offer a useful way of increasing the ability of users to contribute, follow, and vote on the validity of hunches about cross-genre connections.

Despite its promise, CGMIR carries potential risks. The subjectivity facilitated by a holistic CGMIR system may lead to new insights related to the potential for universality in music, but it may lead to confusion. In addition, relatively popular genres might receive greater attention from users. This is already a problem with, where the top works for specific emotions are all popular songs (Neal et al, 2009).

Ross (2007) discusses the interplay of music genres in the twentieth century to demonstrate that the dichotomy of classical and popular music “no longer makes intellectual or emotional sense” (p. 541). Within this context, music vendors and social networking sites need to consider how they can aid users in finding music that shares similarities while transcending genre. In whatever manner a holistic CGMIR system evolves, it could aid anyone interested in pursuing Bernstein's “Unanswered Question.” Even if a definitive answer remains elusive, the prospect of teasing out unforeseen connections might pique the interest of anyone who cares about music.