Structural and functional neural correlates of music perception


  • Charles J. Limb

    Corresponding author
    1. National Institute on Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, Maryland
    2. Peabody Conservatory of Music, Johns Hopkins University, Baltimore, Maryland
    3. Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins Hospital, Baltimore, Maryland
    • National Institute on Deafness and Other Communication Disorders, National Institutes of Health, 10 Center Drive, 3C-716, Bethesda, MD 20892
    Search for more papers by this author
    • Fax: 301-480-7410.

  • This article is a US Government work and, as such, is in the public domain in the United States of America.


This review article highlights state-of-the-art functional neuroimaging studies and demonstrates the novel use of music as a tool for the study of human auditory brain structure and function. Music is a unique auditory stimulus with properties that make it a compelling tool with which to study both human behavior and, more specifically, the neural elements involved in the processing of sound. Functional neuroimaging techniques represent a modern and powerful method of investigation into neural structure and functional correlates in the living organism. These methods have demonstrated a close relationship between the neural processing of music and language, both syntactically and semantically. Greater neural activity and increased volume of gray matter in Heschl's gyrus has been associated with musical aptitude. Activation of Broca's area, a region traditionally considered to subserve language, is important in interpreting whether a note is on or off key. The planum temporale shows asymmetries that are associated with the phenomenon of perfect pitch. Functional imaging studies have also demonstrated activation of primitive emotional centers such as ventral striatum, midbrain, amygdala, orbitofrontal cortex, and ventral medial prefrontal cortex in listeners of moving musical passages. In addition, studies of melody and rhythm perception have elucidated mechanisms of hemispheric specialization. These studies show the power of music and functional neuroimaging to provide singularly useful tools for the study of brain structure and function. Anat Rec Part A, 2006. Published 2006 Wiley-Liss, Inc.

“Music … is the freest, the most abstract, the least fettered of all the arts: no story content, no pictorial representation, no regularity of meter, no strict limitation of frame need hamper the intuitive functioning of the imaginative mind” (Copland,1952).

Recent advances in neuroimaging technology, particularly functional imaging techniques, have provided state-of-the-art methods with which to study the brain. Such techniques allow the investigation of neural regions and their responses to stimuli in the living organism. As a unique stimulus, music provides a framework with internal elements that allow us to study auditory and nonauditory centers of the brain. The neurological processes responsible for perception or production of music are common to many other, nonmusical endeavors such as interpretation of language. Additionally, insights into processes such as encoding of multiple data streams and temporal discrimination of auditory signals, both integral to music, are of broad significance to our understanding of the relationship between the brain and the environment. Neural plasticity can also be studied with music by studying trained musicians and the musically naive. Hence, the singular properties of music provide us with an ideal tool with which to study the brain.

This review highlights recent studies that have utilized music as a tool for understanding brain structure and function. Areas traditionally thought to be exclusively for processing language also show specializations and asymmetries associated with musical perception. These studies have fundamentally changed our understanding of the roles of the primary and secondary auditory cortex in sound interpretation.


Two fundamental techniques for neuroimaging include positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) (Jezzard et al.,2001; Toga and Mazziotta,2002). In PET scanning, intravenous bolus injection of a positron-emitting substance (e.g., H2-[15]O) is administered to a subject during an experimental paradigm. Based on the assumption of increased regional cerebral blood flow (rCBF) in areas of increased brain activity, relatively high areas of positron emission in the brain can be detected and their position localized. By comparing rCBF for a given task with baseline images, physiologically relevant areas may be identified.

In the more recently developed technique of functional MRI (fMRI or FMRI), detection of brain activity is also based on increases in rCBF, although in the case of fMRI, no exogenous tracer is administered. Rather, the technology of echo-planar imaging allows rapid acquisition of whole brain volumes with detection of local changes in deoxyhemoglobin/hemoglobin levels (Ogawa et al.,1990,1993). This technique, known as blood oxygen level-dependent (BOLD) contrast imaging, allows for significantly greater anatomic resolution and temporal specificity than PET scanning and is more widely available than PET. However, fMRI has the disadvantage (particularly for auditory studies) of significant added background noise (as high as 130 dB SPL).


The exact definition of music has been the subject of a lengthy debate (Sessions,1950; Cage,1961; Cooke,1990). We will use a colloquial definition here and consider music to be sound that is organized according to principles of pitch, rhythm, and harmony (Sessions,1950) and that employs musical timbres that allow us to differentiate between musical sound sources and identify musical instruments, such as violin, piano, and flute. Throughout all genres and historical epochs of musical composition, the organization of sound according to pitch, rhythm, and harmony has provided the acoustic framework by which we perceive and produce music. Music generally constitutes a unified whole that cannot be naturally subdivided (e.g., it is difficult to listen to the notes of a melody while ignoring its rhythmic underpinning and vice versa) (Peretz and Zatorre,2005). As such, it is plausible that the division of music into smaller musical elements may not be the best method to approach the subject of music at large. Yet, in order to establish basic concepts, this approach has been commonly taken and has even been used to outline a modular organization for music processing (Peretz and Coltheart,2003). The scientific study of music has utilized stimuli based on both discrete musical elements as well as intact, musically rich stimuli.


All sound processing begins with the peripheral auditory apparatus, in which sound vibrations are transmitted to the cochlear inner hair cells (via the ear canal, tympanic membrane, and ossicular chain). The early process of acoustic deconstruction takes place within the cochlea, which responds to acoustic vibrations in a tonotopic (i.e., frequency dependent) fashion and triggers afferent potentials that travel down the cochlear nerve to the brainstem. Through a chain of subcortical processing structures (cochlear nuclei, olivary pathways, lateral lemnisci, inferior colliculi, and medial geniculate nuclei of the thalamus), neural impulses representing sound information eventually reach the auditory cortical structures (Fig. 1). The primary auditory cortex (which contains Heschl's gyrus bilaterally, Brodmann area 41) receives incoming sound input and then relays sound to the appropriate processing centers, including secondary auditory association centers along the planum temporale (referring here to the true anterior planum temporale, or the region lateral to Heschl's gyrus) (Zetzsche et al.,2001; Meisenzahl et al.,2002), perisylvian language centers, motor areas, frontal lobe, and so forth; the net result is that of an auditory percept. The notions of a primary auditory core, surrounded by a belt region which itself is bordered by a parabelt region, has been used to describe the organization of the auditory cortices in macaque monkeys and has been recently confirmed to apply to humans (Hackett et al.,1998a,1998b,1999a,1999b,2001; Kaas and Hackett,1998,1999,2000; Kaas et al.,1999; Sweet et al.,2005). All auditory processing, whether environmental, linguistic, or musical in nature, relies on the integrity of this ascending auditory pathway. The various patterns of activation of cortical networks appear to be inherently tied to both the nature of the presented stimuli (e.g., language, melody) and also to different neural modes of pitch perceptual processing (e.g., fundamental pitch vs. spectral pitch) (Schneider et al.,2005), and these patterns are just beginning to be identified.

Figure 1.

Primary and secondary auditory cortical regions. The left image shows an MRI film (y = −22) with colored regions depicting the locations of the Sylvian fissure (green), superior temporal gyrus and sulcus (STG and STS; blue), and Heschl's gyrus (orange). Heschl's gyrus (transverse temporal gyrus) contains the primary auditory core and is located medially; it can be seen by reflecting the temporal lobe laterally at the Sylvian fissure. The secondary auditory cortical regions are located within the belt and parabelt regions, along the STG. The right image shows a three-dimensional view of the brain with surface renderings of the primary and secondary auditory regions and Sylvian fissure.

One of the most striking findings in recent years is that of major differences in both physiology and morphology of the auditory system even at the primary level, between musicians and nonmusicians. While one might expect higher levels of neural processing to be most intrinsically related to musical processing, Heschl's gyrus itself has received a good deal of attention as a possible marker of musicality. Schneider et al. (2002) conducted a magnetoencephalographic (MEG) study of nonmusicians, amateur musicians, and professional musicians. MEG recordings were taken during an auditory processing task, which were then correlated with three-dimensional volumetric MR imaging of gray matter within anteromedial Heschl's gyrus. Notably, professional musicians showed a significant (greater than 100%) increase in MEG activity within primary auditory cortex compared to nonmusicians, which was found to correlate with increased (130%) volumetric measurements of gray matter within Heschl's gyrus in musicians compared to nonmusicians. Furthermore, psychometric testing revealed a positive correlation between the size of Heschl's gyrus and musical aptitude. While the question of causality is not explicitly addressed in this study, it suggests a fundamental link between musical exposure, musical aptitude, and the physiologic and anatomic development of Heschl's gyrus.


Music has been referred to as the “universal language” (Swain,1997) due to its transcendant properties of expression that appear universally accessible regardless of spoken language differences. Implicit in this expression is the suggestion that music may express ideas in a more compelling, albeit less specific, manner than can traditional languages. As an abstract form of communication, music enjoys a freedom not shared by traditional languages—the intentional violation of its own syntactic rules (Patel,2003) for artistic or aesthetic purposes. Language, with its primary goal of clear, precise communication, does not generally support such violations of internal syntax. The notion of a distinct musical syntax has been proposed (Swain,1997; Koelsch et al.,2004), although the rules of this syntax are difficult to define concretely. Music clearly shares several features with language, most notably those of a hierarchical structure (syntax/harmony), a vocabulary (words/chords and intervals), tonal properties (inflection/timbre), and a temporal clock (prosody/rhythm), which raises the question of whether or not music and language utilize the same neural structures.

A large body of literature exists to support the idea that cerebral specialization for language is left hemisphere dominant, taking place along a network of centers distributed around the sylvian fissure—the perisylvian language cortices. More specifically, these areas include superior temporal, middle temporal, frontal opercular, and inferior parietal regions along the left side (Fig. 2) (Papathanassiou et al.,2000). The perisylvian language regions incorporate the traditional areas known as Broca's area (Brodmann area 44) and Wernicke's area (Brodmann areas 21 and 42), which are now understood to be part of a broader network for language processing (during both comprehension and production). Several studies of music have supported the notion of right hemispheric specialization for tonal music, a finding that was based largely on lesion studies (Zatorre,1985; Samson and Zatorre,1988; Liegeois-Chauvel et al.,1998; Samson et al.,2002). It is becoming clearer, however, that a simple division between language/left hemisphere and music/right hemisphere does not fully account for the processing mechanisms of these complex systems, and it is quite likely that interhemispheric communication plays an important role (Schlaug et al.,1995a; Ridding et al.,2000; Nordstrom and Butler,2002; Lee et al.,2003).

Figure 2.

The perisylvian language areas. A three-dimensional view of the brain with colored regions corresponding to the approximate locations of the perisylvian language areas. The frontal operculum (purple) includes Broca's area, while the inferior parietal lobule (red) includes Wernicke's area; the STG/STS and middle temporal gyrus (green) include secondary auditory association cortex.


The organization of sound in music is intrinsically relational, even though it consists of absolute elements. That is, the listener's interpretation (and, presumably, the composer's intention) of a musical note is based largely on the note's relation to those that preceded it temporally or are presented simultaneously; in turn, this note provides a portion of the framework within which each subsequent note is interpreted. Taken together, these characteristics of music lead to the notion of a musical key. The relational characteristic of music pitches is what allows for transposition of one melody from any key to the next, a process that alters the absolute frequencies of the presented notes while preserving the essential contour of a melody. Indeed, most people are not able to differentiate between alterations in absolute frequency, because they lack the ability to identify absolute pitches, but are instead able to differentiate between relative pitch. This relational organization holds true for rhythmic and harmonic principles as well, in that rhythmic patterns are meaningful only in the context of the note that falls before and after a given note. For harmonic relationships, a given chord quality (e.g., major or minor) can be defined as a series of relative intervals that holds true for all 12 musical keys. For example, a major chord can be described as a root note (tonic), major third interval (four semitones away), and a perfect fifth interval (three semitones or a minor third interval away from the major third), regardless of what key the chord is in (Fig. 3). Moreover, all chord qualities within a common group suggest equivalent musical connotations irrespective of key.

Figure 3.

Piano keyboard showing relational nature of major triads. A C major triad and a D major triad are shown. Despite the differences in key, the interval spacing between all notes (root, third, and fifth) of the triad remains constant.

The relational nature of music is foundational to musical expression and endows music with much of its flexibility and universality. One of the primary consequences of music's relational system is the creation of expectation in the listener based on a priori internalization of certain relational variables. Most listeners of music are accustomed to hear musical notes that fit properly within the contextual musical reference, whether melodic, rhythmic, or harmonic. Corollary to the notion of musical expectancies is that of violations of musical expectancies, which are tantamount to violations of musical syntax. For example, if a simple melody is played entirely within one key (e.g., G major), but the last note of the melody is out of key (e.g., G# instead of G natural), the listener detects a syntactic aberration within the presented melody. The ability to detect musical aberrations supports the idea of a musical syntax that is simultaneously both vague and robust and is likely to be dependent on cultural musical upbringing, degree of innate musicality, presence of tone deafness, and degree of musical training.

Maess et al. (2001) exploited the relational properties of Western tonal music in an MEG study of musical syntax. In this study, a series of chords were presented to listeners without musical training. The sequences consisted of in-key musical chord sequences that occasionally contained so-called Neapolitan or sixth chord, which contains two out-of-key notes while being itself both major and consonant in character (Fig. 4). Hence, the Neapolitan chord allowed for examination of responses to musical chords that did not vary in chord quality (such as major to minor, or degree of dissonance), but only in the manner in which they satisfied the musical expectancies created by the preceding chords. During the MEG paradigm, subjects listened passively to the chords and were instructed to ignore harmonic changes (a timbral alteration was included to provide a noncompeting object of focus, although no explicit task was required). The authors found an early effect of Neapolitan chord presentation, which was termed the magnetic early right anterior negativity (mERAN; in reference to electrophysiologic studies showing an ERAN in response to music-syntactic violations). Through source localization, the mERAN was found to be generated from the traditional left Broca's area and its right hemisphere homolog (inferior Brodmann area 44), regions known to be important for syntactic processing of language (Caplan et al.,1998,1999,2000; Dapretto and Bookheimer,1999). This study strongly supports the notion of musical syntax and implies that areas traditionally thought to be involved in single-domain (e.g., language) processing have far greater flexibility than previously understood.

Figure 4.

Musical stimuli used to assess neural perception of musical syntactic anomalies. A: A five-chord sequence is shown using traditional musical notation, with all five chords being in-key consonant chords in the key of C. The fifth chord is highlighted in green and represented pictorially using a piano keyboard layout. B: A second five-chord sequence is shown with the fifth chords highlighted in red being a Neapolitan sixth chord, which contains two in-key notes (F and E) and two out-of-key notes (A flat and D flat). The four chords preceding the fifth chord set up a harmonic (syntactic) expectancy in the listener, which is detected as a syntactic violation in the case of the Neapolitan chord but not the in-key chord. Modified from Maess et al. (2001) and printed with permission from Nature Publishing Group.


In language, words used in a sentence are selected for their syntactic correctness (to enable proper interpretation of relationships between the relevant subjects and objects) and, ultimately, for their semantic implications (to enable interpretation of meaning). In music, which is inherently abstract and has little explicit reference to the external world (Copland,1952), the notion of specific meaning is certainly troubling. Can a musical phrase be assumed to mean anything, and if so, can this be proven with any degree of robustness? In an electrophysiological study of 122 normal subjects, Koelsch et al. (2004) examined whether or not the well-described priming effect caused by presenting semantically related words in sequence (the N400 potential) could also apply to music. Through a paradigm of presenting musical excerpts followed by words that had descriptive, qualitative, or structural similarities to the excerpts, it was found that presentation of a musical excerpt that shared “semantic” similarities to the target word led to a smaller N400 response (due to the priming effect), consistent with findings using language only. Source analysis of the N400 showed no statistically significant differences in location between language and musical stimuli, and the posterior portion of the middle temporal gyrus (Brodmann's area 21/37) was found to be the primary generator of the electric response. These findings are similar to findings in language studies that localize semantic processing to the region of the superior temporal sulcus (Friederici et al.,2000; Friederici,2001,2002; Hahne et al.,2002).

The implications of these findings are provocative: musical passages containing no explicit linguistic content can cause a priming effect for certain words, if those words have a possible semantic relationship to the musical passage. While the selection of the target words and pairing with musical passages implies a certain preexisting bias in what words might be semantically related (e.g., matching an ascending tone pattern with the word “staircase”), this does not undermine the results of the study from the perspective of neural activity. The idea that some musical passages might have a greater or lesser relationship to particular words implies that, within a musical context, there are some descriptive words that more accurately reflect the “meaning” of a musical passage. Therefore, while it may be impossible to define the true semantic intention of a composed musical phrase precisely, this does not contradict the notion that musical semantics exist. Moreover, the localization of source to the middle temporal gyrus near the superior temporal sulcus suggests that the secondary auditory cortex is utilized in sound processing for a wide variety of purposes, including the possible semantic interpretation of music and linkage of this interpretation to nonmusical constructs.


While the concept of musical semantics is difficult to grasp, the idea that music conveys emotions seems an intuitive one, in light of the central role played by music in social functions ranging from celebration (e.g., weddings) to grieving (e.g., funerals). One could easily posit that the pervasiveness of music in the world is largely due to its ability to convey emotion. Popular song lyrics are littered with emotional content, and classical music has numerous examples of composers using musical composition (e.g., Beethoven's Fifth Symphony, Mahler's Tenth Symphony) to express extreme emotion. Jazz music has been described as the “sound of surprise” (Balliett,1978). Music is played to infants in order to soothe them.

The neural basis of emotional coupling to musical sounds, however, is less intuitive. From a pragmatic, utilitarian, or survival perspective, there are no clear reasons why the act of perceiving musical sounds should be capable of inducing emotion, especially deep emotion often associated with one's response to music. The neural basis for this coupling has yielded several interesting findings. In a PET study, Blood and Zatorre (2001) selected 10 musicians, all of whom had extensive musical training. These individuals were notable in that they reported the presence of reproducible “chills” when listening to certain pieces of music; musical choices were limited to classical music without lyrics or singing. The production of chills was considered an indicator of an intense emotional response to a musical stimulus, an assumption that was supported by changes in heart rate and respiratory rate. During the scanning procedure, the individuals listened to pieces of music known to induce chills, and also to music that was selected by the other participants as evoking chills. Therefore, responses were measured to identical stimuli, some of which produced chills in some subjects but not in others (the sensation of chills was reported in more than 75% of scans that were selected for this purpose). After analysis of cerebral blood flow patterns, it was found that the ventral striatum, midbrain, amygdala, orbitofrontal cortex, and ventral medial prefrontal cortex were activated during scans that evoked chills—regions that are known to be involved in modulation of emotion, and particularly for reward/motivation systems (Fig. 5). This study confirmed not only the notion that musically induced emotion can be studied, but more fundamentally, that the activation patterns elicited by music could be tied to primitive systems of emotion and reward, with its attendant ties to survival behavior.

Figure 5.

Neuroanatomical regions demonstrating significant rCBF correlations with chills intensity ratings. Regression analyses were used to correlate rCBF from averaged PET data for combined subject-selected and control music scans with ratings of chills intensity (0 to 10). Correlations are shown as t-statistic images superimposed on corresponding average MRI scans. The t-statistic ranges for each set of images are coded by color scales below each column, corresponding to ac (positive correlations with increasing chills intensity) and df (negative correlations). a (sagittal section, × 4 mm) shows positive rCBF correlations in left dorsomedial midbrain (Mb), right thalamus (Th), AC, SMA, and bilateral cerebellum (Cb). b (coronal section, y 13 mm) shows left ventral striatum (VStr) and bilateral insula (In; also AC). c (coronal section, y 32 mm) shows right orbitofrontal cortex (Of). d (sagittal section, × 4 mm) shows negative rCBF correlations in VMPF and visual cortex (VC). e (sagittal section, × 21 mm) shows right amygdala (Am). f (sagittal section, × 19 mm) shows left hippocampus/amygdala (H Am). Reprinted with permission from Blood and Zatorre (2001). Copyright 2001 National Academy of Sciences, USA.

A more recent study by Menon and Levitin (2005) used functional connectivity analysis to show that listening to music invokes activity in the nucleus accumbens, ventral tegmental area, hypothalamus, and insula, regions that are thought to be closely related to physiologic mechanisms of reward behavior. Hence, these results suggest that music, far from being a casual, pleasant by-product of the auditory system designed for language, may instead be tied to mechanisms of survival, which may explain in part why music has persisted throughout history despite the fact that it confers no obvious survival advantage in humans.

In a related study, Blood et al. (1999) studied affective responses to musical consonance and dissonance. In cultures whose music is based on Western scale systems, the juxtaposition of two different musical pitches against one another can sound either harmonious (consonant) or incongruous (dissonant; Fig. 6). Both consonance and dissonance are employed as compositional elements in most musical pieces, usually (but not always) to convey a sense of resolution or tension; when utilized in a proper musical context, the net effect of musical dissonance can be striking and enjoyable. In isolation from other musical elements, however, the presentation of a dissonant interval can evoke a sense of unpleasantness for the listener.

Figure 6.

Piano keyboard diagram of consonance and dissonance. The blue interval (F to A) depicts a major third interval, which is consonant and part of a major triad. The red interval (F to F sharp) depicts a minor second interval, which is dissonant. It should be noted that the separation of this interval by transposing the F up one octave produces the interval F sharp to F, which implies a less dissonant major seventh interval with F sharp as the root and F as the major seventh note.

In a PET paradigm, Blood et al. (1999) presented a simple melody to the listeners, but modified the chords that accompanied the melody, such that they were increasingly dissonant (Fig. 2). Subjects rated each musical example in terms of both degree of pleasantness/unpleasantness and whether the melody sounded happy or sad. After regression analysis and contrast analysis of cerebral blood flow maps, it was found that increased levels of dissonance correlated with activity in the right parahippocampal gyrus and right precuneus, while increased musical consonance was associated with activity in the right orbitofrontal cortex and medial subcallosal cingulated gyrus. The right parahippocampal gyrus, which was strongly activated by dissonant conditions, has diverse neural roles, with connections to the amygdala, implicating a role for this region in emotional processing of auditory stimuli with unpleasant characteristics. These results suggest that paralimbic and neocortical brain regions have specific responses to conditions of musical consonance and dissonance, supporting the claim that musical processing can invoke primitive neural substrates responsible for affective responses, such as fear and arousal.


A melody is a sequence of musical pitches that form a musical phrase. Melody is one of the absolute quintessential elements of music. Although melodies (like all other elements of music) are inherently rooted in time and have their own temporal structure and phrasing, it is the pitch relationship of one note to the next which is the signature of a particular melody. Both the intervals between individual notes and the overall contour of the sequence are incorporated into melodic processing. Many studies of music have focused on melody or pitch perception and discrimination (Zatorre et al.,1994; Rao et al.,1997; Griffiths et al.,1999; Halpern and Zatorre,1999; Hugdahl et al.,1999; Perry et al.,1999). The earliest studies of musical pitch perception were based primarily on lesion studies, with the goal of identifying potential neural regions involved in musical pitch perception. Initial attempts to study music examined lateralization to whole hemispheres in a binary fashion.

On the basis of several early studies, it was suggested that musical stimuli are processed by the right hemisphere (reviewed by Peretz,1985; Zatorre,1985). In a study of brain-damaged patients with either right- or left-sided brain damage, Peretz (1990) examined the question of whether or not hemispheric specialization existed for processing of melodic contour and pitch interval analysis. She found that patients with right-sided brain damage were unable to distinguish between melodies with intact contours vs. melodies with violations of contour, or between transposed melodies. In comparison, both right and left hemisphere-damaged patients had difficulty on tasks that looked at pitch interval discrimination. Although the use of subjects with brain damage is intrinsically limited as a method of elucidating normal neurologic processing, this study suggested that the right hemisphere was predominantly involved in processing of musical contour over the left and also contributed to processing of pitch interval information. Subsequent studies revealed more specifically that tonal pitch perception could be attributed to the right hemisphere, and to the auditory cortices in particular (Rao et al.,1997; Halpern and Zatorre,1999; Perry et al.,1999).

As stated before, the interaction between musical pitches in a melody (and the cognitive interpretation of the pitches) is relational, in that pitches derive meaning from the context of notes before and after. Musically, a key provides a scale according to which a melody is presented (in Western music). For example, in the key of C major, the majority of notes of a melody presented in this key are comprised of notes from the C major scale; the same holds true for other scales, e.g., G minor. There are 12 different notes in the Western scale (C, C#, D, D#, E, F, F#, G, G#, A, A#, and B), each of which represents a key. Within each key, different scale modes (e.g., major vs. minor) can be extrapolated. The circle of fifths describes the relationship between one key to the next in terms of musical distances (Fig. 7).

Figure 7.

The circle of fifths. This diagram shows the relationships between musical keys. On the outer part of the circle, the major keys are shown, with relative minor keys shown in the inside. As one progresses along the circle of fifths, there is a systematic change in the number of sharps or flats associated with each key, and this number is identical for each pairing of major keys and relative minor keys.

In a functional MRI study of musically trained subjects, the neural basis for the so-called geometric properties of the Western musical key system and its implication for melodic processing were investigated (Janata et al.,2002). By studying the ability of subjects to track tonality of a melodic contour (the explicit task was to listen to melodies presented in major and minor versions of all 12 keys, and to detect violations of key tonality), the authors identified several consistent areas of activation. Most importantly, the rostromedial prefontral cortex was found to be consistently activated and interpreted to be a region that tracks tonal space. The authors argue that this region, which plays a multimodality integrative role, allows a listener to maintain a topographic map of musical input regardless of musical transposition, such that modulations of key do not alter the listener's ability to retain a melodic contour or to retain a fixed perception of pitch intervals (regardless of which reference pitch is chosen).

The proper perception of melodies (and chords of notes presented simultaneously) rests on the accurate processing and cognitive perception of individual pitches. As one might expect, the neural processing of basic musical pitch information, regardless of musical contour or melodic transposition, takes place within auditory processing areas rather than within areas involved in multimodal processing, particularly the right superior temporal gyrus (reviewed by Peretz and Zatorre,2005). In a PET imaging study, Zatorre and Belin (2001) used an auditory stimulus in which alternate pure tones separated by an octave were presented with either spectral or temporal characteristics were varied.

A passive paradigm was used without any behavioral index. Analysis of the PET findings showed two main findings (Fig 8). First, bilateral activation of anterior superior temporal areas was found for spectral variation. Second, secondary auditory areas on the right were more active during spectral stimulus variation, while temporal variation led to increased activity in left superior temporal regions. Hence, even at the basic level of passive pitch processing in a nonmusical paradigm, relative right hemispheric specialization is found to exist within secondary auditory cortex.

Figure 8.

Three-dimensional rendering of the cerebral blood flow (CBF) data from the covariation analyses onto a magnetic resonance image of a representative individual subject's brain, viewed from the right. The level of the section (z = 3 mm) is indicated in the inset. The green areas correspond to the regions showing significant covariation of CBF with increasing rate of temporal change, while the red areas correspond to regions whose CBF increased as a function of change in the spectral parameter. H indicates the stem of Heschl's gyrus in each hemisphere; STS indicates the superior temporal sulcus. Note that the temporal covariation sites are located within Heschl's gyri in the two hemispheres, while the spectral covariations are located anterior to the sites covarying with the temporal stimulus parameter. An additional posterior site of spectral covariation is located within the STS in the right hemisphere only. The bottom panel shows merged PET and MRI volumes corresponding to the direct comparison of temporal and spectral conditions to one another. The image on the left shows a horizontal section taken through the region of Heschl's gyri (z = 9), which showed significantly greater activity in the combined temporal conditions than in the combined spectral conditions. The section on the right is taken through the maxima corresponding to the anterior superior temporal region (z = −6), which showed a greater response to the spectral conditions than to the temporal conditions. The bar graphs illustrate the percent difference between condition in regions of interest taken from corresponding locations. Reprinted from Zatorre and Belin (2001). Copyright 2001 Oxford University Press.


In contrast to the notion of musical pitches and melodies being based on relative distances and intervals between successive notes is the phenomenon known as absolute, or perfect, pitch. Absolute pitch is most commonly considered the ability to identify, on hearing a random musical note, its exact pitch. It has been noted that even individuals with absolute pitch do not always identify pitches correctly to the exact semitone, and that they sometimes have difficulty identifying which octave a given pitch falls in (Levitin and Rogers,2005). Some authors have used a visual analogy with color identification to describe perfect pitch, in that most individuals can easily name a color on seeing it (Levitin and Rogers,2005). Others have pointed out that a visual analogy is not entirely compelling, in that while individuals can often identify basic color groups, they have great difficulty in discriminating between various shades of a color group, much in the same way that most individuals can tell whether a pitch is very high or very low, but have increasing difficulty as more specific categorization of notes is required (Zatorre,2003).

The estimated incidence of absolute pitch is 1 in 10,000 (Deutsch,1999) in the general population. Although this ability has been linked to musical talent and early musical exposure (e.g., Mozart), emphasizing the importance of environmental variables, extensive musical training is not necessary for the ability to process pitch in an absolute fashion. Furthermore, extensive musical training does not appear to endow individuals with absolute pitch, if they are not exposed in early life to music (Deutsch,1999).

It appears that absolute pitch (as well as tone deafness) is an ability that arises as a combination of both genetic and environmental influences, and as such, absolute pitch provides a unique example with which the relative roles of these influences can be studied. A high prevalence of absolute pitch in Asians (even in nontonal language-based cultures) has been noted, supporting a genetic link (Levitin and Rogers,2005). The anatomic basis for absolute pitch is beginning to be understood. In a single-case study, Zatorre (1989) reported the case of a musician with absolute pitch who suffered from intractable epilepsy, eventually requiring left anterior temporal lobectomy. One year following the procedure, musical pitch testing revealed that the subject's absolute pitch perceptual ability was completely intact (in fact, it was better than the immediate preoperative period). Subsequent studies clarified that the left planum temporale (rather than the left anterior temporal lobe), a region known to contain auditory association cortex, was a region of special interest for absolute pitch perception. In an anatomic study using high-resolution magnetic resonance morphometry, Schlaug et al. (1995b) found that there was a leftward asymmetry in musicians who possessed absolute pitch when compared to musicians without absolute pitch or nonmusicians. A follow-up study examined the asymmetry between left and right planum temporale more closely (Keenan et al.,2001). The authors performed high-resolution magnetic resonance imaging to examine a large cohort of musicians (n = 27) with absolute pitch, in comparison to 27 nonmusicians and 22 musicians without absolute pitch. The purpose of this study was to clarify whether or not the leftward asymmetry seen in absolute pitch was due to increased size of the left planum temporale or diminished size of the right planum temporale. They found that absolute pitch was in fact correlated with a decrease in the size of the right planum, interpreted as a pruning effect in nonmusicians or musicians without absolute pitch (Fig. 9).

Figure 9.

Surface reconstructions of the right and left PT of one nonmusician and one musician with absolute pitch. The approximate location of the transverse gyrus of Heschl (HG) is indicated for each PT surface reconstruction. Note the large difference in right hemisphere PT size for these two subjects. Reprinted from Keenan et al. (2001). Copyright 2001 Elsevier.

The importance of the planum temporale in absolute pitch processing was further confirmed in an electrophysiologic study of musicians with absolute pitch, who showed an early left posterior negativity at 150 msec regardless of which task was being performed, implying a central role of the planum temporale for absolute pitch, in comparison to the later onset of ERP components elicited in relative pitch tasks (Itoh et al.,2005).

Further insights into absolute pitch revealed that working memory mechanisms for pitch interval tasks were differentially evoked in musicians with absolute pitch in comparison to those without absolute pitch. In a PET study, Zatorre (1998) studied musicians with and without absolute pitch as they listened to musical tones. Most interestingly, musicians without absolute pitch showed activity within the right inferior frontal cortex while performing interval-judgement tasks, a finding that was not seen in musicians with absolute pitch, implying that individuals with absolute pitch did not access working memory mechanisms during pitch interval naming, as did those with relative pitch alone.


Temporal processing can be examined both microscopically and macroscopically. Most studies have examined temporal properties of acoustic perception from a microscopic perspective, on the scale of milliseconds (Tallal and Newcombe,1978; Griffiths et al.,1998; Liegeois-Chauvel et al.,1998). Musical rhythms, in contrast, take place on the scale of seconds or longer, and while such fine-grained studies are important, they reveal less about how actual musical rhythms are perceived. The creation of rhythmic patterns is arguably the most basic of all musical impulses (Sessions,1950), common even to primitive societies and children. Rhythm is defined here in accordance to Peretz (1990), as the organization of relative durations of sound and silence (or notes and rests), and differs from meter, which is the division of rhythmic patterns according to equal periods (or measures) marked by an underlying metronome (or tempo). Of all components of music perception, rhythm is the most fundamentally linked to the movement of time and therefore perception of rhythmic patterns necessarily implicates brain regions involved in temporal processing. In the model of temporal processing proposed by Lerdahl and Jackendoff (1983), rhythm and meter processing are separable into distinct components. Several authors have similarly supported such a notion (Povel,1981,1984; Essens and Povel,1985; Dowling and Harwood,1986).

In comparison to the robust data showing right-sided specialization for melody processing, earlier studies suggested less convincingly that the left hemisphere was dominant for rhythm perception, primarily on the basis of lesion data (Efron,1963; Swisher and Hirsh,1972; Lackner and Teuber,1973; Tallal and Newcombe,1978; Sherwin and Efron,1980; Robin et al.,1990). In a functional imaging study of normal hearing individuals, Platel et al. (1997) examined rhythm perception using PET and found that left inferior frontal gyrus (BA 44/6, Broca's area) was involved in rhythm perception in normal individuals. These findings are somewhat difficult to interpret in light of the fact that the paradigm mixed together rhythmic, pitch, and timbre irregularities within test conditions. To complicate the issue further, other studies have shown definite right-sided contributions to temporal pattern perception (Robinson and Solomon,1974; Michel et al.,1980; Peretz,1990; Kester et al.,1991; Penhune et al.,1999).

It has been clearly demonstrated that the leftward dominance for rhythm processing depends on the mathematical intervallic relationship of the rhythm (Sakai et al.,1999). In this fMRI study, metric rhythms (with interval ratios of 1:2:4 or 1:2:3) activated left premotor, left parietal, and right cerebellar areas, while nonmetric rhythms (1:2.5:3.5) led to right prefrontal, right premotor, right parietal, and bilateral cerebellar activity. In a neuroimaging study of the effects of long-term musical training on rhythm perception, fMRI was used to assess passive perception of rhythms that are either highly regular or highly irregular. This study suggested that musicians had a relative left lateralization of neural activation patterns for all rhythms in comparison to nonmusicians, but that this activation was particularly intense in perisylvian language areas during regular rhythm perception (Fig. 10) (Limb et al., this issue). Taken together, these findings suggest that effective rhythm processing of integer-based or quantized rhythms employs left hemispheric mechanisms, as opposed to the right hemispheric specializations found for pitch and melody, and that musical training emphasizes this leftward asymmetry.

Figure 10.

Axial slice representation of brain activity during passive perception of regular rhythms in musicians and nonmusicians. The red activations correspond to areas that were more active in musicians than nonmusicians in a random-effects contrast analysis (P < 0.005) and reveal predominantly left-sided activation, especially within perisylvian language cortices. The blue activations correspond to areas that were more active in nonmusicians than musicians and show weaker, relative right lateralization of activity in nonmusicians. Number labels depict z-plane for eadh axial slice in Talairach space, shown graphically in the sagittal brain inset (lower right).


Enormous advances that have been made in recent years toward an understanding of the neural structural and functional correlates that are involved in musical perception. Technical limitations due to confounding factors, such as background scanner noise during fMRI paradigms, are beginning to be overcome by the use of alternate scanning methods (e.g., sparse temporal acquisition paradigms). While music perception utilizes neural substrates common to all types of auditory processing, it is clear that the brain processes music in a strikingly broad fashion, with neural activation patterns that reflect the use of language mechanisms, long-term neural plasticity, and emotion and reward systems. The wide spectrum of musical ability, ranging from the musically gifted to the amusic individual, provides a quasiparametric variable with which to interpret such patterns of brain activity. As such, music promises to remain a singularly useful tool for the study of the brain. Future studies of music are likely to be directed beyond musical perception, to issues of musical performance, learning, and composition. Ultimately, we will perhaps gain a more concrete understanding of that most intriguing endeavor of humankind: artistic creativity.