Sounds of hammering or clapping can evoke simulation of the arm movements that have been previously associated with those sounds. This audio-motor transformation also occurs at the sequential level and plays a role in speech and music processing. The present study aimed to demonstrate how the activation pattern of the sensorimotor network was modulated by the sequential nature of the auditory input and effector. Fifteen skilled drum set players participated in our functional magnetic resonance imaging study. Prior to the scan, these drummers practiced six drumming grooves. During the scan, there were four rehearsal conditions: covertly playing the drum set under the guidance of its randomly-presented isolated stroke sounds, covertly playing the drum set along with the sounds of learned percussion music, covertly reciting the syllable representation along with this music, and covertly reciting along with the syllable representation of this music. We found greater activity in the bilateral posterior middle temporal gyri for active listening to isolated drum strokes than for active listening to learned drum music. These regions might mediate the one-to-one mappings from sounds to limb movements. Compared with subvocal rehearsals along with learned drum music, covert rehearsals of limb movements along with the same music additionally activated a lateral subregion of the left posterior planum temporale. Our results illustrate a functional specialization of the posterior temporal lobes for audio-motor processing.