Despite its fundamental relevance for representing the emotional world surrounding us, human affective neuroscience research has widely neglected the auditory system, at least in comparison to the visual domain. Here, we have investigated the spatiotemporal dynamics of human affective auditory processing using time-sensitive whole-head magnetoencephalography. A novel and highly challenging affective associative learning procedure, ‘MultiCS conditioning’, involving multiple conditioned stimuli (CS) per affective category, was adopted to test whether previous findings from intramodal conditioning of multiple click-tones with an equal number of auditory emotional scenes (Bröckelmann et al., 2011 J. Neurosci., 31, 7801) would generalise to crossmodal conditioning of multiple click-tones with an electric shock as single aversive somatosensory unconditioned stimulus (UCS). Event-related magnetic fields were recorded in response to 40 click-tones before and after four contingent pairings of 20 CS with a shock and the other half remaining unpaired. In line with previous findings from intramodal MultiCS conditioning we found an affect-specific modulation of the auditory N1m component 100–150 ms post-stimulus within a distributed frontal–temporal–parietal neural network. Increased activation for shock-associated tones was lateralised to right-hemispheric regions, whereas unpaired safety-signalling tones were preferentially processed in the left hemisphere. Participants did not show explicit awareness of the contingent CS–UCS relationship, yet behavioural conditioning effects were indicated on an indirect measure of stimulus valence. Our findings imply converging evidence for a rapid and highly differentiating affect-specific modulation of the auditory N1m after intramodal as well crossmodal MultiCS conditioning and a correspondence of the modulating impact of emotional attention on early affective processing in vision and audition.