Apart from speech content, the human voice also carries paralinguistic information about speaker identity. Voice identification and its neural correlates have received little scientific attention up to now. Here we use event-related potentials (ERPs) in an adaptation paradigm, in order to investigate the neural representation and the time course of vocal identity processing. Participants adapted to repeated utterances of vowel–consonant–vowel (VCV) of one personally familiar speaker (either A or B), before classifying a subsequent test voice varying on an identity continuum between these two speakers. Following adaptation to speaker A, test voices were more likely perceived as speaker B and vice versa, and these contrastive voice identity aftereffects (VIAEs) were much more pronounced when the same syllable, rather than a different syllable, was used as adaptor. Adaptation induced amplitude reductions of the frontocentral N1–P2 complex and a prominent reduction of the parietal P3 component, for test voices preceded by identity-corresponding adaptors. Importantly, only the P3 modulation remained clear for across-syllable combinations of adaptor and test stimuli. Our results suggest that voice identity is contrastively processed by specialized neurons in auditory cortex within ∼250 ms after stimulus onset, with identity processing becoming less dependent on speech content after ∼300 ms.