Integrating the multisensory features of talking faces is critical to learning and extracting coherent meaning from social signals. While we know much about the development of these capacities at the behavioral level, we know very little about the underlying neural processes. One prominent behavioral milestone of these capacities is the perceptual narrowing of face–voice matching, whereby young infants match faces and voices across species, but older infants do not. In the present study, we provide neurophysiological evidence for developmental decline in cross-species face–voice matching. We measured event-related brain potentials (ERPs) while 4- and 8-month-old infants watched and listened to congruent and incongruent audio-visual presentations of monkey vocalizations and humans mimicking monkey vocalizations. The ERP results indicated that younger infants distinguished between the congruent and the incongruent faces and voices regardless of species, whereas in older infants, the sensitivity to multisensory congruency was limited to the human face and voice. Furthermore, with development, visual and frontal brain processes and their functional connectivity became more sensitive to the congruence of human faces and voices relative to monkey faces and voices. Our data show the neural correlates of perceptual narrowing in face–voice matching and support the notion that postnatal experience with species identity is associated with neural changes in multisensory processing (Lewkowicz & Ghazanfar, 2009).