No evidence had been provided so far of newborns’ capacity to give a matching response to 2D stimuli. We report evidence from 18 newborns who were presented with three types of stimuli on a 2D screen. The stimuli were video-recorded displays of tongue protrusion shown by: (a) a human face, (b) a human tongue from a disembodied mouth, and (c) an artificial tongue from a robotic mouth. Compared to a baseline condition, neonates increased significantly their tongue protrusion when seeing disembodied human and artificial tongue movements, but not when seeing a 2D full-face protruding tongue. This result was interpreted as revealing the exploration of top-heavy patterns of the 2D face that distracted infants’ attention from the tongue. Results also showed progressively more accurate matching (full tongue protrusion) throughout repeated exposure to each kind of stimulus. Such findings are not in line with the predictions of the innate releasing mechanism (IRM) model or of the oral exploration hypothesis. They support the active intermodal mapping (AIM) hypothesis that emphasizes not only the importance of repeated experience, as would the associative sequence learning (ASL) hypothesis, but also predicts a differential learning and progressive correction of the response adapted to each stimulus.