Audiovisual speech has a stereotypical rhythm that is between 2 and 7 Hz, and deviations from this frequency range in either modality reduce intelligibility. Understanding how audiovisual speech evolved requires investigating the origins of this rhythmic structure. One hypothesis is that the rhythm of speech evolved through the modification of some pre-existing cyclical jaw movements in a primate ancestor. We tested this hypothesis by investigating the temporal structure of lipsmacks and teeth-grinds of macaque monkeys and the neural responses to these facial gestures in the superior temporal sulcus (STS), a region implicated in the processing of audiovisual communication signals in both humans and monkeys. We found that both lipsmacks and teeth-grinds have consistent but distinct peak frequencies and that both fall well within the 2–7 Hz range of mouth movements associated with audiovisual speech. Single neurons and local field potentials of the STS of monkeys readily responded to such facial rhythms, but also responded just as robustly to yawns, a nonrhythmic but dynamic facial expression. All expressions elicited enhanced power in the delta (0–3Hz), theta (3–8Hz), alpha (8–14Hz) and gamma (> 60 Hz) frequency ranges, and suppressed power in the beta (20–40Hz) range. Thus, STS is sensitive to, but not selective for, rhythmic facial gestures. Taken together, these data provide support for the idea that that audiovisual speech evolved (at least in part) from the rhythmic facial gestures of an ancestral primate and that the STS was sensitive to and thus ‘prepared’ for the advent of rhythmic audiovisual communication.