Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved de novo in humans. An alternative account – the one we explored here – is that the rhythm of speech evolved through the modification of rhythmic facial expressions. We tested this idea by investigating the structure and development of macaque monkey lipsmacks and found that their developmental trajectory is strikingly similar to the one that leads from human infant babbling to adult speech. Specifically, we show that: (1) younger monkeys produce slower, more variable mouth movements and as they get older, these movements become faster and less variable; and (2) this developmental pattern does not occur for another cyclical mouth movement – chewing. These patterns parallel human developmental patterns for speech and chewing. They suggest that, in both species, the two types of rhythmic mouth movements use different underlying neural circuits that develop in different ways. Ultimately, both lipsmacking and speech converge on a ∼5 Hz rhythm that represents the frequency that characterizes the speech rhythm of human adults. We conclude that monkey lipsmacking and human speech share a homologous developmental mechanism, lending strong empirical support to the idea that the human speech rhythm evolved from the rhythmic facial expressions of our primate ancestors.