In previous analyses of the influence of language on cognition, speech has been the main channel examined. In studies conducted among Yucatec Mayas, efforts to determine the preferred frame of reference in use in this community have failed to reach an agreement (Bohnemeyer & Stolz, 2006; Levinson, 2003 vs. Le Guen, 2006, 2009). This paper argues for a multimodal analysis of language that encompasses gesture as well as speech, and shows that the preferred frame of reference in Yucatec Maya is only detectable through the analysis of co-speech gesture and not through speech alone. A series of experiments compares knowledge of the semantics of spatial terms, performance on nonlinguistic tasks and gestures produced by men and women. The results show a striking gender difference in the knowledge of the semantics of spatial terms, but an equal preference for a geocentric frame of reference in nonverbal tasks. In a localization task, participants used a variety of strategies in their speech, but they all exhibited a systematic preference for a geocentric frame of reference in their gestures.