Scan Patterns Predict Sentence Production in the Cross-Modal Processing of Visual Scenes
Version of Record online: 9 APR 2012
Copyright © 2012 Cognitive Science Society, Inc.
Volume 36, Issue 7, pages 1204–1223, September/October 2012
How to Cite
Coco, M. I. and Keller, F. (2012), Scan Patterns Predict Sentence Production in the Cross-Modal Processing of Visual Scenes. Cognitive Science, 36: 1204–1223. doi: 10.1111/j.1551-6709.2012.01246.x
- Issue online: 5 SEP 2012
- Version of Record online: 9 APR 2012
- Received 16 April 2011; received in revised form 6 December 2011; accepted 7 December 2011
- Scan patterns;
- Language production;
- Scene understanding;
- Cross-model processing;
- Similarity measures
Most everyday tasks involve multiple modalities, which raises the question of how the processing of these modalities is coordinated by the cognitive system. In this paper, we focus on the coordination of visual attention and linguistic processing during speaking. Previous research has shown that objects in a visual scene are fixated before they are mentioned, leading us to hypothesize that the scan pattern of a participant can be used to predict what he or she will say. We test this hypothesis using a data set of cued scene descriptions of photo-realistic scenes. We demonstrate that similar scan patterns are correlated with similar sentences, within and between visual scenes; and that this correlation holds for three phases of the language production process (target identification, sentence planning, and speaking). We also present a simple algorithm that uses scan patterns to accurately predict associated sentences by utilizing similarity-based retrieval.