Spatial and Linguistic Aspects of Visual Imagery in Sentence Comprehension


Department of Linguistics, University of Hawaii at Manoa, 569 Moore Hall, 1890 East-West Rd., Honolulu, HI, 96822. E-mail:


There is mounting evidence that language comprehension involves the activation of mental imagery of the content of utterances (Barsalou, 1999; Bergen, Chang, & Narayan, 2004; Bergen, Narayan, & Feldman, 2003; Narayan, Bergen, & Weinberg, 2004; Richardson, Spivey, McRae, & Barsalou, 2003; Stanfield & Zwaan, 2001; Zwaan, Stanfield, & Yaxley, 2002). This imagery can have motor or perceptual content. Three main questions about the process remain under-explored, however. First, are lexical associations with perception or motion sufficient to yield mental simulation, or is the integration of lexical semantics into larger structures, like sentences, necessary? Second, what linguistic elements (e.g., verbs, nouns, etc.) trigger mental simulations? Third, how detailed are the visual simulations that are performed? A series of behavioral experiments address these questions, using a visual object categorization task to investigate whether up- or down-related language selectively interferes with visual processing in the same part of the visual field (following Richardson et al., 2003). The results demonstrate that either subject nouns or main verbs can trigger visual imagery, but only when used in literal sentences about real space—metaphorical language does not yield significant effects—which implies that it is the comprehension of the sentence as a whole and not simply lexical associations that yields imagery effects. These studies also show that the evoked imagery contains detail as to the part of the visual field where the described scene would take place.

1. Introduction

“Thought is impossible without an image.” (Aristotle, On Memory and Recollection)

Until the late 1950s, mental imagery was believed to occupy a special place in human thought. Throughout most of the second half of the 20th century, however, imagery was backgrounded by approaches that favored objectivism and symbol manipulation. Over the course of the past 2 decades, imagery has once again become increasingly more interesting to cognitive scientists. A number of studies have shown that humans automatically and unconsciously engage perceptual and motor imagery when performing high-level cognitive tasks, such as recall (Nyberg et al., 2001) and categorization (Barsalou, 1999). The benefit of conscripting imagery for these tasks is clear—imagery provides a modality-specific, continuous representation well suited for comparison with perceptual input or performing inference. Three scholarly traditions have converged on the notion that language understanding critically engages the cognitive capacity to internally construct modal representations. Cognitive linguistics, for one, has long emphasized the importance of embodied representations of the world (e.g., spatial topology) in the representation of language (e.g., Lakoff, 1987; Langacker, 1987). Embodied cognitive psychology has similarly highlighted the importance of low-level perceptual and motor processes in language and other high-level phenomena (Barsalou, 1999; Glenberg & Robertson, 2000). And research on mental models in narrative comprehension has emphasized the role of detailed perceptual and motor knowledge in the construction of mental representations of scenes from verbal input (Zwaan, 1999). This convergence of views has spawned several lines of empirical and theoretical work arguing that understanding language leads to the automatic and unconscious activation of mental imagery corresponding to the content of the utterance. Such imagery, which may be motor or perceptual in nature (among others), has the potential to interfere with (Kaschak et al., 2005; Richardson et al., 2003) or facilitate (Glenberg & Kaschak, 2002; Zwaan et al., 2002) the actual performance of actions or the perception of objects, depending on the task.

This article focuses on the role of visual imagery in language understanding, and provides evidence that language processing drives location-specific perceptual images of described entities and their attributes. It advances the study of language-induced mental simulation in three ways. First, previous work on mental imagery and language understanding has not explored which linguistic elements—nouns, verbs, or others—engage imagery in the course of understanding a sentence. The work reported here demonstrates that mental imagery can be evoked by either subject nouns or main verbs in sentence stimuli. Second, the work reported here shows that linguistic elements that drive perceptual simulation only do so in an utterance in which they have a literal, spatial meaning, suggesting that it is not just lexical associations but rather the construction of a model of the whole sentence's meaning that drives simulation. And third, the experiments reported here show that spatial imagery is specific to the direction of motion—up or down—and not just the axis of motion, as previously demonstrated (Richardson et al., 2003). On the basis of these results, we argue for a view of lexical and sentential meaning in which words pair phonological form with specifications for imagery to be performed, and larger utterances compose these imagery specifications to drive a mental simulation of the content of the utterance.

Before looking in detail at the method used to address these issues in section 1.2., we provide an overview of work on mental simulation in language understanding in section 1.1.

1.1. Mental simulation in language understanding

To demonstrate the influence of language on mental imagery (we will be using “mental simulation” synonymously), it can be useful to consider the subjective experience of processing language associated with perceptual content. Answering questions like the following, for instance, may require mental imagery: What shape are a poodle's ears? What color is the cover of Cognitive Science? Which is taller: a basketball hoop or a bus? (See also examples in Kosslyn, 1980.) Critically, most people report that in answering such questions, they mentally picture or “look at” named objects; that they mentally rotate or otherwise manipulate these objects; that they are able to zoom in or out; and that they combine imagined objects in a single visual picture (Kosslyn, Ganis, & Thompson, 2001). These subjective visual experiences are triggered proximally by verbal input.

Mental imagery, then, can be defined as experience resembling perceptual or motor experience occurring in the absence of the relevant external stimuli, in the case of perceptual experience; or without actual execution of motor actions, in the case of motor imagery. Imagery has played a critical role in most theories of mind, starting at least as early as Aristotle. Modern investigations of imagery have demonstrated that it is integral to conceptual knowledge (Barsalou, Simmons, Barbey, & Wilson, 2003) and recall (Nyberg et al., 2001), can work unconsciously (Barsalou, 1999), can be used productively to form new configurations (Barsalou & Prinz, 1997), and works by activating neural structures overlapping with (or a subset of) those used for perception and action (Ehrsson, Geyer, & Naito, 2003; Kosslyn et al., 2001).

Imagery has been argued in the literature on embodied cognition and especially cognitive linguistics to be critical to language. The shared central idea is that processing language activates internal representations of previously experienced events, or schematic abstractions over these (Lakoff, 1987; Langacker, 1987; Talmy, 2000). It is thus the (re)activation of modal (e.g., perceptual or motor) content associated with particular described scenes that serves as the “engine” of meaning. This mental simulation process has been argued to be useful in the production of detailed inferences on the basis of language input (Narayanan, 1997), to prepare the understander for situated action (Bailey, 1997; Barsalou 1999; Glenberg & Kaschak, 2002), to build a situation model of a described scene (Zwaan, 1999), and to allow disambiguation (Bergen & Chang, 2005). In general, embodied approaches to language predict that understanding verbal input about events that can be perceived or performed will result in an individual's tacit and automatic mental enactment of corresponding motor or perceptual imagery.

And this is precisely what has been observed in a number of recent studies. When processing language, understanders appear to activate imagery pertaining to the direction of motion of a described object (Glenberg & Kaschak, 2002; Kaschak et al., 2005), the shape (Stanfield & Zwaan, 2001), and the orientation (Zwaan et al., 2002) of described objects; the rate and length of (fictive) motion (Matlock, 2004b); the effector used to perform an action (Bergen et al., 2004; Bergen et al., 2003); and the axis (horizontal vs. vertical) along which action takes place (Lindsay, 2003; Richardson et al., 2003).

In the remainder of this article, we concentrate on visual imagery evoked in response to natural language; in particular on the extent to which language triggers visual imagery of motion or location in the upper or lower part of the visual field. Visual imagery lends itself well to empirical study because, as will be made clear in the next section, it is relatively easy to assess. Moreover, it is well-suited to the study of how language drives imagery because language that describes upward or downward motion or location occurs pervasively within languages. Because different classes of words like nouns (1a) and verbs (1b) have spatial meanings, we can study how these different word types contribute to the construction of a mental simulation. Spatial language is also advantageous because it tends to be multifunctional—language that describes literal, physical motion like (1b) often also has figurative motion uses, where there is no literal motion of the described entity. Perhaps the most pervasive type of figurative motion is metaphorical motion (1c) in which an abstract event of some kind—in this case a change in quantity—is described with motion language. The multifunctionality of words denoting spatial motion allows us to investigate how the context of their use influences the manner in which words contribute to simulations.

  • (1) a. The ground/roof shook.

  • b. The ant climbed/dropped.

  • c. Stock prices climbed/dropped.

To develop a full account of how language drives mental imagery, we need to know what sorts of language (e.g., literal, figurative) result in what sorts of imagery, and what linguistic elements (e.g., nouns, verbs) trigger this imagery. The remainder of this section introduces the methodology used in this experiment and outlines previous work using this method.

1.2. Linguistic Perky effects

In a seminal study, Perky (1910) asked participants to imagine seeing an object (such as a banana or a leaf) while they were looking at a blank screen. At the same time, unbeknownst to them, an actual image of the same object was projected on the screen, starting below the threshold for conscious perception, but with progressively greater and greater illumination. Perky found that many participants continued to believe that they were still just imagining the stimulus and failed to recognize that there was actually a real, projected image even at levels where the projected image was perfectly perceptible to participants not simultaneously performing imagery.

Recent work on the Perky (1910) effect has shown that such interference of imagery on perception can arise not just from shared identity of a real and an imagined object, but also from shared location. Craver-Lemley and Arterberry (2001) presented participants with visual stimuli in the upper or lower half of their visual field while they were performing imagery either in the same region where the visual stimulus was or in a different region, or were performing no imagery at all. Participants were asked to say whether they saw the visual image, and were significantly less accurate at doing so when they were imagining an object (of whatever sort) in the same region than when they were performing no imagery or were performing imagery in a different part of the visual field.

A proposed explanation for these interference effects is that visual imagery makes use of the same neural resources recruited for actual vision (Kosslyn et al., 2001). In commonsense terms, if a particular part of the retinotopically arranged visual system is being used for one function (say, imagery), then it will be significantly less efficient at performing another incompatible function (say, visual perception) at the same time. Interference of visual imagery on visual processing can be naturally used to investigate whether language processing also drives imagery. Rather than asking participants to imagine visual objects, experimenters can ask participants to process language hypothesized to evoke visual imagery of a particular type—of particular objects with particular properties or of objects in particular locations. If language of this sort selectively activates visual imagery, then we should expect a Perky-type effect that results in interference of the visual properties implied by the language on processing of displayed visual images.

This is precisely the tack taken by Richardson et al. (2003). In their work, participants first heard sentences whose content had implied spatial characteristics and then very quickly thereafter performed a visual categorization task (deciding whether a presented image on the screen was a circle or a square), where the location of an object they were asked to categorize could overlap with the imagery the sentence would supposedly evoke or not. The researchers reasoned that if sentence understanding entailed visual imagery, then there should be Perky-like interference on the object categorization task—that is, people should take longer to categorize an object when it had visual properties similar to the image evoked by the sentence.

Specifically, Richardson et al. (2003) suggested that processing language about concrete or abstract motion along different axes in the visual field (vertical vs. horizontal) leads language understanders to conscript the parts of their visual system that are normally used to perceive trajectories with those same orientations. For example, a sentence like (2a) implies horizontal motion, whereas (2b) implies vertical motion. If understanders selectively perform vertical or horizontal visual imagery in processing these sentences, then when they are asked immediately after presentation of the sentence to visually perceive an object that appears in their actual visual field, they should take longer to do so when it appears on the same axis as the motion implied by the sentence. Thus, after (2a) (a horizontal-motion sentence), participants should take longer to categorize an object as a circle or a square when it appears to the right or left of the middle of the screen (on the horizontal axis) than it should take them to categorize an object when it appears above or below the middle of the screen (on the vertical axis).

  • (2) a. The miner pushes the cart. [Horizontal]

  • b. The ship sinks in the ocean. [Vertical]

An additional point of interest here concerns the nature of the sentences used. The experimenters were interested in the spatial orientation not just of concrete verbs, like push and sink, but also abstract verbs, like respect and tempt. They wanted to determine whether abstract events, like concrete events, were selectively associated with particular spatial orientations. How abstract concepts are represented and understood is a critical question for all theories of meaning and understanding, but is particularly critical to simulation-based models, which rely on perceptual and motor knowledge. There are insightful discussions of how abstract concepts can be grounded in embodied systems elsewhere (Barsalou, 1999; Barsalou & Wiemer-Hastings, 2005; Glenberg & Robertson, 2000; Lakoff, 1987), and the topic is explored in more depth in section 5.

Richardson et al. (2003) took verbs, with associated horizontality–verticality and concreteness–abstractness ratings determined through a norming study (Richardson et al., 2001), and presented them to participants in the interest of ascertaining whether they would induce Perky-like effects on the categorization of visual objects (shapes). These objects were presented on the screen in locations that overlapped with the sentences' implied orientation. After seeing a fixation cross for 1 sec, participants heard a sentence; then, after a brief pause (randomly selected for each trial from among 50, 100, 150, or 200 msec), they saw a visual object that was either a circle or a square positioned in one of the four locations on the screen (right, left, top, or bottom). Their task was to press a button indicating the identity of the object (1 button each for “circle” and “square”) as quickly as possible.

  • (3) a. The miner pushes the cart. [Concrete Horizontal]

  • b. The plane bombs the city. [Concrete Vertical]

  • c. The husband argues with the wife. [Abstract Horizontal]

  • d. The storeowner increases the price. [Abstract Vertical]

The results indicated a clear interference effect—participants took longer to categorize objects on the vertical axis after vertical sentences (as compared with horizontal sentences), and vice versa for objects on the horizontal axis. Intriguingly, post hoc tests (which Richardson et al. explicitly indicated were, strictly speaking, statistically unwarranted) showed that this interference effect was significant for abstract sentences but not for the concrete sentences (see section 6 for details).

It is important to underline at this point that the expected (and observed) effect was interference between language and visual perception using the same part of the visual field. This contrasts with other work (Glenberg & Kaschak, 2002; Zwaan et al., 2002), which has found facilitatory compatibility effects. Briefly, it appears that when the same cognitive resources are used for two tasks at the same time, as is believed to occur with the very short latency between sentence and object perception in the Richardson et al. (2003) task (50–200 msec), we observe interference. The explanation for this interference is that the same cognitive resources cannot be adequately used to perform two distinct tasks at the same time. It should be difficult then for a participant to use a particular part of their visual system to simultaneously imagine an object in a particular location in the imagined visual field and also perceive a distinct object in the same location of their real visual field if the two processes use the same parts of the visual system—the claim at the heart of the visual imagery hypothesis. By contrast, when there is enough time between the tasks for priming to take place, such as the 250 msec or more in studies like Glenberg and Kaschak (2002), Stanfield and Zwaan (2001), and Zwaan et al. (2002), facilitation is observed (Bergen, 2007; Kaschak et al., 2005).

Although the work reported by Richardson et al. (2003) provides key insights into the relationship between imagery and language, it also leaves several questions unanswered; questions that we will explore in this article. First, why would abstract sentences but not literal sentences generate the expected Perky (1910) effect? No simulation-based account of language understanding, nor any other account of language understanding that we are aware of, would predict that abstract but not literal spatial language should yield perceptual imagery.

Second, Richardson et al.'s (2003) study was not designed to tell us what linguistic elements in the sentences were yielding the observed effects. The sentences used different argument structures, including both transitive and intransitive structures, and had subjects and objects whose own vertical or horizontal associations were not controlled for.

Third, when one takes a close look at the sentences appearing in the abstract condition, their verbs fall into varied semantic classes. The abstract category includes relatively abstract verbs like hope and increase as well as relatively concrete ones like argue and give. Moreover, with few exceptions, the nouns used in the sentences are almost entirely concrete, denoting people, physical objects, and places. It may be that even abstract verbs, when combined with concrete arguments, evoke imagery of concrete situations. For instance, the abstract horizontal sentence, “The husband argues with the wife,” might well yield imagery of a scene in which the two participants in the argument are arrayed horizontally, in the way that two people normally would when arguing. As a result, the question remains open what types of “abstract” verbs, combined with what types of arguments into abstract sentences, yield spatial imagery.

Fourth and finally, Richardson et al. (2003) intentionally conflated the up and down positions and the right and left positions. For example, both sentences in the following list (4) are in the Concrete Vertical condition, despite the fact that they describe movement in opposite directions. Although it could be that the entire imagined vertical axis is used to process both of these sentences, the absence of any significant effect for concrete sentences in Richardson et al.'s (2003) study suggests that there may be something more complicated going on. It could be instead that sentences describing downwards motion, like (4a), yield spatial processing in the lower part of the imagined visual field; whereas upward sentences, like (4b), do the same in the upper part of the imagined visual field. If so, then subsets of the stimuli in each of the concrete conditions would actually have imagery and objects in different parts of the visual field.

  • (4) a. The ship sinks in the ocean.

  • b. The strongman lifts the barbell.

Thus, the current state of affairs still leaves open the three questions identified earlier. Namely, (a) what linguistic cues trigger mental simulation, (b) what sorts of language (literal, metaphorical, abstract) result in mental simulation, and (c) how detailed is the mental simulation?

2. Experiment 1: Upward and downward motion

Does language denoting literal motion in a particular direction drive visual imagery localized to the same part of the visual field? Our first experiment followed Richardson et al. (2003) but aimed to answer the outstanding questions of what linguistic elements drive simulation and how detailed it is. The design here controlled for the linguistic components of sentences and separated the vertical axis into distinct up and down regions. Based on prior work showing that the Perky (1910) effect is location specific (Craver-Lemley & Arterberry, 2001), we expected that people would take longer to identify objects in the upper or lower part of the visual field following sentences denoting scenes that canonically take place in the same locations.

To reduce the range of possible linguistic factors influencing imagery, we used bare intransitive sentences (sentences with only a subject noun phrase and a main verb). The verbs, as determined by a norming task, all denoted literal motion in a particular direction. This meant that only upward and downward motion could be used, as there are no verbs in English that denote rightward or leftward motion. All subject nouns in the critical sentences were determined through a norming study to be unassociated with upness or downness. Critical sentences thus fell into two directional conditions (up and down).

  • (5) a. The mule climbed. [Upward motion]

  • b. The chair toppled. [Downward motion]

2.1. Method

Sixty-five native speakers of English participated in exchange for course credit in an introductory linguistics class at the University of Hawaii.

Participants wore headphones and sat in front of a computer screen. They heard sentences and looked at geometric shapes that were presented in one of four locations on the screen. They were instructed to quickly press one of two buttons to identify whether the shape was a square (by pressing “x”) or a circle (by pressing “z”). Each trial began with a fixation cross that appeared in the middle of the screen for 1,000 msec. Next, a sentence was presented auditorily, followed by an inter-stimulus interval of 200 msec (during which time the screen was blank). Then a circle or a square appeared in the top, bottom, left, or right part of the screen for 200 msec. All objects appeared the same distance from the fixation cross at the center of the screen, along a central axis (e.g., objects in the upper part appeared directly over the fixation cross).

In critical trials, sentences denoted either upward motion or downward motion (5), and the object appeared in the upper or lower region. Filler trials were randomly interspersed. Some filler trials included a short yes–no comprehension question to ensure that participants attended to the meaning of the sentences. For instance, the filler sentence, “The branch split,” was followed by the question, “Did the branch break?” Filler trials included as many up- and down-related sentences as appeared in the critical trials, but all of these were followed by an object on the left or right—all of these sentences were selected from among the sentences discarded through the norming study.

The constraints imposed by this design, that only intransitive verbs denoting upward or downward motion could be used, translated into a relatively small number of candidate verbs. In English, there are only 5 to 10 verbs denoting either upward or downward motion. Because of the small number of possible verbs of each type, the entire list of sentences was presented twice to each participant—once followed by a shape in the upper region and once followed by a shape in the lower region of the screen. To ensure that there was distance between the two instantiations of each critical sentence, the experiment was broken into two halves, each of which contained all critical sentences in a random order. The order of the two halves was manipulated to create two lists. Participants were randomly assigned to one of these lists.

2.2. Norming

In constructing stimuli, we conducted a norming study to ensure that the critical sentences had several properties. For each type of sentence, we aimed to include sentences in the up condition that were no more or less meaningful than sentences in the down condition, and to have as little difference as possible in processing time between the two groups of sentences. Second, and more critically, we wanted to ensure that the sentences, which had only a subject and a verb, differed in terms of their upness or downness only because of one manipulated word. Therefore, the sentential subjects used in the critical sentences in this experiment were constrained to be equally neutral for their up–down associations (e.g., chair and donkey), whereas the verbs denoted significantly different up/down meanings (e.g., climb and descend).

A total of 57 native speakers of English from the University of Hawaii community participated in the norming study in exchange for credit in an introductory linguistics class. They performed three tasks. First, they completed a sentence reading task in which sentences were presented and participants were instructed to press a button as soon as they understood the meaning of the sentence. They were then asked to rate the meaningfulness of the sentence on a scale ranging from 1 (least meaningful) to 7 (most meaningful). Next they were given a list of words, either nouns or verbs, and were asked to rate them as to how strongly their meanings were associated with up or down—1 (the least up- or down-associated) to 7 (the most up- or down-associated). One group of participants rated only upness, the other only downness.

The critical stimuli in the upness or downness rating task included verbs that the experimenters hypothesized to denote motion events canonically moving upward or downward and nouns denoting objects canonically located above or below an observer's head, and the sentences in the reading and meaningfulness part of the norming study were constructed from these words. In addition, each group of participants saw one half of the proposed filler sentences, which were expected to be meaningful; and the other half with the verbs and participant nouns randomized across sentences, which were thus unlikely to be meaningful. Finally, each participant saw 15 sentences with transitive verbs used intransitively, which were also unlikely to be judged meaningful.

One participant was removed from the norming study analysis for having a mean reaction time (RT) more than 2 SDs greater than the grand mean. We also removed all trials with RTs less than 350 msec, as these sentences were unlikely to have been thoroughly understood.

In selecting sentences for the main experiment, we eliminated all sentences with extremely fast or slow RTs, low meaningfulness ratings, nouns with strong up or down associations, or verbs without strong up or down associations. This left five sentences in each critical condition.1 The mean upness and downness ratings for the nouns selected for the main study are shown in Table 1. The nouns in the upward motion sentences were not significantly more up-related than those in downward motion sentences: F (1, 28) = 0.55, p = .47; nor were they significantly more down-related (although the effect here approached significance), F (1, 27) = 3.56, p = .07. Turning to the verbs, it was crucial that the verbs used in two conditions differed from each other in terms of their upness and downness. Overall, verbs were classified as expected: The verbs in the two literal conditions differed significantly in their upness ratings, F (1, 28) = 117.65, p < .001; and their downness ratings, F (1, 27) = 134.54, p < .001.

Table 1. Results of norming studies in which participants rated nouns and verbs on upness and downness
ExperimentsUp AvgSDDown AvgSDUp AvgSDDown AvgSD
  1. Note. n = 28. Avg = average.

Experiment 1        
    Down (Verb)2.041.652.311.821.851.095.391.16
    Up (Verb)2.121.762.001.485.181.432.351.40
Experiment 2        
    Down (Noun)1.991.724.612.182.141.312.061.35
    Up (Noun)5.371.912.091.622.191.412.041.16
Experiment 3        
    Down (Metaphor)4.642.004.332.141.851.095.391.16
    Up (Metaphor)4.452.014.342.095.181.432.351.40
Experiment 4        
    Down (Abstract)4.352.304.052.191.630.824.401.32
    Up (Abstract)4.372.

Also of interest are the mean reading times and meaningfulness ratings, shown in Table 2. Repeated-measures analyses of variance (ANOVAs) revealed a reliable difference in reading times, F (1, 28) = 12.39, p < .01; and a marginally significant difference in meaningfulness, F (1, 28) = 4.10, p = .05. Although it is certainly not ideal to have such differences between conditions, it was a necessary artifact of the design, as very few verbs exist in English that can denote intransitive upward motion. This will be discussed in more detail in the next section.

Table 2. Results of norming studies in which participants read sentences and rated them on 7-point scale of meaningfulness
 Reaction TimeMeaningfulness
  1. Note. n = 28.

Experiment 1    
    Down (Verb)1,5156316.160.81
    Up (Verb)1,8448135.810.96
Experiment 2    
    Down (Noun)1,6918286.310.88
    Up (Noun)1,5546246.480.88
Experiment 3    
    Down (Metaphor)1,9708325.411.04
    Up (Metaphor)2,0111,0365.590.92
Experiment 4    
    Down (Abstract)1,9328756.130.80
    Up (Abstract)1,8118066.120.75

2.3. Results

Only participants who answered the sentence comprehension questions with at least 85% accuracy were included in the analysis—this eliminated 1 participant. Another participant was excluded for answering the object categorization questions with only 79% accuracy. None of the remaining participants performed at less than 90% accuracy on the critical trials. Responses that were 3 SDs above or below the mean for each participant were removed and replaced with values 3 SDs above or below the mean for that participant.2 This resulted in changes to less than 1% of the data.

The mean RTs for the literal sentences displayed in the first two data rows of Table 3 show a clear interaction effect of the predicted kind. Objects in the upper part of the visual field are categorized faster following literal down sentences than they are following literal up sentences, and the reverse is true for visual objects in the lower part of the visual field (although this latter effect does not appear to be as strong). A repeated-measures ANOVA by participants showed the predicted interference effect through a significant interaction between sentence direction (up or down) and object location (up or down), F (1, 63) = 5.03, p < .05; partial η2 = 0.07). There were no significant main effects of sentence type or object location. With only five items in each condition, it would be unrealistic to expect an ANOVA using items as a random factor to show significance. Moreover, because the set of stimuli in each condition effectively constitutes the population of relevant items, and are not random samples from that population, it would not make sense to perform such an analysis in any case. As shown in Table 4, however, all up sentences had longer RTs in the Up Object condition than in the Down Object condition (by at least 30 msec), suggesting that the interference effect holds for all the Literal Up sentences. Similarly indicative of interference, three out of five of the Literal Down sentences had longer RTs in the Down than in the Up condition. Looking at the items individually, it seems that the interference effect is stronger with Literal Up sentences, which yielded much slower response times to objects in the upper position than those in the lower position.

Table 3. Mean reaction time (RT) in milliseconds for object categorization in upper and lower quadrants of the screen
 Object in Lower QuadrantObject in Upper Quadrant
ExperimentsMean RTSDSEMean RTSDSE
Experiment 1      
    Down (Verb)5512553254224030
    Up (Verb)5262052660327034
    Difference (msec)+25  −61  
Experiment 2      
    Down (Noun)5502212850624520
    Up (Noun)5082183052624722
    Difference (msec)+42  −20  
Experiment 3      
    Down (Metaphor)5162832353222825
    Up (Metaphor)5352352453124024
    Difference (msec)−19  +1  
Experiment 4      
    Down (Abstract)5892302957522228
    Up (Abstract)5932683360031740
    Difference (msec)−4  −25  
Table 4. Mean reaction time in milliseconds for object categorization in upper and lower quadrants of the screen for Up and Down sentences in Experiment 1, by sentence
SentencesObject UpObject Down
    The Cork Rocketed.645458
    The Mule Climbed.529493
    The Patient Rose.635591
    The Lizard Ascended.644541
    The Dolphin Soared.570539
    The Glass Fell.514611
    The Chair Toppled.605625
    The Cat Descended.399578
    The Pipe Dropped.588492
    The Stone Sank.614456

To deal with the problem of a small set of potential verbs, the design of this study presented each critical sentence once with the visual stimulus in the upper region and once with the visual stimulus in the lower region. Because the repetition of stimuli runs the risk of inducing carryover effects (e.g., participants develop different strategies for responding to stimuli they have seen already), we performed a post hoc analysis to determine whether such effects accounted for the results reported here. To do this, we analyzed the data from the first half of the experiment only, which included just the first presentation of each sentence. The results, seen in Table 5, are not statistically significant, F (1, 63) < 1, as might be expected given the low number of stimuli per condition per participant (2.5). However, the trend is in same direction as the full results, suggesting that carryover effects were not responsible for the critical Perky-like interference effect we observed.

Table 5. Mean reaction time (RT) in milliseconds for object categorization in upper and lower quadrants of the screen, for the first half of Experiment 1 only
 Object in Lower QuadrantObject in Upper Quadrant
CategoryMean RTSEMean RTSE
First Half Only    
    Down (Verb)6043456128
    Up (Verb)5932962639
    Difference (RT)+11−65  

2.4. Discussion

The significant interaction effect observed here with sentences denoting upward or downward motion leads to two important conclusions. The first involves the specificity of the imagery associated with these sentences. It has previously been argued (Richardson et al., 2003) that the axis of motion of a sentence is accessed during language processing. This study provides evidence that the spatial grain of visual imagery is in fact even more detailed than this. Because sentences denoting upward and downward motion selectively interfered with categorizing objects in the same part of the visual field, we can see that motion imagery in response to these sentences is specific to the location in which the content of the utterance would take place, not just the axis.

Second, unlike the post hoc report on Richardson et al.'s (2003) results, we observed a reliable interaction with concrete sentences denoting physical motion. This finding is more squarely in line with what is predicted by theories of perceptual simulation in language understanding—that literal language about space should be processed using those neurocognitive systems responsible for perceiving the same aspects of space. As we suggested in the introduction, these results suggest that the lack of an effect for concrete sentences in Richardson et al. may have resulted from the conflation of the up and down directions into a single level. As we have seen here, sentences denoting upward motion result in interference in the upper part of the visual field. Thus, it would not be not surprising if, when upward- and downward-oriented sentences are combined in a single condition, their effects cancelled each other out.

The effect we observed here was especially strong for sentences denoting upward motion. Why might upward motion sentences show a stronger effect than downward motion sentences? One plausible explanation is that the difference results from the slightly (although not significantly) greater time it took participants to process the upward motion sentences. Perhaps they had not completed the comprehension process at the point in time when the visual object was presented—in this case, continued sentences imagery would yield a greater interference effect.

Another possible explanation points to differences in the likelihood of the two types of events described. In everyday life, we often observe objects moving downward, even when there is no force acting on them. By contrast, we more rarely observe objects moving upward, especially without force overtly exerted on them. Because upward motion events without an external agent are less common in the world than equivalent downward events, individuals might have a need for greater simulation (more time, more effort) in the case of upward motion. This would result in greater interference with visually categorizing objects in the upper part of the visual field.

Regardless of the details of this effect, the crucial manipulation that yielded it was the use of verbs that were strongly associated with upward or downward motion. From the simulation-based perspective, the effects are perfectly predictable because verbs of motion are supposed to indicate processes and relations holding of entities. What would happen, though, if nouns were manipulated while verbs were held constant? Do nouns denoting objects that are canonically associated with the upper or lower part of the visual field yield the same sort of interference? This is the topic of the next study.

3. Experiment 2: Up- or down-associated nouns

In Experiment 1, we found a significant interference effect when a motion verb in a sentence denoted movement in a particular direction and a visual object that was subsequently categorized appeared in the same part of the visual field. In this study, we investigate whether the same effect can be produced by manipulating the subject noun alone.

Recent work on visual imagery during language understanding has demonstrated that mentioned objects are represented with a good deal of visual detail. In work in a paradigm different from the current one, Stanfield and Zwaan (2001) and Zwaan et al. (2002) had participants read sentences, then name or make a judgment about an image of an object that had been mentioned in the sentence. They found that implied orientation of objects in sentences like the following (6) affected how long it took participants to perform the object judgment task. Participants took longer to respond to an image that was incompatible with the implied orientation or shape of a mentioned object. For example, reading a sentence about a nail hammered into a wall primed the horizontal nail image, as contrasted with a sentence about a nail hammered into the floor. Similar results were found for shape of objects, such as a whole egg versus a cracked egg in a pan. These results imply that shape and orientation of objects are represented in mental imagery during language understanding.

  • The man hammered the nail into the floor.

  • The man hammered the nail into the wall.

People also seem to mentally represent the locations of objects in space. Eye-tracking evidence from narrative comprehension shows that listeners looking at a blank screen tend to look at those locations in space where mentioned objects and events would appear both during comprehension (Spivey & Geng, 2001) and recall (Johansson, Holsanova, & Holmqvist, 2005). These studies, along with earlier work on mental models (e.g., Bower & Morrow, 1990), show that when objects are described as appearing in particular locations, this spatial location is represented in an analogue fashion. However, it is not yet known whether the location where an object is canonically found (e.g., above or below an observer) is automatically engaged as part of the mental simulation evoked by an utterance.

The question of whether nouns that denote objects which happen to be canonically located in up or down locations can yield perceptual interference effects is crucial to understanding what factors make an utterance likely to produce visual simulations with particular properties. If nouns themselves can trigger imagery in the upper or lower part of the visual field, then this could potentially help to explain some of the effects reported by Richardson et al. (2003).

3.1. Method

A total of 63 students from the same population described in Experiment 1 (who had not participated in Experiment 1) participated in this study. The method was globally identical to that in Experiment 1, with the exception of the critical sentences. In this experiment, participants listened to critical sentences whose subject nouns were canonically associated with upness or downness and whose verbs were vertically neutral (no upness or downness)—for example, “The cellar flooded,” and “The ceiling cracked.” The sentences were constructed from items selected from the norming study described in Experiment 1. In the norming study, the Up and Down sentences showed no significant difference in RT: F (1, 27) = 0.89, p = .35; or in meaningfulness: F (1, 27) = 2.60, p = .12 (see Table 2).

Moreover, the verbs in the two noun conditions did not differ significantly in either their upness ratings, F (1, 28) = 0.13, p = .72; or their downness ratings, F (1, 27) = 0.01, p = .93 (see Table 1). By contrast, the nouns in the up versus down sentences were highly differentiated in terms of upness: F (1, 28) = 215.16, p < .001; and down-ness: F (1, 27) = 132.31, p < .001. These norming results serve to ensure that any interference effects observed on the object categorization task would result from the differences in the up or down associations of nouns alone, not in differences between the verbs.

3.2. Results

Response times from two participants whose mean response times fell 2 SDs above the mean for all participants were removed. In addition, response times for two other participants were removed for answering the comprehension questions with less than 80% accuracy. In the remaining data set, responses more than 3 SDs from each participant's mean RT were replaced with values 3 SDs from their mean. This resulted in the modification of less than 1% of the data.

Considering only correct responses, the means were as shown in Table 3. As with the verb manipulation in Experiment 1, there was interference in the predicted direction between sentence direction and object location. Indeed, a repeated-measures ANOVA by participants showed a significant interaction between object location and sentence direction, F (1, 58) = 5.76, p < .05; partial η2 = 0.09. There were no significant main effects of object location or sentence direction. Again, there were too few items to expect an item analysis using ANOVA to yield significant results, but looking at them individually (Table 6), we see that almost all of the sentences with down-associated subject nouns yielded faster categorization when the subsequent object appeared in the upper part of the visual field. It is interesting to note that the one exceptional sentence in this group, “The submarine fired,” might be construed as encoding upward movement—that is, when submarines fire ballistic missiles rather than torpedoes, they typically fire upward. The sentences with up-related subject nouns showed the opposite tendency, as predicted. Namely, the majority yielded faster response times to the categorization task when the object appeared in the lower part of the screen.

Table 6. Mean reaction time in milliseconds for object categorization in upper and lower quadrants of the screen for Up and Down sentences in Experiment 2, by sentence
SentencesObject UpObject Down
Noun Down  
    The Cellar Flooded.478511
    The Grass Glistened.515568
    The Ground Shook.533708
    The Shoe Smelled.457484
    The Submarine Fired.547474
Noun Up  
    The Ceiling Cracked.515486
    The Rainbow Faded.592412
    The Roof Creaked.538609
    The Sky Darkened.506472
    The Tree Swayed.479561

3.3. Discussion

The striking finding from this study is that sentences with subject nouns that are canonically associated with upness or downness selectively interfere with the visual processing of objects in the same parts of the visual field. This is in line with other work on visual imagery associated with objects in sentence understanding, which shows that both the shape (Stanfield & Zwaan, 2001) and orientation (Zwaan et al., 2002) of objects are primed by sentences that imply those particular shapes or orientations for objects.

Note that unlike the sentences with verbs denoting upward or downward motion described in Experiment 1, the sentences with up- or down-associated nouns did not display an asymmetry between a strong effect in up sentences and a small effect in down sentences. This would tend to support either of the explanations given there—that this asymmetry in Experiment 1 was due to either a difference in processing times between the sentences (which was not seen in the norming data for the sentences in Experiment 2), or that it arose due to the unusualness of intransitive motion (because the sentences in Experiment 2 did not encode upward or downward motion so much as up or down location). Either of these accounts would predict the asymmetry to disappear in this second study. In agreement with this prediction, we can see that the effect is not stronger for up sentences than down ones—in fact, the tendency seems to be weakly in the opposite direction.

Further, it is worth noting that the interference effect was observed in both Experiments 1 and 2, despite substantial differences between them. Sentences in Experiment 1 (e.g., The mule climbed) denoted dynamic motion events, whereas in Experiment 2 sentences (e.g., The grass glistened) described a static object canonically found in a particular location. We might expect to find a greater interference effect for the first experiment if a sentence denoting motion was paired with motion of an incompatible object observed on the screen, and work in such a vein has shown compatibility effects of apparent motion toward or away from the participant (Zwaan, Madden, Yaxley, & Aveyard, 2004). An additional difference between the experiments involved whether the upness or downness of the sentence was carried by the noun or verb, grammatical classes that have been noted (Kersten, 1998) to be differently associated with motion. And yet, the two studies showed the same global interference effect, suggesting that it is a matter of the interpretation of the scene described by the sentences as a whole, rather than the contributions of individual words in the sentence, that drives the interference.

Despite the reliability of the interference effect shown in these first two studies, we have not conclusively shown yet that the mental imagery is driven by the processing of an entire sentence. The effects we have observed so far could instead result from some sort of strictly lexical process. Perhaps the lexical representations for words like ceiling and rise share a common feature [+UP], and it is this feature, rather than a dynamic simulation of the utterance's content, that is causing the interference effects. Granted, one might be more likely to anticipate facilitatory priming on this lexical semantic feature account, but because inhibitory lexical effects are also observed in certain cases, and to eliminate the possibility that the effect is simply lexical, a third experiment used the same set of verbs described in the first study but with subject nouns that could not literally move up or down. Finding no interaction effect here would suggest that the interference was a result of sentence interpretation and not simply lexical semantics.

4. Experiment 3: Metaphorical motion

Language about motion in a direction, or about objects located in a given location, yielded significant interference on a visual perception task in the first two studies. To investigate whether this effect was the result of lexical or sentential interpretation, we performed a third experiment testing whether sentences that included motion verbs but did not denote literal motion would also interfere with object categorization.

Verbs of motion can be used cross-linguistically to describe events that do not involve literal motion, such as fictive motion (7a and 7b; Matlock, 2004a; Talmy, 2000) and metaphorical motion (7c and 7d; Lakoff & Johnson, 1980):

  • (7) a. The drainpipe climbs up the back wall of the house.

  • b. Starting at the house, the fence drops down quickly to the ocean.

  • c. Oil prices climbed above $51 per barrel.

  • d. Mortgage rates dropped further below 6 percent this week.

The interpretation processes involved in understanding figurative language have been a matter of significant research and debate. Some work has demonstrated that language users access internal representations of space and motion when performing reasoning tasks about abstract concepts understood metaphorically in terms of these concrete notions (Boroditsky, 2000; Boroditsky & Ramscar, 2002; Gibbs, Bogdonovich, Sykes, & Barr, 1997). Moreover, there is limited evidence that processing connected discourse using metaphor proceeds most quickly when conventional metaphorical expressions are used (Langston, 2002). However, we do not yet know whether simply processing metaphorical motion language makes use of spatial representations. Critically, if the effect observed above in the first two experiments is simply lexical or if figurative language yields the same visual imagery that literal language does, then we should expect to see no difference when the same experiment described above is conducted with figurative upward or downward motion sentences rather than literal ones. However, if the effect observed in the previous experiments is due to the interpretation of the sentence—where a participant mentally simulates the described scene—and does not simply result from the lexical semantics of constituent words (and if figurative language differs in some ways from literal language interpretation), then we expect to see a significant decrease in the interference effect with metaphorical sentences. In the most convincing scenario, we would observe the significant interference effect triggered by literal sentences to disappear entirely with figurative ones.

4.1. Method

All the motion verbs used in the first study on literal sentences (section 2) can also be used to describe changes in quantity or value of entities that do not have physical height, such as oil prices or mortgage rates (7c and 7d). Thus, to create metaphorical sentences, we used subjects such as rates and prices along with the same motion verbs used in the first experiment to produce metaphorical sentences. The sentences were normed as described in section 2.2. The up and down metaphorical sentences showed no significant difference in RT, F (1, 27) = 0.07, p = .79; or in meaningfulness, F (1, 27) = 0.97, p = .33 (Table 2). The nouns in metaphorical up versus down sentences were not rated differently in upness: F (1, 28) = 1.21, p = .28; or in downness: F (1, 27) = 0.003, p = .95; whereas the verbs were, as seen in Table 1.

In all respects other than the critical stimuli, the experiment was exactly as described earlier, and was in fact run together with Experiment 2.

4.2. Results

As can be seen from Table 3, by contrast with the literal verb and noun sentences, there was no significant interaction between sentence direction and object location with the metaphorical sentences, F (1, 58) = 0.43, p = .52; partial η2 = 0.01; nor were there significant main effects of object location or sentence direction. The analysis of items (Table 7) reveals the same absence of interference: More sentences in the down condition yielded faster response times when the object was in the lower half of the visual field, and the reverse was true for metaphorical up sentences. Both of these tendencies were the reverse of the predicted direction of the Perky (1910) effect.

Table 7. Mean reaction time in milliseconds for object categorization in upper and lower quadrants of the screen for Up and Down sentences in Experiment 3, by sentence
SentencesObject UpObject Down
Metaphorical Down  
    The Market Sank.576478
    The Percentage Dropped.570518
    The Quantity Fell.491490
    The Rates Toppled.473493
    The Ratio Descended.548600
Metaphorical Up  
    The Amount Rose.494601
    The Cost Climbed.581482
    The Fees Ascended.568476
    The Numbers Rocketed.517593
    The Rating Soared.492523

4.3. Discussion

The absence of an interference effect in the metaphorical sentences confirms that the effects observed in Experiments 1 and 2 were the result of sentence interpretation and not just of the activation of lexical semantics. The verbs in Experiments 1 (literal motion sentences) and 3 (metaphorical sentences) were the same, and the subject nouns in the two sentence conditions in each experiment had identical up–down ratings. Consequently, the presence of interference effects in the literal sentences must result from understanding processes applied to the sentences as a whole.

A second notable finding here is that metaphorical sentences are not processed the same way as their literal counterparts with respect to visual imagery. This is initially surprising because many studies have shown that a literal source domain is in fact activated during the processing of metaphorical language (Boroditsky, 2000; Boroditsky & Ramscar, 2002; Gibbs et al., 1997). However, these results are not inconsistent because all that the current study indicates is that metaphorical and literal motion language differ in terms of their use of visual imagery at a particular point in time during sentence comprehension. It is possible that the sentences used would in fact trigger visual imagery, just with a different time course; or, for that matter, different intensity or variability than the literal language. One obvious avenue of research would be to apply eye-tracking techniques used for the closely related case of fictive motion (e.g., The road runs through the woods; Matlock & Richardson, 2004; Richardson & Matlock, 2007) to metaphorical language like the sentences used in this experiment. However, we must leave this question open for further investigation.

The results from the first two experiments suggest that literal sentences of different types give rise to visual imagery. Therefore, we turn to the question of abstract motion sentences. Richardson et al. (2003) reported a significant interference effect for abstract sentences but none for concrete sentences. By contrast, as we have seen, the current study (which differed in terms of the composition of the sentences and the manipulation of the spatial dimension) did yield interference with literal sentences. What is the relation between the visual imagery performed for literal and abstract motion language?

5. Experiment 4: Abstract verbs

This experiment tested whether abstract sentences produce location-specific interference on a visual categorization task. Our abstract sentences, like the metaphorical sentences in Experiment 3, denoted changes in quantity but did so using verbs that did not also have a concrete meaning denoting change in height (verbs such as increase and wane). Embodied accounts of conceptual representation and language understanding (Barsalou, 1999; Glenberg & Robertson, 2000; Lakoff, 1987) argue that all concepts, whether concrete or abstract, are ultimately grounded in terms of embodied individual human experience in the world. The grounding of concrete concepts can be straightforwardly accounted for in terms of the perceptual, motor, and perhaps even affective content of experiences an agent has when dealing with instances of them. Indeed the evidence from the first two experiments in the current work indicates that understanding language about motion in a particular direction or about an object canonically located in a particular place involves accessing the perceptual correlates of perceiving the described scene. It might similarly be argued that abstract concepts like changes in quantity or value can be grounded in terms of changes in physical location. This is precisely what is suggested by Richardson et al.'s (2003) finding that abstract sentences yield interference on object categorization.

An embodied account of abstract language might further argue that our understanding of abstract concepts like change in quantity is based on our experience with concrete, tangible domains like change in physical height, because the two are systematically correlated in experience (Grady, 1997; Lakoff & Johnson, 1980). Indeed, much of the time when we experience a change in quantity or compare or evaluate quantity of physical entities, physical height correlates with quantity. For example, when water is poured into a glass, the increase in the amount of water goes along with the increase in height of the waterline, and the same is true of masses and piles of things. Thus, our understanding of abstract notions like quantity could be inextricably linked to their perceptual or motor correlates. Perhaps, when we deal with abstract concepts like quantity, even when applied to non-physical entities, we still engage our perceptual systems in reflection of their tight coupling with abstract notions in experience. More specifically, perhaps change of quantity verbs activate visual up–down imagery in the same way literal change of height verbs do.

5.1. Method

Abstract verbs were selected from a single semantic field. All verbs expressed a change in quantity—either an increase, such as increase and double; or a decrease, such as decrease and lessen. They only encoded change in quantity (and could not independently denote change in height), using language primarily associated with quantity (i.e., non-metaphorical abstract motion). Sentences were constructed using these abstract verbs along with sentential subjects that denoted abstract quantifiable entities, drawn from the same group as those used with the metaphorical sentences in Experiment 3. This yielded sentences like those in the following:

  • (8) a. The figures doubled. [Abstract Up]

  • b. The percentage decreased. [Abstract Down]

Because the abstract verbs used here do not denote any literal upward or downward motion, it is critical to determine that they are nevertheless strongly associated with the vertical axis. In the norming study, where participants were asked to rate verbs for upness or downness, they systematically assigned verbs denoting increases, like increase and double high Up ratings and verbs denoting decreases high Down ratings. Indeed, the verbs in the two abstract conditions were significantly different from each other in upness rating, F (1, 28) = 86.49, p < .001; and downness rating, F (1, 27) = 149.78, p < .001. By contrast, the nouns in abstract up versus down sentences were not rated differently in upness: F (1, 28) = 0.03, p = .87; or in downness: F (1, 27) = 0.07, p = .79 (Table 1). Abstract sentences in the two conditions showed no significant difference in RTs: F (1, 28) = 1.54, p = .23; or in meaningfulness ratings: F (1, 28) = 0.01, p = .94.

The experiment was conducted using the same method as those described previously, and was run together with Experiment 1.

5.2. Results

By contrast with the literal up and down sentences, the means for the abstract sentences show no interference effect (Table 3). Indeed, a participant analysis of RTs following abstract sentences showed no significant interaction of sentence direction with object location, F (1, 63) = 0.13, p = .72; partial η2 = 0.002. There were no significant main effects of sentence direction or object location either. The individual items in the abstract condition (Table 8) did not display the polarization seen in the responses to individual items in the literal sentences in Experiments 1 and 2: the same number of abstract down sentences and up sentences (3 out of 5) yield longer response times whether the object is displayed in the upper or the lower part of the visual field.

Table 8. Mean reaction time in milliseconds for object categorization in upper and lower quadrants of the screen for Up and Down sentences in Experiment 4, by sentence
SentencesObject UpObject Down
Abstract Down  
    The Ratio Lessened.593507
    The Quantity Dwindled.549505
    The Indicators Weakened.647578
    The Percentage Decreased.583700
    The Value Diminished.504630
Abstract Up  
    The Fees Expanded.670592
    The Rating Improved.642595
    The Price Redoubled.637589
    The Figures Doubled.540556
    The Numbers Increased.515640

5.3. Discussion

Despite being systematically associated with upness or downness, the abstract verbs used in this experiment did not yield selective interference on the object categorization task. This provides further evidence that the outcomes of the Experiment 1 and Experiment 2 did not result simply from lexical attributes of the constituent words in the sentences—something like a [+UP] or [+DOWN] feature. The abstract up verbs were strongly up-associated, and the abstract down verbs were strongly down-associated, at least as measured by the norming data; yet these aspects of their semantics were not sufficient for them to interfere with visual object categorization. There is a straightforward explanation for the presence of an interference effect in the first two studies and its absence in the last two. Namely, the scenes described by the first two involved actual events occurring in one location or the other, whereas those described by the last two did not. It would thus seem to be the construction of a mental representation of the described scene, rather than purely lexical semantics, that drives the measured interference effect.

Given the finding in this fourth study, that abstract language about change in quantity does not trigger visual imagery as measured by interference on visual perception, we are left without an answer to the question of how abstract language is understood and, more generally, how abstract concepts are represented. Indeed, there is a great deal of variability in experimental results pertaining to the processing of abstract and metaphorical language. Although there are reliable spatial effects during abstract language processing in orientation judgment (Richardson, Spivey, & Cheung, 2001) and Perky-type tasks by axis (Richardson et al., 2003, Experiment 1), spatial effects are not observed in a Perky-type task by location (our Experiment 4) or in a picture recall task (Richardson et al., 2003, Experiment 2).

Despite this variability in experimental results, it has been widely suggested that we base abstract thought and language on concrete thought and language (Barsalou, 1999; Barsalou & Wiemer-Hastings, 2005; Lakoff, 1987). For instance, change in quantity is understood in terms of change in height. This study shows that it is not straightforwardly the case that a particular abstract domain is processed in exactly the same way as the concrete domain it is supposedly related to. Of course, this should not be particularly surprising. If individuals understanding abstract language enacted mental imagery that was not qualitatively different from imagery performed during literal language processing, this would be a confusing state of affairs for comprehenders indeed. Because we know that in understanding language, people are not prone to confusing changes in quantity of abstract numbers with change in height of physical objects, the processing of these different domains must differ in some ways.

It remains to be seen exactly what processes underlie abstract language understanding, but the absence of an interference effect observed here does not imply that the embodied account for abstract language understanding and abstract concept grounding is incorrect. There may be other factors that obscure a measurable interference effect with abstract sentences, entertained in section 6. A key finding of this experiment, however, is that where Richardson et al.'s (2003) earlier work showed that abstract sentences yield interference effects on categorizing objects in the same axis, we found no effect of abstract sentences on categorizing objects in the same location. In addition, the results of Experiment 1 showed significant effects for literal concrete sentences; but, Richardson et al.'s concrete sentences appeared not to produce significant effects, albeit in statistically unlicensed post hoc tests. In the last study, we consider possible explanations for these divergences and test the idea that the differences lie in the detail of the mental imagery driven by concrete versus abstract language.

6. Experiment 5: Abstract verbs and nouns

Although the present work and Richardson et al.'s (2003) differed along several dimensions, the most obvious one is the assignment of sentences to different conditions. The original study took upward- and downward-directed sentences as belonging to the same condition (contrasted with horizontal sentences) and categorized all responses to objects appearing either in the upper or the lower part of the screen as belonging to the same condition (contrasted with right- or left-appearing objects). In other words, the sentence and image stimuli were specific to the axis of concrete or abstract motion. By contrast, the current study pulled apart the up and down conditions in sentences and object responses. This offers a straightforward explanation for the difference in responses to literal sentences in the two experiments.

Given that we have seen in this work that literal up sentences interfere with visual processing in the upper part of the screen, and down sentences interfere with the lower part of the visual field (Experiments 1 and 2), it is not at all surprising that grouping all these responses together (as was done in Richardson et al., 2003) would eliminate any effects. After all, up sentences (possibly about one half of the sentences in the vertical condition) would result in slower responses to objects in the upper part of the screen (one half of the objects in that condition), whereas down sentences (the remaining sentences in that same condition) would interfere with the other half of the object stimuli—those in the lower position. The two effects could cancel each other out, resulting in no significant effect. By comparison, this study, which investigated not just axes but more particularly locations along those axes, did not see such effects obscured, and the results were thus clearly significant for concrete sentences.

By contrast, there are several candidate explanations for why abstract sentences showed a significant interference effect by axis in the original study (Richardson et al., 2003) but no location-specific interference in our Experiment 4. The most prominent one is based on this same structural difference between the experiments, placing up and down in different conditions or collapsing them into a single vertical axis condition. Perhaps, as Richardson et al. showed, abstract sentences do trigger mental imagery, but imagery that is not specific to particular locations so much as to axes—that is, abstract language imagery may be less spatially precise, while still retaining an imagistic component. This would explain why abstract language yields measurable interference effects when up and down are collapsed together and the entire vertical axis is treated as a condition. It would also explain why a study like Experiment 4 in which objects located in the upper and lower regions are placed in separate conditions would show no such interference because the abstract motion sentences are not incompatible with any of the presented objects, all of which appear in the vertical axis.

Some support for this account comes from evidence that axes and specific locations are represented distinctly in the human cognitive system (Logan & Sadler, 1996). Carlson-Radvansky and Jiang (1998) have shown that individual words like above may activate an entire axis, presumably as contrasted with location-specific words like up. McCloskey and Rapp (2000) have similarly shown that axis and direction can dissociate in particular neurological disorders. A participant they studied had lost the ability to ballistically reach for targets (thus, had lost location specificity) but preserved the ability to interact with the correct axis along which the object was located. Similarly, Landau and Hoffman (2005) have shown that children with Williams Syndrome have difficulties with direction but not axis of orientation. Thus, it is reasonable to conclude that object location may be represented separately from axis of orientation, and as such the two different systems might be available to be recruited separately by concrete versus abstract language processing.

We tested this explanation using the same methodology as in Experiment 4, except that the critical abstract sentences were now followed by objects appearing not only in the upper and lower parts of the screen, but also on the right and left. This required us to double the number of abstract up and down sentences using the same template as in Experiment 4. If we found an effect of axis but not quadrant—that is, if abstract sentences yielded slower response times to object categorization in the upper and lower parts of the screen than in the left and right parts—this would replicate Richardson et al.'s (2003) findings and support the hypothesis that abstract sentences are simulated with less detail than concrete ones.

6.1. Method

Although our main focus was on abstract sentence processing, we also included metaphorical and noun-based sentences as controls, along with filler items. Each participant saw each of the three types of sentences. The concrete verb-manipulated sentences were not included, as this would have led to excessive repetition of verbs in the verb-manipulated and metaphorical conditions.

The original sets of sentences used in the first four experiments included only five verbs for each condition, with each sentence that used these verbs repeated twice for each participant. In order to present targets in each of the four quadrants of the screen, we needed to increase our stimulus set. We increased the number of verbs in each condition from five to eight, selecting an additional three verbs (or nouns) from those having the highest ratings in upness or downness from the previous norming study described in Experiment 1. We then doubled the number of stimuli for each condition by using each verb twice but with a different noun for the metaphorical and abstract conditions, and each noun twice for the noun sentences, with a different verb. An example abstract sentence pair is shown in the following (9). The verb failed was rated as strongly downward associated. Unlike the previous studies, participants saw each sentence (e.g., 9a or 9b) only once.

  • (9) a. The argument failed.

  • b. The policy failed.

Unbiased nouns for the metaphorical and abstract sentences, and unbiased verbs for the noun sentences, were chosen from the norms to have low ratings for up or downness. We also included a few words that were not in the original norms, in order to construct new intransitive sentences that made sense. When this was done, care was taken not to include words that had an intuitively obvious association with the vertical or horizontal axes. The list of the abstract sentences used in this experiment is included in the appendix.

The presentation of stimuli was globally the same as in Experiments 1 through 4. However, in those experiments, only filler sentences preceded visual targets appearing in the left or right regions of the screen, whereas in this experiment horizontal object presentation followed critical experimental sentences. This experiment used one list, with the pairing of sentence type to item target randomly assigned for each participant, but with each of the four possible target locations (up, down, left, or right) appearing with equal frequency for each sentence type within participants.

Responses were collected using an E-Prime button box instead of the keyboard used in Experiments 1 through 4. Sentences were recorded by a native speaker of British English.

6.2. Results

Fifty native speakers of English from the University of Sussex community took part, in exchange for course credit in a research methods class. All participants had above 85% accuracy in target discrimination and 88% accuracy in the questions testing comprehension. Outlier removal was the same as in Experiment 1.

The RTs for the left and right target locations were collapsed together in the analysis as the horizontal axis, and the up and down targets formed the vertical axis. If abstract sentences yield mental imagery along the entire vertical axis, we should see longer RTs to categorize objects when they appear after such sentences in the vertical axis than the horizontal axis. However, analysis of just the abstract sentences with a repeated-measures ANOVA by participants showed no significant difference in responses to the horizontal and vertical targets, F (1, 48) = 0.61, p = .44; partial η2 = 0.013. There was also no effect of target object location when the metaphorical and noun-manipulated sentences were included; a 2 (horizontal or vertical dimension) × 3 (abstract, metaphorical, or noun sentences) repeated-measures ANOVA showed no main effect for horizontal or vertical object locations, F (1, 48) = 1.11, p = .30; partial η2 = 0.023; and no significant interaction between sentence type and object axis, F (2, 48) = 0.05, p = .94; partial η2 = 0.001. As a confirmation of the results of Experiment 4, there was no significant interaction between sentence direction and up or down object location for the new set of abstract sentences: F (1, 48) = 0.23, p = .88; partial η2 = 0.0.

One discrepancy between the previous set of studies is that the RTs were globally quicker than Experiments 1 through 4, with a mean response of 289 msec in Experiment 5, compared with 546 msec in Experiments 1 through 4. The reasons for this difference remain unclear to us. The experiment was run on a different computer to the other studies, using a button box instead of a keyboard, and with a different population (British vs. Hawaiian university students). It is assumed that a combination of factors led to the shorter RTs, as the only main difference in design between the studies was the inclusion of more sentence types. Although no significant effect of axis was found, it is noted that for all three types of sentences the RTs were slower for the vertical targets than the horizontal targets (see Table 9), although this difference was very small—between 3 to 5 msec, and the level of unsystematic variability meant that differences of this size were not enough to be statistically significant.

Table 9. Mean RT in milliseconds for object categorization in upper and lower quadrants of the screen for noun, metaphorical, and abstract sentences in Experiment 5
 Object in Vertical AxisObject in Horizontal Axis 
SentenceMean RTSDSEMean RTSDSEDifference (msec)
  1. Note. N = 50. RT = reaction time.


6.3. Discussion

The results of Experiment 5 showed there was no interference effect for abstract sentences by axis. They also replicated the finding of Experiment 4, showing that abstract sentences yield no interference effect by up versus down location. The hypothesis that the differences between the results of Experiment 4 in this work and Richardson et al. (2003) were due to differences in the detail of the imagery prompted by concrete and abstract language is not supported. Thus, it remains to be determined what caused the discrepancy between Richardson et al.'s work and our Experiments 4 and 5.

One possible explanation for the absence of an effect with abstract sentences in our Experiments 4 and 5, but the presence of such an effect in Richardson et al.'s (2003) work, relies on differences in the abstractness of the stimuli in the two studies. In Richardson et al.'s work, abstract sentences included verbs rated as abstract in the MRC Psycholinguistic database. This selection method may have inadvertently resulted in a small number of relatively concrete verbs; perusing the verbs in their study yields several candidates like argue, rush, give, and rest. These verbs were combined with arguments that were very concrete—sentential subjects always denoted people like the storeowner, the husband, or the jogger. The combination of even relatively abstract verbs—like want—with concrete arguments—like the child and the cake—results in sentences that could easily yield mental imagery of concrete situations. In this example, an imagined scenario in which a child wants cake might involve a child looking covetously at some cake in a spatial arrangement that is probably horizontal. Because abstract sentences in the original study contained linguistic elements that might have made the scenes they described concretely imageable, those images might have been responsible for the interference effect observed with these abstract sentences.

By contrast, abstract sentences in the current study (Experiments 4 and 5) were more abstract. All verbs (Table 8 and the Appendix) denoted change in quantity (some, such as expand, are inevitably somewhat concrete as in Richardson et al.'s, 2003, study). However, the nouns in the sentences are all abstract and describe quantitative measures like quantity, ratio, and measures. As a result, it is subjectively more difficult to imagine a concrete scene in which the scenes these sentences describe would be grounded than it is for the abstract sentences in the original study. This could be responsible for the difference in findings in the two studies—perhaps abstract language only yields measurable imagery effects when it is straightforwardly interpreted as referring to spatially concrete scenes. We leave this possibility open for investigation in future work.

7. General discussion

Processing sentences denoting events that would tend to take place in a particular part of a perceiver's visual field yields interference on actually using the same part of the real visual field, as measured by decreased performance in an object categorization task. This is true whether the location of the event is denoted by a verb of motion (Experiment 1) or supplied by connotational semantics of a sentential subject (Experiment 2). However, having an up- or down-associated lexical item in a sentence does not suffice to produce interference. The sentence must encode a scene literally involving the relevant location in the visual field, as metaphorical uses of motion verbs (Experiment 3) and abstract verbs that are nonetheless associated with upness or downness (Experiments 4 and 5) yield no significant interference effect, either at a specific level of detail (up or down; Experiment 4) or at a more general level of detail (vertical or horizontal axis; Experiment 5). We can conclude from this that it is not lexical priming that yields the interference but rather the performance of mental imagery corresponding to the meaning of an utterance.

One specific point about these experiments and the comparisons with previous work is worth taking up before we move on to a more general discussion of the place of imagery in language use. This is the question of why sentences in the first experiment, which denoted motion in a direction, interfered with static images of objects in particular locations. We used static visual stimuli for two reasons. The first was to enable comparisons with the work by Richardson et al. (2003), more of which follows below. The second was that we were concerned that moving objects would make it easier for participants to discern the relationship between the sentences and the visual perception task. The fact that we found significant effects despite this difference between the motion described by the sentences and the lack of motion in the visual stimuli suggests that the mere use of a particular location in the visual field can produce interference.

The findings reported in the foregoing studies provide new evidence suggesting that understanding spatial language leads individuals to activate internal simulations of the described scenes. Although the selective interference of language processing on visual perception does not imply that such mental simulation is required for language understanding, it does imply that it is unconscious and automatic. Various authors have suggested different roles for the construction of a mental simulation on the basis of language, using detailed modal knowledge. One critical role of imagery is to produce detailed inferences (Narayanan, 1997), which can both allow an individual to gain a rich notion of the utterance's content, such as a situation model of the described scene (Zwaan, 1999), as well as to prepare the individual to understand future utterances or to respond relevantly. The construction of a mental simulation might also prepare the individual for situated action (Bailey, 1997; Barsalou, 1999; Glenberg & Kaschak, 2002). Finally, some language may be disambiguated only through the performance of imagery (Bergen & Chang, 2005).

Various theories of language rely heavily on perceptually and motorically grounded representations as the backbone for the language understanding process. Of particular note, Kaschak and Glenberg (2000) argued that language understanding proceeds through the meshing of simulation constraints from language, and the subsequent mental simulation of afforded actions, to prepare for situated responses. Zwaan (1999, 2004) argued similarly that language comprehension proceeds through the construction of modal mental models, and Barsalou (1999) suggested that language hooks into simulators—systematic patterns of reactivation of representations of perceptual and motor experiences. What all these approaches share is a recognition of the importance of mental simulation in the process of language understanding. However, none of them are actual theories of how the individual linguistic items that make up an utterance directly produce a mental simulation, especially given the complexities of linguistic structure, although Glenberg and Kaschak made some progress with regard to how grammatical constructions contribute to mental simulation.

Up to the present, one of the main gaps in theories of language understanding based on mental simulation is explaining the precise ways in which language triggers simulation and what aspects of simulation it triggers. Glenberg and Kaschak (2002 for example, view the construction of an embodied simulation as arising from the meshing of simulation constraints imposed by pieces of language, but very little is known about how exactly this might take place or what aspects of simulation can be triggered by what sorts of language. Cognitive linguists have documented a broad range of possible functions of grammatical and lexical items. For example, it appears that various sorts of language, from modal verbs like make and let to prepositions like despite and from, are intuitively associated with simple notions of the application or non-application of force (Talmy, 2000). A function of various grammatical structures, like subjects and topic markers, appears to be to raise certain elements to prominence as the foreground by contrast with others that remain in the background (Lakoff, 1987; Langacker, 1987; Talmy, 2000). Although cognitive linguistic work is based largely on introspection and text analysis, it provides many useful insights into language use and representation and serves as an extremely rich source for empirically testable potential functions of linguistic items.

Work like the experiments described here can begin to tell us a little bit more about exactly how language drives simulation. One thread of work attempting to wed the observation that simulation is a central element in language understanding with the details of how specific linguistic elements drive simulation, as inspired by the work in cognitive linguistics described above, is “embodied construction grammar” (Bergen & Chang, 2005; Bergen et al., 2004; Feldman, 2006). The basic idea of embodied construction grammar, a computational model of language understanding, is that linguistic elements (from lexical items to grammatical markers to phrasal patterns) are pairings of some linguistic form with specifications for mental simulations to be performed when they are used. In the simplest cases, words that denote actions or perceivable entities drive the simulation to enact imagery of those actions or entities. Similarly, grammatical constructions place constraints on the simulation—indicating what type of event should be simulated, from what perspective, or with what in the foreground. As in Glenberg & Kaschak's (2002) model, the simulation constraints of the various linguistic constraints must be meshed or bound together to produce a coherent simulation for an utterance. We anticipate that future work will further elucidate the contributions that individual words, as well as grammatical structures, make to the construction of mental imagery during language understanding.

Visual interference effects produced by linguistic input are reliable and replicable in a number of methodological permutations. These findings as a whole provide evidence that perceptual systems—in particular the visual system—are unconsciously and automatically engaged in the process of natural language understanding. Given that spatial imagery is automatically engaged during language use, it seems that a complete account of how words and utterances are understood requires knowing how they drive imagery. The same may hold of grammatical markers and sentence patterns (Bergen & Chang, 2005; Glenberg & Kaschak, 2002). More broadly, the observation of language driving imagery suggests yet another way that embodied human experience shapes language processing. Our similar bodies and experiences yield shared imagery, a common currency that facilitates effective communication


  • 1

    The relatively small number of sentences of each type could, in principle, be remedied by using the words up and down in sentences. We chose to avoid these words for several reasons. First was the possibility that participants would recognize these recurring words in the experiment and guess its purpose. We were also concerned with potential direct effects of the words up and down on participants' responses. For example, seeing those words might result in participants orienting overt attention to that part of the visual field, which would counteract the expected effect. Moreover, if included, up or down could themselves be argued to be responsible for any observed effects rather than the interpretation of the sentence as a whole (which we tested by contrasting Experiments 1 and 3).

  • 2

    Replacing outliers with values at a set distance from the subject's mean is also known as “windsorizing” (Barnett & Lewis, 1978) and is commonly used in sentence processing research. Although it may increase power in a small set of restricted cases, it globally does not affect results of statistical analyses (Ratcliff, 1993). We chose to windsorize, rather than eliminate outliers, due to the small number of items in each condition.


Our thanks to Larry Barsalou, Susanne Gahl, Art Glenberg, Terry Regier, Daniel Richardson, Amy Schafer, Nathaniel Smith, and Michael Spivey for useful discussion and comments, as well as to Steven Sloman, Arthur Markman, and three anonymous reviewers for careful and useful commentary. Any errors or omissions are our own.


Table of abstract sentences used in Experiment 5

Abstract Down Sentences
    The Indicators Weakened.
    The Prospects Weakened.
    The Value Diminished.
    The Faith Diminished.
    The Quantity Dwindled.
    The Interest Dwindled.
    The Ratio Lessened.
    The Indicators Lessened.
    The Enthusiasm Decreased.
    The Demand Decreased.
    The Argument Failed.
    The Policy Failed.
    The Crowd Saddened.
    The Nation Saddened.
    The Agreement Broke.
    The Pact Broke.
Abstract Up Sentences
    The Ratings Improved.
    The Market Improved.
    The Fees Doubled.
    The Inflation Doubled.
    The Price Redoubled.
    The Payments Redoubled.
    The Amount Multiplied.
    The Price Multiplied.
    The Figures Expanded.
    The Program Expanded.
    The Numbers Increased.
    The Ranking Increased.
    The Coalition Conquered.
    The Army Conquered.
    The Prosecution Won.
    The Law Won.