An Eye-Tracking Study of Exploitations of Spatial Constraints in Diagrammatic Reasoning


Correspondence should be sent to Atsushi Shimojima, Faculty of Culture and Information Science, Doshisha University, 1-3 Tatara-Miyakodani, Kyotanabe, Kyoto, Japan. E-mail:


Semantic studies on diagrammatic notations (Barwise & Etchemendy, 1990; Shimojima, 1995; Stenning & Lemon, 2001) have revealed that the “non-deductive,” “emergent,” or “perceptual” effects of diagrams (Chandrasekaran, Kurup, Banerjee, Josephson, & Winkler, 2004; Kulpa, 2003; Larkin & Simon, 1987; Lindsay, 1988) are all rooted in the exploitation of spatial constraints on graphical structures. Thus, theoretically, this process is a key factor in inference with diagrams, explaining the frequently observed reduction of inferential load. The purpose of this study was to examine the empirical basis for this theoretical suggestion, focusing on the reality of the constraint-exploitation strategy in actual practices of diagrammatic reasoning. Eye movements were recorded while participants used simple position diagrams to solve three- or four-term transitive inference problems. Our experiments revealed that the participants could exploit spatial constraints on graphical structures even when (a) they were not in the position of actually manipulating diagrams, (b) the semantic rule for the provided diagrams did not match their preferences, and (c) the constraint-exploitation strategy invited a partly adverse effect. These findings indicate that the hypothesized process is in fact robust, with the potential to broadly account for the inferential advantage of diagrams.

1. Introduction

Many, perhaps all, systems of diagrams have the function of letting users exploit spatial constraints on graphical structures and thus lightening their load of inferences. Consider a system of simple position diagrams, where letter symbols are arranged vertically to express a certain transitive relation that holds between the symbolized objects. Fig. 1A has a sample diagram of this system, which expresses that object A is lighter than object B. Now, modify this diagram to express another piece of information, that object C is lighter than object A. We obtain the new position diagram in Fig. 1B.

Figure 1.

(A) Position diagram expressing that A is lighter than B. (B) Result of adding information that C is lighter than A.

This diagram results from expressing two pieces of information in the current system of position diagrams. Yet it expresses a third piece of information, namely, that C is lighter than B. Furthermore, given the transitivity of the relation lighter, this additional piece of information is a logical consequence of the original two pieces. Thus, just by expressing the two premises in this system of position diagrams, the user obtains a diagram that expresses their particular logical consequence automatically. As aptly put by Barwise and Etchemendy (1990), the user “never need infer” this consequence from the premises but “can simply read [it] off from the diagram as needed.”

Note that this “automaticity” of expression is largely due to a spatial constraint on the arrangement of letter symbols in position diagrams: If letter symbol x is placed above another letter symbol y, which is placed above still another symbol z, then x is necessarily above z. The system of position diagrams is designed to exploit this spatial constraint for “automatically” expressing certain logical consequences. The user can rely on this function of the system and significantly reduce his or her inferential task, replacing it with a reading-off task. The user could even be unaware of the existence of such a spatial constraint involved in individual cases. Essential to this process is that the constraint holds on external space as an objective fact, ready to be exploited when the relevant diagram is drawn.

This characteristic inferential process using diagrams has been noted by many researchers, conceptualized variously as “perceptual inference” (Larkin & Simon, 1987), “non-deductive representation” (Lindsay, 1988), “inference by recognition” or “inspection” (Novak, 1988; Olivier, 2001), and the “emergent” effect (Chandrasekaran, Kurup, Banerjee, Josephson, & Winkler, 2004; Kulpa, 2003). Semantic studies of diagrams (Barwise & Etchemendy, 1990; Shimojima, 1995; Stenning & Lemon, 2001) have clarified that the inferential process is a form of exploiting spatial constraints on graphical structures in relevant diagrams.

Characterized this way, the advantage can easily be identified in a variety of diagrammatic representation systems. For example, expressing the information that all As are Bs in a Venn diagram (Fig. 2A) and adding the information that no Bs are Cs (Fig. 2B) results in the expression of the information that no Cs are As due to constraints on the shading of subregions. Expressing the information that AB in an Euler diagram (Fig. 2C) and then expressing the information that CB = ∅ (Fig. 2D) results in the expression of the information that CA = ∅ due to constraints on the inclusion–exclusion relation between regions. Geographical maps have the same function and to a much larger extent. Adding an icon of a house to a particular position in a map results in the expression of various new pieces of information concerning the spatial relations of the house to many other objects already mapped. This is due to the spatial constraints governing map symbols, which are isomorphic to spatial constraints governing mapped objects.

Figure 2.

(A) Venn diagram expressing that all As are Bs. (B) Result of adding information that no Bs are Cs. (C) Euler diagram expressing that AB. (D) Result of adding information that CB = ∅.

1.1. Main question

Thus, from the viewpoint of semantic theory, the exploitation of spatial constraints on graphical structures seems to be a good explanation for why diagrams are so useful for inference in certain contexts. From an empirical point of view, the question remains whether this account applies to real inferential practices with diagrams, and if so, how generally it does.

In this study, we were particularly interested in inferences with diagrams that took place when the user was just looking at them. To illustrate the problem, consider the following situation. You are given the position diagram in Fig. 1A, which expresses the first premise that A is lighter than B. Then, the second premise that C is lighter than A is given, but you are not in the position of changing the first diagram to add this premise to it. You are then asked, “Is C lighter than B?”

How would you answer this question? Because you cannot manipulate the diagram in the first place, you might entirely ignore it and just think about the given premises solely in your head. Otherwise, you might use it as the mere record of the first premise, working as a memory aid but having no role in the inferential process beyond that. Or you might directly “draw” on the diagram to express the second premise, where the “drawing” is not physical but sharing an inferential function with a physical drawing. In this case, it would amount to placing a non-physical symbol [C] in the uppermost part of Fig. 1A and reading off the logical consequence of the two premises that gets expressed due to a spatial constraint on the symbol arrangement in the diagram.

In this article, experiments that tested this third possibility are reported on. Testing this possibility in a focused and systematic manner is important for at least two reasons. First, as already suggested, it amounts to testing the general applicability of one promising explanation of the inferential advantage of diagrams. If a non-physical drawing in the above sense is possible and can actually be implemented in practice, then it would mean that exploitation of spatial constraints on diagrams can occur even when people are not in the position to actually manipulate the diagrams. It would then be natural to assume that they occur in the case where it is more effortless, that is, where people can actually manipulate diagrams. This way, we could obtain a stronger result supporting the constraint-based account.

Second, as we will see later, various studies on diagrammatic reasoning have presupposed or suggested the existence of a non-physical drawing sharing an inferential function with a physical drawing. Concepts such as “envisaging” (Johnson-Laird, 2006; Sloman, 1971), “perceptual operation” (Larkin & Simon, 1987), “visualization process” (Narayanan, Suwa, & Motoda, 1995), and “spatial transformation” (Trafton, Trickett, & Mintz, 2005; Trickett & Trafton, 2007) all seem to point to, or include, such non-physical operations on the external diagrams. Although some empirical evidence has been obtained for its existence (Bauer & Johnson-Laird, 1993; Kozhevnikov, Motes, & Hegarty, 2007; Shimojima & Fukaya, 2003; Trafton & Trickett, 2001; Trafton, Marshall, Mintz, & Trickett, 2002; Yoon & Narayanan, 2004), the phenomenon has never undergone sufficiently focused and systematic testing.

The experiments reported here were aimed at providing such a test by formulating an explicit experimental hypothesis with the help of recent research on deictic indexing. In Experiment 1, we systematically varied the semantic rules of diagrams used as stimuli, detected the participants' preferences for semantic rules, and evaluated the experimental results in relation to the participants' semantic preferences thus detected. Experiment 2 tested the existence and persistence of the hypothesized process in yet another way by setting up a condition that would incur difficulty in drawing on the given diagram. We analyzed the response-latency data to examine whether the participants still attempted a non-physical drawing under this adverse condition.

1.2. Theory of deictic indexing

A non-physical “drawing” may sound too mysterious or vague to be subjected to an experimental test. However, given the recent research on the function of deictic indexes in visual scene perception (Ballard, Hayhoe, Pook, & Rao, 1997; Pylyshyn, 1989; Pylyshyn, 2003; Ullman, 1984), it is not so far-fetched to hypothesize that people use such operations to take advantage of spatial constraints on external diagrams.

Since visual processing shifts across the visual scene from one location to another, it is desirable to keep at least a partial track of the locations already processed. According to Ullman (1984), such a capability is necessary even to perform the simple task of visual counting, where records must be kept of which objects have already been counted and which have not. Ullman characterized this capability as “marking” and listed it as one of the basic operations necessary to analyze the bottom-up representation obtained from our visual environment.

According to Pylyshyn (1989), the spatial relation among objects (such as collinearity and insideness) cannot be evaluated without our having direct and simultaneous access to multiple objects in the visual scene. For this and other theoretical reasons, Pylyshyn (1989, 2003) extended Ullman's idea of “marking" into the theory of a “visual index,” according to which we can keep track of up to five objects simultaneously and can do so purely on the basis of their historical continuity without relying on any distinguishing visual or spatial features. Pylyshyn also emphasized that to fulfill its theoretical purpose, the indexing operation should be conceptualized as extending to an object in the external world rather than stopping at an object in our internal representation such as Ullman's bottom-up representation (e.g., Pylyshyn, 2003, pp. 207–208). Pylyshyn and his colleagues conducted a series of experiments supporting this capability, where people succeeded in tracking a set of moving objects discriminately against other moving objects, although they were not visually distinguishable (e.g., Pylyshyn & Storm, 1988).

The capacity of direct and simultaneous access to multiple objects implies that focal attention can easily revisit any object in the indexed pool without explicit searches based on their visual or spatial properties (Pylyshyn, 2003). Thus, especially when indexed objects are stable in their location and visual properties, deictic indexes can help off-load our internal memory since they let us identify an object's visual properties through frequent returns to it, without having to retrieve these properties from our internal memory.1 Ballard, Hayhoe, Pook, and Rao (1997) observed interesting patterns of eye movements in a computer-based block-building task, where people apparently chose to revisit an indexed object to identify its color and shape incrementally rather than remembering its color and shape once and for all from a single visit.

Once indexed, an object can be subjected to detailed processing on the perceptual and cognitive level, and its index serves as the kernel with which the results of such processing are associated (Pylyshyn, 2003; Ullman, 1984). Thus, deictic indexes serve as “object files” in the sense of Kahneman, Treisman, and Gibbs (1992), where various properties of a historically continuous object are stored and updated as the object continues to be processed. Information associated with an index not only includes visual information obtained from perceptual processing but also so-called semantic information summarized or otherwise interpreted as a result of higher cognitive processing.

To summarize, deictic indexes let us (a) keep track of multiple objects or locations in the visual scene, (b) return focal attention to any indexed object or location without fresh searches, and (c) associate perceptual and semantic information with individual objects or locations for subsequent integration of information. Spivey, Richardson, and Fitneva (2004) give a survey of possible roles of deictic indexes in cognition and communication, including but not limited to these core functions.

1.3. General hypothesis

From these studies on deictic indexes, we propose the following general hypothesis. Whenever one wants to add a piece of information to a diagram, one could actually place an object with an appropriate visual property in an appropriate location of the diagram, or instead, he or she could place a deictic index at the appropriate location, tagging it with the appropriate visual property stored in internal memory.

In the first case, one could observe what spatial relation or pattern is produced as a result of the new object being added and use this observation to draw a conclusion of the inference. This is the usual process of constraint exploitation based on an actual drawing. In the second case, one could also observe the resulting spatial relation or pattern, but this time, the one that was produced by adding the new index. For example, the newly indexed location may have a certain spatial relation (say, to_the_left) with another object in the diagram, and it may produce a particular shape (say, a triangular region) in combination with multiple objects in the diagram. This observation could be used to draw a conclusion for the inferential task, and when this happens, it is a case of exploiting a spatial constraint on the graphical structure of the diagram. The only difference from the usual process is that the observed spatial relation or pattern is partly formed by an indexed location, whose visual feature is stored not in situ but in the internal memory. We will refer to this process as inference by hypothetical drawing to distinguish it from the usual inferential process based on actual drawing.

To be more specific, we assumed the following set of base operations involving deictic indexes.

  • place-object: A procedure for introducing a new object by placing it at a new location that satisfies the prescribed relation with objects already present and in focal attention. In our model, this amounts to the following computational procedure. Given a spatial RELATION and a set of INDICES, assign an INDEX to the new LOCATION that satisfies the RELATION together with the INDICES, tag the INDEX with a PROPERTY, add the INDEX to the index pool, and return the INDEX.
  • identify-object: A procedure for searching and identifying an object by its label and other properties within the objects, which are already present and attended to. Given a PROPERTY, locate an index tagged with the PROPERTY in the index pool and return the INDEX.
  • check-relation: A procedure for testing if a spatial relation holds between objects that are in focal attention. Given a spatial RELATION and a set of INDICES, see if the RELATION holds for the INDICES and return TRUE/FALSE.

The stipulation of these specific procedures provides us with a solid basis on which to conduct experimental investigations and try out different hypotheses on the phenomena of a hypothetical drawing. In this model, a hypothetical object is characterized as a deictic index assigned to an empty location in the external environment and tagged with a visual property stored in the internal memory. We rely on this characterization throughout this article, using the expression “hypothetical object” in this technical sense.

This key idea of projecting mental contents to external space through deictic indexing operations was first developed by Pylyshyn (Pylyshyn, 2007, sections 5.4–5.6; see also Pylyshyn, 2003, sections 7.2–7.3). In Pylyshyn's theory, however, the assignment of an index is primarily a stimulus-driven process, directed to an object defined by some visual stimuli. In contrast, our hypothesis involves (a) a top-down, voluntary assignment of an index to an empty location. It also assumes (b) assignment of indexes to empty locations with no apparent visual stimuli. In these respects, our hypothesis involves non-trivial extensions to Pylyshyn's theory, but both extensions already have a substantial empirical basis. Annan and Pylyshyn (2002) conducted an experiment indicating that indexes could be assigned voluntarily to objects even when the relevant objects had no particular visual features causing indexes to be automatically assigned. Annan and Pylyshyn's data indicate such top-down assignment of indexes must be done serially, requiring focal attention directed to each object before the assignment can take place. The place-object operation in our hypothesis is also a serial process requiring focal attention and is consistent with the findings by Annan and Pylyshyn.

As for the possibility of indexing to empty locations, Spivey and Geng (2001) reported on the eye-movement pattern on a large blank screen that spatially corresponded to the content of the story that people were listening to. Spivey, Richardson, and Fitneva (2004) interpreted this to indicate that people left deictic indexes in specific locations on the blank screen to give spatial coordinates to the mental content constructed from listening to the story. Spivey and Geng (2001), Richardson and Spivey (2000), and Hoover and Richardson (2008) also reported on experimental results suggesting deictic indexing to blank locations, although in these cases, indexes were initially assigned to visible objects. The eye-tracking data indicated that people returned focal attention to those locations even after the objects had disappeared. These data are interesting in their own right, but for the present purpose, they are significant in suggesting that indexes can remain in blank locations for possible subsequent reference. Pylyshyn (2003, 2007) also explored possible explanations of the indexing to apparently empty locations, including the existence of subtle texture elements that are handled in early visual processing and the effect of neighboring objects that help determine locations to be indexed.

2. Experiment 1: Eye movement induced by hypothetical symbols

2.1. General design

Experiment 1 consisted of a main session with transitive inference tasks and a follow-up session with a semantic preference test.

2.1.1. Transitive inference task

The first question we asked in the course of testing our general hypothesis was if we could find any evidence in people's eye movements that they were engaged in inference through hypothetical drawings. We conducted an experiment with a number of three-term transitive inference problems, to be solved with simple position diagrams, and we recorded the eye movements of the people engaged in the inference task.

A typical trial in a session consisted of the following three steps.

Step 1: An audio recording, “A is lighter than B,” is played, while the diagram in Fig. 3A is simultaneously presented on a computer display.

Step 2: Another audio recording, “C is lighter than A,” is played, while the diagram on the display remains unchanged.

Step 3: An audio recording, “Is C lighter than B?” is played, while the diagram remains unchanged.

We hypothesized that the participants would be engaged in the following processes in each step of this sample problem.

Figure 3.

(A) Sample diagram used in Experiment 1. (B) Rough location of hypothetical indexing in sample problem. (C) Another diagram used in Experiment 1, where positions of square symbols are inverted. (D) Horizontal diagram used in Experiment 1.

Step 1 (Semantic mapping)

When participants are instructed to interpret the diagram in Step 1 as expressing the same information expressed by the audio recording played simultaneously, they should interpret it to mean that A is lighter than B, assuming the semantic rule that a symbol's being above another symbol means that the referent for the first symbol is lighter than the referent for the second. (For brevity, we will use the symbol [abovelighter] to indicate this semantic rule; similar symbols will be used for other semantic rules.)

Step 2 (Hypothetical object introduction)

The second premise given by audio instruction in Step 2 remains unexpressed in the diagram, as the diagram, which only expresses the first premise presented in Step 1, remains unchanged throughout the problem. As the participants exploit spatial constraints on position diagrams in solving this problem, they would “draw” a symbol in the blank area above the square symbol [A]. According to our characterization, this amounts to placing a deictic index. Thus, the area above the square symbol [A] is the hypothetical indexing position for this problem. The gray symbol [C] in Fig. 3B indicates this fact. Since the operation of place-object requires focal attention (Annan & Pylyshyn, 2002), our hypotheses predict that participants' eyes should move to this area, reaching relatively high on the display.

Step 3 (Inference)

When asked whether C is lighter than B, the participants would check the spatial relation between this index and the index for symbol [B]. As we hypothesized that check-relation requires focal attention on the indexes to be compared, we predicted that participants' eyes would move to the hypothetical indexing position introduced in Step 2.

In summary, our prediction for this transitive inference problem was that a participant's eyes would move to the hypothetical indexing position located above symbol [A] in Step 2 and would then move again to the same position in Step 3. Analyzing the eye movements in each step and verifying this prediction would thus serve as a test of whether the participant was engaged in the hypothesized process, namely inference by exploiting spatial constraints on external diagrams.

However, verifying eye movements in one particular problem is not sufficient to establish whether constraint-exploitation occurs, for participants' eyes may move to this area for reasons unrelated to the presence of an external diagram. Perhaps, the participant simply had the habit of looking at the upper area of the display to process the information verbally given in Steps 2 and 3. Otherwise perhaps, he/she was “looking” at his or her own independently constructed mental image, and upper-directed eye movements were only epi-phenomenal in this internal operation.

To filter out these possibilities, we observed the eye movements of participants by systematically varying (a) the predicted location of hypothetical symbols in the diagram and (b) the semantic correspondence between representing spatial relations in diagrams and represented target relations in the task domain. The manipulation was achieved by changing both the layout and the semantic rules of the diagrams shown to the participants.

For example, we presented the diagram in either Fig. 3A or C, while verbally stating the same premises (“A is lighter than B,” “C is lighter than A”) and the same question (“Is C lighter than B?”). This would lead the participants to assume semantic rules that were opposite, that is, [abovelighter] and [belowlighter]. This would also result in different hypothetical indexing positions (to the areas above or below the square symbol [A]), and these are exactly where we predicted participants' eyes would move to. In another type of problem, we presented a “horizontal” diagram such as that in Fig. 3D, while verbally providing the same set of premises and the same question. This would lead the participants to assume the semantic rule [to_the_left_ ⇒ lighter] and the hypothetical indexing position would be the area to the left of symbol [A]. In yet another type of problem, the semantic rule was [to_the_right_ ⇒ lighter] and the indexing point would change accordingly.

Overall, four different problem categories were stipulated on the basis of the predicted eye movement locations: higher/lower/left/right-predictive. Fig. 4 shows these different categories. The gray squares in each category of the problem indicate the hypothetical indexing positions, with the number “2” inside the squares indicating that the hypothetical drawing in that position is expected to occur in Step 2 of the problem. The brackets labeled “3” point to the pairs of (hypothetical or actual) objects whose spatial relations are to be checked in Step 3 of the problem.

Figure 4.

Four different categories of problems used in Experiment 1. Hypothetical indexing positions for problem categories v1, v2, h1, and h2 are, respectively, uppermost, lowermost, leftmost, and rightmost areas of diagram.

The idea is that by adjusting the layout and semantic rules of the diagram given in the problem, we can systematically vary the indexing position for the problem. If we could observe the participants “tracking” this variance, moving their eyes to the particular indexing positions differing from problem to problem, we could take this to be good evidence that they did operate on the externally given diagram, following the particular semantic rule associated with it.

2.1.2. Semantic preference test

For cultural, social, and personal reasons, people may have developed certain preferences over the way a given transitive relation is represented by a spatial relation. For example, most readers, accustomed to the standard convention of geographical mapping, would prefer to_the_east to be represented by the to_the_right relation rather than by the to_the_left relation. The same people on the other hand may prefer in_front to be represented by to_the_left rather than by to_the_right. With an analogy to gravity, some people may prefer heavier to be represented by below rather than by above.

The transitive inference problems given in the experiment involve a variety of semantic rules for each transitive relation. For example, to_the_east can alternatively be represented by the vertical spatial relation below and its inverse above and by the horizontal spatial relation to_the_right and its inverse to_the_left. It is likely that these different semantic rules are preferred in different degrees by individual participants, making the diagrams used in the relevant problems familiar or unfamiliar to them.

One may then suspect that this difference in familiarity with the presented diagrams may affect the participant's inferential process, particularly the occurrence of a hypothetical drawing and constraint exploitation. On the one hand, people may conduct a hypothetical drawing only when the semantic rule associated with the given diagram matches one's semantic preferences. On the other hand, people may be rather adaptive to unfamiliar diagrams, engaged in the constraint-exploitation process even when it requires the use of externally given, non-preferred semantic rules. Either way, detecting the participants' semantic preferences is important in evaluating how persistent the constraint-exploitation process is, and hence, how central it can be to human inference processes with diagrams. This is the second question we addressed in Experiment 1.

After the main session of the experiment (measuring the participants' eye movements during transitive inference tasks), we administered an additional test to all participants to detect their semantic preferences.

The semantic preference test involved a number of binary-choice questions presented in two steps:

Step 1: An audio recording, “A is cleaner than B,” is played while two alternative diagrams such as those in Fig. 5 are simultaneously presented on a computer display.

Figure 5.

Sample stimulus used in semantic-preference test.

Step 2: The participant is to choose which diagram is intuitively more appropriate to express the information presented in the audio recording.

If the participant answers that the left diagram is more appropriate, this counts as partial evidence that he or she prefers the cleaner relation to be represented by the above relation rather than by the below relation. Yet, as one answer is not sufficient to determine the participant's preference about the way the cleaner relation is represented, we created four different versions, modifying the positions of the two diagrams and the order of A and B mentioned in the audio recordings. Since 12 transitive relations were used in the main experiment, a total of 4 × 12 questions were created for the test on vertical diagrams. Participants were asked these questions in random order. After answering these questions on vertical diagrams, participants proceeded to questions on horizontal diagrams, which were similarly designed and administered.

If, for example, a participant was very consistent in his or her answers to the four questions about the cleaner relation so that all four answers indicated his/her preference for the use of the above relation rather than the below relation, the preference for the semantic rule [abovecleaner] was assessed as 4 for this participant, whereas that for the opposite semantic rule [belowcleaner] was assessed as 0. Generally, the preference score for a semantic rule for a participant was the number of times he/she responded with an answer indicating his/her preference for the rule in the four relevant questions. This procedure let us score a total of 48 semantic rules (12 transitive relations × 4 spatial relations) on the basis of each participant's semantic preference.

On the basis of the scores thus obtained, the problems used in the main session of the experiment were classified into three categories: (a) a problem was classified as “Match” if the semantic rule in the diagram used in the problem had a preference score of 4, (b) “OK” if the semantic rule had a preference score ranging from 1 to 3, and (c) “Mismatch” if the semantic rule had a score of 0. These categories of semantic preferences were then used to evaluate how much a participant's semantic preference affected his/her process of constraint exploitation on the relevant diagrams.

2.2. Methods

2.2.1. Participants

A total of 31 naïve volunteers (undergraduate students: 17 females and 14 males) participated, with monetary compensation of 1,000 yen per hour for their participation.

2.2.2. Apparatus

The diagrams were presented on a 17-inch Dell LCD display (1024 × 768 pixel resolution). The viewing angles for the display were approximately 34 degrees horizontally and 28 degrees vertically, and the viewing angles for the stimulus diagrams were approximately 8 degrees (longer dimension) and 2 degrees (shorter dimension). The presentation of the problem materials, including instructions, was controlled by SuperLab 4.0 (Windows version). A Cedrus RB-530 response pad was used for the response input.

The eye position was recorded with a NAC 60Hz VOXER eye tracker. Although the eye-tracking system was fairly tolerant of head movements, each participant's chin and forehead were placed in a fixed support to enable steady recording. The eye tracker recorded momentary eye positions as coordinate values on the display using pixel units. Nine-point calibration was performed at the beginning of each session, and subsequent calibrations were conducted as needed.

Table 1. Visualizability and spatializability ratings awarded by 10 subjects on 7-point Likert scale (Shimojima & Fukaya, 2003)
Bigger / smaller5.634.50
In front / behind5.446.38
To the east / to the west4.695.06
Cleaner / dirtier5.193.44
Brighter / darker2.752.38
Heavier / lighter4.753.94

2.2.3. Problem categories

The four categories of problems in Fig. 4 were prepared, prescribing different hypothetical indexing positions. In preparing problems, we adopted a total of six pairs of a transitive relation and its inverse: bigger and smaller, in_front and behind, to_the_east and to_the_west, cleaner and dirtier, brighter and darker, and heavier and lighter. Shimojima and Fukaya (2003) tested the visualizability and spatializability of each pair of predicates with a method adopted from Knauff and Johnson-Laird (2000). Their results (Table 1) indicate that the pairs of predicates used in our experiment were fairly diverse in terms of their visualizability and spatializability. Half the problems used the same relational predicate in the two premises, while the other half used a relational predicate and its antonym (e.g., “heavier” in the first premise and “lighter” in the second). We also adjusted the content of the questions given in Step 3 so that half the problems would have the correct answer “yes” and the other half “no.”

2.2.4. Procedure

The experiment consisted of a main session where the participants were engaged in transitive inference problems and a follow-up test of their semantic preferences.

In the main session of the experiment, each participant first solved six exercise problems and then proceeded to solve a total of 96 transitive inference problems. The set of problems was derived from four variations of problem categories, six variations in relational predicates, two variations in predicate pairing, and two variations in correct answers. These problems were presented in random order, with a break in between every 24 problems. The participants were instructed to answer as quickly and accurately as possible by using the response pad.

The follow-up test on the participants' semantic preferences was administered immediately after the main session. After three practice questions, each participant answered 48 questions (4 questions × 12 transitive relations) involving vertical diagrams and then 48 more questions involving horizontal diagrams. A maximum of 7 s was allowed for them to answer each question. Questions that remained unanswered during this period were not counted into the preference score of either of the alternative semantic rules. Eye positions were not recorded during the follow-up test.

2.2.5. Design

To examine how different hypothetical positions influence eye movements, problem categories were used as an independent variable in the subsequent analysis. There were two levels, “higher-predictive” and “lower-predictive,” in examining eye movements for vertical problems, and another two levels, “left-predictive” and “right-predictive,” in examining those for horizontal problems.

To examine the influence of familiarity with the semantic rules prescribed by the diagram, for example, between [abovelighter] and [aboveheavier], three levels of semantic rule preferences, “Match,” “OK,” and “Mismatch,” were used as the second independent variable in the analysis.

2.3. Results

Overall, semantic rules involving vertical relations (above and below) were awarded preference scores of 4 or 0 about 52.7% of the time, and those involving horizontal relations (to_the_left and to_the_right) were awarded scores of 4 or 0 about 48.0% of the time. This indicates that the participants were consistent in choosing the preferred semantic rule about half the time. The two most preferred semantic rules involving a vertical relation were [abovein_front] and [abovecleaner], while [to_the_leftbehind], [to_the_leftdirtier], [to_the_leftto_the_west], and [to_the_rightto_the_east] were the most preferred rules involving a horizontal relation.

In analyzing the eye-tracking data, we focused on the “outer reaches” of eye movements in crucial steps of the problem. That is, for the problems involving vertical diagrams, we analyzed the highest and the lowest positions on the display to which participants' eyes moved in individual steps, and for the problems involving horizontal diagrams, we analyzed their leftmost and rightmost positions. Our analysis was mainly based on eye-sample data that consisted of momentary positions of eyes sampled at a fixed rate. The data on eye fixations were also examined in a follow-up analysis.

The data from six participants were excluded from the analysis due to serious misunderstanding of instructions (one participant), errors in problem randomization (two participants), failures in calibration (two participants), and excessive blinking (one participant). Eye movements were analyzed only for those trials in which participants returned a correct answer, which accounted for 87.1% (2,090 trials) of the total trials in our data.

On the semantic preference test, one participant had only one “mismatching” semantic rule on a vertical relation, while he returned an incorrect answer in the particular problem based on this semantic rule. This participant's data on vertical problems therefore had a missing value in the “mismatch” condition, and our analysis of eye movements on vertical diagrams ended up with N = 24. The data for our analysis of eye movements on horizontal diagrams also had a reduced amount of data, that is, N = 21, for similar reasons: One participant had only one “matching” semantic rule on horizontal relations while giving an incorrect answer to the relevant problem, two participants had no “matching” or “mismatching” semantic rules, and one participant had no “OK” semantic rules.

2.3.1. Overview

Figs. 6 and 7 show the upward and the downward reaches of participants' eyes on vertical diagrams, grouped by problem categories and semantic-preference categories. The values are vertical coordinate values in pixels. The origin of the coordinates was set to the bottom-left corner of the display, so the greater the values of upward reaches, the higher participants' eyes reached on the display, and the smaller the values of downward reaches, the lower their eyes reached. The value for each group was obtained by taking the medians of individual participants' values in that group and then averaging these medians over all participants.

Figure 6.

Upward reaches of eye movements on vertical diagrams in Experiment 1 (N = 24). Each square indicates the rough location of the upper square symbol in the stimulus diagram.

Similarly, Figs. 8 and 9 indicate the leftward and the rightward reaches of participants' eyes on the horizontal diagrams. The values are horizontal coordinate values in pixels. The smaller the values of leftward reaches, the further left participants' eyes reached on the display, and the greater the values of rightward reaches, the further right their eyes reached.

A glance at each figure reveals that the participants' eyes did not reach far beyond the boundaries of the real square symbols in Step 1. This was expected because, in this step, the objects involved in the inference are all represented by the square symbols, and there is no need to place or otherwise manipulate deictic indices outside the square symbols. Such a need only arises in Steps 2 and 3, according to our general hypothesis. In fact, a preliminary analysis with a one-factor repeated-measure ANOVA revealed that the effect of differences in the steps was significant for all directions: upward reaches (math formula), downward reaches (math formula), leftward reaches (math formula), and rightward reaches (math formula). Furthermore, post hoc multiple comparisons revealed a significant difference in the outer reaches in Step 1 and the outer reaches in Steps 2 and 3. This roughly demonstrates that eye movements in Steps 2 and 3 were different from those in Step 1, as we had predicted. The next question is whether our predictions are supported in more detailed analyses of eye movements in Steps 2 and 3.

2.3.2. Eye movements in Step 2

As the sample problem in Section 'General design' illustrates, Step 2 in our transitive inference problem was when the participant was predicted to place an index in a particular position on the diagram. We applied a 2 × 3 repeated-measure ANOVA to the outward reaches of the participants' eyes with the factors of problem categories and semantic preferences. The purpose of the analysis was to see if eye movements really depended on the indexing positions prescribed by problems categories (main effects) and if the participants' semantic preferences affected dependence (interactions).

We found a highly significant main effect in problem categories on upward reaches (F(1,23) = 26.39, p < .001, math formula), but the interaction with semantic preferences was marginal (F(2,46) = 3.03, p = .06, math formula). Problem categories also had a highly significant main effect on downward reaches (F(1,23) = 36.86, p < .001, math formula), but the interaction with semantic preferences was not significant (F(2,46) = 1.55, n.s., math formula).

The results for the cases involving horizontal diagrams were similar. The 2 × 3 repeated-measure ANOVA indicated a highly significant main effect in problem categories on leftward reaches (F(1,20) = 38.39, p < .001, math formula) with no significant interaction with semantic preferences (F(2,40) = 2.44, n.s., math formula). Problem categories also had a significant main effect on rightward reaches (F(1,20) = 28.18, p < .001, math formula), but the interaction with semantic preferences was not significant (F(2,40) = 0.60, n.s., math formula).

2.3.3. Eye movements in Step 3

Step 3 in our transitive inference problem was when the participant was predicted to return to the index placed in Step 2 to check the spatial relation between this index and another index created in Step 1. We wanted to check whether participants' eyes really returned to that empty location and whether their semantic preferences affected this behavior.

On the 2 × 3 repeated-measure ANOVA with the factors of problem categories and semantic preferences, we found a significant main effect in problem categories on upward reaches (math formula) with a marginal interaction with semantic preferences (math formula). Problem categories also had a highly significant main effect on downward reaches (math formula), but the interaction with semantic preferences was not significant (math formula).

The results for the cases of horizontal diagrams were similar again. On the 2 × 3 repeated-measure ANOVA, the main effect in problem categories was highly significant on the leftward reaches (F(1,20) = 43.67, p < .001, math formula) and significant on the rightward reaches (F(1,20) = 12.26, p < .01, math formula). The interactions with semantic preferences were marginally significant for the leftward reaches (F(2,40) = 2.93, p = .06, math formula) but non-significant for the rightward reaches (F(2,40) = 0.30, n.s., math formula).

2.3.4. Follow-up: Eye fixations in Steps 2 and 3

The above results were concerned with eye-sample positions, namely momentary positions of participants' eyes sampled at certain intervals (1/60 s in our case). These are distinguished from eye-fixation positions, namely central positions around which eyes nearly stopped moving. As visual or cognitive processing on a position requires an eye fixation on that position, we conducted a follow-up analysis to see whether the results obtained from eye sample data also held for eye-fixation positions.

Eye fixations were identified with the velocity-threshold method, setting the minimum velocity of saccades at 30 visual degrees per second (following Brockmole & Irwin, 2005) and the minimum duration of fixation at 100 ms (Manor & Gordon, 2003). Ideally, these thresholds should have been set at appropriate values according to the nature of the visuo-cognitive process we were detecting. Since a great deal is yet to be known of our target processes (placing an index on a empty position and checking the spatial relation of that position to other positions), we simply used the threshold values generally recommended in the literature. The eye-fixation data thus obtained were analyzed with the same 2×3 repeated-measure ANOVA that we used for the eye-sample data.

The results were analogous to those obtained from the eye-sample data. For Step 2, we found strong main effects in problem categories on upward reaches (math formula), downward reaches (math formula), leftward reaches (math formula), and rightward reaches (math formula). Interactions with semantic preference were marginal for upward reaches (math formula) and leftward reaches (math formula), while they were non-significant for downward reaches (math formula) and rightward reaches (math formula).

We found strong main effects in problem categories for Step 3 again: upward reaches (math formula), downward reaches (math formula), leftward reaches (math formula), and rightward reaches (math formula). Interactions with semantic preference were significant for downward reaches (math formula), marginal for upward reaches (math formula) and leftward reaches (math formula), and non-significant for rightward reaches (math formula).

2.4. Discussion

2.4.1. Existence of inference by hypothetical drawing

Thus, overall, problem categories had a very strong effect on the outward reach of eye movements everywhere. Moreover, the way eye movements were affected by problem categories was exactly as we had predicted.

This tendency is fairly clear from Figs. 6-9. The bar graphs for Step 2 (central bar graphs) in Figs. 6, 7 indicate that the reach of eye movements tended to be higher in higher-predictive problems than in lower-predictive problems and tended to be lower in lower-predictive problems than in higher-predictive problems. This indicates that participants' eyes tended to move toward the hypothetical indexing positions determined by the categories of problems, suggesting that the participants actually moved their focal attention to those areas in order to place deictic indexes in Step 2.

As the bar graphs for Step 3 (right-hand bar graphs) in these figures show, the average reach of eye movements in Step 3 had the same tendency, going toward the hypothetical indexing positions again. This suggests that, in Step 3, the participants moved their focal attention to the locations indexed in Step 2 to check the spatial relation between these locations and other locations in the diagrams.

Figs. 8 and 9 reveal similar tendencies for horizontal diagrams. In both Steps 2 and 3, participants' eyes tracked the hypothetical indexing positions prescribed by the problem categories. Eyes tended to reach further left in left-predictive problems and to reach further right in right-predictive problems.

The statistical support for these tendencies remains clear, both in eye-sample and eye-fixation analyses. Thus, our eye-tracking data strongly indicate that the participants were actually engaged in the process we hypothesized, that is, inference by hypothetical drawing. It is likely that the participants attempted to exploit the spatial constraints of graphical structures even though they were not in the position of directly manipulating them.

2.4.2. Persistence of inference by hypothetical drawing

Moreover, these tendencies did not depend so much on the participants' semantic preferences. Both in Steps 2 and 3 given in Fig. 6, the bars in the higher-predictive problems equally rise higher, and the bars in the lower-predictive problems in Fig. 7equally fall lower. There is an analogous pattern of eye movements in Figs. 8 and 9. This reflects our results obtained from interaction analysis on eye-sample positions, where the main effects of problem categories had no or only marginal interactions with semantic preferences.

Figure 7.

Downward reaches of eye movements on vertical diagrams in Experiment 1 (N = 24). Each square indicates the rough location of the lower square symbol in the stimulus diagram.

Figure 8.

Leftward reaches of eye movements on horizontal diagrams in Experiment 1 (N = 21). Each square indicates the rough location of the left-hand square symbol in the stimulus diagram.

Figure 9.

Rightward reaches of eye movements on horizontal diagrams in Experiment 1 (N = 21). Each square indicates the rough location of the right-hand square symbol in the stimulus diagram.

This, however, should not be taken to mean that semantic preference was entirely without effect. A significant interaction with semantic preference was found for the downward reaches of eye movements in Step 3, when eye fixation data were considered in the follow-up analysis. On a closer look at the data, this reflects the fact that problem categories had a greater effect on the downward reach of eye movements when the semantic rules matched the participants' preferences than when they were OK or mismatching. Although only marginally significant, we found a similar effect of the matching condition elsewhere, in the upward reach in Step 2 (eye-sample data) and the downward reach in Step 2 (fixation data). A related, marginally significant tendency was found in the upward reach in Step 3 (both eye-sample and fixation data) and the leftward reach in Step 3 (eye-sample data), where the effects of problem categories on eye movements were smaller in the mismatching conditions than in the matching and OK conditions. Thus, semantic preference seems to have had a limited effect on the occurrence of inference by hypothetical drawing: the evidence for the participants' being engaged in the process was clearer when the semantic rule associated with the given diagram was preferable or at least OK to them.

Yet this effect of semantic preference was not consistent, missing from the combinations of problem categories and steps other than those previously mentioned. Also, it was rather weak in the face of stronger and more consistent main effects in problem categories on eye movements. Thus, the participants were fairly adaptive to vertical diagrams drawn with unfamiliar semantic rules; their eyes seem to have tracked the hypothetical indexing positions, complying with externally given semantic rules whether they were preferable to them. Our data suggest that the exploitation of spatial constraints is a robust process that can be applied to externally given diagrams of unfamiliar kinds. Moreover, the fact that the same tendency was found both for vertical and horizontal diagrams indicates that the constraint-exploitation process is persistent over the change of the diagram's spatial layout.2

3. Experiment 2: Indefiniteness in hypothetical symbols

3.1. General design

It has been known that, in contrast to linguistic representations, diagrammatic representations are not well suited to represent indefiniteness. Experiment 2 was designed to whether inference with hypothetical symbols exhibits similar difficulties in handling indefiniteness. If we can find positive evidence to this question, it will provide additional support to our hypothesis that people exploit spatial constraints on external diagrammatic structures through deictic indexing.

To incorporate the indefiniteness problem in a natural way, Experiment 2 involves more complex, four-term transitive inference problems consisting of the following four steps.

Step 1: An audio recording, “A is cleaner than B,” is played while the diagram in Fig. 10-(Step 1) is simultaneously presented on a computer display.

Step 2: Another audio recording, “C is cleaner than A,” is played while the diagram on the display remains unchanged.

Step 3: Another audio recording, “A is cleaner than O,” is played while the diagram remains unchanged.

Figure 10.

Sample problem in Experiment 2, with rough locations of hypothetically drawn square symbols in Steps 2–4.

Step 4: An audio recording, “Is C cleaner than O?” is played while the diagram remains unchanged.

We hypothesized that the participants would be engaged in the following processes in each step of this sample problem.

Step 1 (Semantic mapping)

As with Experiment 1, a participant was instructed to interpret the diagram in Step 1 as expressing the same information expressed by the audio recording played at the same time. Thus, in our sample problem, the participant should have interpreted the diagram in 10-(a) to mean that A was cleaner than B, assuming the semantic rule [abovecleaner].

Step 2 (Hypothetical object introduction)

Throughout the task, the diagram only expressed the first premise presented in Step 1. The second and the third premises supplied in Steps 2 and 3 remained unexpressed in the diagram. Given the semantic rule [abovecleaner] and the content of the second premise, the hypothetical indexing position was uniquely determined for Step 2 to be the blank area indicated by the gray symbol [C] in Fig. 10B. Thus, we could verify the occurrence of hypothetical drawing by analyzing participants' eye movements in and out of this area.

Step 3 (Indefinite symbol introduction)

This is the crucial step in this sample problem. Given the semantic rule [abovecleaner] and the content of the third premise, expressing this information would require that symbol [A] be above symbol [O]—that is, that symbol [O] should be below symbol [A]. To place [O] below [A], however, one needs to make a difficult choice on where to place [O] relative to the existing symbol [B]. It would be a mistake to place [O] above [B] since it would carry the information that O is cleaner than B, which is not implied by the given premises. It would be also a mistake to place [O] below [B] since it would carry another piece of unwarranted information that B is cleaner than O. Since the position diagrams only allow the symbols to be ordered linearly without overlap, there is no definite way of placing [O] below [A] without thereby expressing an unwarranted piece of information. The drawing position of symbol [O] is indefinite in this sense. Fig. 10C indicates this indefiniteness in having gray [O] beside [B].

This is an instance of the very common property of diagrams, known as “specificity” (Stenning & Oberlander, 1995), “over-specificity” (Shimojima, 1996), or “particularity” (Kulpa, 2003). Varieties of diagrams, including maps, geometry diagrams, and Euler diagrams, prohibit the exclusive expression of information in this manner, enforcing the choice of additional, unwarranted information for expression. This property is a characteristic weakness of diagrams, and as such, it intervenes in the cognitive process only when the cognitive process involves a drawing on a diagram. Thus, if we could find evidence that the subjects' performance was affected within contexts that required the “drawing” of indefinite symbols, then we could count this as a new kind of evidence for the existence of hypothetical drawing. The data on response latency were used to verify this effect.

Step 4 (Inference)

From the premise that C is cleaner than A (Step 2) and the premise that A is cleaner than O (Step 3), it follows that C is cleaner than O. Thus, the correct answer to the question in Step 4 is “yes” in this particular problem. The gray line connecting the symbols [C] and [O] in Fig. 10D indicates that the spatial relationship between these symbols is to be checked in this step. The premise given in Step 1 is not relevant to the solution.

Besides allowing us to examine the occurrence of inference by hypothetical drawing, Experiment 2 lets us check the non-arbitrariness of underlying spatial constraints. Note that spatial constraints on structures of position diagrams play an essential role in the indefinite-symbol problem. Given any two symbols and an axis on an Euclidean plane, the two symbols must stand in some precedence relation relative to the direction of the axis. In particular, the symbol [O] must be either above, horizontal to, or below the symbol [B] if they are to be placed in an area below the symbol [A]. If it were not for such constraints, you could place [O] and [B] below [A] without determining the vertical relation between [O] and [B] so that your diagram could say that both O and B are dirtier than A without thereby saying which of O and A is cleaner. Thus, the indefinite-symbol problem is the case where spatial constraints on diagrammatic structures intervene in the expression of information and hence in the overall inferential process. This is in contrast to the case mainly studied in Experiment 1, where spatial constraints facilitate the inferential process through automatic expression of logical consequences.

So, if we find evidence that the participants' performance is lowered in introducing indefinite symbols, it will be evidence that spatial constraints do intervene in their inferential processes. Such evidence is particularly important since it will be a manifestation of the non-arbitrariness of the spatial constraints involved. When one is engaged in a diagrammatic inferential process exploiting spatial constraints, the process must be subject to their adverse effect, not only their favorable effects. Non-arbitrary application is a hallmark of spatial constraints as constraints, and finding evidence for such non-arbitrariness is a strong indication of their presence in the inferential process.

Still another advantage of Experiment 2 is that it is more immune to the so-called experimenter's demand problem. According to Pylyshyn (1981) and Intons-Peterson (1983), with every experiment involving mental imaging of one sort or another, there is the potential danger that the participant will somehow become aware of the experimenters' hypothesis and “cooperate” excessively by consciously exercising his/her mental imaging of the hypothesized form. Various measures have been taken to preclude this possibility (Damarais & Cohen, 1998; Johansson, Holsanova, & Holmqvist, 2006; Laeng & Teodorescu, 2002; Spivey & Geng, 2001; Spivey, Richardson, Tyler, & Young, 2000), many of which have consisted of hiding the experimenters' expectations from the participants. Although our target process is not necessarily mental imaging, it is something that the participants could also exercise consciously, and the danger of excessive cooperation holds for our experiments at least potentially. However, the new complication introduced in Experiment 2 significantly reduces its likelihood. Exploiting the over-specific nature of the system of position diagrams, many problems presented in Experiment 2 deliberately made the process of hypothetical drawing more difficult. Yet this very process is what we expected to occur, and there was little chance that the participants were aware of our expectations. If anything, the participants should have assumed that the opposite had been expected. In fact, so far as the interviews following Experiment 2 were concerned, only one of the 26 participants suggested that she was aware of our actual hypothesis. Our measures taken against the experimenter's demand problem seemed fairly successful in this respect.

3.2. Method

3.2.1. Problem categories

All the problems we assessed in our experiment used a four-step procedure, as illustrated in the previous section. As with Experiment 1, half the problems were vertical problems, using a diagram with vertically arranged square symbols. The other half were horizontal problems, using a diagram with horizontally arranged symbols.

The content of the premises and the semantic rules used in the problems varied significantly so that different problems may have prescribed different positions and operations for hypothetically drawn objects on the given diagrams. Fig. 11 shows 14 categories (v1 (a) through v6 (c)) of vertical problems classified on this basis. Here, the numbers in gray squares indicate the timing (the step number) in which the symbol in question is supposed to be hypothetically drawn on the diagram. The gray line labeled “4” indicates two (actual or hypothetical) symbols whose spatial relations are supposed to be checked in Step 4 for each category of the problem. Note the problem categories belonging to the same group (say, problem categories v1(a) and v1(b) belonging to group v1) have the same pattern from Steps 1 to 3. They only differ in Step 4, when different pairs of symbols are expected to be compared for spatial relations (in v1(a), for example, the symbol introduced in Step 3 is compared with the lower physical symbol whereas in v1(b), the same symbol is compared to the indefinite symbol introduced in Step 2).

Figure 11.

Types of vertical problems used in Experiment 2.

The sample problem described in the previous section is in category 4 (b), for example. It prescribes the hypothetical drawing of symbol [C] in the uppermost area in Step 2, and that is indicated by a gray square numbered 2 in the uppermost area of the figure of category v4(b). The figures of categories v1, v2, v4, and v5 have gray squares placed beside other symbols, meaning that problems in these categories introduce indefinite symbol squares with indicated timing.

In a similar vein, the horizontal problems in Experiment 2 can be classified into six broad categories divided more precisely into 14 categories due to the ramifications in Step 4. Their definitions are analogous to their vertical counterparts.

As with Experiment 1, we adopted opposite semantic rules for each target relation (e.g., [abovecleaner] and [belowcleaner]) in presenting different diagrams. This time, however, the purpose of doing so was not to test the influence of the participants' preferences for different semantic rules but merely to prevent them from being adapted to a particular semantic rule (e.g., [abovecleaner]) so much that they started to simply think about the representing relation (e.g., above) instead of the target relation (e.g., cleaner).

With 28 categories of problems (14 vertical and 14 horizontal) and two variations of relational predicates for each category, we had a total of 56 problems to be presented in the main session of the experiment.

3.2.2. Predictions on response latency

The position diagrams we used in our experiments have a weakness in terms of expressive flexibility. Consequently, a problem with positioning indefinite symbols occurs in certain contexts. If we refer back to Fig. 11 again, we can see the gray squares beside other squares are indefinite symbols, and the numbers inside the squares indicate the timing (step numbers) in which they must be positioned in the relevant diagrams. For example, Step 2 in problem categories v1 and v2 would thus be more difficult than Step 2 in the other problem categories. Also, Step 3 in problem categories v4 and v5 would be more difficult than Step 3 in the other problem categories. We predicted that this difference would be reflected in the difference in latency.

Assuming that the participants keep engaged in the hypothetical drawing process in face of the problem of positioning indefinite symbols, we predicted that no significant difference in latency would occur in Step 4 between problems involving indefinite symbols and those that do not. There are certainly many different drawing strategies that one may use to deal with the problem: One may place a hypothetical symbol in a definite position in the linear diagram at the risk of expressing unwarranted information, one may express indefiniteness by putting a hypothetical symbol beside the existing symbol (in the manner we used for Fig. 11), or one may express alternative possibilities disjunctively by placing hypothetical symbols in multiple locations. All these complications, however, are confined to the initial introduction of indefiniteness in Step 2 or 3. The operation involved in Step 4 is check-relation rather than place-object, and if we strictly apply our general hypothesis, the locations of all the relevant indexes must have been fixed by the time the check-relation operation is applied, and participants can simply check the spatial relation among these fixed locations. Thus, although Step 4 in problem categories v1(b), v2(b), v4(b), and v5(b) seems to involve an operation on indefinite symbols, no greater response latency should be predicted for these steps.

Analogous considerations apply to horizontal problems, and the predictions of response latency for them were determined accordingly.

3.2.3. Predictions on eye movements

Applying the model described in Section 'General hypothesis' again, we obtained systematic predictions on eye movements accompanying the operations of place-object and check-relation.

We illustrate our predictions with the case of upward reaches of eyes. Consulting Fig. 11, we see place-object applied to the uppermost areas of diagrams in Step 2 of problem categories v3(a)–(c) and v4(a)(b) and in Step 3 of problem categories v1(a)(b) and v6(a)–(c). Furthermore, check-relation would involve hypothetical symbols in the uppermost areas in Step 4 of problem categories v1(a)(b), v3(a)(b), v4(a)(b), and v6(a)(b). Thus, we predicted higher upward reaches of eyes during these steps. In contrast, the upward reaches of eyes would be generally lower in Step 2 of problem categories v1(a)(b), v2(a)(b), v5(a)(b), and v6(a)–(c) since there would be no definite symbols in the uppermost areas at the time of Step 2. For the same reason, the upward reaches of eyes would be generally lower in Steps 3 and 4 of v2(a)(b) and v5(a)(b). We had no predictions for the upward reaches for Step 3 of v3(a)–(c) and v4(a)(b) and Step 4 of v3(c) and v6(c). They were ambiguous cases having definite hypothetical symbols in the uppermost areas that, however, would not be operated on by place-object nor check-relation in the relevant steps. The detailed analysis of these cases are beyond the main concern of this article.

We made predictions on downward, leftward, and rightward reaches of eyes in an analogous manner. Table 2 is the classification of vertical problems based on the predicted upward and downward reaches of eyes.

Table 2. Classification of vertical problems in Experiment 2 based on predicted eye movements in individual steps
Problem CategoriesStep 2Step 3Step 4
Upward ReachDownward ReachUpward ReachDownward ReachUpward ReachDownward Reach
  1. Note. Expression “-pred" is an abbreviation for “-predictive." Blank means no prediction

v3(a)Higher-predHigher-pred Lower-predHigher-predLower-pred
v3(b)Higher-predHigher-pred Lower-predHigher-pred 
v3(c)Higher-predHigher-pred Lower-pred Lower-pred
v4(a)(b)Higher-predHigher-pred Higher-predHigher-predHigher-pred
v5(a)(b)Lower-predLower-predLower-pred Lower-predLower-pred
v6(a)Lower-predLower-predHigher-pred Higher-predLower-pred
v6(b)Lower-predLower-predHigher-pred Higher-pred 
v6(c)Lower-predLower-predHigher-pred  Lower-pred

3.2.4. Participants

A total of 26 naïve volunteers (undergraduate students: 17 females and 9 males) participated, with monetary compensation of 1,000 yen per hour.

3.2.5. Apparatus

The diagrams were presented on a 19-inch Eizo Flexscan LCD display (1280 × 1,024 pixel resolution). The viewing angles of the display were approximately 38 degrees (horizontal) and 31 degrees (vertical), and the viewing angles of the stimulus diagrams were the same as those in Experiment 1, being approximately 8 degrees (longer dimension) and 2 degrees (shorter dimension). The apparatus for problem presentation, participant input, and eye-position measurement was the same as that in Experiment 1. Again, each participant's chin and forehead were placed in a fixed support. A nine-point calibration was performed at the beginning of each session, and subsequent calibrations were conducted as needed.

3.2.6. Design

Some steps in our problems required operations involving indefinite symbols, whereas other steps required operations involving only definite symbols. This indefiniteness in operations (two levels: with and without indefiniteness) was used as an independent variable for the first test. Response latency in each step was measured and used as the dependent variable.

Problems were also categorized in terms of predicted eye movements (higher/lower/left/right-predictive) in each of the Steps 2, 3, and 4. This set of problem categories was used as an independent variable. Outward reaches of eye movement on problem displays were then measured and used as the dependent variable.

3.2.7. Procedure

Participants were tested individually, beginning with eight practice problems followed by 56 main problems with a break between every 14 problems. They were instructed to solve the problems as quickly and accurately as possible. The response latency in each step and eye movement during the session was recorded.

3.3. Results

3.3.1. Response latency

Each of the 26 subjects solved 56 problems, totaling 1,456 trials. A total of 1,243 trials (85.4 %) were answered correctly. Table 3 lists the average response latency in Steps 2–4, comparing the contexts operating on indefinite symbols and the contexts operating on definite symbols. Repeated-pair t-tests found that response latency was significantly longer in contexts with that problem for both Step 2 (t(25) = 2.43, p = .02) and Step 3 (t(25) = 3.91, p < .01). There were no significant differences between the two contexts in Step 4 (t(25) = −0.51, n.s.), as we expected.

Table 3. Response latency in Steps 2–4, comparing contexts operating on indefinite and definite symbols
 ContextsMean (msec)SD
Step 2Operating on indefinite symbols4,074925.7
Operating on definite symbols3,756744.2
Step 3Operating on indefinite symbols4,4481387.4
Operating on definite symbols3,873830.7
Step 4Operating on originally indefinite symbols3,786595.1
Operating on originally definite symbols3,727459.5

3.3.2. Eye movements

To test our predictions on eye movements, we analyzed the 1,243 trials (85.4%) that had been answered correctly. We again analyzed the outer reaches of eye movements, captured by the vertical and horizontal coordinates of eye samples. Due to calibration failure, the data for one subject had to be excluded from analysis.

Figs. 12 and 13 plot the average vertical coordinate values of the upward reaches and the downward reaches of eye movements during the individual steps of trials. As predicted, participants' eyes reached higher in the higher-predictive problems than in the lower-predictive problems. The trend was highly significant for Step 2 (t(24) = 5.38,p < .01), Step 3 (t(24) = 5.19, p < .01), and Step 4 (t(24) = 4.42, p < .01). Also, participants' eyes reached lower in the lower-predictive problems than in the higher-predictive problems. The trend was highly significant for Step 2 (t(24) = 3.28, p < .01) and marginally significant for Step 3 (t(24) = 1.83, p < .1). The trend did not reach any degree of significance for Step 4, however (t(24) = 0.89, n.s.).

Figure 12.

Upward reaches of eye movements on vertical diagrams in Experiment 2 (N = 25). Each square shows the rough location of the upper square symbol in the stimulus diagram.

Figure 13.

Downward reaches of eye movements on vertical diagrams in Experiment 2 (N = 25). Each square shows the rough location of the lower square symbol in the stimulus diagram.

Figs. 14 and 15 show the average horizontal coordinate values of the leftward reaches and the rightward reaches of eye movements. Again, participants' eyes moved mostly as predicted. Eyes reached further left in the left-predictive problems than in the right-predictive problems in Step 2 (t(24) = 3.62, p < .01) and Step 3 (t(24) = 4.92, p < .01), although the effect did not reach any degree of significance for Step 4 [t(24) = 1.70, n.s.). Also, eyes reached further right in the right-predictive problems than in the left-predictive problems in Step 2 (t(24) = 3.14, p < .01) and Step 3 (t(24) = 5.91, p < .01), yet the effect did not reach any degree of significance for Step 4 (t(24) = 1.56, n.s.).

Figure 14.

Leftward reaches of eye movements on horizontal diagrams in Experiment 2 (N = 25). Each square shows the rough location of the left square symbol in the stimulus diagram.

Figure 15.

Rightward reaches of eye movements on horizontal diagrams in Experiment 2 (N = 25). Each square shows the rough location of the right square symbol in the stimulus diagram.

3.4. Discussion

3.4.1. Treatment of indefinite symbols

Over-specificity is a property of many graphical systems, where spatial constraints on graphical structures intervene in the expression of information in diagrams (Shimojima, 1995; Stenning & Oberlander, 1995). In the position diagrams used in our experiment, this property was expected to cause difficulty in choosing the location for drawing a square symbol within certain contexts.

Our data clearly indicate that the participants' performance was adversely affected by this difficulty: The response latency was significantly greater when the location for drawing was indeterminate than when it was determinate (Steps 2 and 3 in Table 3). This implies that the participants were engaged in an essentially drawing-like process, in the sense that it was subject to constraints inherent in space just as actual drawing is. If it were not drawing, it would not have been affected in a way drawing would be affected, and indeterminacy of the drawing location should have made no difference to performance. Thus, our data provide strong evidence for the reality of inference by hypothetical drawing.

Interestingly, this adverse effect was observed only at the time of drawing problematic symbols, not when revisiting these symbols for the check-relation operation (Step 4 in Table 3). This is consistent with the view that the participants were engaged in a drawing-like process. No matter how difficult it may have been to determine the location of a square symbol, once drawing was completed, the symbol should have been placed in a fixed location. Thus, subsequent operations on this symbol should not have been affected by the fact that the symbol's location was indefinite when it was initially introduced. This is exactly what was indicated by our response latency data.

3.4.2. Evidence from eye movements

The eye-tracking results from Experiment 2 largely replicate those from Experiment 1: Participants' eyes tracked the expected locations of place-object that varied with problem categories. This was clearly indicated by the data on Step 2, where a significant difference in outer reach was found for all directions. This was also confirmed by the data on Step 3, where significant differences in the .01 level were found for all directions except the downward reach. Thus, as with those from Experiment 1, eye-tracking data from Experiment 2 support the occurrence of hypothetical drawing operations.

However, the data on check-relation collected from Step 4 are less clear. As expected, the upper reach in “higher-predictive” problems was significantly higher than in “lower-predictive” problems. Also, the average reaches of downward, leftward, and rightward directions were modulated as we expected. Yet the differences did not reach any degree of statistical significance for these directions. This indicates that eyes tracked the expected locations of check-relation less precisely in Experiment 2 than in Experiment 1. This might be related to the fact that the deictic indexes that had to be tracked in Step 4 were larger in number and more varied in location in Step 4. We will come back to this issue in the general discussion, where different eye-movement strategies for tracking multiple objects are discussed.

4. General discussion

4.1. Reality of constraint-exploitation processes

Our eye-tracking data, combined with the response latency data, strongly support the hypothesis that people exploit spatial constraints on the given graphical structures, even when they are not in the position of actually manipulating them. According to our model based on the indexing theories (Ballard, Hayhoe, Pook, & Rao, 1997; Pylyshyn, 1989; Ullman, 1984), this process consists of placing an index in an empty location in the graphical structure, associating with the index a certain visual property stored in internal memory, and checking the emerging spatial relation of this index to other indexes in empty or non-empty locations. Within the context of inferential problem solving, this spatial relation may carry a piece of information that logically follows from the premises at hand, and thus, the task of computing a logical consequence can be replaced by visual-indexing procedures applied to an external display. Spatial constraints on the graphical structure can thus be exploited.

As noted in the introduction, semantic studies of diagrammatic reasoning (Barwise & Etchemendy, 1990; Pylyshyn, 2003; Shimojima, 1995; Stenning & Lemon, 2001) suggested that the exploitation of spatial constraints is one of the core mechanisms through which diagrams facilitate our inferential processes. Our experimental data support this conjecture, giving systematic evidence of the existence of the constraint-exploitation process. With an experimental paradigm where only a partial diagram is given, we could observe a systematic pattern of eye movements toward particular blank places on the display (Experiment 1). Without any visual targets in their destinations, these eye movements could be explained only as placing a hypothetical symbol or checking a spatial property of a hypothetical symbol, where these operations exactly matched what would be required when one used the transitivity constraint of the relevant spatial relation to facilitate one's inference. Moreover, the observed eye movements were systematic in the sense that under wide semantic-syntactic variances of the given diagram, they were consistently in the directions that would maintain this matching with the required operations. Thus, Experiment 1 provided initial strong evidence for the constraint-exploitation process.

Experiment 2 reinforced the evidence by showing the non-arbitrary character of the spatial constraints involved. The experiment had the crucial condition where a spatial constraint on the given diagram would work adversely only if the participant were engaged in the constraint-exploitation process. A significant increase of response time was observed under this condition, strongly suggesting that an adverse spatial constraint did intervene in the reasoning process. Clearly, this adverse constraint came as a side effect of the overall constraint-exploitation process, just as is the case when we physically manipulate diagrams (Shimojima, 1995; Stenning & Oberlander, 1995). As a constraint, a spatial constraint involved in the process must be non-arbitrary, in the sense that it affects the reasoning process consistently, no matter whether it is of the kind facilitating reasoning (Experiment 1) or the kind impeding it (the present cases).

Experiment 2 was particularly important as a test of this non-arbitrariness, and its results clearely showed the involvement of a real constraint in this respect.

4.2. Reality of hypothetical drawing

Another direct implication of our study is the reality of hypothetical drawing operations. A number of studies on diagrammatic reasoning have suggested the existence of non-physical operations on diagrams of this kind. Back in 1971, Sloman (1971) cited our ability to “imagine or envisage rotations, stretches, and translations of parts” of a diagram as an explanation of the efficacy of inference with diagrams. Apart from their well-known analysis of the information-indexing functions of diagrams, Larkin and Simon (1987) also discussed “simple, direct perceptual operations” on the supply-demand chart that helped the viewer “read off” the effects of an economic policy. In characterizing “reasoning about a picture as the referent” in contrast to “reasoning about the picture's referent,” Schwartz (1995) assumed a mental operation that made the line representing the upper leg of a hinge swing down to the line representing the lower leg of the hinge. Narayanan, Suwa, and Motoda (1995) postulated the “visualization” process on a schematic diagram of a mechanical device that moved, rotated, copied, and deleted elements of the diagram. Trafton, Trickett, and Mintz (2005) and Tricket and Trafton (2007) used the protocol data to investigate how scientists apply “spatial transformation” to graphical data displays. These studies seemed to point to a rich field of diagrammatic inferences to which hypothetical drawing applied.

Several researchers have recently started to investigate non-physical drawing with an eye-tracking method. Hegarty's study of mental animations (Hegarty, 1992) was an early but highy successful attempt. The scientiffic use of spatial transformations was investigated also with an eye-tracking method (Trafton & Trickett, 2001; Trafton, Marshall, Mintz, & Trickett, 2002). The research by Shimojima and Fukaya (2003) is a direct predecessor of this study, where they investigated “hypothetical drawing” on position diagrams used in transitive inference tasks. Yoon and Narayanan (2004) investigated imaginative drawing on a near-blank screen within the context of problem solving. In their study of kinematic problem solving, Kozhevnikov, Motes, and Hegarty (2007) identified the ability of spatial transformation on external diagrams as a characteristic of students with higher spatial-visualization ability.

Our study contributes to this research trend by providing strong evidence to the existence of such a process in a focused and systematic manner. In our model, a hypothetical symbol on an external display is really a deictic index placed in an empty location, with an associated visual property (such as being a square symbol labeled “A”) stored in the agent's internal memory. On the basis of this model, we found systematic movements of eyes to the expected locations of index placement. The fact that eyes closely followed the change of the diagram's syntax and semantics established that the observed eye movements reflected certain operations on the given diagram, effectively excluding the possibility that eyes moved for a reason unrelated to the given diagram. Our data on response latency also displayed the pattern reflecting the expressive weakness of the given diagrammatic notation, which was only accountable by assuming that the participants were engaged in a drawing-like activity on the given diagram.

Admittedly, our evidence is only about a simple case of hypothetical drawing, placing a single iconic symbol in a single place. In contrast, some of the cases studied in the previous research involve the placement or movement of more complex objects, such as the translocation of a data curve in a time-series graph (Larkin & Simon, 1987). To fully account for such cases, our process model needs to be significantly extended, say, to specify how multiple indexes are associated with a single label to form a complex “object” and how they are collectively moved while maintaining their mutual spatial relations. The recent evidence indicates that our image-based inference relies more on the spatial configuration of the relevant representation than on its visual details, where the former is determined by the locations of a relatively small number of objects (see Knauff, 2006 for a review). This is encouraging to the idea of extending the indexing model to cover more complex cases of hypothetical drawing. Such extensive research, however, would not be feasible without initial evidence on more basic processes. The evidence provided by our study is of this character.

4.3. Implications under the real-space hypothesis

The contribution and intervention of spatial constraints identified in our study have wider implications on the current research on inference involving spatial representations. The research in this field, however, has been divided by two well-known general hypotheses concerning the nature of “spatial representations.” The implications of our results therefore diverge, depending on under which position they are interpreted. Since each of these hypotheses is driving promising research programs, we will clarify both sets of implications of our results in the following two sections.

One hypothesis, which may be called “the real-space hypothesis,” has been most explicitly formulated by Pylyshyn (2003, 2007). The general position goes as follows. When we apparently operate on spatial representations, the operations are extended to real, physical space surrounding us, possibly going through (but not terminating at) internal representations. This extension is possible by the indexing mechanism (outlined in section 1.'Theory of deictic indexing'), which maintains reliable connections between mental labels and particular objects in real space so that whatever processor handles the mental labels can issue direct queries concerning the spatial properties or configurations of those objects. This mechanism lets us use locations, objects, and their relations in real space as a spatial representation of the problem domain, where “spatial” is taken literally. The reason why our mental operations sometimes bear spatial characters—as exemplified by mental scanning (Kosslyn, 1973; Kossylyn, Ball, & Reiser, 1978) and image imposition (Farah, 1989; Podgorny & Shepard, 1978)—is because they operate on real physical space, but not because they operate on some internally realized space.

Our model of hypothetical drawing presented in Section 1.'General hypothesis' is a form of the real-space hypothesis applied to the case of augmenting partially drawn diagrams. All the indexing operations postulated in our model, namely, place-object, identify-object, and check-relation, are conceived as operations reaching out to real space, and it is constraints on real space that are exploited through these operations. Thus, under the real-space hypothesis, our results would imply a direct use of spatial constraint on real space. The indexing operations let us project premises of our inference onto an external diagram, exploit spatial constraints holding in there, and gain a “free ride” to a logical consequence. This is a paradigmatic case where cognitive processes are distributed over the external world (Clark, 2011; Clark, 1997; Clark & Chalmers, 1998; Hutchins, 1995).

There is an important point, however, in which the form of distributed cognition indicated by our model is different from what has been pointed out in the previous research on vision-based distributed cognition. According to O'Regan (1992), we do not have to construct an exact mental copy of our visual scene since the external world is generally quite stable and thus we can always return our attention to the relevant part of the visual scene and obtain the necessary information. Ballard, Hayhoe, Pook, and Rao (1997) gave an empirical demonstration of this view, using the notion of deictic indexes to account for our frequent attention-returning behavior. In their second experiment, Spivey and Geng (2001) found that people could be so accustomed to the stability of an external display as to return attention to where the relevant visual information was, even when it was no longer available there. Thus, the previous experimental findings on vision-based distributed cognition have been mainly concerned with how memory gains from interaction with the external environment. In contrast, our concern is how inference gains from interaction with the external environment. Deictic indexing is a key operation again, but in the case of inference, we take advantage of spatial constraints holding in real space to skip some computational steps rather than taking advantage of the stability of the external world to reduce the memory load.

The general idea of spatial constraints on real space contributing inference is not entirely new. We have indicated that semantic studies of diagrammatic reasoning (Barwise & Etchemendy, 1990; Shimojima, 1995; Stenning & Lemon, 2001) have often pointed out that physically manipulating diagrams lets one gain such an advantage. In fact, Pylyshyn (2003, 2007) and Spivey, Richardson, and Fitneva (2004) anticipated a part of our experimental results, specifically suggesting the possibility of exploiting spatial constraints by placing indexes in real space. Thus, since indexed objects are objects in real space, any spatial configurations they make are “guaranteed” to be “consistent with the axioms of geometry” (Pylyshyn, 2003, p. 377). This enables one to “read off a correct answer” to an inference problem, “without using the axiom of transitivity or any similar rule of syllogistic logic” (Pylyshyn, 2007, p. 170). The virtue of our study was then to conduct experiments that could isolate the suggested contribution of spatial constraints in a focused manner. The reality of such constraints was further highlighted by the result indicating their intervention to the reasoning process.

4.4. Implications under the internal-space hypothesis

The other major position that one may take about the nature of spatial representations can be called “the internal-space hypothesis.” According to this position, representations with certain spatial characters can be realized in the brain. Mental operations sometimes bear spatial characters because they operate on this internally realized space (as opposed to real space).

The well-known theory of visual imaging developed by Kosslyn (1994) and Kosslyn, Thompson, and Ganis (2006) provide a detailed process model based on this position, with an emphasis on its neurophysiological realizability. According to this model, spatial representations are realized as “object maps” in the posterior parietal lobe when only spatial properties need to be represented (Kosslyn, Thompson, & Ganis, 2006) or as “depictive representations” in the occipital lobe (the “visual buffer”) when visual properties also need to be represented (Kosslyn, 1994). There are massive neural connections between these areas and other portions of the brain, realizing such operations as attention-window shifting, spatial-property-encoding, and spatial-property-transformation to handle information represented in these areas. The topographical organization of the areas then have a systematic effect on these operations, accounting for the spatial characters that some mental operations bear.

Assuming this position, all the indexing operations postulated in our computational model should be considered operations on the perceptual image of the diagram in this internally realized space, and this fact should account for their spatial characters. Thus, the observed eye movements to blank places in the external display are interpreted to reflect the creation of mental images in the corresponding locations in the internal space and hence to reflect the exploitation of spatial constraints holding in there; the increased response latency found in Experiment 2 then indicates the intervention of spatial constraints also holding in the internal space. Thus, under the internal-space hypothesis, it is exploitation of spatial constraints on this internal space, not of the ones on real space, that our experiments have isolated.

Our own computational model is conceived under the real-space hypothesis and as such is not compatible with this reading based on the internal-space hypothesis. However, our experimental data themselves are perfectly compatible with it, especially given recent empirical studies, conducted largely under the internal-space hypothesis, that show that directions of eye movements spatially correspond to the underlying mental operations on internal space (Brandt & Stark, 1997; Damarais & Cohen, 1998; Johansson, Holsanova, & Holmqvist, 2006; Laeng & Teodorescu, 2002)—for, on the basis of this result, we can interpret an eye movement to a hypothetical indexical position in a given diagram as reflecting the application of place-object or check-relation to the mental diagram exploiting its spatial constraints.3

In fact, this alternative interpretation of our experimental data has an important implication on the nature of spatial representations conceived under the internal-space hypothesis. The strength and relevance of spatial constraints in our inferential process indicates under the internal-space hypothesis, that the internal space is a space of that kind, namely one that can sustain non-arbitrary spatial constraints as real space does. This point is important since it is not clearly implied by the “depictive” character (Kosslyn, Thompson, & Ganis, 2006) that has been suggested as an essential property of spatial representations in internal space. Kosslyn, Thompson, and Ganis (2006) defines a depictive representation as something that represents its target by resemblance, namely (a) each portion of the representation must correspond to a visible portion of the actual object, and (b) the represented distances among the portions of the representation must correspond to the distances among the corresponding portions of the actual object (as they appear from a particular point of view). Thus, the definition of the depictive character is primarily a semantical one, in the sense that it is concerned with the informational relation between the structure of a representation and the structure of its target. In contrast, the holding of spatial constraints is exclusively concerned with the structure of the representation side, and especially, with laws or law-like regularities governing the structure. As such, it is concerned with necessities, asserting that something must be the case if something else is the case, such as that the symbol [C] must be above the symbol [B] if [C] is above the symbol [A] and [A] is above [B]. Our results are important in showing that the sustainment of such a law-like regularity on spatial structures is an essential property of internal space, necessary to account for the function of depictive representations it carries.

Although this point is relevant to research assuming internal space in general, our results are most directly related to emerging studies of mental diagrams, which attempt to explain the functions of diagrams by positing inferential operations on internal-space representations of externally given diagrams. Bauer and Johnson-Laird (1993) obtained experimental evidence that mental representations of diagrams help our inferential process by providing static bases that we dynamically modify to envisage alternative possibilities relevant to the inference. Each modified state of the internal diagram was considered a mental model in the sense of Johnson-Laird's (e.g., Goodwin & Johnson-Laird, 2005; Johnson-Laird, 1983; Johnson-Laird & Byrne, 1991), and this result was considered evidence for mental model theory too. Johnson-Laird (2006) later suggested that each modified state of the internal diagram is represented in a mental array, where the spatial structure of the diagram is represented by the values of their components on a set of axes. Chandrasekaran, Banerjee, Kurup, and Lele (2011) recently laid out a comprehensive computational theory of the functions of mental diagrams, which specifies how mental diagrams are constructed (sometimes in a hierarchical order) and how operations on them are related to the performance of real-world tasks. The diagrammatic representation component (the DR component) in their architecture has spatiality as a key feature and has been implemented in sophisticated manners, by specifying the intensity values of its symbols in a 2D array (Banerjee & Chandrasekaran, 2010a) or by using algebraic expressions or equations that describe point, curve, or region symbols (Banerjee & Chandrasekaran, 2010b).

Under the internal-space hypothesis, the hypothetical drawing we found is an operation on the mental representation of a diagram and is clearly an instance of what Bauer and Johnson-Laired (1993) and Johnson-Laired (2006) characterized as envisaging operations.4 It also corresponds to the mental operations that Chandrasekaran, Banerjee, Kurup, and Lele (2011) and Banerjee and Chadrasekaran (2010a, b) implemented on the perception/action algorithms. The main finding of our study is then that the sustainment of spatial constraints is an essential property of mental diagrams.

In fact, spatial constraints seem to play an essential role in the particular cases of diagrammatic reasoning studied in the internal-diagram literature. If it were not for spatial constraints, for example, envisaging symbols filling in particular slots in mental circuit-diagrams would not result in a sequential connection of symbols that signifies critical information for deduction (Fig. 16A) (Bauer & Johnson-Laird, 1993), and drawing a particular path in an internal map would not result in its crossing a particular area (Fig. 16B) (Chandrasekaran, Banerjee, Kurup, & Lele, 2011).

Perhaps, once internal diagrams are implemented in terms of coordinate values on a set of axes or algebraic expressions, the effects of the relevant spatial constraints are also realized, owing to the power of the mathematical modeling based on these variables. However, this is different from theoretically specifying the roles that specific spatial constraints play in facilitating or impeding the inferential processes. Our experimental results demonstrate the need of such a theory complementing the ongoing research on mental diagrams. Spatial constraints appear to be an essential property that accounts for the power of mental diagrams as an explanatory construct.

Figure 16.

Examples of mental diagrams. (A) Quasi-circuit diagrams (Johnson-Laird, 2006, redrawn). Filling certain slots with symbols results in a sequential connection of symbols. (B) Topographical map (Chandrasekaran, 2011, redrawn with a modification). Drawing a straight path from the position labeled “A” to the position labeled “B” results in the path's crossing the shaded area.

4.5. Remaining problems

Although we found a strong tendency for people to look toward hypothetical indexing positions in various phases of our inferential tasks, there are two important questions yet to be addressed.

4.5.1. Problem of exceptions

The first problem is concerned with exceptions to this overall tendency that we found. A post hoc analysis of the eye-tracking data from Experiment 1 showed a small but significant ratio of cases where eyes do not move to hypothetical drawing positions (see Appendix A for the details of the post hoc analysis). Moreover, a participant-wise correlation analysis indicated tendencies to move toward hypothetical indexing positions are strongly correlated between Steps 2 and 3: Participants tending to move their eyes toward hypothetical indexing positions in Step 2 also tended to do so in Step 3, while participants tending to keep their eyes away from hypothetical indexing positions in Step 2 also tended to do so in Step 3. Although small in number, the latter participants just did not seem to directly look at hypothetical indexing positions.

One possibility is that these participants solved problems with a totally different inferential strategy than we had hypothesized, placing no indexes on the given diagram. Another possibility, however, is that they did produce deictic indexes on the given diagrams and revisit them but without moving their eyes directly to the individual indexes. Zelinsky and Neider (2008) identified two different eye-fixation behaviors when people track multiple moving objects. The first was to fixate on the collective center of gravity of the objects being tracked, and the second was to fixate on individual objects in turn. The first behavior was more frequent when there were few tracked objects (around 2), and the second behavior was more frequent when there were more (around 4). Fehd and Seiffert (2008) also identified the center-of-gravity strategy in the task of tracking three objects, and Doran, Hoffman, and Scholl (2009) also suggested that the two strategies were adopted in a task-dependent manner.

Given these findings, the chance is that some of our participants adopted the center-of-gravity strategy in certain trials. That is, to retain the indexes placed on the blank regions and the two square symbols, they might have fixated on their collective center of gravity. Since the collective center of gravity was generally detached from hypothetical indexing positions, this would explain the exceptional eye movements in our experiments. In particular, this explanation fits the weakened tendency of eyes' moving toward hypothetical indexing positions during Step 4 of Experiment 2. For as steps accumulate, indexes to be tracked also accumulate, and their center of gravity moves away from the outermost position where the hypothetical indexing position is. Although our experimental tasks involved indexing to non-moving objects, and hence were considerably different from the multiple-object-tracking tasks adopted by the authors mentioned above, the idea of two different fixation strategies seems to be one promising approach to explain the divergent patterns of eye movements in our data.5

4.5.2. Problem of functional significance

Despite this divergency, people generally look at hypothetical drawing positions as the constraint-based account predicts. However, moving eyes to hypothetical drawing positions is one thing, and whether such movement actually helps the process of hypothetical drawing is another. It might be that eye movement to a particular position in a diagram plays a central role to facilitate indexing operations, that is place-object, identify-object, and check-relation, on the diagram. Alternatively, eye movement may only be epiphenomenal, occurring merely as the result of some other mechanism that also generate these operations. For example, the deictic indexes in our case may be retained in the form of motor commands for saccadic movements to the indexed positions, and thus, every time the deictic indexes are operated on, these motor commands become activated and cause idle eye movements to the indexed positions. In such cases, eye movements themselves have no functional significance to central processes, neither helping nor hindering them.

Addressing this problem of functionality is important since, if eye movements to hypothetical indexing positions turn out to be central to indexing operations, it will amount to the discovery of one clear function of eye movements other than foveation. They would be instrumental to a projection of information onto space (either internal or external), as opposed to an extraction of information. Unfortunately, an examination of this intriguing possibility has to be left for a future experiment, where the occurrence of hypothetical drawing is systematically controlled so that the effects of eye movements to its performance is accurately evaluated.

5. Conclusion

Overall, our experimental data provided strong support for the theoretical predictions gained from semantic studies of graphical representation systems. Due to spatial constraints governing graphical structures, many graphical systems have the capacity to express logical consequences spontaneously when the required premises are expressed. From the semantic point of view, people are expected to take full advantage of this capacity of graphical systems for their inferential needs. This expectation was empirically borne out for transitive inferential tasks using position diagrams. We found evidence that people try to exploit spatial constraints on position diagrams in solving problems and do so quite persistently, maintaining deictic indexes to empty locations, adjusting themselves to unfamiliar diagrammatic notations, and tolerating indeterminacy in drawing positions within certain contexts. This supports the view that constraint exploitation is indeed a general process, making up one central mechanism in which human inference is facilitated by the presence of graphical representations. The data collected in this study were therefore of fundamental value to the study of diagrammatic reasoning, even though the scope was limited to particular inferential tasks involving relatively simple graphical representations.

A problem was left open, concerning whether we should consider spatial constraints that are exploited as ones on physical space or ones on internally realized space. Under the first interpretation, our results point to a new way cognition is distributed over the external world, whereas under the second, they help clarify an essential component of the spatiality of internal representations. Beyond this ramification, however, the following seems to be a common, definite implication of our study. People have a native ability to incorporate spatial constraints on external space into their reasoning processes. The ability might be based on a long-term internalization of those constraints into internal space (the internal-space view) or based on the acquisition of internal skills to project internal thoughts onto external space (the real-space view). Whichever turns out to be true, diagrammatic notations are a culture capitalizing on this ability of humans. They even amplify its scope, by helping it represent the world with ever increasing syntactic-semantic variations.


This work was supported by JSPS KAKENHI Grant numbers 20500244, 21650059, and 23300101. We thank Mr. Takugo Fukaya, whose early collaboration with the first author has laid the groundwork for the experiments reported here.


  1. 1

    Throughout this article, we use the term “visual property” in a broad sense to mean whatever property of an object we access through vision. Thus, it refers to the color and texture of an object as well as those properties that might be called “spatial” in other contexts, such as its shape, its size, and the spatial configurations of its components (when the object is complex). What counts as an object then depends on how viewers segment the visual scene on individual occasions.

  2. 2

    However, the difference in relational predicates used in the problems made a difference that is worth mentioning here. Knauff and Johnson-Laird (2002) reported that performance in transitive inference problems was impeded when more visualizable (but less spatializable) relational predicates were used. Our data corroborate their findings. As shown in section 2.2.3, the predicates “clearer” and “dirtier” used in our experiment fall into this category, and the response time in Step 2 was indeed longest when these predicates were used (mean value of 9.47 s). Response time was shorter when less visualizable and less spatializable predicates were used (8.90 s for “brighter” and “darker”) and shortest when more visualizable and more spatializable predicates were used (8.50 s for “front of” and “behind”). This trend was significant (Page's L =326 in three treatments on 26 subjects, p < .05). The response time in Step 3 had a similar trend (mean value of 4.04 s for the first category, 3.82 s for the second category, and 3.73 s for the third category). This trend did not reach any degree of significance, however. See DeLeeuw and Hegarty (2008) for a more systematic observation of how persistent the impedance effect is, especially over the use of external spatial representations (diagrams).

  3. 3

    As long as we can assume internal space shares all the relevant spatial properties with real space, behavioral data do not easily distinguish operations on one space from those on the other. We thank Balakrishnan Chandrasekaran for helping us to see this added possibility of interpreting our experimental data.

  4. 4

    In fact, the increased response latency we found for the case of introducing an indefinite symbol fits well with the prediction of mental model theory, as it exactly corresponds to the case requiring multiple mental models to be envisaged. The conjecture that the introduction of indefinite symbols in our experiment amounts to the envisaging of alternative models could be empirically tested with eye-tracking data since such envisaging would require the placement of multiple deictic indexes in a sequential manner and eyes would follow the corresponding course of movement. We acknowledge Greg Trafton for suggesting these ideas for additional experiments.

  5. 5

    We thank Greg Trafton for referring us to the relevant literature that suggested this approach.

Appendix A:

Post-hoc analysis of the eye-tracking data from Experiment 1

Fig. 17A shows the upward reaches of the individual participants' eye movements in higher-predictive problems in Experiment 1, where each circle indicates the median values of an individual participant's data in Steps 2 and 3. There was a rather strong correlation of upward reaches in Step 2 and 3 (t = 5.84,r = .77,p < .001). Of all the participants, 6/25 = 24% had medians lower than the 503-pixel line both in Steps 2 and 3 (indicated by the dotted line), meaning that their eyes tended not to go beyond the upper boundary of the actually drawn square symbols.

Similar tendencies were found for the other categories of problems (Fig. 17B–C). A rather strong correlation in Steps 2 and 3 was found for the downward reaches in lower-predictive problems (t = 7.63, r = .85, p < .001, Fig. 17B), the leftward reaches of left-predictive problems (t = 6.96, r = .82, p < .001, Fig. 17C), and the rightward reaches of right-predictive problems (t = 6.44, r = .80, p < .001, Fig. 17D). The ratio of participants who did not seem to directly look at hypothetical indexing positions was 2/25 = 8% for lower-predictive problems, 8/25 = 32% for left-predictive problems, and 1/25 = 4% for right-predictive problems.

Figure 17.

Comparison of individual participants' average reaches of eyes in Steps 2 and 3 of Experiment 1: (A) upward reaches in higher-predictive problems, (B) downward reaches in lower-predictive problems, (C) leftward reaches in left-predictive problems, and (D) rightward reaches in right-predictive problems. Dotted lines indicate upper, lower, left, and right boundaries of the actually drawn square symbols in the respective categories of problems.