2.1. Top level: Performance on a complex task
Technological innovations, like remote sensing and simulated virtual environments, have created new application opportunities and new avenues for research on spatial cognition. Activities like training in simulated virtual environments and piloting unmanned aerial vehicles (UAVs) are increasingly prevalent. These technologies go well beyond more traditional spatial tasks like you-are-here maps, navigation, and perspective taking, which historically have received more research attention (e.g., Huttenlocher & Presson, 1979; Levine, Marchom, & Hanley, 1984; Malinowski & Gillespie, 2001). Moreover, they are having a substantial impact on modern society, yet the complex interplay of spatial information processing, perception, action, and higher-level reasoning involved in their use is still poorly understood (see Kheener, current issue). More sophisticated theories of spatial cognition are needed, which can explain performance in these new domains of human activity.
Dimperio, Gunzelmann, and Harris (2008) describe a computational process model that flies a UAV in a synthetic task environment to perform reconnaissance missions. It uses a virtual stick and throttle to maneuver a simulated Predator UAV so that high-resolution surveillance footage of a ground target is obtained through a hole in a layer of clouds. An illustration of the task environment is shown in Fig. 1. The goal is to maximize “time on target” while minimizing violations of various restrictions (e.g., “no-fly” zones, altitude limits). This is a complex task requiring spatial reasoning and planning but also involving significant non-spatial processing. For instance, substantial knowledge is required to simply control the aircraft appropriately to avoid situations that cost points in the simulation but that might put the aircraft at risk in an operational setting (e.g., violations and engine stalls).
Figure 1. Illustration of the unmanned aerial vehicle (UAV) synthetic task environment (STE) reconnaissance mission task. On the left is a nose camera view from the plane of the layer of clouds, with a cloud hole visible in the distance. Superimposed on this view is a heads-up display (HUD), providing critical information related to the UAV’s performance. On the right is a map of the area, which also shows the location and orientation of the UAV and the target location. The location of the cloud hole is shown here for illustration, but it was not visible to participants in the experiment.
Download figure to PowerPoint
As the task requires that various cognitive activities and functions be fully integrated, Dimperio et al. (2008) used a cognitive architecture—specifically the Adaptive Control of Thought-Rational, or ACT-R, architecture (Anderson, 2007)—as the theoretical foundation for the model. The model interacts directly with the task software, and it does a relatively good job of matching human performance on measures like time on target and penalty time (see Dimperio et al., 2008). It also produces routes that are qualitatively similar to the routes flown by some expert pilots (Fig. 2). Importantly, the spatial reasoning in the model is accomplished largely through mechanisms that are not grounded in psychological theories of spatial cognition. Consequently, as currently implemented there are important questions about the psychological validity of some aspects of the model. However, valid mechanisms for these processes are critical for developing a detailed scientific understanding of human performance in complex tasks, as well as in applied settings where psychological models may be utilized by decision makers on issues ranging from training, to workload, to interface design.
Figure 2. Sample flight paths for two expert human pilots (A and B) and the model (C). The image presents a top-down view of the flight path during a 10-min trial. The lines transition gradually from black to red, with green portions illustrating periods where the camera was obtaining surveillance footage of the target. The cloud hole is illustrated for information purposes in blue. The model produces paths that are qualitatively similar to many human flight paths, although the human paths (A and B) show that the human flight paths were quite variable.
Download figure to PowerPoint
A full description of the model is beyond the scope of this paper. More important than the details, however, is a primary motivation for its development, which was to expose gaps in theoretical coverage within ACT-R, particularly relating to spatial information processing. Table 1 lists several spatial processes that are needed by our UAV model to accomplish the reconnaissance task but are not explained well using mechanisms that compose the current ACT-R architecture.
Table 1. Spatial abilities required for the UAV model but not represented by validated mechanisms in ACT-R
|Spatial Ability||Relevance in UAV Model|
|Encoding spatial location||Representing relative locations of visible objects, including the plane, the target, and the cloud hole.|
|Mental rotation||Reasoning about aircraft maneuvering based upon map-based representation of plane location.|
|Frame of reference transformations||Reasoning between the egocentric perspective from camera and the map-based representation.|
|Spatial reasoning for route planning||Finding and subsequently returning to the area where the target is visible through the cloud hole.|
|Mental simulation of spatial action||Anticipating consequences of maneuvers; reasoning about strategies for maximizing time on target.|
|Visualization and visuospatial memory||Constructing and maintaining mental images of key spatial information, such as: the map region from which the target is viewable; the predicted track of the aircraft; and the “cone” relating altitude to target-viewing area.|
Consider one specific example from Table 1—route planning. In the task, the model must plan an initial route to fly over the cloud hole. Although the cloud hole may be visible through the nose camera on the plane, the field of view is limited (about 30°) and distance information is lacking (see Fig. 1). Sometimes, a no-fly zone blocks a direct path, necessitating some planning to maneuver around the no-fly zone and get back on the intended trajectory. Such planning is straightforward for humans, but in Dimperio et al. (2008), it was necessary to use trigonometric functions and waypoints implemented directly in code, due to a lack of validated computational mechanisms for performing such reasoning in a humanlike manner. While effective in the task, these processes are unlikely to reflect cognitively valid mechanisms for this kind of planning.
Another example involves coordinating spatial information between the camera-based representation on the left display and the map-based perspective on the right in Fig. 1 (Frame of reference transformations in Table 1). The model uses the first-person view for little more than initially locating the cloud hole and orienting the plane appropriately. Once that is accomplished, the spatial planning and reasoning is based almost entirely on map-based information and tends to be mostly heuristic in nature (e.g., fly higher and more slowly while over the cloud hole). Whereas this is sufficient in the context of this task, there are many other contexts where more explicit coordination of in situ and map-based information is required. In fact, the situation noted in the previous paragraph—planning a route to avoid a no-fly zone—represents an instance of this, as the cloud hole is visible in the first-person (egocentric) view, and the no-fly zone is indicated on the map (exocentric view). This kind of task, which exposes important gaps in the representational and processing capacities of ACT-R, has been the focus of the research discussed next, which utilized a simpler laboratory-based task to hone in on this process more closely.
2.2. Component level example 1: Reasoning about frames of reference
In the UAV reconnaissance task, and in a variety of real-world settings, information is distributed across multiple internal and/or external representations. The particular situation in the reconnaissance task is common—perceptual information about the world from a first-person perspective must be coordinated with map-based representations in a variety of navigation contexts (e.g., Levine, Jankovic, & Palij, 1982; Malinowski & Gillespie, 2001; Péruch & Lapin, 1993). We have conducted research on the coordination and integration required in this kind of task to address how spatial information processing abilities are leveraged to make various sorts of judgments (Gunzelmann, 2008; Gunzelmann & Anderson, 2006; Gunzelmann & Lyon, 2006). In one version of the task, participants are shown a visual scene, depicting a circular space containing a number of objects from a viewpoint somewhere along the edge. Along with this egocentric view, an exocentric map is shown, which also shows the object locations. An example is illustrated in Fig. 3. Participants were asked to click on the edge of the map to indicate the viewpoint for the visual scene. The response was scored as being correct when it fell within ±15° of the actual viewpoint.
Figure 3. Sample trial for a task where participants must identify their location on the map, given the first-person view of the environment on the right. Responses are made by clicking within the darker shaded ring around the edge of the map, and they were scored correct if they fell within ±15° of the actual location. In this trial, the viewer is positioned northeast of the center, looking southwest.
Download figure to PowerPoint
The task illustrated in Fig. 3 was challenging for participants to perform, and error rates were in the range of 25%–30%, even after significant practice (i.e., hundreds of trials). What is most interesting about the errors, however, is the distribution of responses relative to the correct answer. Fig. 4 shows the response proportion as a function of the angular deviation from the actual viewpoint (the first three points represent responses scored as correct). The results clearly show that errors in the task were not random. Rather, they tended to be quite close to the correct answer. The pattern suggests that participants were able to understand and identify the correspondence between the map and the visual scene generally but often failed to estimate the location of the viewpoint precisely enough.
Figure 4. Distribution of responses for the model and human participants as a function of angular deviation from the actual viewpoint. Responses were scored as correct when they fell within ±15° of the actual location.
Download figure to PowerPoint
In modeling human performance on this task, we have drawn upon other behavioral and neuropsychological research that has argued for a distinction between qualitative and quantitative spatial processes (Baciu et al., 1999; Kosslyn, Sokolov, & Chen, 1989). The model makes a response by executing a series of productions that implement a two-step strategy for the task. The first is a qualitative step, where the model identifies corresponding groups of objects in the visual scene and on the map. This information is used to narrow the potential response area to a region along the edge of the map (i.e., an arc where the qualitative spatial relations are preserved). Then, in a second step, the model uses quantitative estimates of the egocentric bearing to particular groups in the visual scene to refine its estimate of the viewpoint location within that potential response area on the map. Specifically, the model responds when it finds a location on the edge of the map where the bearing estimates are close enough to what was encoded from the visual scene, based upon prior experience with the task. Finding the correct response location is challenging because of the differing reference frames but also because encoding of quantitative information is imprecise. As a result, the model makes errors, and it produces a pattern of responses that closely approximates the performance of human participants (Fig. 4).
This model begins to illustrate how qualitative and quantitative spatial reasoning may be integrated in human performance to accomplish operations in Table 1 like spatial encoding and reference frame transformations. Variations of this task have also been used to show how a single general strategy may be adapted for use in different particular task contexts (Gunzelmann, 2008). However, even in this task, the spatial mechanisms are largely abstracted and captured in parameters like rotation speed or spatial update time. In other words, this task is more about using spatial knowledge than isolating the representations and processes that are involved at the architectural level. To focus even more precisely on those details, we have developed a task that taxes spatial processing specifically, and which allows us to address some of the foundational mechanisms involved in visualizing spatial information to support decision making.
2.3 Component level example 2: Spatial visualization
Visualizing spatial material is a key aspect of solving many kinds of spatial problems, including route planning, mental simulation of spatial action, and visuospatial memory processes mentioned in Table 1 in the context of our UAV reconnaissance task. To further sharpen our focus on understanding the elementary representations and processing mechanisms that underlie these abilties, we studied visualization using the path visualization task (Lyon et al., 2008). The goal was to tap the process of visualizing complex spatial material, while minimizing extraneous aspects of task complexity. The simplest versions of the path visualization task are similar to the experience of hearing or reading a verbal description of how to walk or drive from one point to another, except that paths can be in three dimensions. The participant hears or reads a sequence of segments for a path wandering randomly through an undifferentiated, 5 × 5 × 5 grid space (for example “Right 1,”“Up 1,”“Forward 1,” with segment length always 1 step). Each time a segment is presented, a key must be pressed to indicate whether or not the new path segment has intersected with any previous part of the path. Both accuracy and response time are recorded; the latter primarily to assure that any effects on accuracy are not due to speed-accuracy tradeoffs.
Path visualization is similar in some ways to tasks used by Brooks (1968), Attneave and Curlee (1983), Kerr (1987, 1993), Diwadkar, Carpenter, and Just (2000), and others. However, we have argued (Lyon et al., 2008) that this task is particularly good at forcing participants to actually perform complex, very effortful visualization, rather than using alternative strategies such as verbal rehearsal or numeric recoding. We believe that this provides us with an objective measure of accuracy in spatial visualization, thereby revealing some of the details regarding the underlying mechanisms in this component of human cognitive functioning.
We have built and tested a detailed computational model (implemented in ACT-R) of visualization accuracy for this task (Lyon et al., 2008). The model proposes several processes that operate on chunks representing segments of the path, localized within a cognitive representation of space. According to this model, the following three processes are the major sources of error in this kind of spatial visualization: (a) decay of chunk activation; (b) associative interference; and (c) spatial interference. Decay and associative interference are part of virtually all ACT-R models of declarative memory, but spatial interference is a new process that helps to explain the strong effects of spatial proximity found in our data. This mechanism represents greater confusability between elements being visualized when they are closer together. In path visualization, this leads to incorrectly identifying an intersection when a nearby node has been visited. These proximity effects in visualization could prove to be related to crowding effects in vision itself (Lyon, 2009), although evidence for this idea is currently inconclusive.
The model was based on experiments in which verbal descriptions of path segments were exocentric, so for example, “Up” was always toward the top of the space and did not vary with the facing direction of a hypothetical traveler along the path. There is, however, another important class of situations in which spatial visualization plays a role. People frequently attempt to visualize paths described egocentrically, with reference to their own virtual facing direction (“Turn right, go 2 blocks, then turn left”). Although the description is egocentric, often they need to interpret the result in terms of an exocentric, map-like representation.
It is easy to study both exocentric and egocentric path descriptions using the path visualization task, because any path can be described either way. For instance, a square can be described in egocentric terms as a sequence of four “Right 1” segments. The corresponding description in an exocentric reference frame would be “Right 1,”“Back 1,”“Left 1,”“Forward 1.” When we compared performance using these two alternatives for presenting path information, we found that egocentrically described paths produced substantially more errors (and longer response latencies) than exocentrically described paths (Lyon, Gunzelmann, & Gluck, 2007). A possible explanation for this result is that the additional errors in the egocentric condition are due to an egocentric-to-exocentric translation process that is used to “construct” a representation of the path in an exocentric reference frame. Therefore, we extended our computational model to incorporate this process. Our data show that the time required to perform egocentric-to-exocentric translation increases substantially as the orientation of one’s (virtual) body differs from upright and facing forward (Fig. 5). Once this factor was incorporated into the model, we could account for the difference in accuracy between egocentric and exocentric conditions. Interestingly, the model explained this result based solely on the increased decay associated with the longer latencies produced by the translation process (Lyon & Gunzelmann, 2009).
Figure 5. Speed of mentally constructing each segment of a 3D allocentric “map” of (virtual) movements through a grid, shown as response times (ms) for correct responses in the task as a function of the misalignment of the imagined orientation. The time to visualize a map segment increases with increasing misalignment of one’s virtual body from an upright, forward-facing orientation. For example, a top misalignment of 180° would be standing on one’s head; a facing misalignment of 90° would be facing either right or left. Note that the faster response time in the 180°–180° misalignment condition appears to reflect a somewhat easier situation, where the rotations produce a mirror-image rotation that can be computed more quickly (e.g., Gunzelmann, 2008).
Download figure to PowerPoint
Another important aspect of the path visualization task is that the same set of paths can be presented using a variety of different methods. The results discussed above came from experiments in which people visualize paths that are described verbally. However, UAV operators experience both verbal descriptions of paths and the virtual experience (from the nose camera feed) of moving along a path, turning, climbing, etc. To examine the influence of this perceptual information, we compared error rates for egocentrically described paths to paths experienced through virtual motion (Lyon, Gunzelmann, & Gluck, 2006). The result was clear—virtual motion did not improve visualization accuracy. Of course this result applies only to the process of visualizing the abstract path itself, including the ego-to-exo translation process. Visualizing landmarks is a very different matter, and it would be expected to benefit from visual depiction over verbal description.
The results using the path visualization task expose some details regarding the mechanisms of spatial visualization identified in Table 1. They suggest that the mechanisms that cause errors in visualizing a mental map operate in a similar way for very different perceptual inputs (Klatzky, Wu, & Stetten, 2008). In addition, the model illustrates that maintaining visualized information may be subject to many of the same dynamic influences as other forms of declarative knowledge. These findings speak to the nature of the representations and processes that are needed in developing a quantitative account of human spatial competence.
The tasks and models described in this section represent an example of an architecture-guided, multilevel research strategy that addresses both complex, naturalistic tasks (UAV reconnaissance) and controlled, laboratory examinations of spatial processing in two different tasks. By using a cognitive architecture that is capable of modeling both complex tasks and key component processes, we can hope to eventually show exactly how spatial competence is integrated with non-spatial cognitive processes, and how elementary spatial processes support complex cognitive functions like those mentioned in Table 1. In the next section, we look at the other side of the coin, namely, the gaps and shortcomings that we have encountered in trying to apply the architecture-guided strategy using a specific cognitive architecture—ACT-R.