The design of visual displays is often based on intuitions, approaches that analyze the basic dimensions or “visual variables” that make up visual-spatial displays (e.g., Bertin, 1983), or deriving principles from examples of good and poor graphics (e.g., Tufte, 2001). Display design is both an art and a science and these approaches are important. However, there are many cases in which intuitions and expert opinions about displays do not conform to their actual effectiveness when performance with displays is objectively measured. Empirical methods used in cognitive science and related fields (e.g., human factors) are central to testing and revising design principles based on objective data. Moreover, knowledge of human information processing and empirical measures of performance with visual displays can inform the development of cognitive models that make a priori predictions about display effectiveness.
5.1. The importance of objective measures
Intuitions about the effectiveness of displays do not always conform to their actual effectiveness. Animations of physical processes, such as the workings of machines, and biological mechanisms, provide a prominent example. Intuitively it would seem that animation should be a good means of communicating how these processes work. For example, in animations of these processes, the shapes, locations, and movements of parts of the representation correspond directly to the shapes, locations, and movements of their referents. However, in a review of several papers comparing animated to static displays, Tversky et al. (2002) indicated that there was no advantage to animations over static displays, making the point that animations are often ineffective because they are too fast or too complex. Realistic animations of a mechanical or biological system often show several different components moving at once, and critical phases often happen very quickly, but visual attention is limited, so it is not possible to encode and relate the movements of the components in the time available. A possible solution is to give users interactive control over the animation (allowing them to control the speed, pause, rewind, etc.), but even with such interactive controls, a recent study found that students constructed the wrong mental model of how a mechanical system works, a model that was actually inconsistent with the information displayed (Kriz & Hegarty, 2007). At least in educational situations, a series of “small multiples” showing key frames in the process can be as effective or more effective (Mayer, Hegarty, Mayer, & Campbell, 2005).
Intuitions about animations are an example of a more general intuition that iconic displays should resemble their referents as much as possible, when in fact the power of visual spatial displays often comes from their ability to simplify and abstract from reality. Smallman and St. John (2005) have provided extensive evidence that people have a strong preference for displays that emphasize high-fidelity spatio-temporal realism, even when these displays result in poor performance. They term this misplaced faith in realistic displays “Naïve Realism” and theorize it is rooted in metacognitive errors (folk fallacies) about the nature of perception. Specifically they argue that folk psychology is that perception is simple, accurate, and complete, accounting for the intuitions that a realistic information display be internalized easily when in fact, perception is hard, flawed, and sparse accounting for the poor performance with realistic information displays.
In summary, preference for animation, and naïve realism more generally, provide a strong argument for empirically testing the effectiveness of displays rather than relying on users’ or designers’ intuitions. One might argue that expert intuitions are more likely to be accurate than those of novices. But even expert intuitions have been found to be erroneous. For example, during the 20th century, statisticians developed a strong bias against the pie chart (see Fig. 5A) preferring divided bars (see Fig. 5B) or bar graphs as a means of displaying proportions. However, careful experiments indicated that for some tasks, pie charts are as effective as divided bar charts, and for other tasks they are actually more effective (Hollands & Spence, 1998; Simkin & Hastie, 1986; Spence & Lewandowsky, 1991). Simple judgments (e.g., comparing the population of Europe and Africa in Fig. 5) were slightly more effective with bar graphs, but complex comparisons (e.g., comparing the combinations of components) were more efficient with pie charts (Spence & Lewandowsky, 1991).
Figure 5. Examples of (A) a pie chart, (B) a divided bar chart, (C) a bar chart showing world population by continent, and (D) an aligned pie graph showing a subset of the data.
Download figure to PowerPoint
5.2. How objective measures inform display design
Researchers collect several types of empirical data to compare the effectiveness and different types of visual displays. The most common method is to record accuracy and response times of individuals as they answer specific questions with different displays of the same information, with the assumption that more effective displays are those that produce more accurate and efficient question answering. This approach has been used extensively in examining graph comprehension in particular (see reviews by Lewandowsky & Behrens, 1999; Shah, Freedman, & Vekiri, 2005). Objective performance measures of accuracy and response time are also increasingly being used to examine the effectiveness of geospatial displays (e.g., Fabrikant, Rebich-Hespanha, & Hegarty, 2010; Smallman & Cook, this volume; Yeh & Wickens, 2001), reflecting an increasing emphasis on the use of cognitive methods in cartography (e.g., Fabrikant & Lobben, 2009; MacEachren, 1995).
Another approach is to show people different displays and ask them to describe what the graph shows. Spontaneous descriptions can reveal what information is salient in a graphic, how much of the displayed information different individuals actually encode, and their schemas for what types of information a particular type of graphic communicates. For example, Shah and Carpenter (1995) asked people to describe line graphs showing the effects of two variables (e.g., stress and hours of study) on a measured variable (e.g., scores in an achievement test). They varied which of the independent variables was displayed on the x axis and which shown by lines with different colors and markers (referred to as the z-variable, see Fig. 6). Participants described the same data differently, depending on how it was displayed and their descriptions emphasized the x-y trends. When shown two graphs of the same data, as in Fig. 6, they were unable to tell that they were informationally equivalent.
Figure 6. Examples of different line graphs of the same fictitious data showing the relationship between stress, hours of study, and score on a test. The graphs differ in which variable is on the x axis and which is indicated by different lines. Shah and Carpenter (1995) found that when shown graphs like this, students were unable to tell that they showed the same data.
Download figure to PowerPoint
A related approach is to examine what visual-spatial representations people spontaneously produce when asked to communicate different forms of information. For example, Tversky, Kugelmass, and Winter (1991) asked children to place stickers on a page to represent spatial, temporal, quantitative, and preference dimensions. For a temporal judgment they might have to place stickers for breakfast, lunch, or dinner and for a preference dimension they might have to place stickers for their least favorite food, a food they like, and their favorite food. Most children placed the stickers in a line that preserved the relationships, indicating that they naturally mapped more abstract relations to space. They mapped spatial and temporal dimensions to space at an earlier age than they mapped quantitative and preference dimensions and their mappings were affected by writing order in their cultures (see examples in Fig. 7). Spontaneous depictions reveal natural mappings between meaning and space that can be capitalized in the design of visual displays.
Figure 7. Examples of configurations spontaneously produced by students when they were asked to place stickers representing temporal and preference dimension by Tversky et al. (1991). The top panel shows that children naturally mapped time to the horizontal dimension, with the dominant direction influenced by the order of writing in their cultures. The bottom panel shows that they naturally mapped preference dimensions to space, using both horizontal and vertical dimensions; when the vertical dimension was used, preference was from top to bottom.
Download figure to PowerPoint
With the development of user-friendly eye trackers, eye fixations are increasingly being used to inform the design of visual displays by cognitive scientists (e.g., Carpenter & Shah, 1998; Peebles & Cheng, 2003; Ratwani, Trafton, & Boehm-Davis, 2008) as well as in related domains such as education (van Gog & Scheiter, 2010) and cartography (Fabrikant et al., 2010). Eye fixations can be interpreted as a measure of overt visual attention (cf. Henderson & Ferreira, 2004). While reaction times provide information about the general efficiency of task performance with a display, eye fixations can provide more diagnostic information, for example, identify areas of a display that attract attention although they are not task relevant. Observing the eye fixations of more expert or more successful users of a display may also lead to the design of displays that direct less successful users’ attention to the task relevant information. For example, Grant and Spivey (2003) examined eye fixations on a diagram of Duncker’s (1945) classic tumor problem, while people solved this problem and found that successful problem solvers made more eye-fixations on the outline of the body (the skin). They then redesigned the display to make the skin more visually salient by animating it, and thus improved problem solving with the display.
Finally, with the increased availability of interactive displays, methods that log users’ interactions with these displays are increasingly important (Robertson et al., 2009). Interaction logs indicate the extent to which people use the different functions afforded by interactive displays and can be related to measures of performance to reveal which interactions are most effective. For example, Keehner, Hegarty, Cohen, Khooshabeh, and Montello (2008) examined use of an interactive visualization to perform a task that involved imagining the cross section of a three-dimensional object. Participants were provided with a computer model of the object that could be rotated in any direction using an intuitive 3-degrees-of-freedom interface. Interaction logs indicated that the most common user interaction was to rotate the model to view the object from a perspective perpendicular to the cross section to be imagined. Those who used the interactive models in this way had better task performance, but many participants did not use the models in this way. This study makes it clear that just providing people with an interactive visual display does not ensure that they will use it effectively.
5.4. How cognitive models inform display design
Some cognitive science and human factors researchers make prescriptions about the design of displays on the basis of task analyses and knowledge of perception and cognition. In a classic task analysis, Cleveland first analyzed the basic perceptual tasks that had to be carried out to encode the information in different common kinds of statistical graphs, such as pie charts, bar charts, scatter plots, etc. (Cleveland, 1985; Cleveland & McGill, 1984). For example, perception of angles is necessary to understand pie charts, perception of position along a common axis is necessary to understand bar charts, and position along non-aligned scales is necessary to compare corresponding elements in stacked bar charts (see Fig. 5). On the basis of psychophysics research and their own empirical studies, Cleveland and colleagues ordered the basic perceptual tasks in terms of accuracy. Perceiving position along a common scale was judged as the most accurate, followed by position along nonaligned scales, comparisons of line lengths, angles, areas, and volumes in that order. The ordering of the necessary perceptual tasks was used to predict the effectiveness of different types of graphs, for example, that bar charts would be more effective than pie charts for presenting relative magnitudes because position along a common scale is a more accurate perceptual judgment than is angle. A meta-analysis by Carswell (1992), in addition to Cleveland’s own research, provided good support for the model, when the graph comprehension tasks involved extracting of specific data points and local comparisons (e.g., comparing the proportions for Europe and Africa in Fig. 5), although Carswell suggested that the model was less effective in explaining performance for tasks that involved making global comparison and synthesis judgments (e.g., comparing combinations of data points or judging the general variability of the data points in the graph).
Other task analyses provide models of the elementary perceptual and cognitive processes necessary to carry out various data interpretation tasks with different types of displays (e.g., Gillan & Callahan, 2000; Gillan & Lewis, 1994; Hollands & Spence, 1998; Lohse, 1993; Simkin & Hastie, 1986; Spence & Lewandowsky, 1991). Elementary processes might include visual search to find an element in a display, scanning to estimate the distance between two components, and mental superimposition to compare the size of two components. The number of basic processes (and estimates of their duration) is then used to predict the efficiency of carrying out different graph comprehension tasks with various types of displays, with the assumption that displays that minimize the number of basic processes will be more efficient. These models have been quite successful in predicting the efficiency of specific graph comprehension tasks. For example, Gillan and Lewis (1994) found that a simple componential model accounted for up to 85% of individuals’ response times to answer different questions (identifying single values, comparing values, and calculating values) from common graph types (line graphs, scatter plots, and stacked bar graphs). Models can also guide the design of new displays. For example, on the basis of a task analysis, Gillan and Callahan (2000) redesigned the pie graph to create a new format, the aligned pie graph (see Fig. 5D), which proved to be more efficient for the specific task of comparing proportions. They argued that this comparison involves both mental and rotation and superimposition of the two elements with the standard bar chart (Fig. 5A) but only superimposition for the aligned pie graph in Fig. 5D.
Task analytic models have also been used to develop computational models that predict the sequence of eye fixations that a person will make while answering a question from a visual display as well as response time (Lohse, 1993; Peebles & Cheng, 2003). For example, Peebles and Cheng (2003) developed production system models of optimal scan paths for reading values from different types of graphs and evaluated these models using both reaction time and eye fixation data. The model accounted for 87% and 66% of the variance in reaction times for two different graph formats and demonstrated that a less familiar graph (parametric graph) that is better tuned to the task requirements can be more effective than a more familiar type (function graph). Similarly, Trafton and colleagues (Breslow et al., 2009; Ratwani et al., 2008; Trafton et al., 2000) have used a combination of task analysis, cognitive modeling, eye-tracking, and verbal protocols to study how people extract information from geospatial displays, integrate information across different displays and variables, and to explain interactions between tasks and display format.
A relatively new modeling approach is to use general models of visual salience (e.g., Itti & Koch, 2000) or visual clutter (e.g., Lohrenz, Trafton, Beck, & Gendron, 2009; Rosenholtz, Li, & Nakano, 2007) to guide the design of displays. For example, the Itti and Koch model uses information about how visual features (color, intensity, and orientation) are processed by the visual system to derive a “salience map” for any image, assuming that salient areas are those that are most different from their surrounding regions on the visual features. Fabrikant et al. (2010) used the Itti and Koch (2000) model in conjunction with an informal task analysis to redesign weather maps to make task-relevant information salient. An original weather map (downloaded form the Web) is shown in Fig. 8A and the task studied was to infer wind direction, which is based on pressure, so that pressure is task relevant and temperature is irrelevant. To redesign the map, Fabrikant et al. used cartographic principles (Bertin, 1983) to make the task-irrelevant temperature information less salient (by muting the colors showing temperature) and make the task-relevant pressure systems more salient. The resulting maps were tested by applying the salience model, and the redesign and test cycle was repeated until the arrow and pressure systems were identified as the most salient display regions by the model (Fig. 8B). Empirical testing indicated that people performed the inference task more efficiently (Fabrikant et al., 2010) and more accurately (Hegarty et al., 2010) with the redesigned maps.
In summary, empirical studies have made it clear that one should not rely on intuitions alone to judge the effectiveness of visual displays, as people’s intuitions about displays are not necessarily a good indication of their effectiveness. Cognitive scientists have had good success in characterizing the cognitive processes involved in performing tasks with visual displays and in developing cognitive models that can predict the relative effectiveness of different displays. However, to date most of this research has focused on relatively simple displays of quantitative data and on well-defined tasks, such as extracting specific values, comparing values, or detecting expected trends. These simple tasks contrast sharply with the types of tasks of interest to the visual analytics community where the goals are much broader, and ill defined, including data exploration, sense making, and reasoning with visualizations of complex data sets with thousands of data points (Thomas & Cook, 2005). Research to date therefore points to both the promise of cognitive science approaches and the challenges that lie ahead in scaling up cognitive approaches to the design of displays for more complex tasks.