Communication in the wild is a sound and light show combining words, prosody, facial expressions, gestures, and actions. Although it is often presumed—think of the “letter of the law” and transcripts of trials—that meanings are neatly packaged into words joined by rules into utterances, in fact, other channels of communication carry significant aspects of meaning, despite or perhaps because of the fact that they cannot be neatly packaged into units strung together by rules (e.g., Clark, 1996; Goldin-Meadow, 2003; Kendon, 2004; McNeill, 1992, 2005). Prosody, as in irony or sarcasm, can overrule and even reverse meanings of words, as can facial expressions. Pointing can replace words, for things, for directions, and more, so that natural descriptions, narratives, or explanations cannot be fully understood from the words alone (e.g., Emmorey, Tversky, & Taylor, 2000). Gestures go beyond pointing; they can show size, shape, pattern, manner, position, direction, order, quantity, both literally and metaphorically. They can express abstract meanings, mood, affect, evaluation, attitude, and more. Gestures and actions convey this rich set of meanings by using position, form, and movement in space. Communication can happen wordlessly, as in avoiding collisions on busy sidewalks or placing items on the counter next to the cash register to indicate an intention to buy. In fact, the shelf next to the cash register is designed to play a communicative role. Standing next to a circle of chatting acquaintances can be a request to join the conversation. Opening the circle is the group’s wordless response. Rolling one’s eyes can signify, well, rolling one’s eyes. Communication in the wild combines and integrates these modes, usually seamlessly, with each contributing to the overall meaning (e.g., Clark, 1996; Engle, 1998; Goldin-Meadow, 2004; Kendon, 2004; McNeill, 1996; Tomasello, 2008).
Gestures and actions are especially convenient because their tools, like the tools for speech, are free, and they are always with us. But gestures, like speech, are fleeting; they quickly disappear. They are limited by what can be produced and comprehended in real time. These limitations render gestures abstract and schematic. Visualizations, on paper, silk, parchment, wood, stone, or screen, are more permanent; they can be inspected and reinspected. Because they persist, they can be subjected to myriad perceptual processes: Compare, contrast, assess similarity, distance, direction, shape, and size, reverse figure and ground, rotate, group and regroup; that is, they can be mentally assessed and rearranged in multiple ways that contribute to understanding, inference, and insight. Visualizations can be viewed as the permanent traces of gestures; both embody and are embodied. Like gesture, visualizations use position, form, and actions in space to convey meanings (e.g., Tversky, Heiser, Lee, & Daniel, 2009). For visualizations, fleeting positions become places and fleeting actions become marks and forms. Here, we analyze the ways that place and form constrain and convey meaning, meanings that are based in part in actions.
Traces of visual communication go far back into prehistory. Indeed, they are one of the earliest signs of culture. They not only precede written language but also served as the basis for it (e.g., Gelb, 1963; Schmandt-Besserat, 1996). Visual communications come in myriad forms: animals in cave paintings, maps in petroglyphs, tallies on bones, histories on columns, battles in tapestries, messages on birch bark, journeys in scrolls, stories in stained glass windows, dramas in comics, diagrams in manuals, charts in magazines, and graphs in journals. All forms of communication entail design, as the intent of communication is to be understood by others or by one’s self at another time. Communication design, then, is inherently social, because to be understood by another or by self at another time entails fashioning communications to fit the presumed mental states of others or of one’s self at another time.
Diagrams, along with pictures, film, paintings in caves, notches in wood, incisions in stone, cuttings in bone, impressions in clay, illustrations in books, paintings on walls, and of course words and gestures, externalize thought. They do this for many reasons, often several simultaneously. Some are aesthetic: to arouse emotions or evoke pleasure. Some are behavioral: to affect action or promote collaboration. Some are cognitive: to serve as reminders, to focus thoughts, to reorganize thoughts, and to explore thoughts. Many are communicative: to inform both self and others.
Because depictions, like other cultural artifacts (e.g., Norman, 1993; Donald, 1991), have evolved over time, they have undergone an informal but powerful kind of natural user testing, produced by some, comprehended by others, and refined and revised to improve communication by a community of users. Similar processes have served and continue to serve to design and redesign language (cf. Clark, 1996). Features and forms that have been invented and reinvented across cultures and time are likely to be effective. Analyzing these depictive communications, then, can provide valuable clues to designing new ones. It can save and inspire laboratory work, as well as the tasks of designers. What is more, the natural evolution of communication design can be brought into the laboratory and accelerated for specific ends (see Tversky et al., 2007).
Oddly, this rich set of visual forms has traditionally been discussed in the domain of art, along with painting, drawing, and photography. Increasingly, that discussion has expanded to include diagrams, charts, film, graphs, notational systems, visual instructions, computer interfaces, comics, movies, and more, to take into account the mind that perceives, conceives, and understands them, and to ripple across domains (e.g., Arnheim, 1974; Bertin, 1981; Card, Mackinlay, & Shneiderman, 1999; Elkin, 1999; Gombrich, 1961; Goodman, 1978; Kulvicki, 2006; McCloud, 1994; Murch, 2001; Small, 1997; Stafford, 2007; Wainer, 1992; Ware, 2008; Winn, 1987; Tufte, 1983, 1990, 1997). Similarly, discussions of human communication have historically focused on language, typically narrowly conceived as words and sentences, and have only recently broadened to include prosody, gesture, and action (e.g., Argyle, 1988; Clark, 1996; Goldin-Meadow, 2003; Kendon, 2004; McNeill, 1996).
Unlike symbolic words, forms of visual communication, notably diagrams and gestures, often work by a kind of resemblance, that is, sharing features or associations, typically visuo-spatial features, with the meanings they are intended to convey (a claim of some philosophic controversy, e.g., Goodman, 1978; Hochberg & Brooks, 1962; Walton, 1990). The proverbial “big fish” is indicated in gesture by expanding the fingers or hands horizontally, thus capturing the approximate relative horizontal extent of the fish, but ignoring its other properties. How the fish swam to try to get away is abstracted and conveyed differently, perhaps by embodying the fish and its movements. Similarly, the shape, dimensions, and even actions of the fish can be abstracted in a variety of ways to the page. The fish example illustrates another property of visual communication. In capturing features of the world, visual communications are highly selective; they omit information, normally information that is regarded as less essential for the purposes at hand. They abstract and schematize not only by omission but also by exaggeration and even by additions. Maps, for example, are not simply shrunken aerial photographs. Maps selectively omit most information, houses, trees, fields, mountains, and the like, but also many of the twists and turns of roads or coastlines; they disproportionately enlarge roads and rivers to make them visible; they turn entire metropolises into dots. Maps may also add features like government boundaries and topological levels that are not visible.
In other words, maps, like many other kinds of visualizations, distort the “truth” to tell a larger truth. The processes that abstract, schematize, supplement, and distort the world outside onto the world of a page, filtering, leveling, sharpening, categorizing, and otherwise transforming, are the same processes the nervous system and the brain apply to make sense of the barrage of stimuli the world provides. Attention is selective, ignoring much incoming information. The perceptual systems level and sharpen the information that does come in; for example, the visual system searches for the boundaries that define figures by sharpening edges and corners, by filling in gaps, by normalizing shapes. Cognition filters, abstracts, and categorizes, continuing this process, and symbol systems carry these processes further. Long things do not necessarily get long names, though children often expect them to (e.g., Tolchinsky Landsmann & Levin, 1987). Tallies eliminate the identity of objects, recording them just as instances, though tallies preserve a one-to-one correspondence that Arabic numerals, more convenient for calculations, do not.
The virtues of visual communications have been extolled by many (e.g., Kirsh, 1995; Larkin & Simon, 1987; Norman, 1993; Scaife & Rogers, 1996; Tversky, 1995; Tversky, 2001). As noted, they are cultural artifacts created in a community (Donald, 1991; Norman, 1993), fine-tuned by their users (e.g., Tversky et al., 2007). They can provide a permanent, public record that can be pointed at or referred to. They externalize and clarify common ground. They can be understood, revised, and manipulated by a community. They relieve limited capacity short-term memory, they facilitate information processing, they expand long-term memory, they organize thought, they promote inference and discovery. Because they are visual and spatial, they allow human agility in visual-spatial processing and inference to be applied to visual-spatial information and to metaphorically spatially abstract information.
In contrast to purely symbolic words, visual communications can convey some content and structure directly. They do this in part by using elements, marks on a page, virtual or actual, and spatial relations, proximity and place on a page, to convey literal and metaphoric elements and relations. These ways of communicating meanings may not provide definitions with the rigor of words, but rather provide suggestions for meanings and constraints on them, giving them greater flexibility than words. That flexibility means that many of the meanings thus conveyed need context and experience to fully grasp. A line in a route map has a different meaning from a line in a network and from a line in a graph, though, significantly, all connect. Nor is the expressive power of visual communication as great as that of language (e.g., Stenning & Oberlander, 1995); abstract or invisible concepts like forces, traits, counterfactuals, and negations are not easily conveyed unambiguously in depictions. Even so, conventions for conveying these kinds of concepts have evolved as needed, in road signs, mathematics, science, architecture, engineering, and other domains, a gradual process of symbolization akin to language.
What are the tools of depictions, especially diagrams? How do they communicate? The components of visual communication are simple: Typically, a flat surface, prototypically, a page (or something analogous to a page like a computer screen) and marks or forms placed on it (e.g., Ittelson, 1996; Tversky, 1995, 2001; Tversky, Zacks, Lee, & Heiser, 2000). Each of these, place and form, will be analyzed to show how they can represent meanings that are literal and metaphoric, concrete and abstract. The interpretations will be shown to depend on content and context, on Gestalt or mathematical properties of the marks in space, on the place of the marks on the page, as well as the information processing capacities and proclivities of the mind. The foundations and processes of assigning meaning can be revealed, then, by recurring inventions and by errors and biases in interpretation, that is, by uses and misuses, by successes and failures. The analysis of inventions of visual communication can provide directions for the design of visual communications.
Because assigning meaning, whether from description or depiction, is in part a reductive process—the space of possible meanings is greater than the space of ways to express meanings—misuses, misinterpretations, and misunderstandings are as inevitable as successes, and both are instructive. Expressing meanings, then, entails categorization. Categories create boundaries where none exist; some instances are included and others, even close ones, are not. The consequence of categorization is to increase the perceived similarity of members included in the category and to exaggerate the perceived distance between members and nonmembers. Although the focus here is on meanings conveyed through place and forms, the meanings are deeper, they are conceptually spatial, some more literal, some more metaphorical, so that they have parallels in other ways of using space as well, in words, in actions, and in gesture, in the virtual space created by gesture and the mental space created by words (e.g., Gattis, 2004; Lakoff & Johnson, 1980; Tversky et al., 2009). First, we will discuss place in space, and then forms in space.
3. Forms in space
Now we turn from the space of the page to marks on the page, to examine how marks convey a range of meanings, like space, by using natural correspondences. Although the simplest marks are dots or lines, the most common now and throughout history are undoubtedly what have been referred to as pictograms, icons, depictions, or likenesses, from animals on the ceilings of caves to deer on road signs. Marks on a page have been termed signs, which refer to objects for minds that interpret them, by Peirce, who distinguished three kinds of them (e.g., Hartshorne & Weiss, 1960). An icon denotes an object by resemblance, an index, such as a clock or thermometer, denotes an object by directly presenting a quality of an object, and a symbol, a category that includes certificates as well as words, denotes an object by convention.
Here, we first discuss some properties and uses of likenesses or icons, and then turn at greater length to a specific kind of symbol, which we have called a glyph (e.g., Tversky, 2004; Tversky et al., 2002). Glyphs are simple figures like points, lines, blobs, and arrows, which derive their meanings from their geometric or gestalt properties in context. Glyphs are especially important in diagrams because they allow visual means of expressing common concepts that are not easily conveyed by likenesses. Glyphs have parallels to certain kinds of gestures, for example, points that suggest things that can be conceived of as points or linear gestures that suggest relationships between things. They also bear similarities to words like point and relationship whose meanings vary with context.
Marks, whether likenesses or glyphs, like lines and circles, have visual characteristics other than shape that increase their effectiveness in conveying meaning. An important feature is size. The greater the size, the greater the chance of attracting attention. The toddler knows not only that centrality captures attention, but size as well. The toddler wanting attention puts her face close, blocking other things in the visual field. Size, like centrality, can also indicate importance. Greek vases use both centrality and size; the major figure is larger and in the center, with the others arrayed to either side in decreasing order of importance. Larger bar graphs represent greater quantity or higher ratings. Additional salient visual features, like color, boldness of line, highlighting, and animation, also serve to attract attention and convey importance.
Even sketchy likenesses can be readily recognized by the uninitiated. A toddler who had never seen pictures but could label real objects recognized simple line drawings of common objects (Hochberg & Brooks, 1962). Depictions have other impressive advantages over words in addition to being readily recognized: They access meaning faster (Smith & Magee, 1980) and enjoy greater distinctiveness and memorability (e.g., Paivio, 1986). Perhaps because of their advantages for establishing meaning and memory, likenesses are so compelling that they are produced even when not needed and even when drawing them increases time and effort: in diagrams of linear and cyclical processes produced by undergraduates (Kessell & Tversky, 2009), in diagrams of information systems by graduate students in design (Nickerson et al., 2008).
Likenesses have been creatively integrated into more abstract representations of quantitative data by Neurath and his Vienna Circle and later colleagues in the form of isotypes (Neurath, 1936). Isotypes turn bars into depictions, for example, the number of airplanes in an army or yearly production of corn by a country is represented by a proportional column (or row) of schematic airplanes or corn plants.
Just as likenesses can facilitate comprehension and memory, they can also interfere. Because depictions are specific and concrete, including them when they are not essential to the meaning of a diagram can inhibit generalization, to sets of cases not depicted. By contrast, glyphs, because they are abstractions, can encourage generalization. Capturing the objects in the world and their spatial arrays in diagrams is compelling and has some communicative value, but it can interfere or even conflict with the generalizations or abstractions diagrams are meant to convey. An intriguing example comes from diagrams of the water cycle in junior high science textbooks collected from around the world (Chou, Vikaros, & Tversky, 2009, unpublished data). The typical water cycle diagram includes mountains, snow, lakes, sky, and clouds. On the one hand, these diagrams intend to teach the cycle of evaporation of surface water, formation of clouds, and precipitation. They use arrows to indicate the directions of evaporation and precipitation. On the other hand, they also want to show the water cycle on the geography of the world. As a consequence, the arrows ascend and descend everywhere, so that the cyclicity is obscured. In studies investigating interpretations of slope in diagrams of the atmosphere, students’ inferences were more influenced by the conceptual mapping of rate to slope than by the geographic mapping (Gattis & Holyoak, 1996). In producing diagrams, for example, of a pond ecology, when groups work in pairs, the compelling iconicity evident in individual productions often disappears (Schwartz, 1995). Diagrams produced by dyads become more abstract, most likely because the irrelevant or distracting iconicity is idiosyncratic and the abstractions shared.
The conflict between visualizing the world and visualizing the general phenomena that occur in the world is especially evident when diagrams are used to convey the invisible such as evaporation and gravity. With all the challenges of conveying the visible, conveying the invisible, time, forces, values, and the like presents even more challenges. Glyphs are ideal for visually conveying the invisible. They are not iconic, they do not depict the visible world, so they do not confuse or distract, yet they share many of the advantages of visual communication over purely symbolic communication, notably rapid access to meaning. We turn now to many examples of using glyphs to visually convey invisible and abstract concepts.
3.2. Meaningful glyphs
We shift now from the complex and representative to the simple and abstract. Probably the simplest mark that can be made on the page is a dot, a mark of zero dimensions. Slightly more complicated, a line, a single dimension, followed by various two-dimensional or three-dimensional forms. These simple marks and others like them that we have termed glyphs have context-dependent meanings suggested by their Gestalt or mathematical properties (Tversky, 2004, 2001). On a map of the United States, New York City can be represented as a point, or the route from New York to Chicago as a line, or the entire city can be represented as a region, containing points and lines indicating, for example, roads, subway stops, and subway lines. Continuing, New York City can also be diagrammed as a three-dimensional space in which people move. Like many other spatial distinctions, this set of distinctions has parallels in language and gesture, parallels that suggest the distinctions are conceptual and widely applicable. Regarding an entity in zero, one, two, or three dimensions has implications for thought. In a paper titled, “How language structures space,”Talmy (2000) pointed out that we can conceptualize objects in space, events in time, mental states, and more as zero-, one-, or two-dimensional entities. In English, prepositions are clues to zero-, one-, two- (and three-) dimensional thinking, notably at, on, and in. She waited at the station, rode on the train, rose in the elevator. She arrived at 2, on time, and was in the meeting until dinner. She was at ease, on best behavior, in a receptive mood. Visual expressions of dimensionality are common in diagrams, as they abstract and express key conceptual components.
3.2.1. A visual toolkit for routes: Dots and lines
Dots, lines, and regions abound in diagrams. Dots and lines, nodes and links or edges are the building blocks of route maps. They also form a toolkit for a related set of abstractions, networks of all kinds. To uncover the basic visual and verbal vocabularies of route maps, students outside a dormitory were asked if they knew how to get to a nearby fast food restaurant. If they did, they were asked either to draw a map or to write directions to get there. A pair of studies confirmed that dots and lines, nodes and links, are the basic visual vocabulary of route maps, and that each element in the visual vocabulary for route directions corresponds to an element in the basic verbal vocabulary for route directions (Tversky & Lee, 1998, 1999). Notably, although the sketch maps could have been analog, they were not; turns were simplified to right angles and roads were either straight or curved. Landmarks were represented as dot-like intersections identified by street names or as nonspecific shapes. Short distances with many turns were lengthened to show the turns, and long distances with no actions were shortened. Thus, the route maps not only categorized continuous aspects of the world, they also distorted them. Interestingly, the verbal directions were similarly schematized. Distances were specified only by the bounding landmarks; turns were specified only by the direction of the turn, not the degree. The consensus visual vocabulary consisted of lines or curves, L, T, or + intersections, and dots or blobs as landmarks. The corresponding verbal vocabulary consisted of terms like “go straight” or “follow around” for straight and curved paths, “take a,”“make a,” or “turn” for the intersections, and named or implicit landmarks at turning points. The vocabulary of gestures used to describe routes paralleled the visual and verbal vocabularies (Tversky et al., 2009). These close parallels between disparate modes of communication suggest that the same conceptual structure for routes underlies all of them.
A second study provided students with either the visual or the verbal toolkit, and asked them to use the toolkit to create instructions for several dozen destinations, near and far (Lee & Tversky, 2005). They were asked to supplement the toolkits if needed. In spite of that suggestion, very few students added elements; they succeeded in using the toolkits to create a variety of new directions. Although the semantics (vocabularies) and syntax (rules of combing semantic elements) of route maps and route directions were similar, their pragmatics differs. Route maps cannot omit connections; they must be complete. Route directions can elide; for example, in a string of turns, one end-point is the next start-point, so it is not necessary to mention both.
Why do directions that are so simplified and distorted work so well? Because they are used in a context, and the context disambiguates (Tversky, 2003). This is another general characteristic of diagrams; they are designed to be used by a specific set of users in a specific context. Indeed, part of the success of route maps and route directions is that they have been developed in communities of users who collaborate, collectively and interactively producing and comprehending, thereby fine-tuning the maps and directions, a natural kind of user-testing that can be brought into the laboratory and accelerated (Tversky et al., 2007).
The success of the visual and verbal toolkits for creating route maps and route directions has a number of implications. It has already provided cognitive design principles—paths and turns are important; exact angles and distances are not—for creating a highly successful algorithm for on-line on-demand route directions (Agrawala & Stolte, 2001). It suggests that maps and verbal directions could be automatically translated from one to the other. It is encouraging for finding similar visual and verbal intertranslatable vocabularies for other domains, such as circuit diagrams or musical notation or even domains that are not as well structured domains such as assembly instructions, chemistry, and design. It suggests empirical methods for uncovering domain-specific visual and verbal semantics, syntax, and pragmatics. Finally, it shows that certain simple visual elements have meanings that are spontaneously produced and interpreted in a context. Some of these visual elements have greater generality. Lines are naturally produced and interpreted as paths connecting entities or landmarks that are represented as dots. Hence their widespread use, from social networks, connections among people, to computer networks, connections among computers or components of computers, and more.
3.2.2. Lines connect, bars contain
As Klee put it, “A line is a dot that went for a walk.” Lines are also common in graphs, again, as paths, connections, or relations. So are dots and bars. Graph lines connect dots representing entities with particular values on dimensions represented by the lines. The line indicates that the entities are related, that they share a common dimension, but have different values on that dimension. Bars, in contrast to lines, are two-dimensional; they are containers that separate their contents from those of others. In graphs, bars indicate that all the instances inside are the same and different from instances contained in other bars. To ascertain whether people attribute those meanings to bars and lines, in a series of experiments, students were shown a single graph, either a line graph or a bar graph, and asked to interpret it (Zacks & Tversky, 1999). Some of the graphs had no content, just A’s and B’s. Other graphs displayed either a discrete variable, height of men and women, or a continuous variable, height of 10- and 12-year-olds. Because lines connect and bars contain and separate, students were expected to favor trend descriptions for data presented as lines and favor discrete comparisons for data presented as bars, especially for the graphs without content. For the content-free graphs, the visual forms, bars or lines, had major effects on interpretations, with far more trends for lines and discrete comparisons for bars. More surprisingly, the visual forms had large effects on interpretations of graphs with content, in spite of contrary content. For example, using a line to connect the height of women and men biased trend interpretations, even, “as you get more male, you get taller.” These were comprehension tasks. Mirror results were obtained in production tasks, where students were provided with a description, trend, or discrete comparison, and asked to produce an appropriate graph. More students produced line graphs when given trend descriptions and bar graphs when given discrete comparisons, as before, in spite of contrary content. The meanings of the visual vocabulary, lines or bars, then, had a stronger effect on interpretations and productions than the conceptual character of the data. When the glyph, line or bar, matched the content, there were more appropriate interpretations and when the glyph did not match the content, there were more inappropriate interpretations (for other issues with bars and lines, see Shah & Freedman, 2010).
3.2.3. Lines can mislead
Because glyphs such as lines, dots, boxes, and arrows, induce their own meanings, they are likely to enhance diagrammatic communication when their natural meanings are consistent with the intended meaning and to interfere with diagrammatic communication when the natural meanings conflict with the intended meanings. This interaction was evident in the case of bar and line graphs for discrete and continuous variables, where the interpretations of the visual glyphs trumped the underlying structure of the data when they conflicted. Mismatches between the natural interpretations of lines as paths or connections and the intended interpretations in diagrams turn out to underlie difficulties understanding and producing certain information systems designs. A central component of information system design is a LAN or local area network, common in computer systems in every institution. All of the components in a LAN are interconnected so that each can directly transmit and receive information from each other. A natural way to represent that interconnectivity would be lines between all pairs of components. For large systems, this would quickly lead to a cluttered, indecipherable diagram. To insure legibility, a LAN is diagramed as if a clothesline, a horizontal line, with all the interconnected components hanging from it. However, when students in information design are asked to generate all the shortest paths between components from diagrams containing a LAN, many make errors. A common error demonstrates a strong bias from the line glyph. The shortest paths many students generate show that they think that to get from one component on a LAN to another, they must pass through all the spatially intermediate components, much like traveling a route, to go from 10th St to 30th St one must pass 11th, 12th, 13th, and so on (Corter, Rho, Zahner, Nickerson, & Tversky, 2009; Nickerson et al., 2008). Here, again, the visual trumps the conceptual and misleads.
Lines have mixed benefits in other cases, for example, in interpreting evolutionary diagrams where they can lead to false inferences (Novick & Catley, 2007). Yet another example comes from visualizations of space, time, and agents, diagrams that are useful for keeping track of schedules, suspects, pollen, disease, migrations, and more (Kessell & Tversky, 2008). In one experiment, information about the locations of people over time was presented either as tables with place and time as columns or rows and dots representing people as entries or as tables with lines connecting individuals from place to place over time. Because lines connect, one might expect that the lines would help to keep track of movements of each individual. In one task, participants were asked to draw as many inferences as they could from the diagrams; in another they were asked to verify whether a wide range of inferences was true of the diagrams. At the end of the experiment, they were asked which interface they preferred for particular inferences. Overall, participants performed better with dots than with lines both in quantity of inferences drawn and in speed and accuracy of verification. However, and consonant with expectations, there was one exception, one kind of inference where dots lost their advantage, inferences about the sequence of locations of individuals. For temporal sequence, lines were as effective and as preferred as dots. Nevertheless, the lines interfered with generating and verifying other inferences. In another experiment, participants were asked to generate diagrams that would represent the locations of individuals over time. Most spontaneously produced table-like visualizations, notably without lines. As for preferences, participants preferred the visualizations with dots over those with lines except for temporal sequences. These findings suggest that popular visualizations that rely heavily on lines, such as parallel coordinates (e.g., Inselberg & Dimsdale, 1990) and especially parallel sets (e.g., Bendix, Kosara, & Hauser, 2006), should be used with caution, and only when the lines are meaningful as connectors.
Arrows are asymmetric lines. As a consequence, arrows suggest asymmetric relationships. Arrows enjoy several natural correspondences that provide a basis for extracting meaning. Arrows in the world fly in the direction of the arrowhead. The residue of water erosion is a network of arrow-like lines pointing in the direction of erosion. The diagonals at the head of an arrow converge to a point. Studies of both comprehension and production of arrows show that arrows are naturally interpreted as asymmetric relationships. In a study of comprehension, students were asked to interpret a diagram of one of three mechanical systems, a car brake, a pulley system, or a bicycle pump (Heiser & Tversky, 2006). Half of each kind of the diagram included arrows, half did not. For the diagrams without arrows, students gave structural descriptions, that is, they provided the spatial relations of the parts of the systems. For the diagrams with arrows, students gave functional descriptions that provided the step-by-step causal operations of the systems. The second study provided a description, either structural or functional, of one of the systems and asked students to produce a diagram. Students produced diagrams with labeled parts from the structural descriptions but produced diagrams with arrows from the functional descriptions. Both interpretation and production, then, showed that arrows suggest asymmetric temporal or causal relations.
One of the benefits of arrows can also cause difficulties; they have many possible meanings. Arrows suggest many possible asymmetric relations (Heiser & Tversky, 2006). Their ambiguity can cause misconceptions and confusion. Arrows are used to label or focus attention; to convey sequence; to indicate temporal or causal relations; to show motion or forces; and more. How many meanings? Some have proposed around seven (e.g., van der Waarde & Westendorp, 2000, unpublished data), others, dozens (e.g., Horn, 1998). A survey of diagrams in introductory science and engineering texts revealed that many diagrams had different meanings of arrows in the same diagram, with no visual way to disambiguate them (Tversky, Heiser, Lozano, MacKenzie, & Morrison, 2007).
Circles, with or without arrows, can be viewed as another variant on a line, one that repeats with no beginning and no end. As such, circles have been used to visualize cycles, processes that repeat with no beginning and no end. The common etymology of the two words, circle and cycle, is one sign of the close relationship between the visual and the conceptual. However, the analogies, like many analogies, are only partial. Circles are the same at every point, with no natural divisions and no natural direction. Yet when we talk about cycles, we talk about them as discrete sequences of steps, sometimes with a natural beginning. Hence, cycles are often visualized as circles with boxes, text, or pictograms conveying each stage of the process.
A series of studies on production and comprehension of visualizations of cyclical and linear processes asked participants to produce or interpret appropriate marks on paper (Kessell & Tversky, 2009). In a set of studies, participants were asked to fill in circular diagrams with four boxes at 12 o’clock, 3 o’clock, 6 o’clock, and 9 o’clock with the four steps of various cyclical processes, everyday (e.g., washing clothes, seasons) and scientific (e.g., the rock cycle, the water cycle). They did this easily. Although circles have no beginning, many cycles there have a conceptual beginning, and students tended to place that at 12 o’clock, and then proceed clockwise. Conversely, when asked to interpret labeled circular diagrams, they began at 12 o’clock and proceeded clockwise, except when the “natural” starting point of a cycle, for example, the one-cell stage of mitosis, was at another position. In a second set of studies, students were given blank pages and asked to produce diagrams to portray cyclical processes, like the seasons or the seed-to-plant-to-seed cycle, as well as linear processes, like making scrambled eggs or the formation of fossil fuel. Both cycles and linear processes had four stages. Unsurprisingly, most students portrayed the linear processes in lines, but, more surprisingly, most portrayed the cyclical processes as lines as well, without any return to the beginning. Heavy-handed procedures, presenting only cyclical processes, calling them such, and listing the stages vertically, brought the frequency of circular diagrams to 40%. Changing the list of stages so that the first stage was also the last, as in “the seed germinates, the flower grows, the flower is pollinated, a seed is formed, the seed germinates,” induced slightly more than 50% of participants to draw the stages in a circle, but still, more than 40% drew lines. There is strong resistance to producing circular diagrams for cycles, even among college students. In the final study, participants were provided with a linear or circular diagram of four stages of a cycle, and asked which they thought was better. Over 80% of participants chose the circular display. This is the first case we have found where production and preference do not match, though production lags comprehension in other domains, notably, language acquisition.
Why do people prefer circular diagrams of cycles but produce linear ones? We speculate that linear thinking is easier than circular; that is, it is easier to think of events as having a beginning, a middle, and an end, a forward progression in time, than it is to think of events as returning to where they started and beginning all over again, without end. Events occur in time, time marches relentlessly forward, and does not bend back on itself. Each day is a new day, each seed a new seed; it is not that a specific flower emanates from a seed and then transforms back into one. Thinking in circles requires abstraction, it is not thinking about the individual case, but rather thinking about the processes underlying all the cases. What is more, the sense in which things return to where they started is different in different cases. Every day has a morning, noon, and night, but each morning, noon, and night is unique. A cell divides into two, and then each of those cells undergoes cell division. For clothing and dishes, however, the very same articles of clothing and the very same dishes undergo washing, drying, putting away each time. Viewing a circular diagram enables that abstraction, and once people “see” it (the diagram and the underlying ideas), they prefer the abstract depiction of the general processes to the more concrete depiction of the individual case.
3.2.4. Boxes and frames
Earlier, we saw that people interpret bars as containers, separating their contents from everything else. Boxes are an ancient noniconic depictive device, evident explicitly in stained glass windows, but even prior to that, in Roman wall frescoes. Frames accentuate a more elementary way of visually indicating conceptual relatedness, grouping by proximity, for example, the spaces between words. Framing a picture is a way of saying that what is inside the picture has a different status from what is outside the picture. Comics, of course, use frames liberally, to divide events in time or views in space. Comics artists sometimes violate that for effect, deliberately making their characters pop out of the frame or break the fourth wall, sometimes talking directly to the reader. The visual trope of popping out of the frame makes the dual levels clear, probably even to children: The story is in the frames, the commentary outside (e.g., Wiesner, 2001; Tversky & Bresman, unpublished data). Speech balloons and thought bubbles are a special kind of frame, reserved for speech or thought; as for other frames, they serve to separate what is inside from what is outside. Frames, like parentheses, can embed other frames, hierarchically, indicating levels of conceptual spaces, allowing meta-levels and commentaries. Boxes and frames serving these ends abound in diagrams, in flow charts, decision trees, networks, and more.
3.2.5. Complex combinations of glyphs
As was evident from the visual toolkit for routes, glyphs can be combined to create complex diagrams that express complex thoughts and systems. Like combining words into sentences, combining glyphs into systems follows domain-specific syntactic rules (e.g., Tversky & Lee, 1999). Networks of lines and nodes, more abstractly, concepts and connections between concepts, are so complete and frequent that they constitute a major type of diagram. Others types of diagrams include the following: hierarchies, a kind of network with a unique beginning and layers of asymmetric relations, such as taxonomies and organization charts; flow charts consisting of nodes and links representing temporal organizations of processes and outcomes; decision trees, also composed of nodes and links, where each node is a choice. A slightly different type of diagram is a matrix, a set of boxes organized to represent the cross-categorization of sets of dimensions or attributes. These organized sets of glyphs and space constituting diagrammatic types appear to match, to naturally map, conceptual organizations of concepts and relations. That is, for networks, hierarchies, and matrices, students were able to correctly match a variety of conceptual patterns onto the proper visualization (Novick, Hurley, & Francis, 1999; Novick & Hurley, 2001).
Note that many of these visual complex combinations of glyphs, for example, bar and line graphs, social and computer networks, decision and evolutionary trees, have no pictorial information whatsoever, yet they inherit all the advantages of being visual. They enable human application of visuospatial memory and reasoning skills to abstract domains.
The aim of most of the diagrams discussed thus far is to convey certain information clearly in ways that are easily apprehended, from route directions to data presentations to scientific explanations. Another important role for visualizations of thought is to clarify and develop thought. This kind of visualization is called a sketch because it is usually more tentative and vague than a diagram. Sketches in early phases of design even of physical objects, like products and buildings, are frequently just glyphs, lines and blobs, with no specific shapes, sizes, or distances (e.g., Goel, 1995; Schon, 1983). Designers use their sketches in a kind of conversation: They sketch, reexamine the sketch, and revise (Schon, 1983). They are intentionally ambiguous. Ambiguity in sketches, just like ambiguity in poetry, encourages a multitude of interpretations and reinterpretations. Experienced designers may get new insights, see new relationships, make new inferences from reexamining their sketches, a positive cycle that leads to new design ideas, followed by new sketches and new ideas (Suwa & Tversky, 2001, 2003). Ambiguity can help designers innovate and escape fixation by allowing perceptual reorganization and consequent new insights, a pair of processes, one perceptual, finding new figures and relations, and one conceptual, finding new interpretations, termed “constructive perception” (Suwa & Tversky, 2001, 2003).
3.2.7. Glyphs: Simple geometric forms with related meanings
Diagrams and other forms of visual narratives are enhanced by the inclusion of a rich assortment of schematic visual forms such as dots, lines, arrows, circles, and boxes, whose meanings derive from and are constrained by their Gestalt or mathematical properties within the confines of a context. The meanings they support, entities, relations, asymmetric relations, processes, and collections, are abstract, so apply to many domains. They encourage the kind of abstractions needed for inference, analogy, generalization, transfer, and insight. They have analogs in other means of recording and communicating ideas, in language and in gesture, suggesting that they are elements of thought.
There are other abstract visual devices, infrequent in diagrams, but common in graphic novels and comics, lines suggesting motion, sound, fear, sweat, emotions, and more (e.g., McCloud, 1994). Some of these, like the lines, boxes, and arrows discussed above, have meanings suggested by their forms. Motion lines, for example, seem to have developed as a short-hand or schematization of the perceptual blurring of viewed fast motion. Others, like hearts for love, are more symbolic. The concepts conveyed by the diagrammatic schematic forms are not as readily depictable as objects or even actions.
Those glyphs, such as dots, lines, arrows, frames, and circles, that enjoy a consensus of context-dependent meanings evident in production and comprehension seem to derive their meanings in ways similar to the ways pictograms establish meanings, overlapping features. Among the properties of lines is that they connect, just as relationships, abstract or concrete, connect. Among the properties of boxes is that they contain one set of things and separate those from other things. What is in the box creates a category, leaving open the basis for categorization to the creator or interpreter. The box implies that the things in the box are more related or similar to each other than to things out of the box. The box might contain a spatial region, a temporal slice, a set of objects. These mappings of meaning, the transfer of a few of the possible features from the object represented to the representing glyph, are partial and variable. The consequence is variability of meaning, allowing ambiguity and misconception. A case in point is uses of arrows, which map asymmetric relations. But there are a multitude of asymmetric relations, temporal order, causal order, movement path, and more. In well-designed diagrams, context can clarify, but there are all too many diagrams that are not well designed.
The concepts suggested by glyphs have parallels in language and gesture with the same tradeoffs between abstraction and ambiguity. Think of words, notably spatial ones that parallel glyphs, like relationship or region or point. A romantic relationship? A mathematical relationship? Here, context will likely disambiguate, but not on all occasions. There is good reason why spatial concepts, whether diagrammatic or linguistic or gestural, have multiple meanings; they allow expression of kinds of meanings that apply to many domains.
Much has been said on what depictions do well: make elements, relations, and transformations of thought visible, apply human skills in visuospatial reasoning to abstract domains, encourage abstraction, enable inference, transfer, and insight, promote collaboration. But many concepts essential to thought and innovation are not visible. A key significance of glyphs is that they can visualize the invisible, entities, relations, forces, networks, trees, and more.
4. Processing and designing diagrams
4.1. Processing diagrams
Good design must take into account the information-processing habits and limitations of human users (e.g., Carpenter & Shah, 1998; Kosslyn, 1989, 2006; Pinker, 1990; Shah, Freedman, & Vekiri, 2005; Tversky, Morrison, & Betrancourt, 2002). The page is flat, as is the visual information captured by the retina. Reasoning from 3D diagrams is far more difficult than reasoning from 2D diagrams whether depictive (e.g., Gobert, 1999) or conceptual (e.g., Shah & Carpenter, 1995). Language, visual search, and reasoning are sequential and limited, so that continuous animations of explanatory information can cause difficulties (e.g., Ainsworth, 2008a,b; Hegarty, 1992; Hegarty, Kriz, & Cate, 2003; Schnotz & Lowe, 2007; Tversky, 2001).
Ability matters. Spatial ability is not a unitary factor, and some aspects of spatial thinking, especially performing mental transformations and integrating figures, matter for some situations and others for others (e.g., Hegarty, in press; Hegarty & Waller, 2006; Kozhevnikov, Kosslyn, & Shephard, 2005; Suwa & Tversky, 2003). Different spatial, and undoubtedly conceptual, abilities are needed for different kinds of tasks and inferences that involve diagrams.
Expertise matters. It can trade off with ability. As noted, diagrams, like language, are incomplete and can be abstract, requiring filling in, bridging inferences. Domains include implicit or explicit knowledge that allows bridging, encouraging correct interpretations and discouraging incorrect ones. The significance of domain knowledge was illustrated in route maps and holds a fortiori in more technical domains (e.g., Committee on Support for Thinking Spatially, 2006).
Working memory matters. Although, as advertised, external representations relieve working memory, they do not eliminate it. Typically, diagrams are used for comprehension, inference, and insight. All involve integrating or transforming the information in diagrams, processes that take place in the mind, in working memory. Imagine multiplying two three-digit numbers, even when the numbers are before your eyes, without being able to write down the product of each step (see Shah & Miyake, 1996).
Structure matters. When diagrams are cluttered with information, finding and integrating the relevant information takes working memory capacity. Schematization, that is, removing irrelevant details, exaggerating, perhaps distorting, relevant ones, even adding relevant but invisible information, can facilitate information processing in a variety of ways. Aerial photographs make poor driving maps. Schematization can reduce irrelevancies that can clutter, thereby allowing attention to focus on important features, increasing both speed and accuracy of information processing (e.g., Dwyer, 1978; Smallman, St. John, Onck, & Cowen, 2001; Tversky, 2001).
Sequencing matters. Conveying sequential information, important in history, science, engineering, and everyday life, poses special challenges. Sometimes a sequence of steps can be shown in a single diagram; Minard’s famous diagram of Napoleon’s unsuccessful campaign on Russia is a stellar example. Time lines of historical events are another common successful example. Depicting each step separately and connecting them, often using frames and arrows, is another popular solution, from Egyptian tomb paintings showing the making of bread to Lego instructions. Both separating and connecting require careful design. People segment continuous organized action sequences into meaningful units that connect perception and action, by changes in scene, actor, action, and object (e.g., Barker, 1963; Barker & Wright, 1954; Tversky, Zacks, & Hard, 2008; Tversky et al., 2007; Zacks & Tversky, 2001; Zacks, Tversky, & Iyer, 2001). A well-loved solution to showing processes that occur over time is to use animations. Animations are attractive because they appear to conform to the Congruity Principle: They use change in time to show change in time, a mentally congruent relation (Tversky, 2001). However, as we have just seen, the mind often segments continuous processes into steps (e.g., Tversky et al., 2007; Zacks et al., 2001), suggesting that step-by-step presentation is more congruent to the way the mind understands and represents continuous organized action than continuous presentation. The segmentation of routes by turns and object assembly by actions provide illustrative examples. Animations can suffer two other shortcomings: They are often too fast and too complex to take in, violating the Apprehension Principle, and they show, but do not explain (b2000Tversky et al., 2001). Even more than in static diagrams, visualizing the invisible, causes, forces, and the like, is difficult in animations. And, indeed, a broad range of kinds of animations for a broad range of content have not proved to be superior to static graphics (e.g., Mayer, Hegarty, Mayer, & Campbell, 2005; Stasko & Lawrence, 1998; Tversky, 2001; Tversky, Heiser, et al., 2007).
Multi-media matters. Depictions and language differ in many ways, some discussed earlier, among them, expressiveness, abstraction, constraints, accessibility to meaning (e.g., Stenning & Oberlander, 1995). As we have seen, many meanings may be easier to convey through diagrams, but diagrams can also mislead. Diagrams usually contain words or other symbolic information; the visuals, even augmented with glyphs, may not be sufficient. Maps need names of countries, towns, or streets. Network diagrams need names of the nodes and sometimes the edges. Economic graphs need labels and numeric scales to denote years or countries or financial indices. Anatomical diagrams need names of muscles and bones. But diagrams often need more than labels and scales. Although arrows can indicate causes and forces, the specific forces and causes may need language. In addition, redundancy often helps (e.g., Ainsworth, 2008a,b; Mayer, 2001). Just as diagrams need to be carefully designed to be effective, so does language.
4.2. Designing diagrams
The previous analyses of place and form in diagrams were based on historical and contemporary examples that have been invented and reinvented across time and space. They have been refined by the generations through informal user testing in the wild. The analyses provide a general guideline for designing effective diagrams: Use place in space and forms of marks to convey the kinds of meanings that they more naturally convey. For example, use the vertical for evaluative dimensions, mapping increases upwards. Use the horizontal for neutral dimensions, especially time, mapping increases in reading order. Use dots for entities, lines for relations, arrows for asymmetric relations, boxes for collections. Disambiguate when context is not sufficient. Although helpful, these are general guidelines often not sufficient for specific cases.
The previous analyses of the evolution and refinement of diagrams also suggest methods to systematically develop more specific guidelines when needed, to formalize the natural user testing cycle—produce, use, refine—and bring it into the laboratory by turning users into designers. One project used this procedure for developing cognitive design principles for assembly instructions (Tversky et al., 2007). Students first assembled a TV cart using the photograph on the box. They then designed instructions to help others assemble the cart. Other groups of students used and rated the previous instructions. Analysis of the highly rated and effective instructions revealed the following cognitive principles: Use one diagram per step, segment one step per part, show action, show perspective of action, and use arrows and guidelines to show attachment and action. A computer algorithm was created to construct assembly diagrams using these guidelines, and the resulting visual instructions led to better performance than those that came with the TV cart. These cognitive principles apply not just to assembly diagrams but more broadly to visual explanations of how things behave or work. Moreover, the cycle of producing, using, and refining diagrams is productive in improving diagrams even with a single person (Karmiloff-Smith, 1979, 1990; Lee & Karmiloff-Smith, 1996; Tversky & Suwa, 2009).
5. Diagrams as a microcosm of cognition
Diagrams and other depictions are expressions and communications of thought, a class that includes gesture, action, and language. In common with gesture and action, diagrams use place and form in space to convey meanings, concrete and abstract, quite directly. This paper has presented an analysis and examples of the ways that place and form create meanings, an analysis that included the horizontal, vertical, center–periphery, and pictorial organization of the page as well as the dots, lines, arrows, circles, boxes, and likenesses depicted on a page. In combination, they enable creating the vast variety of visual expressions of meaning, pictures, maps, mandalas, assembly instructions, highway signs, architectural plans, science and engineering diagrams, charts, graphs, and more. Gestures also use many of these features of meaning, but they are more schematic and fleeting; diagrams can be regarded as the visible traces of gestures just as gesturing can be regarded as drawing pictures in the air.
The foundations of diagrams lie in actions in space. People have always organized things and spaces to serve their ends: securing, storing, and preparing food, making and using artifacts, designing shelter, navigating space. The consequences of these actions are the creation of simple geometric patterns in space, patterns that are good gestalts, and that are readily recognized. The patterns invite abstract interpretations: Groups signal similar features or related themes, orders signal dimensions or continua, distributions signal one-to-one or one-to-many correspondences. The creation and interpretation of these patterns form the rudiments of abstract thought: categories, relationships, orderings, hierarchies, dimensions, and counting (e.g., Dehaene, 1997; Gelman & Gallistel, 1986; Frank, Everett, Fedorenko, & Gibson, 2008; Gordon, 2004; Hughes, 1986; Lakoff & Nunez, 2000). The spatial patterns can be manipulated by the hands or by the mind (e.g., Shephard & Podgorny, 1978; Tversky, 2005) to create further abstractions; they form spatial-action representations for the abstractions that underlie the feats of the human mind, a three-way interaction that can be termed spraction. Spractions, then, are actions in space, whether on objects or as gestures, that create abstractions in the mind and patterns in the world, intertwined so that one primes the others. Like language, spractions support and augment cognition and action; unlike language, they do so silently and directly. The arrangements and organizations used to design the world create diagrams in the world: The designed world is a diagram.