Visualizing Thought


should be sent to Barbara Tversky, Department of Human Development, Columbia Teachers College, 525 W. 120th Street, New York, NY 10027; Department of Psychology, Stanford University, 450 Serra Mall, Stanford, CA 94305-2130. E-mail:


Depictive expressions of thought predate written language by thousands of years. They have evolved in communities through a kind of informal user testing that has refined them. Analyzing common visual communications reveals consistencies that illuminate how people think as well as guide design; the process can be brought into the laboratory and accelerated. Like language, visual communications abstract and schematize; unlike language, they use properties of the page (e.g., proximity and place: center, horizontal/up–down, vertical/left–right) and the marks on it (e.g., dots, lines, arrows, boxes, blobs, likenesses, symbols) to convey meanings. The visual expressions of these meanings (e.g., individual, category, order, relation, correspondence, continuum, hierarchy) have analogs in language, gesture, and especially in the patterns that are created when people design the world around them, arranging things into piles and rows and hierarchies and arrays, spatial-abstraction-action interconnections termed spractions. The designed world is a diagram.

1. Introduction

Communication in the wild is a sound and light show combining words, prosody, facial expressions, gestures, and actions. Although it is often presumed—think of the “letter of the law” and transcripts of trials—that meanings are neatly packaged into words joined by rules into utterances, in fact, other channels of communication carry significant aspects of meaning, despite or perhaps because of the fact that they cannot be neatly packaged into units strung together by rules (e.g., Clark, 1996; Goldin-Meadow, 2003; Kendon, 2004; McNeill, 1992, 2005). Prosody, as in irony or sarcasm, can overrule and even reverse meanings of words, as can facial expressions. Pointing can replace words, for things, for directions, and more, so that natural descriptions, narratives, or explanations cannot be fully understood from the words alone (e.g., Emmorey, Tversky, & Taylor, 2000). Gestures go beyond pointing; they can show size, shape, pattern, manner, position, direction, order, quantity, both literally and metaphorically. They can express abstract meanings, mood, affect, evaluation, attitude, and more. Gestures and actions convey this rich set of meanings by using position, form, and movement in space. Communication can happen wordlessly, as in avoiding collisions on busy sidewalks or placing items on the counter next to the cash register to indicate an intention to buy. In fact, the shelf next to the cash register is designed to play a communicative role. Standing next to a circle of chatting acquaintances can be a request to join the conversation. Opening the circle is the group’s wordless response. Rolling one’s eyes can signify, well, rolling one’s eyes. Communication in the wild combines and integrates these modes, usually seamlessly, with each contributing to the overall meaning (e.g., Clark, 1996; Engle, 1998; Goldin-Meadow, 2004; Kendon, 2004; McNeill, 1996; Tomasello, 2008).

Gestures and actions are especially convenient because their tools, like the tools for speech, are free, and they are always with us. But gestures, like speech, are fleeting; they quickly disappear. They are limited by what can be produced and comprehended in real time. These limitations render gestures abstract and schematic. Visualizations, on paper, silk, parchment, wood, stone, or screen, are more permanent; they can be inspected and reinspected. Because they persist, they can be subjected to myriad perceptual processes: Compare, contrast, assess similarity, distance, direction, shape, and size, reverse figure and ground, rotate, group and regroup; that is, they can be mentally assessed and rearranged in multiple ways that contribute to understanding, inference, and insight. Visualizations can be viewed as the permanent traces of gestures; both embody and are embodied. Like gesture, visualizations use position, form, and actions in space to convey meanings (e.g., Tversky, Heiser, Lee, & Daniel, 2009). For visualizations, fleeting positions become places and fleeting actions become marks and forms. Here, we analyze the ways that place and form constrain and convey meaning, meanings that are based in part in actions.

Traces of visual communication go far back into prehistory. Indeed, they are one of the earliest signs of culture. They not only precede written language but also served as the basis for it (e.g., Gelb, 1963; Schmandt-Besserat, 1996). Visual communications come in myriad forms: animals in cave paintings, maps in petroglyphs, tallies on bones, histories on columns, battles in tapestries, messages on birch bark, journeys in scrolls, stories in stained glass windows, dramas in comics, diagrams in manuals, charts in magazines, and graphs in journals. All forms of communication entail design, as the intent of communication is to be understood by others or by one’s self at another time. Communication design, then, is inherently social, because to be understood by another or by self at another time entails fashioning communications to fit the presumed mental states of others or of one’s self at another time.

Diagrams, along with pictures, film, paintings in caves, notches in wood, incisions in stone, cuttings in bone, impressions in clay, illustrations in books, paintings on walls, and of course words and gestures, externalize thought. They do this for many reasons, often several simultaneously. Some are aesthetic: to arouse emotions or evoke pleasure. Some are behavioral: to affect action or promote collaboration. Some are cognitive: to serve as reminders, to focus thoughts, to reorganize thoughts, and to explore thoughts. Many are communicative: to inform both self and others.

Because depictions, like other cultural artifacts (e.g., Norman, 1993; Donald, 1991), have evolved over time, they have undergone an informal but powerful kind of natural user testing, produced by some, comprehended by others, and refined and revised to improve communication by a community of users. Similar processes have served and continue to serve to design and redesign language (cf. Clark, 1996). Features and forms that have been invented and reinvented across cultures and time are likely to be effective. Analyzing these depictive communications, then, can provide valuable clues to designing new ones. It can save and inspire laboratory work, as well as the tasks of designers. What is more, the natural evolution of communication design can be brought into the laboratory and accelerated for specific ends (see Tversky et al., 2007).

Oddly, this rich set of visual forms has traditionally been discussed in the domain of art, along with painting, drawing, and photography. Increasingly, that discussion has expanded to include diagrams, charts, film, graphs, notational systems, visual instructions, computer interfaces, comics, movies, and more, to take into account the mind that perceives, conceives, and understands them, and to ripple across domains (e.g., Arnheim, 1974; Bertin, 1981; Card, Mackinlay, & Shneiderman, 1999; Elkin, 1999; Gombrich, 1961; Goodman, 1978; Kulvicki, 2006; McCloud, 1994; Murch, 2001; Small, 1997; Stafford, 2007; Wainer, 1992; Ware, 2008; Winn, 1987; Tufte, 1983, 1990, 1997). Similarly, discussions of human communication have historically focused on language, typically narrowly conceived as words and sentences, and have only recently broadened to include prosody, gesture, and action (e.g., Argyle, 1988; Clark, 1996; Goldin-Meadow, 2003; Kendon, 2004; McNeill, 1996).

Unlike symbolic words, forms of visual communication, notably diagrams and gestures, often work by a kind of resemblance, that is, sharing features or associations, typically visuo-spatial features, with the meanings they are intended to convey (a claim of some philosophic controversy, e.g., Goodman, 1978; Hochberg & Brooks, 1962; Walton, 1990). The proverbial “big fish” is indicated in gesture by expanding the fingers or hands horizontally, thus capturing the approximate relative horizontal extent of the fish, but ignoring its other properties. How the fish swam to try to get away is abstracted and conveyed differently, perhaps by embodying the fish and its movements. Similarly, the shape, dimensions, and even actions of the fish can be abstracted in a variety of ways to the page. The fish example illustrates another property of visual communication. In capturing features of the world, visual communications are highly selective; they omit information, normally information that is regarded as less essential for the purposes at hand. They abstract and schematize not only by omission but also by exaggeration and even by additions. Maps, for example, are not simply shrunken aerial photographs. Maps selectively omit most information, houses, trees, fields, mountains, and the like, but also many of the twists and turns of roads or coastlines; they disproportionately enlarge roads and rivers to make them visible; they turn entire metropolises into dots. Maps may also add features like government boundaries and topological levels that are not visible.

In other words, maps, like many other kinds of visualizations, distort the “truth” to tell a larger truth. The processes that abstract, schematize, supplement, and distort the world outside onto the world of a page, filtering, leveling, sharpening, categorizing, and otherwise transforming, are the same processes the nervous system and the brain apply to make sense of the barrage of stimuli the world provides. Attention is selective, ignoring much incoming information. The perceptual systems level and sharpen the information that does come in; for example, the visual system searches for the boundaries that define figures by sharpening edges and corners, by filling in gaps, by normalizing shapes. Cognition filters, abstracts, and categorizes, continuing this process, and symbol systems carry these processes further. Long things do not necessarily get long names, though children often expect them to (e.g., Tolchinsky Landsmann & Levin, 1987). Tallies eliminate the identity of objects, recording them just as instances, though tallies preserve a one-to-one correspondence that Arabic numerals, more convenient for calculations, do not.

The virtues of visual communications have been extolled by many (e.g., Kirsh, 1995; Larkin & Simon, 1987; Norman, 1993; Scaife & Rogers, 1996; Tversky, 1995; Tversky, 2001). As noted, they are cultural artifacts created in a community (Donald, 1991; Norman, 1993), fine-tuned by their users (e.g., Tversky et al., 2007). They can provide a permanent, public record that can be pointed at or referred to. They externalize and clarify common ground. They can be understood, revised, and manipulated by a community. They relieve limited capacity short-term memory, they facilitate information processing, they expand long-term memory, they organize thought, they promote inference and discovery. Because they are visual and spatial, they allow human agility in visual-spatial processing and inference to be applied to visual-spatial information and to metaphorically spatially abstract information.

In contrast to purely symbolic words, visual communications can convey some content and structure directly. They do this in part by using elements, marks on a page, virtual or actual, and spatial relations, proximity and place on a page, to convey literal and metaphoric elements and relations. These ways of communicating meanings may not provide definitions with the rigor of words, but rather provide suggestions for meanings and constraints on them, giving them greater flexibility than words. That flexibility means that many of the meanings thus conveyed need context and experience to fully grasp. A line in a route map has a different meaning from a line in a network and from a line in a graph, though, significantly, all connect. Nor is the expressive power of visual communication as great as that of language (e.g., Stenning & Oberlander, 1995); abstract or invisible concepts like forces, traits, counterfactuals, and negations are not easily conveyed unambiguously in depictions. Even so, conventions for conveying these kinds of concepts have evolved as needed, in road signs, mathematics, science, architecture, engineering, and other domains, a gradual process of symbolization akin to language.

What are the tools of depictions, especially diagrams? How do they communicate? The components of visual communication are simple: Typically, a flat surface, prototypically, a page (or something analogous to a page like a computer screen) and marks or forms placed on it (e.g., Ittelson, 1996; Tversky, 1995, 2001; Tversky, Zacks, Lee, & Heiser, 2000). Each of these, place and form, will be analyzed to show how they can represent meanings that are literal and metaphoric, concrete and abstract. The interpretations will be shown to depend on content and context, on Gestalt or mathematical properties of the marks in space, on the place of the marks on the page, as well as the information processing capacities and proclivities of the mind. The foundations and processes of assigning meaning can be revealed, then, by recurring inventions and by errors and biases in interpretation, that is, by uses and misuses, by successes and failures. The analysis of inventions of visual communication can provide directions for the design of visual communications.

Because assigning meaning, whether from description or depiction, is in part a reductive process—the space of possible meanings is greater than the space of ways to express meanings—misuses, misinterpretations, and misunderstandings are as inevitable as successes, and both are instructive. Expressing meanings, then, entails categorization. Categories create boundaries where none exist; some instances are included and others, even close ones, are not. The consequence of categorization is to increase the perceived similarity of members included in the category and to exaggerate the perceived distance between members and nonmembers. Although the focus here is on meanings conveyed through place and forms, the meanings are deeper, they are conceptually spatial, some more literal, some more metaphorical, so that they have parallels in other ways of using space as well, in words, in actions, and in gesture, in the virtual space created by gesture and the mental space created by words (e.g., Gattis, 2004; Lakoff & Johnson, 1980; Tversky et al., 2009). First, we will discuss place in space, and then forms in space.

2. Place in space

2.1. Organizing space in the world

2.1.1. Spatial actions create meaningful patterns

Three quarters of a million years ago, a group of hominins living in the northern Jordan River valley separated the activities of communal life into different spatial areas, cooking activities in one area and tool-building activities in another (Alperson-Afil et al., 2009). Each of these spaces was subdivided, again by function. This primate society created unintended visual communications about their lives to archeologists living generations afterwards. Before the page, there was space itself. Perhaps the simplest way to use space to communicate is to arrange or rearrange things in it. An early process is grouping things in space using proximity, putting similar things in close proximity and farther from dissimilar things, actions that reflect the Gestalt laws of perception. These separated spatial groupings signal separate associated things. “Close” family members and friends sit nearer to one another than strangers. The flatware tray in a drawer of most kitchens allows arranging the knives together in one pile and separating them from the pile of forks and the pile of spoons. Drawers in the bedroom allow arranging the socks together and separating them from other articles of clothing that are also grouped and piled by kind. Shelves and drawers allow hierarchical organization, one shelf for canned goods, another for baking supplies, further organized inside by kind and recency of purchase, in two dimensions. Table settings distribute various items in one-to-one correspondences, each setting gets a plate, a glass, a knife, a fork, a spoon, and a napkin. Themes as well as categories are spatially organized, things for cooking spatially separated from things for sleeping. Larger spaces, homes, and villages are arranged in two and three dimensions, turning inhabited spaces into diagrams, vertical patterns of windows on buildings and horizontal patterns of streets on the ground. We rearrange things in space to capture attention and to affect behavior in the present as well as the future, for example, putting the letters to be mailed by the door or the bills to be paid on the top of the desk (or desktop) or lining up the ingredients for a recipe in order of use (e.g., Kirsch, 1993), ordinal mappings of time and actions in time onto space. Written text is spatially arranged to reflect the organization of thought, spaces between words and sentences, larger spaces between paragraphs. Greek text describing mathematics was written formulaically, fixed orders of semantic forms, often in rows, that formed tables for reasoning (Netz, 1999). Even babies do it; many discover “in your face” early on. When they want attention, they center their faces in the face of the person whose attention they seek, directly in the line of vision. A fundamental service of space, hence meaning of space, is proximity to me. I can perceive and act on the things and beings that are close to me, in reach of the body, primarily eyes, hands, feet, and, for beings, voice. For my actions (and my perception), the best position is centered in front of me. These many deliberate organizations of space serve to direct attention, to augment memory, to facilitate and organize actions, and to communicate to ourselves or to others.

One implication of this analysis is that action underlies perception. The actions of organizing space for many ends into groups, hierarchies, orders, correspondences, continua, and the like create spatial patterns that are far more regular than those created by nature, thus a signal that they are created by sentient minds. These regular spatial patterns conform to the Gestalt laws of perception, augmenting their perceptibility. Things that entail similar actions, whether socks or knives, are grouped together, creating stacks or piles, and things that entail ordered actions are lined up in that order, along a line, creating a temporal continuum on a horizontal surface. Perception of stacks and lines is enhanced by the Gestalt Laws of good continuation or common fate.

Not much farther afield, architecture can be viewed as an advanced form of arranging things in space (in three dimensions) for a number of reasons, among them, to inform and to facilitate or constrain action. Department stores put like things close to each other, separating them from different things. The grouping is hierarchical, men’s clothes together, women’s clothes together, and within each, shirts in one place and outerwear in another. Architectural spaces are also designed to affect behavior. Elevators are placed in eyesight of entrances, desired corridors are broad and well lit. Departments in department stores were once geometrically organized along parallel and perpendicular paths, presumably because such an organization facilitates way-finding (e.g., Tversky, 1981). Increasingly, they seem to be organized like Chinese gardens, in zigzagging meandering paths. In Chinese gardens, a meandering organization of space creates surprises and the impression of a larger space to be contemplated and enjoyed. In department stores, a meandering organization undoubtedly interferes with way-finding and provides more temptations to purchase. In architectural designs, the plan, a horizontal slice, serves action and the elevation, a vertical slice, serves aesthetics (e.g., Arnheim, 1977).

Spaces are also arranged and designed for symbolic and aesthetic reasons. The square patterns that cultures as distant as China and Rome used to build their cities, with roads aligned north-south and east-west, seem to serve several ends at once, cognitive, aesthetic, and symbolic. Other patterns that are consequences of organizing space appear and reappear across cultures in ceramics, weaving, basket-making, and architecture (as well as poetry and music), especially patterns that have geometric repetitions and symmetries (e.g., Arnheim, 1988; Gombrich, 1979).

Spaces are also created on the fly, to serve behavior (the reminders on the desk) or communication. Arrernte speakers in Australia routinely draw the locations and movements of their conversation topics in the sand (Wilkins, 1997). When people describe locations of places or events involving actions, like football plays or accidents, they often use whatever small objects are at hand—coins, salt shakers, Lego blocks, or fingers—to represent the locations and movements of whatever they are describing, creating a map on a surface. If pencil and paper are handy, they often sketch instead (e.g., Kessell & Tversky, 2006).

2.1.2. Conception, action, perception, communication, and meaning

People, then, design and redesign the spaces they inhabit, arranging them and rearranging them to serve a variety of ends. The spaces they create are a visible embodiment of the abstract concepts underlying the organizations. These spaces form regular patterns that resonate with principles of perceptual organization. The close couplings of action, conception, and perception support meanings and afford communication. The examples above are few from many, but they illustrate some of the core phenomena. People put like things together, often into piles, rows, or bins, and separate them from different things. They cluster by kind, often hierarchically. They order things in rows or piles in a variety of ways, depending on their purpose, ingredients in order of use, photograph albums in order of time, bills to be paid in order of importance. These acts select single features and create single dimensions or continua out of disparate things. People also arrange things by themes, and distribute sets of items in one-to-one correspondences. They choose distances and sizes in three dimensions. Whether informally in conversation or more formally on maps and architectural plans, people also map locations in the world onto a representing world, models, or diagrams. These same kinds of organization in space, clusters, orders, maps, and more, are used to locate things on a page to represent and communicate ways things are organized and related in the mind as well as in the world.

2.2. Organizing the space of a page, more literally

As we have seen, people arrange and rearrange the things in the spaces around them into clusters, orders, and more complex organizations for cognitive, social, aesthetic, and symbolic ends. People do the same with the space of a page, for things that are literally spatial as well as things that are metaphorically spatial. In contrast to the space of the world, the space of a page is two-dimensional, though it allows conveying three and more dimensions. Conceptually, the two dimensions of a page are defined with respect to a viewer’s frame of reference and a page oriented horizontally, left–right and top–bottom (or up–down) (cf. Arnheim, 1974, 1988). Conceptually, there is also a page-centric frame of reference: center, periphery.

We begin with an early, basic organization of the space of a page or virtual page, what can be called pictorial space, used to map and represent the visible world. Think first of ancient paintings of animals in the rugged ceilings of caves or the tadpole figures of people drawn by children all over the world (e.g., Kellogg, 1969). Several aspects of place on the page will be analyzed through prevalent examples: up/down, left/right, center, and proximity among them. Some of those uses benefit thought, some benefit conflict, and some even hinder, but all are a testament to the cognitive power of place and marks on the page.

2.2.1. Pictorial space

Perhaps the earliest and simplest and still the most common way to use space in depictions is to map the space of the viewed world to a surface, what is traditionally called a picture. This mapping takes the three-dimensional world into a two-dimensional one, the page, a transformation that is undoubtedly facilitated by the fact that the world captured by the retina and the rest of the visual system is a two-dimensional mapping of the three-dimensional world from a particular perspective. Mapping pictorial space to the page puts things on the ground at the bottom of the page and things in the sky at the top, just as at an easel that holds the page in the plane of the world. Put horizontally on a table, the space of the page is mapped so the ground is close to the viewer, “at the bottom,” and the sky is far from the viewer, “at the top” (cf. Shephard & Hurwitz, 1984). This correspondence applies the notion of “upright” to the page. It is such a compelling organization of space that upside down pictures are harder to recognize and remember, and especially faces of individuals, stimuli of special significance in our lives (e.g., Hochberg & Galper, 1967; Carey, Diamond, & Woods, 1980; Rock, 1973).

When placed horizontally, as on a table or desk, the actual space of the page conflicts with the actual surrounding space as the ground-to-sky bottom-to-top dimension of the page is no longer literally vertical as it is in the world. Nevertheless, the mapping of vertical to horizontal where ground is close to the viewer’s perspective is conceptually powerful, so that the opposite mapping is regarded as upside-down. The pull of the picture plane is so strong that students in a course in information design use it implicitly in diagramming information systems. In diagrams of information systems, what must be shown are the topological relations among the system components (Nickerson, Corter, Tversky, Zahner, & Rho, 2008). The actual locations of components are irrelevant; all that matters is the connections among them, indicated by lines. Nevertheless, designers’ sketches frequently map physical locations, for example, placing a truck that transmits information at the bottom of the page, as if on the ground, and a satellite at the top, as if in the sky. Although organizing a sketch using pictorial space may aid comprehension of the components of the system, it could prevent designers from “seeing” and using other organizations of components that might make better sense for the design.

2.2.2. Maps

Like the making of pictures, the making of maps entails shrinking a viewed environment as well as selecting and perhaps distorting important features and omitting others (Tversky, 2000). However, the making of maps requires more, beginning with taking a perspective not often seen in real life, a perspective from above, looking down. Maps, even ancient ones, typically include far more than can be seen from a single viewpoint, so that the making of maps also entails integrating many different views to convey a more comprehensive one. Despite these challenges, evidence of maps, typically petroglyphs as they survive the ravages of time, goes back at least 6,000 years (e.g., Brown, 1979) and of architectural plans nearly that far. Although maps often represent a horizontally extended world on a horizontal surface, they are frequently placed vertically (“upright”), requiring the same transformation that pictorial representations do (but without gravity and a conceptual up and down). Even though arbitrary, the conventional north-up orientation of maps has both cognitive and practical consequences; north-up maps are easier for many judgments (e.g., Sholl, 1987).

Maps are one of the most ancient, modern, and widespread means of visual communication, and they serve as an illustrative paradigm for many aspects of visual communication. Ancient as they are, maps represent remarkable feats of the human mind, the products of powerful mental transformations. Although human experience is primarily from within environments, a perspective that has been called egocentric, route, or embedded, maps take a viewpoint from outside environments, above them, a perspective that has been called extrinsic, allocentric, or survey. Thus, the making of maps and the understanding of maps entail a dramatic switch of perspective, one that takes remarkably little effort for well-learned environments (e.g., Taylor & Tversky, 1992b; Lee & Tversky, 2005). What is more, just as spontaneous descriptions of space mix perspectives, using route and survey expressions in the same clause (Taylor & Tversky, 1992a), maps (as well as pictorial and other external representations) often show mixed perspectives; for example, many ancient and modern maps of towns and cities show the network of roads from an overhead view and key buildings from a frontal view (e.g., Tversky, 2000). Like Cubist and post-Cubist art, maps can show different views simultaneously in ways that violate the rules of perspective, but that may promote understanding of what is portrayed.

More commonly, maps show a single perspective, a two-dimensional overview of a three-dimensional world. Designers of spaces, architects, seem to work and think in two dimensions at a time, plans or elevations (Arnheim, 1977). Architectural plans map an overview of a design; they show the relations among entrances, walls, furniture, and the like, and are used for designing behavior, for the functional aspects of buildings and complexes. Elevations show how structures will be viewed from the outside, and they are important for designing aesthetic aspects of buildings (Arnheim, 1977).

Producing and comprehending maps require other major mental transformations, integrating and shrinking a large environment, one that typically cannot be seen at a glance, to a small one that can fit onto a piece of paper. Even preschoolers are able to perform some of these mental feats, for example, using a schematic map to find a hidden toy (e.g., De Loache, 2004). The creation of maps requires yet another mental feat, abstracting the features that are important, that need to be included in the external representation, and eliminating those that do not. The uses of maps range widely: road maps, weather maps, maps of spread of populations of people, of plants, of diseases, maps for hiking, for surveillance of water, of earthquakes, of soil quality, and more. The features that are essential to include vary with the use; for some uses, mountains can be omitted but roads must be included, and for others, mountains need to be preserved but roads can be eliminated. Similarly, some kinds of maps add information not directly visible in the environment, such as contour lines for topography of the ground or for weather fronts. Many of the same mental processes used in creating and using external representations parallel those used in creating and using mental ones (e.g., Shephard & Podgorny, 1978), though there are naturally differences as well. And, like mental representations, external representations constrain as well as enable understandings and interpretations. The very same processes that facilitate comprehension and communication, of inclusion and elimination, of leveling and sharpening, of addition and subtraction, also focus and constrain the meanings, with inevitable consequences of misunderstandings, misinterpretations, and error.

2.3. Organizing space of a page, more metaphorically

Traditional pictures, architectural plans, and maps are literally spatial in the sense that they represent things that are visible in the world, typically preserving shapes and spatial relations among and within the forms. Such mappings are derived from the spatial world through the mind, by schematizing or abstracting information from the spatial world. At another extreme are mappings that are regarded as abstract or metaphorically spatial. Such mappings are constructions, derived from mental representations in the mind through similar schematizing processes to forms and places on the page. For concepts that are not literally spatial, form and place are freed of any need to resemble “reality.” Nevertheless, the uses of form and place in conveying meanings are constrained by certain psychological correspondences, perceptual, cognitive, and social. Many of these metaphorically spatial concepts are evident in spatial language: Someone is at the top of the class, another has fallen into a depression, friends grow close or apart; a field is wide open, a topic is central to a debate (e.g., Lakoff & Johnson, 1980). Those constraints and some of their consequences will be discussed in the subsections below on organization of space as well as in the subsequent section on Forms in Space. We continue now with a discussion of certain properties of the page, and how they are used to convey meanings.

2.3.1. Proximity: Category and continuum

Perhaps the most fundamental way that space is used to create abstract meaning is proximity; things that are closer conceptually are placed closer on a page. As in organizing real space, proximity can be used hierarchically to organize metaphoric spaces, first to create clusters, groups, or categories of similar things (like the stack of shirts on a shelf), and then to create clusters, groups, or categories of categories (like the men’s department). Grouping by proximity is commonly used on the space of the page. The letters of one word are separated from the letters of another word by a space, making reading easier. Ideas are further separated on the page by paragraph indentation. Similar uses of space occur in writing and comprehending math equations, where spacing affects the order of carrying out mathematical operations (Landy & Goldstone, 2007a,b).

Often the things to be represented are ordered, thus represented on a continuum: countries by size, events by dates. When things are ordered conceptually, they can be arranged in an order on a page, forming a continuum. If some pairs of the ordered things are conceptually closer and others conceptually farther, proximity can be used to represent the closeness of the pairs on the conceptual relationship. This spatial progression forms the conceptual basis for simple mathematics as well as for graphing, conveying mathematical concepts on a page (e.g., Dehaene, 1997; Fefferman, 2008).

How should orders be arrayed? The very shape of a page suggests three kinds of arrays: horizontal, vertical, and central–peripheral. The salient dimensions of the world reinforce the horizontal and vertical, and certain properties of vision reinforce center–periphery. Representing orders entails selection of spatial dimension as well as selection of a direction within a dimension, issues to be discussed in the following sections.

2.3.2. Central–peripheral

A center-outward organization reflects the organization of the retina, with the fovea, the point of greatest acuity, at the center. Acuity, hence attention, is at the center of the visual field, with acuity and attention declining in all directions from the center. That people organize space center-outwards seems inevitable. Just like the toddler placing her face smack in the middle of someone’s field of view, putting something in the center of a page puts it literally and figuratively in the center of focus of the eye and of attention. Symbolic centers are ubiquitous, from the angels around God to the etiquette of seating arrangements at a formal dinner for a visiting dignitary (Arnheim, 1988). Early in the 20th century, an African king wished to prove the modernity of his country by having it surveyed to make a map. On learning that the capital of the country was not in the county’s geographic center, he ordered that its location on the map be moved to be more central (Woodward & Lewis, 1998). Mandalas, common in Hindu and Buddhist traditions, represent the cosmos or the spiritual world, with spiritual symbols at the center. They not only symbolize the cosmos but also serve as meditation aids, centering meditation on the center of the mandala (Fontana, 2005). Greek and Roman vases place important figures in the center and less important to the sides (Small, 1997), as do advertisements and paintings from all over the globe. Language does this too, of course; we have been talking about the center and the periphery, both literally and figuratively. These spatial features of vision become conceptual features of thought, central or peripheral, a kind of embodiment.

A central–peripheral organization may coordinate well with a single focus of attention and the organization of the eye, but it is not well suited for ordering, either of attention or of things. The periphery extends in all directions from the center without an explicit direction or ordering. At the extreme, a center–periphery organization is dichotomous: central and important versus less central and important. Some mandalas have concentric rings that are ordered outwards, but there is no clear ordering within each ring. Vases, advertisements, and the like are organized by pictorial space as well as by center–periphery, so that the periphery extends leftwards and rightwards (and/or downwards and upwards) from the center rather than in all directions as in a mandala. A horizontal (or vertical) organization simplifies, but since the start point is the center, there is no explicit way to integrate the orderings of things to either side. In addition, the human visual system is especially sensitive to horizontal and vertical, less so to oblique lines (e.g., Howard, 1982). Perhaps for these reasons, complete orderings tend to use a straight line, horizontal or vertical, one of the edges of the page as a guide, and to begin at one end or the other. It is worth noting that written languages, which typically require serial order, use vertical columns or horizontal rows.

2.3.3. Page parallels

The central–peripheral/more important–less important arrangement of space has the advantage of centering the most important, the highest on some attribute, but the disadvantage of making it difficult to compare the orders of those in the periphery, as the order descends in more than one direction. Using one of the dimensions of the page for ordering makes the start point and direction explicit and easy to follow, but it raises the dual questions of which dimension, vertical or horizontal, and where to start. Those decisions are influenced by a number of factors. Some seem to be general across cultures, for example, primacy to up, the location of gods in most cultures. Others seem to be more influenced by culture, for example, horizontal direction, right to left or left to right.

To investigate the spontaneous use of spatial dimensions to convey abstract ones, children from 4 years old to college age from three language cultures, English-speaking Americans, Hebrew-speaking Israelis, and Arabic-speaking Israelis, were asked to place stickers on a square page to indicate the relations of three instances on each of four dimensions: spatial, temporal, quantitative, and preference (Tversky, Kugelmass, & Winter, 1991; for similar work on generating mathematics, see Hughes, 1986). Because English is written from left to right but Hebrew and Arabic are written from right to left, the study also examined the effects of writing order on inventions of graphs. For the spatial task, the experimenter first positioned three small dolls in a row in front of the child and asked the child to place stickers on the page to represent the locations of each of the dolls. All the children performed the spatial mapping task with no difficulty. Then the children were asked to represent the more abstract concepts spatially. For representing time, the experimenter sat next the child and asked the child to think about the times of the day for breakfast, for lunch, and for dinner. For representing quantity, the experimenter asked the child to think about the amount of candy in a handful, in a bagful, or on the shelf in the supermarket. For representing preference, the experimenter asked the child to think about a television show he or she really liked, did not like at all, or sort of liked. Then the experimenter put a sticker in the middle of the page for the middle value, lunch or the amount of candy in a bagful, or the so-so TV show and asked the child to put a sticker on the page for the other two extreme values, one at a time, in counter-balanced order.

A few of the youngest children did not put the stickers representing three examples on a line; instead they scattered the stickers over the page or put one on top of the other, indicating that they did not see the instances as ordered on a continuum. Scattering the stickers across the page suggests that the children saw the instances as three different categories and piling them on top of each other suggests that the children saw the instances as a single category, say meals or candy or TV shows. Either arrangement indicates that the children used space categorically but not ordinally. Most of the preschool children and all of the older children and adults did place the stickers (or dots, for the adults) on a virtual line, thereby using one of the dimensions of the space of the page to represent the underlying dimension. Children represented the more concrete dimension, time, as a line earlier than the more abstract dimensions, quantity, and preference in that order.

A second experiment assessed whether children could map interval as well as order (Tversky et al., 1991). They were first asked to place stickers to indicate the locations of the three small dolls, when two were placed quite close to each other, but relatively far from the third. Even the youngest children used spatial proximity to represent interval in the placement of stickers. Then the children were asked to represent instances of temporal, quantitative, and preference concepts that were unequally spaced. For example, they were asked to place stickers to represent the time for breakfast, morning snack, and dinner. Despite heavy-handed prompting, only at 11–12 years of age did children reliably place stickers closer for instances closer on the dimension and place stickers farther for instances farther on the dimension.

Together, the results indicated that children spontaneously use spatial proximity and linear arrays to represent categorical, ordinal, and interval properties of abstract dimensions. With increasing age, children’s representations progress from categorical to ordinal to interval. Their graphic productions are true inventions; that is, they do not correspond to the graphing conventions that older children are exposed to in school. For example, the directions of increases in their graphic inventions, to which we turn now, were not consistent across dimensions within or across children nor did they universally proceed from left to right.

2.3.4. Direction in space: Horizontal

Center–periphery uses direction, from the center outwards to the periphery to indicate importance or closeness to God. Center–periphery mappings work well for vague cases, where the center is the highlight and the exact ordering of the cases in the periphery is not of concern. But if it is, a spatial order that is easy to discern is preferable. We have seen that children and adults mapped orders of spatial, temporal, quantitative, and preference concepts onto lines. For the case of time, the preferred orientation was horizontal across cultures and ages. Mapping time to horizontal, evident even in Chinese, a language written in columns (e.g., Boroditsky, 2001), is likely to have a basis in motion, which for humans and most creatures and natural phenomenon, is primarily horizontal. Motion is in space, on the plane, and takes time. In many senses, space, time, and motion are intertwined and sometimes interchangeable. Knowledge of space frequently comes from motion in time, from exploring environments and piecing together the parts. Spatial distance is often expressed as time, a 20-min walk or an hour’s drive. That said, concepts of space appear to be primary, and concepts of time derived from concepts of space (Boroditsky, 2000), perhaps because space can be viewed and time cannot. Time is a neutral dimension, and, as shall be seen, the vertical dimension appears to be preferred for evaluative concepts and the horizontal dimension for neutral concepts. Nevertheless, although time is primarily represented horizontally, as shall be seen, there are cases where time is represented vertically; for each dimension, there is a preferred directionality.

In the studies of Tversky et al. (1991), children and adults from all three language cultures preferred to map time horizontally. However, the direction of temporal increases reflected cultural habits, specifically, the order of reading and writing. English speakers typically arrayed temporal events from left to right and Arabic speakers from right to left, corresponding to the direction of writing in those languages. Hebrew speakers were split. Although writing proceeds right to left in Hebrew, numbers proceed left to right, as in Western languages. For the Arabic populations in this study, arithmetic is taught right to left until 5th grade, when it is reversed to conform to Western conventions. In addition, Hebrew characters are formed left to right, whereas Arabic characters are formed right to left, and Hebrew-speaking Israelis are more likely to be exposed to Western left to right languages.

The influence of reading order appears for a wide range of concepts, especially those related in some way to time. Counting, like writing, is serial, and takes place in time. The mental number line has an implicit spatial ordering evident in speed of calculations, left-to-right in readers of languages that go from left to right, and the opposite for languages that go from right to left and absent in illiterates (e.g., Dehaene, 1997; Zebian, 2005). Temporal order of events is gestured left to right in native Spanish speakers but right to left in native Arabic speakers, even when speaking Spanish (e.g., Santiago, Lupiáñez, Pérez, & Funes, 2007). Writing order affects perception of motion (e.g., Maass, Pagani, & Berta, 2007; Morikawa & McBeath, 1992), perceptual exploration and drawing (e.g., Chokron & De Agostini, 2000; Nachshon, 1985; Vaid, Singh, Sakhuja, & Gupta, 2002), aesthetic judgments (e.g., Chokron & De Agostini, 2000; Nachshon, Argaman, & Luria, 1999), emotion judgments (Sakhuja, Gupta, Singh, and Vaid, 1996), judgments of agency, power, and speed (Chatterjee, 2001, 2002; Hegarty, Lemieux, & McQueen, in press; Maass & Russo, 2003; Suitner & Maass, 2007), and art (Chatterjee, 2001; McManus & Humphrey, 1973). A variety of factors correlated with reading order seem to underlie these effects. The effects of reading order on perception of apparent motion and of speed and on perceptual organization seem to derive from long-term reading habits. The effects of reading order on judgments of agency, where figures on the left are seen as more powerful, seem to derive from language syntax, where the actor is typically earlier in the sentence than the recipient of action.

The respondents in study by Tversky et al. (1991), children and adults, did not use a graphic template to map abstract relations to the page. Mappings of quantity and preference, in contrast to mappings of time, did not reflect reading order. Speakers of all three languages were equally likely to map quantity and preference from right to left, left to right, and down to up. That is, their horizontal mappings corresponded to writing order only half the time, for both language orders. And vertical mappings were also used frequently, with large quantities and preferred alternatives at the top. Mapping increases in quantity or preference from up to down was avoided by all cultures especially for quantity and preference, for reasons elaborated below.

Writing order is one mapping of order to the page, a weaker one that depends on culture. In the large cross-cultural study of spontaneous mappings, it appeared only for temporal concepts (Tversky et al., 1991). Even there, although English speakers tended to map order left to right and Arabic speakers right to left in correspondence with writing and with numerals, Hebrew speakers did not show a strong preference, most likely because they were familiar with cultural artifacts ordered both ways. In contrast to the vertical dimension with its strong asymmetry defined by gravity and corresponding to people’s upright posture, the horizontal dimension has only weak asymmetries. Although the horizontal surface of the world is very salient, it has no privileged direction, unlike the vertical direction defined by gravity. The front–back axis of the body has strong asymmetries, but the left–right axis is more or less symmetric. Handedness is a notable exception; however, it is primarily behavioral rather than visible, and it varies across people with biases that depend on handedness (e.g., Casasanto, 2009). The plasticity across cultures of left–right horizontal mappings supports the claim that for the page, directional bias along the horizontal axis is weaker than directional bias along the vertical axis, hence influenced by cultural factors such as writing/reading direction.

The plasticity of the horizontal left–right (or right–left) dimension, suggested by its influence from cultural factors, is no doubt partly due to the absence of salient left–right asymmetries in the body or the world (e.g., Clark, 1973; Franklin & Tversky, 1990). It seems to be reinforced by a salient fact about human communication, either with other humans or with graphics. Communication normally happens face to face, where my left and right are the reverse of yours or the reverse of that depicted. So although godly figures are depicted or described with angels on his right and the devil on his left, his right is the viewers’ left. Some languages do not even distinguish right from left, leading to different organizations of space (e.g., Levinson, 2003). A number of factors, then, converge to render mappings to the horizontal dimension to be more flexible than those to the vertical dimension.

2.3.5. Direction in space: Vertical

By contrast, the use of the vertical to express asymmetric evaluative concepts like power, strength, and quality is evident in a broad range of gestures and linguistic expressions across cultures and has a basis in the nature of the world and the things in it, including ourselves (e.g., Clark, 1973; Cooper & Ross, 1975; Franklin & Tversky, 1990; Lakoff & Johnson, 1980; Talmy, 1983, 2000; Tversky, 2001). Gravity makes it more difficult to go up than to go down, so that it takes power, strength, health, and energy to go upwards. People, along with many other animals and plants, grow taller and stronger as they reach adulthood, and taller people tend to be stronger. People who are healthy and happy have the energy to stand tall and people who are weak or ill or depressed slump. Piles of money or other things grow higher as their numbers increase. Remember that children and adults alike used the vertical to represent increases in quantity and preference, with large quantities and preferences at the top, never the reverse. In a more complex graphing task, children and adults preferred steeper lines, those that incline more upwards, to represent greater rates (Gattis, 2002; Gattis & Holyoak, 1996). On the whole, more power, better health, greater strength, and more money are good, and less of all that is bad. This maps lower numbers to lower values and to lower spatial positions and higher numbers to higher values and higher spatial positions. The starting point is the ground. Low numbers are bad and high numbers are good. Gestures such as high five and thumbs up reflect the correspondence of upwards with positive value.

But mappings to vertical can conflict, with consequent confusions. The world and our experiences in it provide reasons for beginning at the top. People’s major perceptual and conceptual machinery, our eyes, our ears, and our brains, are at the top of our bodies. Reading order enters here as well; most written languages begin at the top, whether they go left to right or right to left, whether they are written in rows or columns. Numbering, then, can begin at the top or begin at the bottom. So familiar are the two mappings to numbers that we hardly notice the contradiction: the number one player, the one at the top, is the one with the highest number of points. Rises in unemployment or inflation are bad, but are mapped upwards because of rising numbers. These alternative mappings to vertical were seen in a survey of common diagrams in college textbooks for biology, earth sciences, and linguistics (Tversky, 2001). Almost all the diagrams of evolution had man (yes, man) at the top and almost all of the geological eras had the present era at the top; that is, each kind of diagram began at the bottom with prehistory, and depicted the culmination of “progress” at the top. Earlier time was at the bottom, later time at the top. Although evolutionary trees have man at the top, we speak of the “descent of man” not the ascent of man. In contrast, linguistic trees, like family trees, typically had the progenitor language at the top and the language derived from it descending downwards. For linguistic and family trees, time begins at the top. In memory, the concept “depth of processing,” which suggests that lower is more abstract and meaningful, is synonymous with the concept of “levels of processing,” which specifies that higher levels of processing are deeper, more abstract, and meaningful. Deep thought occurs at high levels of thinking.

Although there are multiple mappings of abstract dimensions and relations to direction in space, there are also some consistencies. Notably, horizontal and vertical are chosen for ordering, not diagonal or circular or some other path through space, undoubtedly related to the privileged status of horizontal and vertical in vision (e.g., Howard, 1982). The horizontal direction, the primary plane for motion, human and other, is readily mapped to time and more frequently used for other neutral concepts. By contrast, the vertical dimension formed by gravity is readily mapped to quantity and force, and more frequently used for evaluative concepts like quantity and preference. The vertical direction has salient and far-reaching asymmetries in the world and in human perception and behavior, with multiple correspondences from evaluative concepts like strength, power, health, and wealth to the upwards direction. The horizontal dimension has fewer asymmetries in the world and in human perception and behavior so weaker, cultural variables affect direction, notably, the direction of reading, writing, and arithmetic, and to some extent, handedness. These spatial meanings are reflected in language and in gesture as well. While both those on the politically left and those on the politically right will agree that it is better to be on top, they will disagree on whether left or right is better.

2.4. Mapping meaning to space

A variety of examples have shown that people readily map meaning to space, and to the space of a page. They use spatial properties of the page to relay a range of ideas, abstract, and concrete: proximity, place, linear arrays, horizontal, vertical, and direction to group categories, show relationships, illustrate orders, convey conceptual distance, express value, and more. We have already accumulated a small catalog of meaningful mappings to space: depictive or geographic, clumps for categories, center to catch attention or convey importance, lines for orders, distance/proximity in space to reflect distance/proximity on an abstract dimension, horizontal for time and concepts related to time, vertical for strength, quantity, force, power, and concepts related to them. Direction matters, too: Concordant with the vertical asymmetry of the world created by gravity and the human experience of living in the world, up is readily associated with increases in amount, strength, goodness, and power. The horizontal dimension of the world is more neutral, so less strongly tied to abstract concepts and more susceptible to cultural influences such as reading order and handedness. But there are caveats on these mappings. For one thing, they are incomplete and variable; different features may be mapped on different occasions. Hence, these mappings can conflict, especially when associated with number; a high score can determine who is first. These correspondences are natural in the sense that they have been invented and reinvented across cultures and contexts, they have origins in the body and the world, and they are expressed in spatial arrangements, spatial language, and spatial gestures.

3. Forms in space

Now we turn from the space of the page to marks on the page, to examine how marks convey a range of meanings, like space, by using natural correspondences. Although the simplest marks are dots or lines, the most common now and throughout history are undoubtedly what have been referred to as pictograms, icons, depictions, or likenesses, from animals on the ceilings of caves to deer on road signs. Marks on a page have been termed signs, which refer to objects for minds that interpret them, by Peirce, who distinguished three kinds of them (e.g., Hartshorne & Weiss, 1960). An icon denotes an object by resemblance, an index, such as a clock or thermometer, denotes an object by directly presenting a quality of an object, and a symbol, a category that includes certificates as well as words, denotes an object by convention.

Here, we first discuss some properties and uses of likenesses or icons, and then turn at greater length to a specific kind of symbol, which we have called a glyph (e.g., Tversky, 2004; Tversky et al., 2002). Glyphs are simple figures like points, lines, blobs, and arrows, which derive their meanings from their geometric or gestalt properties in context. Glyphs are especially important in diagrams because they allow visual means of expressing common concepts that are not easily conveyed by likenesses. Glyphs have parallels to certain kinds of gestures, for example, points that suggest things that can be conceived of as points or linear gestures that suggest relationships between things. They also bear similarities to words like point and relationship whose meanings vary with context.

Marks, whether likenesses or glyphs, like lines and circles, have visual characteristics other than shape that increase their effectiveness in conveying meaning. An important feature is size. The greater the size, the greater the chance of attracting attention. The toddler knows not only that centrality captures attention, but size as well. The toddler wanting attention puts her face close, blocking other things in the visual field. Size, like centrality, can also indicate importance. Greek vases use both centrality and size; the major figure is larger and in the center, with the others arrayed to either side in decreasing order of importance. Larger bar graphs represent greater quantity or higher ratings. Additional salient visual features, like color, boldness of line, highlighting, and animation, also serve to attract attention and convey importance.

3.1. Likenesses

Even sketchy likenesses can be readily recognized by the uninitiated. A toddler who had never seen pictures but could label real objects recognized simple line drawings of common objects (Hochberg & Brooks, 1962). Depictions have other impressive advantages over words in addition to being readily recognized: They access meaning faster (Smith & Magee, 1980) and enjoy greater distinctiveness and memorability (e.g., Paivio, 1986). Perhaps because of their advantages for establishing meaning and memory, likenesses are so compelling that they are produced even when not needed and even when drawing them increases time and effort: in diagrams of linear and cyclical processes produced by undergraduates (Kessell & Tversky, 2009), in diagrams of information systems by graduate students in design (Nickerson et al., 2008).

Likenesses have been creatively integrated into more abstract representations of quantitative data by Neurath and his Vienna Circle and later colleagues in the form of isotypes (Neurath, 1936). Isotypes turn bars into depictions, for example, the number of airplanes in an army or yearly production of corn by a country is represented by a proportional column (or row) of schematic airplanes or corn plants.

Just as likenesses can facilitate comprehension and memory, they can also interfere. Because depictions are specific and concrete, including them when they are not essential to the meaning of a diagram can inhibit generalization, to sets of cases not depicted. By contrast, glyphs, because they are abstractions, can encourage generalization. Capturing the objects in the world and their spatial arrays in diagrams is compelling and has some communicative value, but it can interfere or even conflict with the generalizations or abstractions diagrams are meant to convey. An intriguing example comes from diagrams of the water cycle in junior high science textbooks collected from around the world (Chou, Vikaros, & Tversky, 2009, unpublished data). The typical water cycle diagram includes mountains, snow, lakes, sky, and clouds. On the one hand, these diagrams intend to teach the cycle of evaporation of surface water, formation of clouds, and precipitation. They use arrows to indicate the directions of evaporation and precipitation. On the other hand, they also want to show the water cycle on the geography of the world. As a consequence, the arrows ascend and descend everywhere, so that the cyclicity is obscured. In studies investigating interpretations of slope in diagrams of the atmosphere, students’ inferences were more influenced by the conceptual mapping of rate to slope than by the geographic mapping (Gattis & Holyoak, 1996). In producing diagrams, for example, of a pond ecology, when groups work in pairs, the compelling iconicity evident in individual productions often disappears (Schwartz, 1995). Diagrams produced by dyads become more abstract, most likely because the irrelevant or distracting iconicity is idiosyncratic and the abstractions shared.

The conflict between visualizing the world and visualizing the general phenomena that occur in the world is especially evident when diagrams are used to convey the invisible such as evaporation and gravity. With all the challenges of conveying the visible, conveying the invisible, time, forces, values, and the like presents even more challenges. Glyphs are ideal for visually conveying the invisible. They are not iconic, they do not depict the visible world, so they do not confuse or distract, yet they share many of the advantages of visual communication over purely symbolic communication, notably rapid access to meaning. We turn now to many examples of using glyphs to visually convey invisible and abstract concepts.

3.2. Meaningful glyphs

We shift now from the complex and representative to the simple and abstract. Probably the simplest mark that can be made on the page is a dot, a mark of zero dimensions. Slightly more complicated, a line, a single dimension, followed by various two-dimensional or three-dimensional forms. These simple marks and others like them that we have termed glyphs have context-dependent meanings suggested by their Gestalt or mathematical properties (Tversky, 2004, 2001). On a map of the United States, New York City can be represented as a point, or the route from New York to Chicago as a line, or the entire city can be represented as a region, containing points and lines indicating, for example, roads, subway stops, and subway lines. Continuing, New York City can also be diagrammed as a three-dimensional space in which people move. Like many other spatial distinctions, this set of distinctions has parallels in language and gesture, parallels that suggest the distinctions are conceptual and widely applicable. Regarding an entity in zero, one, two, or three dimensions has implications for thought. In a paper titled, “How language structures space,”Talmy (2000) pointed out that we can conceptualize objects in space, events in time, mental states, and more as zero-, one-, or two-dimensional entities. In English, prepositions are clues to zero-, one-, two- (and three-) dimensional thinking, notably at, on, and in. She waited at the station, rode on the train, rose in the elevator. She arrived at 2, on time, and was in the meeting until dinner. She was at ease, on best behavior, in a receptive mood. Visual expressions of dimensionality are common in diagrams, as they abstract and express key conceptual components.

3.2.1. A visual toolkit for routes: Dots and lines

Dots, lines, and regions abound in diagrams. Dots and lines, nodes and links or edges are the building blocks of route maps. They also form a toolkit for a related set of abstractions, networks of all kinds. To uncover the basic visual and verbal vocabularies of route maps, students outside a dormitory were asked if they knew how to get to a nearby fast food restaurant. If they did, they were asked either to draw a map or to write directions to get there. A pair of studies confirmed that dots and lines, nodes and links, are the basic visual vocabulary of route maps, and that each element in the visual vocabulary for route directions corresponds to an element in the basic verbal vocabulary for route directions (Tversky & Lee, 1998, 1999). Notably, although the sketch maps could have been analog, they were not; turns were simplified to right angles and roads were either straight or curved. Landmarks were represented as dot-like intersections identified by street names or as nonspecific shapes. Short distances with many turns were lengthened to show the turns, and long distances with no actions were shortened. Thus, the route maps not only categorized continuous aspects of the world, they also distorted them. Interestingly, the verbal directions were similarly schematized. Distances were specified only by the bounding landmarks; turns were specified only by the direction of the turn, not the degree. The consensus visual vocabulary consisted of lines or curves, L, T, or + intersections, and dots or blobs as landmarks. The corresponding verbal vocabulary consisted of terms like “go straight” or “follow around” for straight and curved paths, “take a,”“make a,” or “turn” for the intersections, and named or implicit landmarks at turning points. The vocabulary of gestures used to describe routes paralleled the visual and verbal vocabularies (Tversky et al., 2009). These close parallels between disparate modes of communication suggest that the same conceptual structure for routes underlies all of them.

A second study provided students with either the visual or the verbal toolkit, and asked them to use the toolkit to create instructions for several dozen destinations, near and far (Lee & Tversky, 2005). They were asked to supplement the toolkits if needed. In spite of that suggestion, very few students added elements; they succeeded in using the toolkits to create a variety of new directions. Although the semantics (vocabularies) and syntax (rules of combing semantic elements) of route maps and route directions were similar, their pragmatics differs. Route maps cannot omit connections; they must be complete. Route directions can elide; for example, in a string of turns, one end-point is the next start-point, so it is not necessary to mention both.

Why do directions that are so simplified and distorted work so well? Because they are used in a context, and the context disambiguates (Tversky, 2003). This is another general characteristic of diagrams; they are designed to be used by a specific set of users in a specific context. Indeed, part of the success of route maps and route directions is that they have been developed in communities of users who collaborate, collectively and interactively producing and comprehending, thereby fine-tuning the maps and directions, a natural kind of user-testing that can be brought into the laboratory and accelerated (Tversky et al., 2007).

The success of the visual and verbal toolkits for creating route maps and route directions has a number of implications. It has already provided cognitive design principles—paths and turns are important; exact angles and distances are not—for creating a highly successful algorithm for on-line on-demand route directions (Agrawala & Stolte, 2001). It suggests that maps and verbal directions could be automatically translated from one to the other. It is encouraging for finding similar visual and verbal intertranslatable vocabularies for other domains, such as circuit diagrams or musical notation or even domains that are not as well structured domains such as assembly instructions, chemistry, and design. It suggests empirical methods for uncovering domain-specific visual and verbal semantics, syntax, and pragmatics. Finally, it shows that certain simple visual elements have meanings that are spontaneously produced and interpreted in a context. Some of these visual elements have greater generality. Lines are naturally produced and interpreted as paths connecting entities or landmarks that are represented as dots. Hence their widespread use, from social networks, connections among people, to computer networks, connections among computers or components of computers, and more.

3.2.2. Lines connect, bars contain

As Klee put it, “A line is a dot that went for a walk.” Lines are also common in graphs, again, as paths, connections, or relations. So are dots and bars. Graph lines connect dots representing entities with particular values on dimensions represented by the lines. The line indicates that the entities are related, that they share a common dimension, but have different values on that dimension. Bars, in contrast to lines, are two-dimensional; they are containers that separate their contents from those of others. In graphs, bars indicate that all the instances inside are the same and different from instances contained in other bars. To ascertain whether people attribute those meanings to bars and lines, in a series of experiments, students were shown a single graph, either a line graph or a bar graph, and asked to interpret it (Zacks & Tversky, 1999). Some of the graphs had no content, just A’s and B’s. Other graphs displayed either a discrete variable, height of men and women, or a continuous variable, height of 10- and 12-year-olds. Because lines connect and bars contain and separate, students were expected to favor trend descriptions for data presented as lines and favor discrete comparisons for data presented as bars, especially for the graphs without content. For the content-free graphs, the visual forms, bars or lines, had major effects on interpretations, with far more trends for lines and discrete comparisons for bars. More surprisingly, the visual forms had large effects on interpretations of graphs with content, in spite of contrary content. For example, using a line to connect the height of women and men biased trend interpretations, even, “as you get more male, you get taller.” These were comprehension tasks. Mirror results were obtained in production tasks, where students were provided with a description, trend, or discrete comparison, and asked to produce an appropriate graph. More students produced line graphs when given trend descriptions and bar graphs when given discrete comparisons, as before, in spite of contrary content. The meanings of the visual vocabulary, lines or bars, then, had a stronger effect on interpretations and productions than the conceptual character of the data. When the glyph, line or bar, matched the content, there were more appropriate interpretations and when the glyph did not match the content, there were more inappropriate interpretations (for other issues with bars and lines, see Shah & Freedman, 2010).

3.2.3. Lines can mislead

Because glyphs such as lines, dots, boxes, and arrows, induce their own meanings, they are likely to enhance diagrammatic communication when their natural meanings are consistent with the intended meaning and to interfere with diagrammatic communication when the natural meanings conflict with the intended meanings. This interaction was evident in the case of bar and line graphs for discrete and continuous variables, where the interpretations of the visual glyphs trumped the underlying structure of the data when they conflicted. Mismatches between the natural interpretations of lines as paths or connections and the intended interpretations in diagrams turn out to underlie difficulties understanding and producing certain information systems designs. A central component of information system design is a LAN or local area network, common in computer systems in every institution. All of the components in a LAN are interconnected so that each can directly transmit and receive information from each other. A natural way to represent that interconnectivity would be lines between all pairs of components. For large systems, this would quickly lead to a cluttered, indecipherable diagram. To insure legibility, a LAN is diagramed as if a clothesline, a horizontal line, with all the interconnected components hanging from it. However, when students in information design are asked to generate all the shortest paths between components from diagrams containing a LAN, many make errors. A common error demonstrates a strong bias from the line glyph. The shortest paths many students generate show that they think that to get from one component on a LAN to another, they must pass through all the spatially intermediate components, much like traveling a route, to go from 10th St to 30th St one must pass 11th, 12th, 13th, and so on (Corter, Rho, Zahner, Nickerson, & Tversky, 2009; Nickerson et al., 2008). Here, again, the visual trumps the conceptual and misleads.

Lines have mixed benefits in other cases, for example, in interpreting evolutionary diagrams where they can lead to false inferences (Novick & Catley, 2007). Yet another example comes from visualizations of space, time, and agents, diagrams that are useful for keeping track of schedules, suspects, pollen, disease, migrations, and more (Kessell & Tversky, 2008). In one experiment, information about the locations of people over time was presented either as tables with place and time as columns or rows and dots representing people as entries or as tables with lines connecting individuals from place to place over time. Because lines connect, one might expect that the lines would help to keep track of movements of each individual. In one task, participants were asked to draw as many inferences as they could from the diagrams; in another they were asked to verify whether a wide range of inferences was true of the diagrams. At the end of the experiment, they were asked which interface they preferred for particular inferences. Overall, participants performed better with dots than with lines both in quantity of inferences drawn and in speed and accuracy of verification. However, and consonant with expectations, there was one exception, one kind of inference where dots lost their advantage, inferences about the sequence of locations of individuals. For temporal sequence, lines were as effective and as preferred as dots. Nevertheless, the lines interfered with generating and verifying other inferences. In another experiment, participants were asked to generate diagrams that would represent the locations of individuals over time. Most spontaneously produced table-like visualizations, notably without lines. As for preferences, participants preferred the visualizations with dots over those with lines except for temporal sequences. These findings suggest that popular visualizations that rely heavily on lines, such as parallel coordinates (e.g., Inselberg & Dimsdale, 1990) and especially parallel sets (e.g., Bendix, Kosara, & Hauser, 2006), should be used with caution, and only when the lines are meaningful as connectors.

Arrows are asymmetric lines. As a consequence, arrows suggest asymmetric relationships. Arrows enjoy several natural correspondences that provide a basis for extracting meaning. Arrows in the world fly in the direction of the arrowhead. The residue of water erosion is a network of arrow-like lines pointing in the direction of erosion. The diagonals at the head of an arrow converge to a point. Studies of both comprehension and production of arrows show that arrows are naturally interpreted as asymmetric relationships. In a study of comprehension, students were asked to interpret a diagram of one of three mechanical systems, a car brake, a pulley system, or a bicycle pump (Heiser & Tversky, 2006). Half of each kind of the diagram included arrows, half did not. For the diagrams without arrows, students gave structural descriptions, that is, they provided the spatial relations of the parts of the systems. For the diagrams with arrows, students gave functional descriptions that provided the step-by-step causal operations of the systems. The second study provided a description, either structural or functional, of one of the systems and asked students to produce a diagram. Students produced diagrams with labeled parts from the structural descriptions but produced diagrams with arrows from the functional descriptions. Both interpretation and production, then, showed that arrows suggest asymmetric temporal or causal relations.

One of the benefits of arrows can also cause difficulties; they have many possible meanings. Arrows suggest many possible asymmetric relations (Heiser & Tversky, 2006). Their ambiguity can cause misconceptions and confusion. Arrows are used to label or focus attention; to convey sequence; to indicate temporal or causal relations; to show motion or forces; and more. How many meanings? Some have proposed around seven (e.g., van der Waarde & Westendorp, 2000, unpublished data), others, dozens (e.g., Horn, 1998). A survey of diagrams in introductory science and engineering texts revealed that many diagrams had different meanings of arrows in the same diagram, with no visual way to disambiguate them (Tversky, Heiser, Lozano, MacKenzie, & Morrison, 2007).

Circles, with or without arrows, can be viewed as another variant on a line, one that repeats with no beginning and no end. As such, circles have been used to visualize cycles, processes that repeat with no beginning and no end. The common etymology of the two words, circle and cycle, is one sign of the close relationship between the visual and the conceptual. However, the analogies, like many analogies, are only partial. Circles are the same at every point, with no natural divisions and no natural direction. Yet when we talk about cycles, we talk about them as discrete sequences of steps, sometimes with a natural beginning. Hence, cycles are often visualized as circles with boxes, text, or pictograms conveying each stage of the process.

A series of studies on production and comprehension of visualizations of cyclical and linear processes asked participants to produce or interpret appropriate marks on paper (Kessell & Tversky, 2009). In a set of studies, participants were asked to fill in circular diagrams with four boxes at 12 o’clock, 3 o’clock, 6 o’clock, and 9 o’clock with the four steps of various cyclical processes, everyday (e.g., washing clothes, seasons) and scientific (e.g., the rock cycle, the water cycle). They did this easily. Although circles have no beginning, many cycles there have a conceptual beginning, and students tended to place that at 12 o’clock, and then proceed clockwise. Conversely, when asked to interpret labeled circular diagrams, they began at 12 o’clock and proceeded clockwise, except when the “natural” starting point of a cycle, for example, the one-cell stage of mitosis, was at another position. In a second set of studies, students were given blank pages and asked to produce diagrams to portray cyclical processes, like the seasons or the seed-to-plant-to-seed cycle, as well as linear processes, like making scrambled eggs or the formation of fossil fuel. Both cycles and linear processes had four stages. Unsurprisingly, most students portrayed the linear processes in lines, but, more surprisingly, most portrayed the cyclical processes as lines as well, without any return to the beginning. Heavy-handed procedures, presenting only cyclical processes, calling them such, and listing the stages vertically, brought the frequency of circular diagrams to 40%. Changing the list of stages so that the first stage was also the last, as in “the seed germinates, the flower grows, the flower is pollinated, a seed is formed, the seed germinates,” induced slightly more than 50% of participants to draw the stages in a circle, but still, more than 40% drew lines. There is strong resistance to producing circular diagrams for cycles, even among college students. In the final study, participants were provided with a linear or circular diagram of four stages of a cycle, and asked which they thought was better. Over 80% of participants chose the circular display. This is the first case we have found where production and preference do not match, though production lags comprehension in other domains, notably, language acquisition.

Why do people prefer circular diagrams of cycles but produce linear ones? We speculate that linear thinking is easier than circular; that is, it is easier to think of events as having a beginning, a middle, and an end, a forward progression in time, than it is to think of events as returning to where they started and beginning all over again, without end. Events occur in time, time marches relentlessly forward, and does not bend back on itself. Each day is a new day, each seed a new seed; it is not that a specific flower emanates from a seed and then transforms back into one. Thinking in circles requires abstraction, it is not thinking about the individual case, but rather thinking about the processes underlying all the cases. What is more, the sense in which things return to where they started is different in different cases. Every day has a morning, noon, and night, but each morning, noon, and night is unique. A cell divides into two, and then each of those cells undergoes cell division. For clothing and dishes, however, the very same articles of clothing and the very same dishes undergo washing, drying, putting away each time. Viewing a circular diagram enables that abstraction, and once people “see” it (the diagram and the underlying ideas), they prefer the abstract depiction of the general processes to the more concrete depiction of the individual case.

3.2.4. Boxes and frames

Earlier, we saw that people interpret bars as containers, separating their contents from everything else. Boxes are an ancient noniconic depictive device, evident explicitly in stained glass windows, but even prior to that, in Roman wall frescoes. Frames accentuate a more elementary way of visually indicating conceptual relatedness, grouping by proximity, for example, the spaces between words. Framing a picture is a way of saying that what is inside the picture has a different status from what is outside the picture. Comics, of course, use frames liberally, to divide events in time or views in space. Comics artists sometimes violate that for effect, deliberately making their characters pop out of the frame or break the fourth wall, sometimes talking directly to the reader. The visual trope of popping out of the frame makes the dual levels clear, probably even to children: The story is in the frames, the commentary outside (e.g., Wiesner, 2001; Tversky & Bresman, unpublished data). Speech balloons and thought bubbles are a special kind of frame, reserved for speech or thought; as for other frames, they serve to separate what is inside from what is outside. Frames, like parentheses, can embed other frames, hierarchically, indicating levels of conceptual spaces, allowing meta-levels and commentaries. Boxes and frames serving these ends abound in diagrams, in flow charts, decision trees, networks, and more.

3.2.5. Complex combinations of glyphs

As was evident from the visual toolkit for routes, glyphs can be combined to create complex diagrams that express complex thoughts and systems. Like combining words into sentences, combining glyphs into systems follows domain-specific syntactic rules (e.g., Tversky & Lee, 1999). Networks of lines and nodes, more abstractly, concepts and connections between concepts, are so complete and frequent that they constitute a major type of diagram. Others types of diagrams include the following: hierarchies, a kind of network with a unique beginning and layers of asymmetric relations, such as taxonomies and organization charts; flow charts consisting of nodes and links representing temporal organizations of processes and outcomes; decision trees, also composed of nodes and links, where each node is a choice. A slightly different type of diagram is a matrix, a set of boxes organized to represent the cross-categorization of sets of dimensions or attributes. These organized sets of glyphs and space constituting diagrammatic types appear to match, to naturally map, conceptual organizations of concepts and relations. That is, for networks, hierarchies, and matrices, students were able to correctly match a variety of conceptual patterns onto the proper visualization (Novick, Hurley, & Francis, 1999; Novick & Hurley, 2001).

Note that many of these visual complex combinations of glyphs, for example, bar and line graphs, social and computer networks, decision and evolutionary trees, have no pictorial information whatsoever, yet they inherit all the advantages of being visual. They enable human application of visuospatial memory and reasoning skills to abstract domains.

3.2.6. Sketches

The aim of most of the diagrams discussed thus far is to convey certain information clearly in ways that are easily apprehended, from route directions to data presentations to scientific explanations. Another important role for visualizations of thought is to clarify and develop thought. This kind of visualization is called a sketch because it is usually more tentative and vague than a diagram. Sketches in early phases of design even of physical objects, like products and buildings, are frequently just glyphs, lines and blobs, with no specific shapes, sizes, or distances (e.g., Goel, 1995; Schon, 1983). Designers use their sketches in a kind of conversation: They sketch, reexamine the sketch, and revise (Schon, 1983). They are intentionally ambiguous. Ambiguity in sketches, just like ambiguity in poetry, encourages a multitude of interpretations and reinterpretations. Experienced designers may get new insights, see new relationships, make new inferences from reexamining their sketches, a positive cycle that leads to new design ideas, followed by new sketches and new ideas (Suwa & Tversky, 2001, 2003). Ambiguity can help designers innovate and escape fixation by allowing perceptual reorganization and consequent new insights, a pair of processes, one perceptual, finding new figures and relations, and one conceptual, finding new interpretations, termed “constructive perception” (Suwa & Tversky, 2001, 2003).

3.2.7. Glyphs: Simple geometric forms with related meanings

Diagrams and other forms of visual narratives are enhanced by the inclusion of a rich assortment of schematic visual forms such as dots, lines, arrows, circles, and boxes, whose meanings derive from and are constrained by their Gestalt or mathematical properties within the confines of a context. The meanings they support, entities, relations, asymmetric relations, processes, and collections, are abstract, so apply to many domains. They encourage the kind of abstractions needed for inference, analogy, generalization, transfer, and insight. They have analogs in other means of recording and communicating ideas, in language and in gesture, suggesting that they are elements of thought.

There are other abstract visual devices, infrequent in diagrams, but common in graphic novels and comics, lines suggesting motion, sound, fear, sweat, emotions, and more (e.g., McCloud, 1994). Some of these, like the lines, boxes, and arrows discussed above, have meanings suggested by their forms. Motion lines, for example, seem to have developed as a short-hand or schematization of the perceptual blurring of viewed fast motion. Others, like hearts for love, are more symbolic. The concepts conveyed by the diagrammatic schematic forms are not as readily depictable as objects or even actions.

Those glyphs, such as dots, lines, arrows, frames, and circles, that enjoy a consensus of context-dependent meanings evident in production and comprehension seem to derive their meanings in ways similar to the ways pictograms establish meanings, overlapping features. Among the properties of lines is that they connect, just as relationships, abstract or concrete, connect. Among the properties of boxes is that they contain one set of things and separate those from other things. What is in the box creates a category, leaving open the basis for categorization to the creator or interpreter. The box implies that the things in the box are more related or similar to each other than to things out of the box. The box might contain a spatial region, a temporal slice, a set of objects. These mappings of meaning, the transfer of a few of the possible features from the object represented to the representing glyph, are partial and variable. The consequence is variability of meaning, allowing ambiguity and misconception. A case in point is uses of arrows, which map asymmetric relations. But there are a multitude of asymmetric relations, temporal order, causal order, movement path, and more. In well-designed diagrams, context can clarify, but there are all too many diagrams that are not well designed.

The concepts suggested by glyphs have parallels in language and gesture with the same tradeoffs between abstraction and ambiguity. Think of words, notably spatial ones that parallel glyphs, like relationship or region or point. A romantic relationship? A mathematical relationship? Here, context will likely disambiguate, but not on all occasions. There is good reason why spatial concepts, whether diagrammatic or linguistic or gestural, have multiple meanings; they allow expression of kinds of meanings that apply to many domains.

Much has been said on what depictions do well: make elements, relations, and transformations of thought visible, apply human skills in visuospatial reasoning to abstract domains, encourage abstraction, enable inference, transfer, and insight, promote collaboration. But many concepts essential to thought and innovation are not visible. A key significance of glyphs is that they can visualize the invisible, entities, relations, forces, networks, trees, and more.

4. Processing and designing diagrams

4.1. Processing diagrams

Good design must take into account the information-processing habits and limitations of human users (e.g., Carpenter & Shah, 1998; Kosslyn, 1989, 2006; Pinker, 1990; Shah, Freedman, & Vekiri, 2005; Tversky, Morrison, & Betrancourt, 2002). The page is flat, as is the visual information captured by the retina. Reasoning from 3D diagrams is far more difficult than reasoning from 2D diagrams whether depictive (e.g., Gobert, 1999) or conceptual (e.g., Shah & Carpenter, 1995). Language, visual search, and reasoning are sequential and limited, so that continuous animations of explanatory information can cause difficulties (e.g., Ainsworth, 2008a,b; Hegarty, 1992; Hegarty, Kriz, & Cate, 2003; Schnotz & Lowe, 2007; Tversky, 2001).

Ability matters. Spatial ability is not a unitary factor, and some aspects of spatial thinking, especially performing mental transformations and integrating figures, matter for some situations and others for others (e.g., Hegarty, in press; Hegarty & Waller, 2006; Kozhevnikov, Kosslyn, & Shephard, 2005; Suwa & Tversky, 2003). Different spatial, and undoubtedly conceptual, abilities are needed for different kinds of tasks and inferences that involve diagrams.

Expertise matters. It can trade off with ability. As noted, diagrams, like language, are incomplete and can be abstract, requiring filling in, bridging inferences. Domains include implicit or explicit knowledge that allows bridging, encouraging correct interpretations and discouraging incorrect ones. The significance of domain knowledge was illustrated in route maps and holds a fortiori in more technical domains (e.g., Committee on Support for Thinking Spatially, 2006).

Working memory matters. Although, as advertised, external representations relieve working memory, they do not eliminate it. Typically, diagrams are used for comprehension, inference, and insight. All involve integrating or transforming the information in diagrams, processes that take place in the mind, in working memory. Imagine multiplying two three-digit numbers, even when the numbers are before your eyes, without being able to write down the product of each step (see Shah & Miyake, 1996).

Structure matters. When diagrams are cluttered with information, finding and integrating the relevant information takes working memory capacity. Schematization, that is, removing irrelevant details, exaggerating, perhaps distorting, relevant ones, even adding relevant but invisible information, can facilitate information processing in a variety of ways. Aerial photographs make poor driving maps. Schematization can reduce irrelevancies that can clutter, thereby allowing attention to focus on important features, increasing both speed and accuracy of information processing (e.g., Dwyer, 1978; Smallman, St. John, Onck, & Cowen, 2001; Tversky, 2001).

Sequencing matters. Conveying sequential information, important in history, science, engineering, and everyday life, poses special challenges. Sometimes a sequence of steps can be shown in a single diagram; Minard’s famous diagram of Napoleon’s unsuccessful campaign on Russia is a stellar example. Time lines of historical events are another common successful example. Depicting each step separately and connecting them, often using frames and arrows, is another popular solution, from Egyptian tomb paintings showing the making of bread to Lego instructions. Both separating and connecting require careful design. People segment continuous organized action sequences into meaningful units that connect perception and action, by changes in scene, actor, action, and object (e.g., Barker, 1963; Barker & Wright, 1954; Tversky, Zacks, & Hard, 2008; Tversky et al., 2007; Zacks & Tversky, 2001; Zacks, Tversky, & Iyer, 2001). A well-loved solution to showing processes that occur over time is to use animations. Animations are attractive because they appear to conform to the Congruity Principle: They use change in time to show change in time, a mentally congruent relation (Tversky, 2001). However, as we have just seen, the mind often segments continuous processes into steps (e.g., Tversky et al., 2007; Zacks et al., 2001), suggesting that step-by-step presentation is more congruent to the way the mind understands and represents continuous organized action than continuous presentation. The segmentation of routes by turns and object assembly by actions provide illustrative examples. Animations can suffer two other shortcomings: They are often too fast and too complex to take in, violating the Apprehension Principle, and they show, but do not explain (b2000Tversky et al., 2001). Even more than in static diagrams, visualizing the invisible, causes, forces, and the like, is difficult in animations. And, indeed, a broad range of kinds of animations for a broad range of content have not proved to be superior to static graphics (e.g., Mayer, Hegarty, Mayer, & Campbell, 2005; Stasko & Lawrence, 1998; Tversky, 2001; Tversky, Heiser, et al., 2007).

Multi-media matters. Depictions and language differ in many ways, some discussed earlier, among them, expressiveness, abstraction, constraints, accessibility to meaning (e.g., Stenning & Oberlander, 1995). As we have seen, many meanings may be easier to convey through diagrams, but diagrams can also mislead. Diagrams usually contain words or other symbolic information; the visuals, even augmented with glyphs, may not be sufficient. Maps need names of countries, towns, or streets. Network diagrams need names of the nodes and sometimes the edges. Economic graphs need labels and numeric scales to denote years or countries or financial indices. Anatomical diagrams need names of muscles and bones. But diagrams often need more than labels and scales. Although arrows can indicate causes and forces, the specific forces and causes may need language. In addition, redundancy often helps (e.g., Ainsworth, 2008a,b; Mayer, 2001). Just as diagrams need to be carefully designed to be effective, so does language.

4.2. Designing diagrams

The previous analyses of place and form in diagrams were based on historical and contemporary examples that have been invented and reinvented across time and space. They have been refined by the generations through informal user testing in the wild. The analyses provide a general guideline for designing effective diagrams: Use place in space and forms of marks to convey the kinds of meanings that they more naturally convey. For example, use the vertical for evaluative dimensions, mapping increases upwards. Use the horizontal for neutral dimensions, especially time, mapping increases in reading order. Use dots for entities, lines for relations, arrows for asymmetric relations, boxes for collections. Disambiguate when context is not sufficient. Although helpful, these are general guidelines often not sufficient for specific cases.

The previous analyses of the evolution and refinement of diagrams also suggest methods to systematically develop more specific guidelines when needed, to formalize the natural user testing cycle—produce, use, refine—and bring it into the laboratory by turning users into designers. One project used this procedure for developing cognitive design principles for assembly instructions (Tversky et al., 2007). Students first assembled a TV cart using the photograph on the box. They then designed instructions to help others assemble the cart. Other groups of students used and rated the previous instructions. Analysis of the highly rated and effective instructions revealed the following cognitive principles: Use one diagram per step, segment one step per part, show action, show perspective of action, and use arrows and guidelines to show attachment and action. A computer algorithm was created to construct assembly diagrams using these guidelines, and the resulting visual instructions led to better performance than those that came with the TV cart. These cognitive principles apply not just to assembly diagrams but more broadly to visual explanations of how things behave or work. Moreover, the cycle of producing, using, and refining diagrams is productive in improving diagrams even with a single person (Karmiloff-Smith, 1979, 1990; Lee & Karmiloff-Smith, 1996; Tversky & Suwa, 2009).

5. Diagrams as a microcosm of cognition

Diagrams and other depictions are expressions and communications of thought, a class that includes gesture, action, and language. In common with gesture and action, diagrams use place and form in space to convey meanings, concrete and abstract, quite directly. This paper has presented an analysis and examples of the ways that place and form create meanings, an analysis that included the horizontal, vertical, center–periphery, and pictorial organization of the page as well as the dots, lines, arrows, circles, boxes, and likenesses depicted on a page. In combination, they enable creating the vast variety of visual expressions of meaning, pictures, maps, mandalas, assembly instructions, highway signs, architectural plans, science and engineering diagrams, charts, graphs, and more. Gestures also use many of these features of meaning, but they are more schematic and fleeting; diagrams can be regarded as the visible traces of gestures just as gesturing can be regarded as drawing pictures in the air.

The foundations of diagrams lie in actions in space. People have always organized things and spaces to serve their ends: securing, storing, and preparing food, making and using artifacts, designing shelter, navigating space. The consequences of these actions are the creation of simple geometric patterns in space, patterns that are good gestalts, and that are readily recognized. The patterns invite abstract interpretations: Groups signal similar features or related themes, orders signal dimensions or continua, distributions signal one-to-one or one-to-many correspondences. The creation and interpretation of these patterns form the rudiments of abstract thought: categories, relationships, orderings, hierarchies, dimensions, and counting (e.g., Dehaene, 1997; Gelman & Gallistel, 1986; Frank, Everett, Fedorenko, & Gibson, 2008; Gordon, 2004; Hughes, 1986; Lakoff & Nunez, 2000). The spatial patterns can be manipulated by the hands or by the mind (e.g., Shephard & Podgorny, 1978; Tversky, 2005) to create further abstractions; they form spatial-action representations for the abstractions that underlie the feats of the human mind, a three-way interaction that can be termed spraction. Spractions, then, are actions in space, whether on objects or as gestures, that create abstractions in the mind and patterns in the world, intertwined so that one primes the others. Like language, spractions support and augment cognition and action; unlike language, they do so silently and directly. The arrangements and organizations used to design the world create diagrams in the world: The designed world is a diagram.


The author is indebted to many colleagues, collaborators, and commentators, including Maneesh Agrawala, Jon Bresman, Herb Clark, Danny Cohen, Jim Corter, Stu Card, Felice Frankel, Nancy Franklin, Pat Hanrahan, Mary Hegarty, Julie Heiser, Angela Kessell, Paul Lee, Julie Morrison, Jeff Nickerson, Jane Nisselson, Laura Novick, Ben Shneiderman, Penny Small, Masaki Suwa, Holly Taylor, Jeff Zacks, and Doris Zahner. The author is also indebted to the following grants for facilitating the research and/or preparing the manuscript: National Science Foundation HHC 0905417, IIS-0725223, IIS-0855995, and REC 0440103, the Stanford Regional Visualization and Analysis Center, and Office of Naval Research NOOO14-PP-1-O649, N000140110717, and N000140210534.