How to Bootstrap a Human Communication System


Correspondence should be sent to Nicolas Fay, School of Psychology, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia. E-mail:


How might a human communication system be bootstrapped in the absence of conventional language? We argue that motivated signs play an important role (i.e., signs that are linked to meaning by structural resemblance or by natural association). An experimental study is then reported in which participants try to communicate a range of pre-specified items to a partner using repeated non-linguistic vocalization, repeated gesture, or repeated non-linguistic vocalization plus gesture (but without using their existing language system). Gesture proved more effective (measured by communication success) and more efficient (measured by the time taken to communicate) than non-linguistic vocalization across a range of item categories (emotion, object, and action). Combining gesture and vocalization did not improve performance beyond gesture alone. We experimentally demonstrate that gesture is a more effective means of bootstrapping a human communication system. We argue that gesture outperforms non-linguistic vocalization because it lends itself more naturally to the production of motivated signs.

1. Introduction

Communication systems are composed of signs. Human communication systems use three basic types of signs: icon, index, and symbol (Peirce, 1931–1958). These sign types differ in their relation to the referent they signify. Icons and indices are motivated signs (i.e., there is a non-arbitrary correspondence between form and meaning). Icons bring a referent to mind via perceptual resemblance (e.g., a portrait of Barack Obama brings the US president to mind). Indices bring a referent to mind via the natural association between sign and referent (e.g., the smell of burning is an index of fire). Unlike icons and indices, the relation between symbol and referent is arbitrary (e.g., in the case of traffic signals, red means stop, and green means go). Because symbols lack “clues” to their meaning, they must be learned via association (i.e., the red traffic light must be conventionally associated with the command to stop; Clark, 2003).

In spoken language, words tend to be arbitrarily associated with their specific meaning (e.g., there is nothing about the word “house” that suggests a building in which people live; although see Farmer, Christiansen, & Monaghan, 2006 for evidence of iconicity across broad word categories). For Saussure the arbitrary link between linguistic form and meaning is a hallmark of human language. This article examines the types of sign that are needed to bootstrap effective human communication systems. This question is presented by Harnad (1990) as the symbol-grounding problem (how meaning can arise from meaningless signs in the absence of a conventional communication system). We start by reviewing the literature on the evolution of sign systems, arguing that motivated signs play an important role in the creation of ad hoc human communication systems. We then present a laboratory study showing that repeated gesture (a motivated communication medium) is superior to repeated non-linguistic vocalization (a symbolic communication medium) when participants play a communication game that prohibits the use of conventional language (i.e., when they are forced to create a new communication system from scratch).

2. Why motivated signs are needed to bootstrap a human communication system

Computer simulations indicate that conventional communication systems can arise out of meaningless symbols (Barr, 2004; Kirby & Hurford, 2002; Puglisi, Baronchelli, & Loreto, 2008; Steels, 2003). In Steels and colleagues “talking heads experiment” computer agents try to communicate a range of meanings to a partner using arbitrary symbols (e.g., wakabu to communicate blue square). If the hearer agent correctly guesses the speaker agent's intended meaning, a success signal is sent. If unsuccessful, a fail signal is sent. On the basis of this feedback, agents' lexicons are updated using a reinforcement-learning algorithm. Over many thousands of pair-wise interactions, arbitrary signal-meaning pairings become conventionalized in a population of agents. It is important to point out that unlike artificial agents, human agents have a limited memory and tend not to receive explicit performance feedback. This casts some doubt on the generalizability of computer simulations to human populations.

Naturalistic research and experimental studies suggest that motivated signs may play an important role in the grounding of shared sign-meaning mappings among human agents. Early writing systems (e.g., Sumerian, early Egyptian, and ancient Chinese) made extensive use of motivated signs that have since changed in the direction of arbitrariness. For example, in Chinese hanzi the present-day character for mountain began as a picture of three mountains image_n/cogs12048-gra-0001.png, evolving via an intermediate stage image_n/cogs12048-gra-0002.png to its current form image_n/cogs12048-gra-0003.png (Vaccari & Vaccari, 1961). Many signs in American Sign Language have a similar history, starting as iconic representations of objects or actions before losing much of their original motivation (Frishberg, 1975). Consistent with the importance of motivated forms when communicating novel meanings, signers who lack a name for something tend to create an iconic sign for it (Klima & Bellugi, 1979). The importance of motivated forms for bootstrapping communication is seen in home-sign systems, gesture-based systems developed by deaf children raised by non-signing parents. Unable to acquire the spoken language of their linguistic community, and without exposure to a conventional sign language, the deaf children invent highly motivated “pantomimic” gestural communication systems (Goldin-Meadow, 2003; Morford, 1996).

These naturalistic observations are supported by experimental studies. The experimental semiotic approach requires that humans communicate without using their conventional language, often in a different modality (for a review see Galantucci & Garrod, 2010; Scott-Phillips & Kirby, 2010). In one approach, participants communicate a set of recurring concepts (e.g., Art Gallery, Drama, Television) to a partner by drawing on a shared whiteboard (Fay, Garrod, & Roberts, 2008; Fay, Garrod, Roberts, & Swoboda, 2010; Garrod, Fay, Lee, Oberlander, & MacLeod, 2007; Garrod, Fay, Rogers, Walker, & Swoboda, 2010). Like the game Pictionary, participants are not allowed to use letters or numbers. Otherwise, the forms participants can produce are unconstrained. Initially participants' drawings are motivated structure-mappings of the concepts (e.g., at Game 1 Parliament is an iconic depiction of the debating chambers). As participants repeatedly play the game, communicating the same concepts again and again, their identification accuracy improves (to ceiling) and their drawings become increasingly aligned (with those produced by their partner) and arbitrary (by Game 6 Parliament is communicated by two lines plus a circle; see Fig. 1). Thus, motivated signs helped ground shared meanings, and interaction drives sign refinement until much of the sign's initial complexity is lost, making them simpler, more symbolic, and more efficient to produce.

Figure 1.

Sign refinement and alignment for the concept Parliament across six games between a pair playing the Pictionary-like task. Participant numbers are given in bold on the top right of each drawing (from Fay et al., 2010).

In other communication tasks, the opportunity to use motivated signs is reduced. The effect this has on participants' ability to create a mutually comprehensible communication system is striking. In one study, pairs of participants try to coordinate their behavior by moving to the same location on a grid (Galantucci, 2005). A geometric shape marks each grid location (e.g., triangle, flower, circle, hexagon), but participants cannot see the location of their partner. To solve the task, players must develop ways of communicating the different grid locations. They do this by communicating across an unusual graphical medium: a shared digitizing pad that moves vertically as players draw on it (e.g., a horizontal line is transformed into a diagonal line). This device makes it difficult to copy the unique geometric shapes that mark each grid location. Task performance varies widely (with some players unable to solve the task). Crucially, players who are able to develop a highly motivated communication system performed best at this task (i.e., an iconic system that structure maps the different geometric shapes; that is, parts of the sign can be mapped onto parts of the referent).

Other tasks eliminate the opportunity for motivated sign use. In one such study, pairs of participants try to move their avatar to the same colored quadrant on a multicolored two by two grid (Scott-Phillips, Kirby, & Ritchie, 2009). Each participant occupies a different grid, and across grids at least one identically colored quadrant is shared. Because players cannot see the colors on their partner's grid (only the colors on their own grid), they must find a way to identify the shared color. Interestingly, the players' only medium for communication is movement: the movement of their avatars (visible to each other) around the grid. Thus, there is no opportunity for motivated sign use (the relationship between movement and color is arbitrary, related neither by resemblance nor by natural association). Unable to create motivated signs, few participants successfully completed the task (only 17% of pairs in one of the experiments reported).

In summary, computer simulations show that symbols can bootstrap a conventional communication system via the interactions of artificial agents. In contrast, historical studies of writing systems and sign languages, in addition to experimental semiotic studies, suggest that motivated signs play a key role in bootstrapping communication systems among interacting human agents. The experiment reported next compares human participants' ability to bootstrap a communication system using communication mediums that differ in the opportunity they afford for motivated sign use.

Gesture can bootstrap a human communication system: This is seen in established sign languages like American Sign Language, in recently evolved sign languages (Sandler, Meir, Padden, & Aronoff, 2005; Senghas, Kita, & Ozyurek, 2004), and in the idiosyncratic home-sign systems developed by deaf children and their non-signing parents (Goldin-Meadow, 2003; Morford, 1996). We argue that the opportunity gesture affords for the production of motivated signs is core to its success. Can non-linguistic vocalization, a mostly symbolic medium of communication, bootstrap a human communication system? If so, this would argue against the importance of motivated signs to bootstrapping human communication systems. This study compares human participants' ability to communicate a range of items to a partner using repeated non-linguistic vocalization, gesture, or non-linguistic vocalization plus gesture.

3. Method

3.1. Participants

Forty-eight undergraduate psychology students (19 males) participated in exchange for partial course credit or payment. All were free of any visual, speech, or hearing impairment.

3.2. Task and procedure

Participants worked in pairs, one as the director and the other as the matcher. Participants were randomly assigned to the director or matcher role and remained in their role for the duration of the experiment. This is a basic referential communication task design (Krauss & Weinheimer, 1966). On any trial the director's task was to communicate a specific concept from an ordered list of 24 items (18 target items and 6 distractors) that were known to both participants. Concepts were grouped into three categories (emotion, action, and object) and included easily confusable items such as “tired” and “sleeping” (see Table 1 for a complete listing of the experimental items). The director's task was to communicate the first 18 items from his or her list in the order specified. The matcher's task was to indicate on his or her list (presented in a different random order) the item communicated by the director. Participants played the game six times with the same partner, using the same item set on each game.

Table 1. The experimental items directors tried to communicate to matchers (distracter items are given in italic). Target and distracter items were fixed across conditions and throughout the experiment
Happy Eating Mud
Ill Hitting Rain

Each pair was randomly allocated to one of three experimental conditions: vocalization-only, gesture-only, or gesture plus vocalization (combined; = 16 per condition). In each condition, participants were seated at opposite sides of a round table 1 m in diameter. Those in the vocalization condition were told they could make any sounds they wished, but they were not permitted to use words. In this condition participants sat back to back ruling out the use of visual signals. The matcher indicated to the director he or she had made a selection by saying “ok,” and then privately inserted the trial number (1–18) next to the selected item. Once the director had communicated each of the 18 target items, the next game began. The director then communicated the same 18 target items, but in a different random order (this was repeated six times). Those in the gesture-only condition faced one another from opposite sides of the table. All communication was limited to gesture (hand, body, and face) and vocalizing was prohibited. Matchers indicated to the director that they had made their selection by giving the “thumbs up” sign and then privately inserted the trial number next to the selected item. Participants in the combined condition followed the same procedure as those in the gesture-only condition but were permitted to vocalize in addition to gesturing.

Irrespective of role, both participants could interact within a trial (e.g., a matcher might seek clarification by frowning or by grunting). As in most human communication studies, participants were not given explicit feedback with regard to their communication success (e.g., Anderson et al., 1991; Clark & Wilkes-Gibbs, 1986; Garrod & Anderson, 1987; Garrod et al., 2007). All communication was recorded using a closed-circuit security system. Recorded examples from each condition are available at

4. Results

We compared the effectiveness and efficiency of the communication systems that arose under the different conditions by computing the percentage of items successfully communicated and the time taken (in seconds) to communicate each item. An example of communication in the gesture-only condition is given in Fig. 2.

Figure 2.

At Game 1 (panel A) and Game 6 (panel B), the director (female on the right) uses iconic gestures to communicate the concept “fleeing.” At Game 1, she initially adopts a character viewpoint, rapidly moving her arms up and down to simulate running. She then adds an additional gesture, from an observer viewpoint (following McNeill's definition 1992); she extends her arms away from her body while moving her index and middle fingers backward and forward to communicate fleeing. At Game 6, she communicates fleeing using a briefer version of the observer viewpoint gesture given at Game 1.

5. Communication effectiveness

Identification accuracy measures how effective the signs were at picking out their referent. As Fig. 3 shows, participants' identification accuracy improved across Games 1 to 6 in all conditions and for all item types (emotion, action, and object). However, matchers performed better in the gesture and combined conditions than in the vocal condition, with no difference between the gesture and combined conditions. In the vocal condition emotion items were more effectively communicated than action items, and action items were more effectively communicated than object items.

Figure 3.

Mean identification accuracy across item categories (emotion, action, and object) and Games (1–6), expressed as percentage scores, by participants in the vocal, gesture, and combined communication conditions. Error bars indicate the standard errors of the means.

Participants' mean percent accuracy scores were entered into a mixed design anova that treated Condition (vocal, gesture, and combined) as a between-participant factor and Item (emotion, action, and object) and Game (1–6) as within-participant factors. This returned a main effect of Gamelinear [F(1, 21) = 24.54, < .01, ηp2 = .54], indicating that identification accuracy improved across games for all conditions. It also returned main effects of Condition [F(2, 21) = 56.54, < .01, ηp2 = .84] and Item [F(2, 42) = 29.87, < .01, ηp2 = .59] and a reliable Condition × Item interaction [F(4, 42) = 34.78, < .01, ηp2 = .77].

There were simple main effects of accuracy for object, action, and emotion items [Fs(2, 63) > 10.29, ps < .01, ηp2s > .39]. For each item type, accuracy was higher in the gesture and combined conditions when compared to the vocal condition [ts(14) > 2.24, ps < .05, ds > 1.15]. Identification accuracy did not differ between the gesture and combined conditions (ts < 0.69, ps > .10). Only in the vocal condition did communication success vary by item category [F(2, 42) = 95.48, < .01, ηp2 = .91]: Emotion items were better identified than action items [t(7) = 3.33, < .01, d = .73] and action items were better identified than object items [t(7) = 9.95, < .01, d = 2.66]. The different item categories were communicated equally well in the Gesture and Combined conditions (Fs < 2.23, ps > .10).

6. Communication efficiency

To measure communication efficiency, we calculated the time taken to communicate each item, from first communication (vocal or gesture) to matcher selection. Time offers a convenient measure of sign efficiency that is comparable across communication modalities. Fig. 4 shows that the time taken to communicate the items was reduced from Game 1 to 6 in all conditions. However, communication was more efficient in the gesture and combined conditions when compared to the vocal condition. Emotions, actions, and objects were communicated equally efficiently in the gesture and combined conditions, but for the vocal condition emotion items were communicated more efficiently than action items and action items more efficiently than object items.

Figure 4.

Mean time (in seconds) to communicate across item categories (emotion, action, and object) and Games (1–6) by participants in the vocal, gesture, and combined communication conditions. Error bars indicate the standard errors of the means.

Participants' mean times to communicate the items were entered into the same mixed design anova described earlier. Mean times were calculated after the removal of times 2.5 SD from the condition median. These extreme scores were replaced by values corresponding to the median plus or minus 2.5 SD (4.01% of trials). anova returned main effects of Gamelinear [F(1, 21) = 163.15, < .01, ηp2 = .89], indicating that the time taken to communicate each item decreased across games for all conditions. Main effects of Condition [F(2, 21) = 9.77, < .01, ηp2 = .48] and Item [F(2, 42) = 6.99, < .01, ηp2 = .25] were mediated by a Condition × Item interaction [F(4, 42) = 4.55, < .01, ηp2 = .30].

There were simple main effects of time for object [F(2, 63) = 14.84, < .01, ηp2 = .50] and action items [F(2, 63) = 8.71, < .01, ηp2 = .48], but not for emotion items (F < 2.17, > .10). Object and action items were communicated equally efficiently in the gesture and combined conditions (ts < 0.80, ps > .10). Compared to the vocal condition, object and action items were communicated more quickly in the gesture and combined conditions [ts(14) > 2.66, ps < .05, ds > 1.33]. For the vocal condition, there was a significant difference in the time taken to communicate the different categories of items [F(2, 42) = 15.16, < .01, ηp2 = .46]: Emotion items were communicated more quickly than action items [t(7) = 2.72, < .01, d = .64] and action items were communicated more quickly than object items [t(7) = 2.78, < .01, d = .50]. The different item categories were communicated equally efficiently in the gesture and combined conditions (Fs < 0.87, ps > .10). The same pattern of results is observed when only successful communication episodes are included in the analysis.

7. Discussion

The experiment reported compared human participants' ability to bootstrap a communication system using communication mediums that differ in the opportunity they afford for motivated sign use. Communication by gesture was more accurate and more efficient than communication by non-linguistic vocalization.1 Indeed, performance in the gesture condition is comparable to that of speakers who use their existing language system to communicate abstract shapes (Schober & Clark, 1989). When participants were allowed to combine non-linguistic vocalization with gesture, their performance (accuracy and time) was no better than with gesture alone, indicating that gesture was responsible for participants' communication success. We believe that participants performed better in the gesture condition (compared to the non-linguistic vocalization condition) because gesture better lends itself to the production of motivated signs. For example, to communicate the item “fruit,” a director pantomimed the peeling and eating of a banana (an iconic gesture). This type of structure mapping between a sign and its referent is not so readily available to non-linguistic vocalization (where iconic and motivated signs are limited to onomatopoeia and sound symbolism).

While non-linguistic vocalization was less effective than gesture, it did support above chance performance for each category of items (where chance is 4.17%, i.e., 1 divided by 24 experimental items). In addition, in this condition particular categories of items were more successfully communicated than others: Emotion items were better communicated than action items and actions items were better communicated than object items. This pattern of results is informative. Emotions and actions lend themselves to indexical forms of non-linguistic communication, that is, communication based on the natural associations between an item and a sound (e.g., imitating a yawn to communicate “tired” or simulating a snore to communicate “sleeping”). Similar findings are reported in studies that look at people's use of non-linguistic vocalization to communicate emotions (Sauter, Eisner, Ekman, & Scott, 2010) and human movements (Kantartzis, Imai, & Kita, 2011). Such indexical relationships are less apparent between non-linguistic vocalization and objects (e.g., it is difficult to think of a sound that is naturally associated with “rock”). Hence, participants' poorer performance when communicating items from the object category.

Although our study was conducted among modern-day humans (with modern brains and mastery of at least one spoken language), our results may “speak” to vocal and gestural theories of the origin of language (Arbib, 2005; Cheney & Seyfarth, 2005; Corballis, 2003; MacNeilage, 1998). If one accepts that any feature that helped establish language (such as the use of motivated signs) would not have been discarded during the later evolution of the species (p. 384, Deacon, 1997), then our results suggest an important role for gesture. That (modern) people of all cultures gesture while they speak is testament to the naturalness and continued use of gesture (Feyereisen & de Lannoy, 1991). However, we do not rule out a role for non-linguistic vocalization. Interestingly, participants in the combined condition often supplemented their gestures with vocalizations (vocalizations were combined with gestures on 25% of trials on Games 1–6), even though vocalization never replaced gesture (i.e., participants never used vocalization without also using gesture). Thus, rather than commit to a gestural origin, we prefer an origin in which humans initially communicated using motivated signs (icons and indices), whether gestural or vocal (i.e., a multimodal origin; see Pollick & de Waal, 2007 for evidence of multimodality in ape communication). Grounding a basic set of shared meanings in this way, during the very earliest stages of language, could then pave the way for the further expansion of the lexicon.

8. Conclusion

This study examines the type of signs needed to bootstrap a human communication system. Using a task that prohibits the use of conventional language, we found that pairs of participants restricted to gesture (a medium that lends itself to the production of motivated signs) as their sole means of communication established more effective and efficient communication systems when compared with those restricted to non-linguistic vocalization (a more symbolic communication medium). These findings, which parallel naturalistic studies of writing systems and sign languages, in addition to other experimental studies, indicate the superiority of motivated signs over arbitrary signs in bootstrapping human communication systems.


This study was supported by an ARC Discovery grant (DP120104237) awarded to N.F.


  1. 1

    An objection to this conclusion is that participants in the gesture condition may have received more fine-grained partner feedback when compared to those in the non-linguistic vocalization condition (in the form continuous visual feedback). However, this is unlikely because it cannot explain why participants in the non-linguistic vocalization condition were more successful with some item categories than others (i.e., emotion > action > object).