Understanding the fundamental principles underpinning behavior is a key goal in biology and psychology (Gintis, 2007; Grafen, 2007). From an evolutionary perspective, similar behavioral patterns seen across different animals (including humans) may have evolved from a common ancestral trait or may reflect convergent evolution; looking for shared quantitative properties of behavior across diverse animal taxa may thus allow identification of the elementary processes constraining or shaping behavioral evolution (e.g., Hailman, Ficken, & Ficken, 1985; McCowan, Doyle, & Hanser, 2002; Sumpter, 2006). Recent evidence of consistent patterns linking human language with vocal communication and other behavior in a range of animal species has pointed to the existence of at least one such general principle (Ferrer-i-Cancho & Lusseau, 2009; Semple, Hsu, & Agoramoorthy, 2010). Here, we detail the nature of this evidence before describing and mathematically exploring the principle in question—compression—that we propose underlies the consistent patterns discovered.
Words follow Zipf's law of brevity, that is, the tendency of more frequently used words to be shorter (Strauss, Grzybek, & Altmann, 2007; Zipf, 1935), which can be generalized as the tendency of more frequent elements to be shorter or smaller (Ferrer-i-Cancho & Hernández-Fernández, 2013). Statistical laws of language have been studied outside human language (see Ferrer-i-Cancho & Hernández-Fernández, 2013 for an overview), and to our knowledge, the first report of conformity to this generalized brevity law outside human language is found in the pioneering quantitative research by Hailman et al. (1985) on chick-a-dee calls. More recently, accordance with the generalized law of brevity has been found in dolphin surface behavioral patterns (Ferrer-i-Cancho & Lusseau, 2009), the vocalizations of Formosan macaques (Semple et al., 2010), and a subset of the vocal repertoire of common marmosets (Ferrer-i-Cancho & Hernández-Fernández, 2013). A lack of conformity to the law has been found in the vocalizations of golden-backed uakaris (Bezerra, Souto, Radford, & Jones, 2011) and ravens (Ferrer-i-Cancho & Hernández-Fernández, 2013).
Statistical patterns of language, and the law of brevity in particular, offer a unique chance to reframe research on language universals (Evans & Levinson, 2009) in three ground-breaking directions, according to the current state of the art. First, in extending typology of linguistic universals beyond human language and considering the possibility of universals of behavior across species. Second, in considering that some language universals may, in fact, not be specific to language or communication but an instance of universals of animal behavior. Third, in changing the stress of the quest from universal properties of language to universal principles of language (or behavior) along the lines of modern quantitative linguistics (Köhler, 1987, 2005; Zipf, 1949). More important than expanding the collection of universal properties or delimiting what is universal or not is (a) a deep understanding of the principles explaining the recurrence of these regularities regardless of the context, and (b) the role of those principles, perhaps hidden, in exceptions to widespread regularities. If these regularities are not inevitable (Ferrer-i-Cancho, Forns, Hernández-Fernández, Bel-Enguix, & Baixeries, 2013), a parsimonious hypothesis would be a minimal set of principles that are independent from the context and thus universal. An important research goal is defining such a set, if it exists. Furthermore, this third point could contribute to reconciling the wide diversity of languages with the need for unifying approaches, outside the realm of a “language faculty” or innate specializations for language (Evans & Levinson, 2009).
The discovery of conformity to statistical laws of language outside human language raises a very important research question: Are the findings simply a coincidence, or are there general principles responsible for the appearance of the same statistical pattern across species or across many levels of life? As the arguments against the importance or utility of language laws within and outside human language are falling (e.g., Ferrer-i-Cancho & Elvevåg, 2010; Ferrer-i-Cancho & McCowan, 2012; Ferrer-i-Cancho et al., 2013; Hernández-Fernández, Baixeries, Forns, & Ferrer-i-Cancho, 2011), here we propose the principle of compression—the information theoretic principle of minimizing the expected length of a code—as an explanation for the conformity to the law of brevity across species. Hereafter, we consider the term “brevity,” the short length of an element, as an effect of “compression.”
In his pioneering research, G. K. Zipf argued that the law of brevity was a consequence of a general principle of economy (Zipf, 1949). He used the metaphor of an “artisan who will be obliged to survive by performing jobs as economically as possible with his tools” (p. 57). The artisan is a metaphor for a speaker (or a community of speakers) while tools is a metaphor for words. The artisan should decrease the size or the mass of the tools that he uses more frequently to reduce the amount of work (Zipf, 1949, pp. 60–61). Along similar lines, it is well known that S. Morse and A. Vail optimized the Morse code simply by choosing the length of each character approximately inversely proportionally to its frequency of occurrence in English (Gleick, 2011). Although it has been claimed that conformity to the law of brevity is a sign of efficient coding in both human language (Zipf, 1949) and animal behavior (Ferrer-i-Cancho & Lusseau, 2009; Semple et al., 2010), no strong direct connection with standard information theory, and with coding theory in particular, has been shown.
- n is defined as the number of elements of the repertoire or vocabulary.
- pi is defined as the probability of producing the i-th most likely element.
- ei is defined as the energetic cost of that element. ei could be the number of letters or syllables of the i-th most likely word or the mean duration of the i-th most likely vocalization or behavior of a non-human animal.
Then, the mean energetic cost of a repertoire or vocabulary can be defined as
Indeed, a particular case of the definition of E in Eq. (1) is ECL, the mean code length, which is the cost function that is considered in standard information theory for the problem of compression, being ei the length of the code used to encode the i-th element of the set of symbols (Cover & Thomas, 2006, pp. 110). Data compression consists of finding the code lengths that minimize ECL given the probabilities p1,…pi,…,pn. ECL provides an objective measure of coding efficiency. In his pioneering research, G. K. Zipf called his own version of Eq. (1) the “minimum equation” (in the sense that it is the function to minimize in order to reduce effort), with pi being frequency of use of a tool and ei being work, that is, the product of the “mass” of the tool and the “distance” of the tool to the artisan (Zipf, 1949, p. 59). Therefore, G. K. Zipf's view constitutes a precursor of coding theory. However, he never tested if ECL is significantly small in human languages, using a rigorous statistical approach. Hereafter, the definition of ECL from information theory is relaxed so that ei can be not only the bits used to code the i-th element but also its size, length, or duration. ei > 0 is assumed here as we focus on “elements” that can be observed in a real system, although some models of behavior produce elements of length zero, and then ei = 0 for such empty elements (see Ferrer-i-Cancho & Gavaldà, 2009 for a review of these models). Pressure for minimizing ECL in animal behavior may arise from a number of different sources.
First, there may be a need to minimize the direct energetic costs of producing a behavior; evidence from a range of vertebrate and invertebrate species indicates, for example, that energy availability may limit the duration of calling behavior (Bennet-Clark, 1998; Fletcher, 1997; Gillooly & Ophir, 2010; Klump, 2005; Thomas, 2002). Furthermore, Waters and Jones (1995) showed theoretically that energy is directly proportional to duration in acoustic communication by integrating the energy flux density, as it is well known in fluid mechanics, according to which the sound energy flux (a) is given by the time integral of the squared sound pressure (Landau & Lifshitz, 1987), and (b) can be roughly approximated by A · t, where t is the pulse duration and A is its amplitude, for relatively short stimuli, as Baszczyk (2003) explains. According to general acoustics, the energy of a sound wave is ξ = P · t, where P is its sound power and t is its duration, and P = I · S, where I is the sound intensity and S is the area of the propagation surface (Kinsler, Frey, Coppens, & Sanders, 2000). For a sound wave of amplitude A, the sound energy becomes (Fahy, 2001; Kinsler et al., 2000):
namely, the energy of a signal depends linearly on time and the squared amplitude. This suggests a priori that it is not only duration that matters but also amplitude. However, amplitude determines the energy of a sound wave at a given time, and therefore reducing the amplitude reduces the reach of the signal due to the natural degradation of the signal with distance and interference caused by other sounds (Bennet-Clark, 1998). Amplitude fluctuations during the propagation of sound have different effects on receivers, depending on wave frequency, and interfere with their ability to detect directionality (Wiley & Richards, 1978). Minimizing energy by amplitude modulation puts the success of communication at risk. Furthermore, amplitude is highly determined by body size (Bennet-Clark, 1998; Fletcher, 2004) as Gillooly and Ophir (2010) demonstrate because there is a dependency between sound power and amplitude. Therefore, our energy function ECL is capturing the contribution to sound energy that can be more easily controlled, namely, that of duration.
Second, shorter signals may have advantages independent of energetic production costs. In predation contexts, it may be beneficial for calling prey to decrease conspicuousness to predators (Endler, 1993; Hauser, 1996; Ryan, Tuttle, & Rand, 1982) or for calling predators to decrease conspicuousness to prey (Deecke, Ford, & Slater, 2005). In addition, shorter signals suffer less from problems linked to reverberation, an important phenomenon that degrades signals in environments containing solid structures (Waser & Brown, 1986). For the vocalizations of rainforest primates, for example, there appears to be an upper call duration limit of 200–300 ms, to minimize such interference (Brown & Sinnot, 2006; p. 191). Interestingly in relation to this point, the law of brevity has been found in one subset of the repertoire of common marmosets where all but one call type (the exception being the submissive squeal) are below the 300 ms limit; the other subset, where the law is not found, (a) consists of calls that are above 300 ms in duration, and (b) contains all long-distance communication calls (Ferrer-i-Cancho & Hernández-Fernández, 2013). If sound pressure, P, falls inversely proportional to the distance from the sound source (according to the so-called distance law of sound attenuation), and sound intensity (and then energy, recall ξ = I · S · t) falls inversely proportional to the square of the distance (Kinsler et al., 2000; Landau & Lifshitz, 1987), then long-distance calls must generally show an increase either in sound intensity or duration, or both. Therefore, ECL does not measure all the costs of a repertoire, as it does not include intensity, and sometimes it may be advantageous to increase duration rather than reducing it. By focusing on ECL rather than on E, we hope to shed light on the importance of compression in animal behavior.
Here, we aim to provide support for the principle of compression (minimization of ECL) as a general principle of animal behavior. We do not propose that compression is the only principle of animal behavior, or the only principle by which behavior is optimized. The design of language is a multiple constraint engineering problem (Evans & Levinson, 2009; Köhler, 2005) and the same applies to the communication systems of other species (Endler, 1993). The appearance of design in communication systems can exist without a designer or intentional engineering (Cornish, 2010; Kirby, Cornish, & Smith, 2008).
Our notion of principle should not be confused with explanation. Principles are the ingredients of explanations. Using the physical force of gravity as an example helps to illustrate our notion of principle. The universal force of gravity explains why objects fall, but when a rocket flies upward its movement does not constitute an exception to the force. The force is still acting and is involved in explaining, for instance, the amount of fuel that is needed to fly in the opposite direction of the force. As the falling of an object is a manifestation of the force of gravity, we propose that the law of brevity in animal behavior is a manifestation of a principle of compression. Just as the force of gravity is still acting on a rocket moving away from the Earth, so we should not conclude prematurely that compression is not a relevant principle of behavior when the law of brevity is not detected.
Although the Earth and Venus are very different planets, the force of gravity is valid in both (indeed universal) and physicists only care about the difference in its magnitude. Similarly, we do not assume that human language and animal behavior cannot have compression in common, no matter how large the biological, social, cognitive, or other differences. Science is founded on parsimony (Occam's razor), and from an evolutionary perspective hypothesizing that the principle of compression is common to humans and other animal species is a priori simpler than hypothesizing that compression is not shared. We are adopting the perspective of standard statistical hypothesis testing, where the null hypothesis is that there is no difference between two populations, for example, two different species (Sokal & Rohlf, 1995). The research presented here strongly suggests that there is no need to adopt, a priori, the more demanding alternative hypothesis of an intrinsic difference between humans and other species’ linguistic and non-linguistic behavior for the particular case of the dependency between the size or length of the units and their frequency. However, according to modern model selection theory, a good model of reality has to comply with a trade-off between its parsimony and the quality of its fit to reality (Burnham & Anderson, 2002). A key and perhaps surprising result that will be presented in this article is that ECL is never found to be significantly high, in spite of apparently clear advantages in certain situations of increasing signal/behavior length (e.g., for long-distance communication). It could be interpreted that even when the direct effect of compression is not observed, compression still has a role, just as a rocket heading to space is still being attracted by the Earth's gravitational field. It is premature to conclude that compression has no role even when there is no evidence that ECL is being minimized; it is important to note that there are serious statistical limits for detecting efficient coding, especially in small repertoires (Ferrer-i-Cancho & Hernández-Fernández, 2013).
The remainder of the article is organized as follows. Section 'The mathematical relationship between compression and the law of brevity' shows that the energetic cost of an element (e.g., the length of a word) cannot increase as frequency increases in a system that minimizes ECL. Thus, the law of brevity is an epiphenomenon of compression. Section 'Direct evidence of compression' shows direct evidence of compression in animal behavior, in particular, by demonstrating that ECL is significantly small in all the cases where conformity to the law of brevity has previously been found. Section 'Methods' discusses these findings.