THE EVOLUTION OF TEACHING

Authors

  • L. Fogarty,

    1. Centre for Social Learning and Cognitive Evolution, School of Biology, University of St Andrews, Queen's Terrace, St. Andrews, Fife, KY16 9TS, United Kingdom
    Search for more papers by this author
  • P. Strimling,

    1. Centre for Social Learning and Cognitive Evolution, School of Biology, University of St Andrews, Queen's Terrace, St. Andrews, Fife, KY16 9TS, United Kingdom
    2. Centre for the Study of Cultural Evolution, Stockholm University, Sweden
    Search for more papers by this author
  • K. N. Laland

    1. Centre for Social Learning and Cognitive Evolution, School of Biology, University of St Andrews, Queen's Terrace, St. Andrews, Fife, KY16 9TS, United Kingdom
    2. E-mail: knl1@st-andrews.ac.uk
    Search for more papers by this author

Abstract

Teaching, alongside imitation, is widely thought to underlie the success of humanity by allowing high-fidelity transmission of information, skills, and technology between individuals, facilitating both cumulative knowledge gain and normative culture. Yet, it remains a mystery why teaching should be widespread in human societies but extremely rare in other animals. We explore the evolution of teaching using simple genetic models in which a single tutor transmits adaptive information to a related pupil at a cost. Teaching is expected to evolve where its costs are outweighed by the inclusive fitness benefits that result from the tutor's relatives being more likely to acquire the valuable information. We find that teaching is not favored where the pupil can easily acquire the information on its own, or through copying others, or for difficult to learn traits, where teachers typically do not possess the information to pass on to relatives. This leads to a narrow range of traits for which teaching would be efficacious, which helps to explain the rarity of teaching in nature, its unusual distribution, and its highly specific nature. Further models that allow for cumulative cultural knowledge gain suggest that teaching evolved in humans because cumulative culture renders otherwise difficult-to-acquire valuable information available to teach.

The demographic and ecological success of humanity is widely attributed to our capacity for the high-fidelity information transmission necessary for cumulative culture (Boyd and Richerson 1985; Tomasello 1994). Alongside imitation, teaching is the primary mechanism through which humans pass acquired knowledge, skills, and technology between individuals and between generations. Teaching is widespread in human societies, and is a key human psychological adaptation, vital for normative forms of human cooperation to be tenable (Boyd and Richerson 1985; Tomasello 1994; Fehr and Fischbacher 2003; Csibra and Gergely 2006; Csibra 2007), and central to technological development (Boyd and Richerson 1985). Yet if teaching is so effective, it remains a mystery why it should be either absent or exceedingly rare in virtually all other animals.

Countless animals acquire skills and information from others (Heyes and Galef 1996; Laland and Galef 2009), however experienced individuals are not generally thought to actively facilitate learning in others (Danchin et al. 2004). Caro and Hauser (1992) proposed a functional definition of “teaching” applicable to animals, in which a tutor is said to teach if it modifies its behavior in the presence of a pupil, at some cost, thereby promoting the pupil's learning. Refinements of this definition impose additional criteria, such as feedback from pupil to tutor, or restrict teaching to the transfer of skills, concepts, and rules (Franks and Richerson 2006; Leadbeater et al. 2006; Hoppitt et al. 2008). Deploying such definitions, researchers report putative cases of costly information donation (henceforth “animal teaching”) in diverse species, including ants, bees, pied babblers, meerkats, and cats (Franks and Richerson 2006; Leadbeater et al. 2006; Thornton and McAuliffe 2006; Raihani and Ridley 2008; Rapaport and Brown 2008 see Hoppitt et al. 2008; Rapaport and Brown 2008 for reviews). Although cases of animal teaching remain controversial, and their consanguinity with human teaching is contested (Leadbeater et al. 2006; Csibra 2007; Premack 2007), these observations nonetheless raise a challenging question: why should the costly donation of information exhibit such a curious taxonomic distribution?

Any functional similarities should not obscure the fact that mechanistically, cases of animal teaching are entirely different from human teaching, and are not reliant on homologous characters (Csibra and Gergely 2006; Hoppitt et al. 2008). Indeed, current thinking suggests that instances of animal teaching function to enhance the fidelity of information transmission through adaptive refinements of forms of learning present in the animal, leading to distinct teaching mechanisms in different species (Hoppitt et al. 2008). Nonetheless, it is germane to ask why convergent selection should favor investment in costly mechanisms of information transfer in some species and not others. The fact that functionally similar behavior is reported in diverse animals, leads us to ask “What do these species have in common that led to the evolution of teaching?”“Why is teaching not more widespread in animals?” and “Why is it that intelligent animals such as chimpanzees seemingly do not teach if ants and bees are capable of doing so?” As animal teaching appears restricted to isolated traits, this raises the further question of “How did a very general capability for teaching evolve in the human lineage?”

These questions are of widespread interest to multiple academic disciplines, including evolutionary and behavioral biology, psychology, anthropology, archaeology, economics and education. Yet although there has been extensive research into related topics, such as the evolution of social learning (Cavalli-Sforza and Feldman 1981; Boyd and Richerson 1985; Feldman and Zhivotovsky 1992; Rendell et al. 2010), learned communication (Kirby, Dowman and Griffiths 2007; Kirby, Cornish and Smith 2008; Boyd, Gintis and Bowles 2010), and learned cooperation (Boyd and Richerson 1985; Peck and Feldman 1986; Boyd et al. 2003; Fehr and Fischbacher 2003; Gintis 2003), currently there is no formal theory of the evolution of teaching.

Although most teaching meets current definitions of cooperation (West et al. 2007), it being favored in the tutor because it promotes the acquisition of fitness-enhancing information in the pupil, certain unique features specific to teaching render a specialized treatment necessary. These include the fact that taught information can be acquired through means other than teaching (e.g., through trial-and-error learning or inadvertent social learning), and that the dynamics of information transfer differ considerably from the dynamics of the spread and accumulation of physical resources. These differences mean that the evolution of teaching is not explained by contemporary theory of the evolution of cooperation (Sachs et al. 2004; Lehmann and Keller 2006; West et al. 2007). We elaborate on these points, and more generally on the relationship between teaching and cooperation, in the discussion.

Here we develop simple and accessible mathematical models that explore under what circumstances teaching might be expected to evolve. Our analyses help to explain the observed distribution of teaching behavior, the absence of teaching in other taxa and the widespread use of teaching in humans.

The Model

Here we describe a model with haploid genetics, but in the electronic supplementary material (ESM) we extend the analysis to diploid and haplodiploid cases, which give qualitatively similar results. We assume a single, infinite-sized, well-mixed population with two genotypes, teacher and nonteacher. This assumption corresponds to our adopting the phenotypic gambit (Grafen 1984), and in reality we envisage that teaching behavior is likely to result from many genes as well as experiential factors. Individual fitness is a function of baseline fitness (w0), costs associated with genotype (ct, cc), and a viability benefit (wi, where wi>1) associated with possessing valuable learned information i. Teaching is costly, with cc representing a fixed cognitive cost paid by teachers (e.g., representing the neural hardware necessary to teach) and ct representing a time (or energy) cost paid when teaching. As the cognitive cost of teaching is a cost paid by all teachers, independent of their individual fitness, the most appropriate formulation is one in which this cost cc is additive. Conversely, the time cost, ct, is dependent on possession of the information, so it is more realistic to view this as multiplicative. Thus, teachers with and without the information have fitness wt = w0wict – cc and wt = w0 – cc, respectively.

Individuals can acquire the information asocially, with probability A, for instance, through trial-and-error learning. However each individual also has a cultural role model, or tutor, from whom they can learn the information, and to whom they are related with relatedness r. Consistent with Grafen's statistical definition (Hamilton 1964, Grafen 1985), we treat relatedness as a “measure of the genetical similarity between social partners, relative to the rest of the population” (West et al. 2011). From this role model, the individual can acquire the information through inadvertent social learning (e.g., imitation, emulation, enhancement effects), with probability S, or teaching (provided their cultural role model is a teacher), with probability T, where T>S and T>A. We assume that pupil and cultural role model combinations form with a probability proportional to their genetic similarity (r). This assumption is consistent with any one of three biologically plausible processes: (1) “active choice by pupils,” who adopt cultural role models with a likelihood that increases with relatedness, (2) “active choice by cultural role models,” who adopt pupils with a likelihood that increases with relatedness, or (3) “spatial structure or population viscosity,” leading to related individuals being more likely to assort, and thereby generating pupil–cultural role model pairings, with high compared to low relatedness. Thus although this assortment could result from kin discrimination or a green-beard mechanism (West et al. 2007), it need not do so. We also assume that each genotype assorts with relatives to a similar degree. This assumption is made for mathematical convenience and, while it is a defensible first approximation, it may not be met in some natural populations. However, as we envisage that there are likely to be greater fitness benefits for assortment among teachers than nonteachers, this is likely to translate into our generating conservative estimates of the likelihood of the evolution of teaching. In the terminology of Gardner and West (2006), our treatment of relatedness is more akin to the “open” models of social evolution developed by Frank (1998), than to the explicit “closed” models derived by Taylor (1992), which incorporate levels of dispersal and other demographic assumptions that impact on assortment according to kinship. We designate the frequency of teachers and nonteachers to be t and nt (t + nt = 1), the proportion of teachers and nonteachers possessing the information as it and int, and the proportion of teachers and nonteachers that do not possess the information as nit and nint (int+nint = 1, it+nit = 1), respectively.

We begin by developing expressions for the probability that teachers and nonteachers acquire the fitness-enhancing information. As pupil–cultural role model pairings form nonrandomly, a focal individual with the teaching genotype shares its genotype through common descent with its cultural role model with probability kr, where k is a scaling constant, thereby acquiring the information through teaching with probability itT. Alternatively, with probability 1 – kr, the role model's genotype is not a related teacher and is randomly drawn from the rest of the (infinite) population, giving probabilities that it is an unrelated teacher and nonteacher of t and 1 – t, respectively. If the cultural role model is a teacher, the focal individual learns through teaching with probability itT, whereas if it is not a teacher the probability that the focal individual acquires the information from the role model is intS. Note that a pupil's ability to learn from a cultural role model depends critically on the probability that their role model has acquired the information themselves, that is it and int for teachers and nonteachers, respectively. This means that individuals may fail to acquire the information with a certain probability regardless of the genotype of their role model. In addition, the focal individual may learn asocially with probability A no matter the genotype of its cultural role model. It follows that an individual with the teaching genotype acquires the information with probability

image(1a)

and, by similar reasoning, a nonteacher acquires the information with probability

image(1b)

The average fitness of teachers and nonteachers is now given by

image(2a)
image(2b)

We define Wd = Wt- Wnt, so that Wd >0 is the condition for the proportion of teachers to increase in the population. Using expressions (1) and (2), we can specify a dynamic haploid system in terms of three recursive equations, representing the generational change in frequency of teachers (3a), the proportion of teachers with the information (3b), and the proportion of nonteachers with the information (3c), in the population:

image(3a)
image(3b)
image(3c)

INVASION ANALYSIS

We now consider the invasion of teachers into a population of nonteachers. This requires us to compute the equilibrium frequency of the information among nonteachers. Setting t = 0 and k = 1, simplifying equation (1b), and inserting into equation (3c), generates a simplified recursion in int, which can be solved to give

image(4)

where Q = 4AwiS(wi – 1)+(1 – A – Swi + Awi)2.

We explore the invasion of teachers at this equilibrium, assuming that the proportion of teachers with the information is roughly equal to the proportion of nonteachers with the information when teachers are very rare (i.e., it= int). We define Wd = Wt- Wnt, such that, we require Wd >0 for teaching to invade. Thus, the condition for the invasion of teaching is

image(5)

For all following analyses, we set the baseline fitness, w0 = 1.

To investigate the dynamics of the system after invasion by a teaching genotype we solve the equation

image(6)

for inline image the equilibrium value of the proportion of teachers in a population. By inspecting equation (6), we find that the only two biologically possible solutions are inline image= 1 or inline image= 0. This means that teachers will either fail to invade or, once they have invaded, will invade to fixation.

Results

THE RELATEDNESS OF TUTOR AND PUPIL

By inspection of equation (5) we find that teaching will evolve where its costs are outweighed by the inclusive fitness benefits that result from the tutor's relatives being more likely to acquire the valuable information. This effect echoes both recent and traditional formulations of inclusive fitness models first proposed by Hamilton (1964). The impact of relatedness in this model follows more closely the inclusive fitness formulation described by Hamilton (1964) and examined by Taylor et al. (2007), concentrating on the fitness effects of an individual's behavior on kin and the population as a whole. The benefit to possessing the teacher genotype comes from an increased chance that the individual's cultural role model will also be a teacher, which leads to a greater probability of acquiring the information. As might be expected, the benefits of teaching are sensitive to the relatedness of tutor and pupil. Any fitness advantage to teaching depends critically on teachers having a higher than average probability of learning from their cultural role model. The change in fitness associated with r when teachers invade is rint(T – S)(wict – 1) and when nonteachers invade is – rit(T – S)(wi – 1). The more related an average individual is to its role model the greater the benefit of teaching (Fig. 1A), a finding consistent with both existing kin selection theory and recent verbal arguments concerning the evolution of teaching (Hoppitt et al. 2008; Thornton and Raihani 2008).

Figure 1.

(A) Fitness difference between teachers and nonteachers (wd), plotted against the average degree of relatedness between an individual and its cultural role model, for different values of ct. Above the dashed horizontal line teachers are more fit than nonteachers and can invade. T = 0.9, cc = 0.001, wi = 2, S = 0.5, A = 0.1, w0 = 1. (B) wd plotted against the probability of learning through means other than teaching, for three values of wi. T = 0.9, r = 0.5, ct = 0.915, cc = 0.01, w0 = 1. (C) wd plotted against teaching efficacy, T, for a cumulative and a noncumulative model. A1= A2= S1= S2= 0.05, cc = 0.0001, w1 = 2, w2 = 6, r = 0.5, w0 = 1, ct = 0.99. Note the difference in scale of effects on the y-axes of the three plots.

THE EFFECTS OF ASOCIAL AND INADVERTENT SOCIAL LEARNING

The effects of asocial and inadvertent social learning (A and S) on equation (5) are similar and we will deal with them together. Of particular significance is our finding of n-shaped functions representing how the utility of teaching is affected by other means of acquiring the information (Fig. 1B). Effective asocial or social learning, represented by high values of A or S relative to T, reduces the benefits of teaching, rendering it uneconomical. As asocial learning and inadvertent social learning are less costly than teaching, nonteachers have a fitness advantage over teachers when the information is easy to acquire. These observations explain why the incidence of teaching does not appear to covary with brain size or intelligence in animals. Teaching will not be favored where the pupil can easily acquire the information on its own, or through copying others (Hoppitt et al. 2008; Thornton and Raihani 2008). This is apparent in primates such as chimpanzees, which are very efficient learners and thought to be particularly adept at social learning and imitation (Whiten et al. 2003; Whiten 2005). Paradoxically, our models establish that teaching is not generally favored for difficult to learn traits either (low A and S), as teachers typically do not possess the information to pass on to their relatives. Accordingly, teaching is most likely to be favored by mid-range A and S values, with the breadth of the window within which teaching invades dependent on the fitness benefits of the information (wi). However, unless wi is very high, there will typically be an extremely narrow range of traits for which teaching would be efficacious, which helps to explain both the rarity of teaching in nature and the highly specific nature of animal teaching. For example, meerkat helpers teach pups to process scorpions and other food items, but not what to eat, or any nonforaging behavior (Thornton and McAuliffe 2006). Similarly ants, social bees, and babblers also teach a single highly specific piece of information (Franks and Richardson 2006; Leadbeater et al. 2006; Hoppitt et al. 2008; Raihani and Ridley 2008).

When considering the invasion of teachers into a population of nonteachers A and S have two conflicting effects. First, increasing both A and S relative to T reduces the value of teaching by increasing the probability of learning by other means. Because asocial learning and inadvertent social learning are less costly than teaching, nonteachers have a fitness advantage over teachers when the information is easy to acquire. Intuitively, as the gap between S and T closes the benefits to teaching diminish. High values of A and S, corresponding to cases in which the information is easy to acquire through other means (i.e., without teaching), thus do not favor teaching.

Second, the indirect effect of increasing A and S is to increase the amount of information in the population, which increases the benefit of being a teacher over that of being a nonteacher (see next). Conversely, very low values of A and S do not favor teaching, because teachers typically do not possess the information to pass on to their relatives. Teaching is most likely to be favored by mid-range A and S values, whereas very easy and very difficult to learn traits do not promote teaching (see Fig. 1B).

The effect of asocial learning (A) also depends on the relative values of social learning (S) and teaching efficacy (T). As the gap between S and T closes, an increase in A favors nonteachers. In other words, under these circumstances the advantage for a teacher diminishes. This occurs because nonteachers can pass the information on at no cost. One ramification of these observations is that there is little incentive for otherwise effective learners to teach. This is discussed by Whiten (1999) in relation to the great apes who are extremely effective learners but appear to display little or no teaching behavior.

THE VALUE OF INFORMATION

Any benefit to being a teacher comes from an increased chance that the individual's cultural role model will also be a teacher, which leads to a greater probability of acquiring the information. It follows that this benefit increases with the proportion of individuals in the population possessing the information. To be explicit P(l|t) – P(l|nt) = r(Tit – Sint), thus if both it and int are increased by ɛ, the difference in probability of learning as a teacher and nonteacher, P(l|t) – P(l|nt), is increased by ɛr(T – S), which is positive. However, as the information in the population increases, the fitness of both genotypes increases because, T > S, teachers bring more information into the population than nonteachers. This leads to the counter-intuitive result that the fitness of nonteachers Wnt can increase when the cost of teaching is decreased.

CUMULATIVE CULTURE MODEL

Compared to animal teaching, human teaching is a very general capability, reliant on high-fidelity mechanisms such as speech, instruction, and direct shaping, and it allows the transmission of complex information that could not be devised by a single individual (Hoppitt et al. 2008). Humans have undergone cumulative cultural evolution, which has in essence reduced our reliance on information acquired asocially, and allowed the transfer of knowledge and skills that would be difficult to learn without direct guidance and instruction. Valuable information, concerning for instance industrial practices or technological manufacture, has become available to teach through the accumulated efforts of many individuals.

We set out to explore whether teaching and cumulative culture might have coevolved, extending our model to allow for cumulative knowledge gain by allowing individuals that have acquired the information to gather further knowledge that improves it, simultaneously making the information more fitness enhancing but more difficult to learn. We assume two pieces of information (1 and 2), the latter refining the former, where individuals with both pieces of information gain fitness increment w1w2 with w2 > 1. We can consider the evolution of teaching in humans by exploring how cumulative learning affects the sensitivity of the invasion conditions to the fidelity of teaching, T, and the value of the information wi. We include a second round of learning where the pupils who learnt in the first round get a chance to learn the second trait from a new cultural parent. We then rebuild the model using equations (1a) and (1b).

First, we generate an expression describing the likelihood that any teacher will teach twice, this is the likelihood that their new pupil has already acquired information 1 and so is capable of learning information 2, given that they themselves have both pieces of information available to teach. An expression for this is given by the sum of the probability that the teacher's second student is a teacher itself, multiplied by the probability that a teacher picked up information 1, and the probability that the student is a nonteacher multiplied by the probability that a nonteacher could have picked up information 1:

image(7)

Using this, we can again calculate the equilibrium frequency of information 2 in the population when teachers are very rare:

image(8)

We then calculate the fitness functions for teachers and nonteachers in a population where teaching is rare, these are:

image(9a)
image(9b)

Using these, we can analyze the difference in fitness between teachers and nonteachers in a population of mostly nonteachers in three different conditions: a cumulative culture where the second trait is a refinement of the first (i.e., 1 <w2), a noncumulative culture (1 = w2), and the original model with no second trait. The presence of a second trait that does not refine the first is similar to having two independent traits that give the same fitness benefit, which can be picked up in sequence.

Using numerical analysis of these three conditions we find that the fitness advantage of teachers over nonteachers is greater in a cumulative culture context than a noncumulative context, and that this fitness difference increases with the fidelity of teaching (as T increases, see Fig. 1C). Writing Wdc for the fitness difference between teachers and nonteachers in the cumulative setting, and retaining Wd for the noncumulative setting, Figure 2 shows that Wdc – Wd is positive across most biologically plausible conditions. This means that the relative fitness of the teaching compared to the nonteaching genotype is always the same or higher in a cumulative setting compared to a noncumulative setting, and this conclusion holds across an extremely wide range of parameter values. Furthermore, numerical analysis has established that, for biologically plausible values of the relevant parameters (T, A, S, wi, ct), the equilibrium frequency of information in the population prior to the invasion of teaching is always higher in a cumulative setting than in a noncumulative one (see Fig. 2D). Naturally, additional cumulative learning episodes would further increase Wdc – Wd. Conversely, a second learning opportunity when w1 ≥ w1w2 does not increase a teacher's fitness. Thus the difference between the cumulative and noncumulative model is not explained by the fact that there are two learning opportunities in the cumulative setting.

Figure 2.

The fitness advantage of teaching over nonteaching is greater in a cumulative compared to a noncumulative setting (i.e. Wdc – Wd > 0), across a broad range of parameter space. Contour plots show Wdc – Wd, with teaching efficacy, T (0.5<T<1) plotted against (A) the relative ease of learning information 2 compared to information 1 (henceforth β, where A1=βA2 and S1=βS2. As β approaches 1, information 2 becomes as easy to learn as information 1, through asocial and inadvertent social learning), (B) the time cost of teaching, ct, and (c) the fitness value of information 2, w2. (d) The equilibrium frequency of information 1 is higher in a cumulative than a noncumulative setting. In all plots A1= A2= S1= S2= 0.05, cc= 0.0001, w1= 1.5, w2= 6, r = 1/2, w0= 1, ct= 0.99 except where otherwise stated.

Figure 1C illustrates how cumulative knowledge gain increases the difference in fitness between teachers and nonteachers, broadening the range of conditions under which teaching evolves, and Figure 2 reveals that this pattern is robust across all biologically plausible parameter space. The fitness advantage of teaching over nonteaching increases with the fidelity of teaching (T), and does so more sharply in a cumulative compared with a noncumulative setting.

DATA COLLECTION FOR MODEL TESTING.

We envisage that it may be possible for researchers to collect data with which to parameterize and test the predictions of our models. The generation of values for the efficacy of asocial and inadvertent social learning (A and S) parameters is relatively straightforward and useful values have already been generated by Thornton and McAuliffe (2006) regarding meerkats (although not in sufficiently high numbers to allow their use here). Patterns of relatedness (r) for tutors and pupils could feasibly be extracted from patterns of social interaction, using Grafen's (1985) statistical approach. To generate a relative likelihood of the invasion of teaching between two species, a reasonable approximation would be to assume that the parameters describing the absolute worth of the information (wi) and the baseline fitness value (w0) were equal across species. Because in most cases it is possible to set the cognitive cost of teaching to be zero (cc = 0), the only remaining parameters relate to the value of the time cost to each individual teacher, ct. In cases in which teaching has been observed, for example meerkats, we envisage these data may readily be estimated. In cases in which teaching has not been previously seen, different values of ct within a sensible range could be used to compare the likelihood of teaching under different circumstances. Accordingly, the above analyses provide opportunities for the emergence of a theory-driven empirical science of animal teaching.

Discussion

We have explored the evolution of teaching using simple mathematical models in which a single tutor transmits adaptive information to a related pupil at a cost. Teaching is expected to evolve where its costs are outweighed by the inclusive fitness benefits that result from the tutor's relatives being more likely to acquire the valuable information. We find that teaching is not favored where the pupil can easily acquire the information on its own, or through copying others, or for difficult to learn traits, where teachers typically do not possess the information to pass on to their relatives. This leads to a narrow range of traits for which teaching would be efficacious, which helps to explain the rarity of teaching in nature, its unusual distribution, and its highly specific nature. Further models that allow for cumulative cultural knowledge gain suggest that teaching evolved in humans despite, rather than because of, our strong imitative capabilities, and primarily because cumulative culture renders otherwise difficult-to-acquire valuable information available to teach.

Although the extent of animal teaching may, as yet, have been underestimated (Hoppitt et al. 2008; Thornton and Raihani 2008; Laland and Hoppitt 2003), nonetheless the generality and pervasiveness of human teaching offers a striking contrast to teaching in other animals. Our analysis suggests this follows from two factors unique to our species. First, by virtue of our capacity for language, pedagogical cueing, teaching through imitation, manual shaping, and mental state attribution, which allows tutors to adjust their teaching to the state of knowledge of the pupil (Tomasello and Call 1997; Premack 2007), the fidelity of human teaching (T) is likely to be unusually high relative to teaching in other animals (Tomasello 1994; Csibra and Gergely 2006; Csibra 2007). Second, cumulative cultural evolution allows complex, high fitness (Boyd and Richerson 1985; Henrich and McElreath 2003) traits that no individual could acquire on his or her own or through inadvertent social learning, ranging from ancestral lithic technology, tools and weaponry through to contemporary technology, to be present and available to teach in human populations.

Our analysis implies that teaching and cumulative culture reinforce each other, and may have coevolved, because teaching is more advantageous in a cumulative culture setting, whereas cumulative knowledge gain is frequently reliant on teaching. This may be why teaching is observed in humans but is rare or absent in old world primates (Rapaport and Brown 2008), which by most accounts do not possess cumulative culture (Tomasello 1994). In populations that lack cumulative culture, difficult to acquire information would not reach sufficiently high frequency to promote teaching. Although there are reports that teaching is not common in some hunter–gatherer societies (e.g., Whiten et al. 2003), such reports refer to an absence of direct instruction, and neglect the prevalence of more subtle forms of teaching, such as pedagogical cueing (Csibra and Gergely 2006).

A small subset of animals do appear to satisfy the stringent conditions for teaching to evolve, and our findings may also shed light on the taxonomic distribution of teaching. For instance, the possibility of higher relatedness among female workers in the social insects than among diploid relatives may help to explain why teaching is observed in some tandem running ants and some social bees (Franks and Richardson 2006; Leadbeater et al. 2006; Thornton and McAuliffe 2006; Hoppitt et al. 2008), but rarely in vertebrates (although strong conclusions over the frequency of teaching are difficult in the absence of experimental confirmation of teaching in a number of cases [e.g., cats] where circumstantial evidence for teaching exists). Here, and in the ESM, we present a formal analysis that establishes the precise conditions under which teaching will evolve in each genetic system. This suggests that there are likely to be circumstances in which teaching evolves more readily among haplodiploid workers than in diploids, other factors being equal, because of the possibility of higher levels of relatedness between workers in relatively monogamous haplodiploid colonies (Cornwallis et al. 2010; see Craig 1979 and Foster et al. 2006 for discussion of relatedness in haplodiploids).

We also note that the most compelling cases of animal teaching occur in cooperatively breeding species (ants, bees, meerkats, pied babblers), while humans too have been characterized as cooperative breeders (Hrdy 1999). Relative to noncooperative breeders, cooperative-breeding helpers engage in costly (Thornton and McAuliffe 2006), and prolonged (Langen 2000), provisioning of the young, providing a selective environment for adaptations that would speed up the transition to independent feeding. It is possible that in cooperative breeders, the alleviation of heavy provisioning costs, or sharing of costs among multiple tutors, corresponds to a significantly lower per capita cost (ct) to an individual teacher, which Figure 1A shows that helps render teaching economical. Teaching may be favored only where the tutor's operational costs are low (Thornton and Raihani 2008). Moreover, cooperative breeders often exhibit high levels of relatedness (Cornwallis et al. 2010), further enhancing the likelihood of teaching evolving. The obvious counter-example are felids, where teaching of young by mothers may perhaps be favored because hunting skills, or the opportunities to gain them, are difficult to get through asocial or inadvertent social learning (corresponding to low A and S but high T in our model).

In cases where teaching does evolve, we show that the average fitness of all individuals in the population increases, implying that teaching may have group beneficial properties. Our analysis could usefully be extended to multiple competing populations, where we might expect to see growth in populations capable of teaching, with group augmentation, which favors cooperative breeding (Kokko et al. 2001), potentially further promoting teaching. We note that in meerkats, nonrelated individuals (immigrant males) sometimes feed and teach pups (Thornton and McAuliffe 2006). Our model could usefully be extended to include direct benefits, and it is plausible that if such direct benefits are high and the costs are low, the likelihood of teaching among nonrelatives may increase.

Cooperation is defined as behavior that provides a benefit to another individual (recipient), and that is selected for because of its beneficial effect on the recipient (West et al. 2007). Although in humans it is possible for an individual to be taught nonbeneficial traits (e.g., to take Class A drugs, or become a suicide bomber), the overwhelming majority of cases of teaching in humans, and all known cases of teaching in other animals, meet this definition of cooperation. We might ask, if most instances of teaching are also acts of cooperation, given the extensive literature on the evolution of cooperation (Sachs et al. 2004; Lehmann and Keller 2006; West et al. 2007), is a specialized treatment of the evolution of teaching necessary? This question must be answered in the affirmative because teaching is a special case of cooperation, with unique properties that need to be incorporated into formal models to generate accurate predictions.

One property of teaching is that an individual's fitness depends not only on whether they possess the teaching genotype, but also whether they possessed the acquired information or skill. In this respect, teachers resemble the “phenotypic defectors” incorporated into some cooperation models (Lotem et al. 1999; Sherratt and Roberts 2001), who are unable to cooperate through lack of physical resources (i.e., if sick or young). Central to our teaching model is the assumption that individuals can acquire information through other means than being taught it (for instance, through individual learning or inadvertent social learning). For instance, individuals can be taught to capture scorpions, to fashion a hand-axe, or to solve differential equations, but with varying probabilities, they can also acquire this knowledge on their own through trial-and-error learning, or through imitation and other forms of copying. This would be analogous to individuals receiving the fitness benefits of cooperation without cooperation taking place. However, at least in principle, individuals could acquire physical resources, either through their own efforts (analogous to asocial learning), through scrounging (analogous to inadvertent social learning), or as recipients of cooperative acts (analogous to being taught). An important consequence of these alternative means of knowledge gain is that they potentially disconnect the frequency of teachers from the frequency of individuals with the relevant information or skill. Moreover, the frequency of this information among individuals possessing the teaching genotype is, in part, a function of the frequency of the information in the population at large (see eq. 1a). Accordingly, to predict when a teaching event will occur, which is essential to investigate its evolution, it is necessary additionally to track the frequency of the information in the subpopulations, and not just allele, genotype, or phenotype frequencies. We are unaware of any model of cooperation that, in addition to the donation of physical resources by cooperators, also allows physical resources to be acquired both through individual's own efforts and through scrounging, and moreover that also tracks resource frequencies as dynamic variables among co-operators and noncooperators. Furthermore, were such a model to be formulated it would still exhibit different dynamics to our teaching model. That is because physical resources and informational resources have quite different dynamical properties. For instance, an informed individual can exploit the information without depleting it, and can pass it on to others without hindering their ability to use it themselves. In addition, the fitness of the teaching genotype increases the more information there is in the population, because the teachers need to know the information to pass it on to their relatives. At the same time, the amount of information increases with the number of teachers (because teachers are better at spreading information than nonteachers). This is in contrast to most models of cooperation, where the benefit to defection tends to grow as cooperation becomes more common.

In this respect, our treatment also differs from previous models of the cultural evolution of cooperative behavior (e.g., Henrich and Boyd 2001; Boyd et al. 2003; Bowles and Gintis 2004), which have not distinguished between asocial learning, inadvertent social learning, and teaching, as a result of which the frequency of cooperative behavior is directly related to the frequency of socially transmitted information. Recent theory suggests that asocial learning could plausibly play an important role in cooperative behavior, and needs to be incorporated into cultural evolution models (Lehmann et al. 2008). The findings of our analysis support this argument. Our most interesting results, such as that teaching is not favored both for very difficult and very easy to learn traits (Fig. 1B), and is promoted by cumulative culture (Figs. 1C and 2), stem directly from this disconnect between the frequency of the teaching genotype and the frequency of the information to be taught, and the idiosyncratic properties of information flow.

These differences mean that the problem of the evolution of teaching does not reduce to the problem of the evolution of cooperation in any general form, at least not in a straightforward manner such that established theory fully explains the evolution of teaching. While some of our more intuitive findings (e.g., that the likelihood of teaching increases with the degree of relatedness between tutor and pupil and decreases with the costs of teaching, Fig. 1A) are unsurprising in the light of current understanding of evolution of cooperation (Sachs et al. 2004; West et al. 2007), the more significant and novel findings emerge solely from our treatment. Recent thinking within anthropology supports the argument that teaching is widespread in humans (Tehrani and Riede 2008; Hewlett et al. 2011), in spite of earlier claims to the contrary (e.g., Lewis 2007; McDonald 2007), but that it takes on a variety of forms from direct verbal instruction to more subtle pedagogical cueing. Our models imply that teaching may be widespread in humans because cumulative cultural knowledge-gain renders otherwise difficult to learn, high-fitness information available for tutors to impart. While it is frequently claimed that human cooperation is unique (Boyd and Richerson 1988; Henrich 2004), it is not immediately clear in what respects (West et al. 2011). Our analysis provides one possible answer to this conundrum. West et al. (2011) point out that “complex and unique mechanisms to enforce cooperation have arisen in humans, such as contracts, laws, justice, trade and social norms…” It is likely that all of these mechanisms require teaching to spread. Human cooperation may, therefore, be unusually extensive as a result of cumulative culture and may be uniquely reliant on an important mechanism, less-frequently observed in other species: teaching.


Associate Editor: S. West

ACKNOWLEDGMENTS

Research supported in part by BBSRC (BB/D015812/1), EU (FP6–2004-NEST-PATH, CULTAPTATION, Ref: 043434) and ERC (Advanced Investigator Grant, EVOCULTURE, Ref: 232823) grants to KNL and a BBSRC studentship to LF. We are grateful to G. Brown, W. Hoppitt, J. Kendal, H. Lewis, L. Rendell, A. Thornton, M. Webster, and A. Whiten for helpful comments on earlier drafts of this manuscript, and to K. Eriksson, M. Feldman, R. Johnstone and G. Roberts for valuable discussion.

Ancillary