### Abstract

- Top of page
- Abstract
- 2. Mathematical analysis of convergence rates
- 3. Testing the predictions with a function-learning task
- 4. Results and discussion
- 5. Conclusions
- Acknowledgments
- References
- Appendix:: Mathematical details

Information changes as it is passed from person to person, with this process of cultural transmission allowing the minds of individuals to shape the information that they transmit. We present mathematical models of cultural transmission which predict that the amount of information passed from person to person should affect the rate at which that information changes. We tested this prediction using a function-learning task, in which people learn a functional relationship between two variables by observing the values of those variables. We varied the total number of observations and the number of those observations that take unique values. We found an effect of the number of observations, with functions transmitted using fewer observations changing form more quickly. We did not find an effect of the number of unique observations, suggesting that noise in perception or memory may have affected learning.

An apocryphal story from World War I tells of a commander who conveyed an urgent message to his general by having each man speak it to his neighbor in the trench: “Send reinforcements. We are going to advance.” The general was confused to receive the request that finally reached his ears: “Send three and sixpence. We are going to a dance.” Information changes as it is passed from person to person, whether it is transmitted as a spoken message or by one person learning by observing the behavior of another. This process of cultural transmission provides the foundation for much of human knowledge: Most of the things we know we learn from other people, rather than by direct interaction with our physical environment. As a consequence, understanding the factors that affect cultural transmission is important not just for preventing errors in the chain of command, but for understanding how the knowledge maintained by human societies changes over time.

Two basic questions about cultural transmission concern its effects and it speed: how it changes the information being transmitted and how quickly this process takes place. The first question is relevant to understanding how cultural objects such as languages, religious concepts, and social conventions are formed. Anthropologists have argued that since cultural transmission depends on cognitive processes such as learning and memory, we should expect these cultural objects to come to reflect the structure of the human minds that are involved in transmitting them (Atran, 2001; Boyer, 1998; Sperber, 1996). Support for this hypothesis comes from recent theoretical analyses showing that transmission of information along a sequence of Bayesian agents changes the information into a form that is consistent with the inductive biases of those agents (Griffiths & Kalish, 2007; Kirby, Dowman, & Griffiths, 2007). Empirical results have borne out the predictions of this account, showing that as languages and concepts are transmitted along a sequence of human learners they take forms that are easier to learn (Griffiths, Christian, & Kalish, 2008; Kalish, Griffiths, & Lewandowsky, 2007; Kirby, Cornish, & Smith, 2008; Reali & Griffiths, 2009).

The second question, how quickly cultural transmission changes the information being transmitted, has been explored less extensively. This question has both practical and theoretical implications. On the practical side, identifying the factors that determine how quickly a message changes when it is passed from person to person has the potential to decrease misunderstandings of the kind experienced by the World War I general. On the theoretical side, knowing how quickly we expect languages and concepts to change over time would provide us with tools for answering questions such as whether enough time has passed for languages to have lost the influence of a common ancestor (Rafferty, Griffiths, & Klein, 2009) or how long ago two cultures diverged (Gray & Atkinson, 2003; Reali & Griffiths, 2010; Swadesh, 1952).

In this article, we analyze the impact of one factor that influences the rate at which cultural transmission has an effect: the amount of information transmitted between agents. We begin with a mathematical analysis of the simple case of transmission of a category defined on a single dimension. We then use a simulation to extend this analysis to the more complex case of transmission of a function, and we present an experiment exploring the predictions produced by this analysis.

### 2. Mathematical analysis of convergence rates

- Top of page
- Abstract
- 2. Mathematical analysis of convergence rates
- 3. Testing the predictions with a function-learning task
- 4. Results and discussion
- 5. Conclusions
- Acknowledgments
- References
- Appendix:: Mathematical details

These results, together with the properties of Gaussian distributions, can be used to evaluate the distribution of the mean of the observations generated by the agent *n*, , conditioned on the mean of the sample used to initialize the sequence, . In the Appendix, we show that this distribution is Gaussian with mean and variance , where . The mean and variance of thus converge geometrically to the mean and variance of the stationary distribution as *n* increases. The rate of convergence is set by the constant *c*, being faster for smaller values of *c*. The value of *c* is determined by the ratio of to , being small when this ratio is large. Increasing the sample size, *m*, increases *c*, and thus slows the rate of convergence. In other words, as the amount of information transmitted between agents increases, the rate at which cultural transmission changes that information decreases.

### 3. Testing the predictions with a function-learning task

- Top of page
- Abstract
- 2. Mathematical analysis of convergence rates
- 3. Testing the predictions with a function-learning task
- 4. Results and discussion
- 5. Conclusions
- Acknowledgments
- References
- Appendix:: Mathematical details

The analysis presented in the previous section provides clear mathematical results, but it assumes a situation that is simpler than the tasks that have previously been used to test predictions about cultural transmission. We chose to conduct an empirical test of the prediction that the amount of information passed between people should determine the rate at which cultural transmission converges to an equilibrium using a function-learning task, based on previous research that has established that this is a case in which people have strong inductive biases that influence iterated learning (Kalish et al., 2007).

In function learning, each discrete trial involves presentation of a single magnitude of the stimulus variable (*x*), and the learner attempts to infer the underlying function relating *y* to *x* and produces an estimated magnitude in response. Each response is followed by presentation of the correct value of *y*. Values of all variables are typically presented in graphical form. Tests of interpolation and extrapolation with novel *x* values after numerous (*x*,* y*) training trials confirm that people can infer continuous functions from these discrete trials. Previous experiments in function learning suggest that people have an inductive bias favoring linear functions with positive slope: Initial responses are consistent with such functions (Busemeyer, Byun, DeLosh, & McDaniel, 1997), and they require the least training to learn (Brehmer, 1971, 1974; Busemeyer et al., 1997). Accordingly, Kalish, Lewandowsky, and Kruschke, (2004) showed that a model that included such a bias could account for a variety of phenomena in human function learning. Finally, Kalish et al. (2007) showed that simulating cultural transmission of functions in the laboratory resulted in responses that converged on a linear function (with positive slope in 28 of 32 cases) irrespective of the information that was presented to the first generation.

The predictions for this case are similar to those seen in the mathematical analysis presented in the previous section. In general, as more information is passed from one person to another the rate of convergence decreases. Fig. 1 provides an example, produced from a simulation of a more complex Bayesian model described in detail in the Appendix. Increasing the amount of information each Bayesian agent provides to the next (again, expressed in terms of the variance of the posterior distribution) slows down convergence to the solution favored by the prior—in this case a linear function with a positive slope. However, learning functions introduces a factor that was not present in the simple one-dimensional case presented in the previous section: The amount of information a sample provides now depends both on the number of observations and the range of those observations.2 As the range increases, the sample provides more information about the slope of the function. Intuitively, this is why it is a good idea to try to sample a wide range of values for the independent variable when conducting regression analyses. Since increasing the number of observations of *x* is likely to increase the range that those observations cover, the number of unique observations—that is, types rather than tokens—is also related to the rate of convergence.

Our experiment had three conditions, corresponding to the three situations illustrated in Fig. 1. In all conditions, participants in the first generation were trained on a negative linear function, and subsequent generations of participants were trained on the responses of their immediate predecessors. The only difference between conditions was the training regime, which consisted of different combinations of the number of unique stimuli (types) and replications of each stimulus (tokens). In the 4 × 1 condition, training consisted of a single presentation of each of 4 unique stimuli. In the 4 × 10 condition, there were also 4 unique stimuli but each was presented 10 times. Finally, in the 40 × 1 condition, each of 40 unique stimuli was presented once. Training within each condition continued across generations until participants' responses had either flipped to a positive linear function (whereupon further changes are unlikely; Kalish et al., 2007) or a maximum of 11 generations had been trained within a condition.

As shown in the simulation results presented in Fig. 1, we expected the number of observations provided to learners to affect the rate of convergence of cultural transmission, with participants in the 4 × 1 condition converging faster than either the 4 × 10 condition or the 40 × 1 condition. Fig. 1 (b) also shows that it is possible for the number of unique observations to affect the rate of convergence, with a difference between the 4 × 10 and 40 × 1 conditions. However, this effect is weaker than the effect of the number of observations in two ways. First, the effect size is smaller, with the confidence intervals on the slopes overlapping. Second, the effect disappears if the observations are perceived with noise, since this effectively increases the range of the observations. Fig. 1 (c) shows that when *x* and *y* have noise associated with them (perhaps as a result of errors in perception or memory), there is little difference in the rate of convergence across conditions (see the Appendix for details). Consequently, whether we see an effect of the number of unique observations may depend on whether people can identify those observations as actually being unique.

#### 3.1. Method

##### 3.1.1. Participants

The participants were members of the campus community at the University of Western Australia (*N* = 56) and University of Louisiana at Lafayette (*N* = 79). Participants received remuneration ($10/h) or course credit for participation in the single experimental session.

Participants were randomly assigned to one of three experimental conditions and to a “family” within a condition (subject to the termination constraints below), with cultural transmission taking place across the generations of the family. There were five families in each condition, and participants were no longer added to a family after the 11th generation or after the responses of the latest participant clearly conformed to a positive linear function (assessed by the slope of the test responses), whichever came first.

##### 3.1.2. Stimuli and apparatus

A Windows computer running a Matlab program designed using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) was used to present stimuli and to record responses.

On each trial, a gray filled horizontal bar (approximately 2 cm high) was presented at the top-left of the screen. The upper left corner of the bar was approximately 4 cm from the top and 4 cm from the left of the edge of the screen, and the horizontal extent of the bar indicated the magnitude (*x*) of the stimulus. No tick marks or scales were present.

The participant entered a response magnitude () by clicking on a vertically oriented slider, which consisted of a thin (0.8 cm) unfilled rectangle that abutted the right side of the screen and was labeled “0” and “max,” respectively, at the bottom and top ends. No other scale marks were present. The mouse was originally positioned to the left of the center of the slider, and people indicated their response by clicking within the slider. Upon clicking, a black horizontal bar appeared at that location within the slider and a confirmation button (labeled “OK”) appeared to its left. Participants could adjust their response repeatedly and clicked the “OK” button to proceed.

During the training phase, a response was immediately followed by feedback, which consisted of the word “correct” printed within a frame connected to the slider with a line at the vertical location that corresponded to the correct target value *y*. The feedback remained visible for a minimum of 1.6 s, with the duration being extended in linear proportion to the response error (i.e., the difference between the response magnitude and the true magnitude) to encourage accurate responding.

##### 3.1.3. Procedure

The experiment involved a training phase followed by a test phase. The stimuli used in the training phase varied by condition: There were 4 unique stimuli in the 4 × 1 and 4 × 10 conditions (repeated 10 times in the latter) and 40 unique stimuli in the 40 × 1 condition. For learners who formed the first generation of any family, all unique stimuli were randomly sampled from the set of stimulus magnitudes (*x*) in [1,100]. Target magnitudes (*y*) were assigned according to the negative linear function *y* = 100 − *x*, allowing us to monitor the rate at which responses moved away from this function and toward a positive linear function.

The test phase always involved 40 test trials, irrespective of condition. In the 4 × 1 and 4 × 10 conditions, the test phase involved all four unique training stimuli plus 36 new stimuli with *x* values sampled uniformly from [1,100]. In the 40 × 1 condition, 20 of the training stimuli were used as test stimuli, together with 20 new stimuli. Test stimuli were presented in random order.

For generations following the first, training stimuli were sampled from the test phase responses of the participant in the previous generation of that family. For the 40 × 1 condition, the training set for a given generation was simply the test set of the previous generation. For the other two conditions, the training set consisted of two of the test stimuli that had *x* values drawn from the training set of the previous participant and two of the test stimuli that had new *x* values. In all cases, the magnitude estimates provided during the test phase by the previous participant became the new target values *y*.

The sequence of training trials consisted of a random permutation of all unique stimuli and their replications (i.e., 4 × 1 = 4, 4 × 10 = 40, and 40 × 1 = 40 training trials). Training trials were separated by a 1 s blank screen. Test trials were identical to training trials, except that no feedback was presented after the response was entered. Participants were informed about this change at the outset.

The experiment was preceded by four practice trials during which feedback was presented. All practice trials involved the pairing (*x* = 50, *y* = 50) for all conditions. The constant values prevented possible biasing toward any particular function relating *x* and *y* during training.

### 4. Results and discussion

- Top of page
- Abstract
- 2. Mathematical analysis of convergence rates
- 3. Testing the predictions with a function-learning task
- 4. Results and discussion
- 5. Conclusions
- Acknowledgments
- References
- Appendix:: Mathematical details

Owing to the brevity of training in the 4 × 1 condition, analysis focused on responses from the test phase. Fig. 2 shows the responses for each participant in all three conditions. For each condition, one row of panels corresponds to a family, whereas columns correspond to generations. Thus, the participants in the left-most column all received stimuli that were sampled—using the regime determined by the condition—from the same negative linear function. All remaining participants in each condition were trained on stimuli that were contingent upon the responses of the preceding generation.

Of greatest interest in Fig. 2 is the evolution of responses across intergenerational transmission, from left to right across columns in each row. It is immediately apparent that in the 4 × 10 condition (panel (b)) and in the 40 × 1 condition (panel (c)), there were three families who failed to converge across 11 generations; that is, the last descendant in each family continued to respond according to the negative linear function that was used for the first generation. The remaining two families in each condition converged after eight and four generations (40 × 1), and after seven and seven generations (4 × 10). In striking contrast, *all* families converged in the 4 × 1 condition, namely after ten, five, one, six, and three generations (panel (a)). The rapid switches in slope across successive generations are consistent with the results of previous iterated learning experiments using a function-learning task (Kalish et al., 2007) and are consistent with having a multimodal prior distributon on functions rather than the smooth prior assumed in the Bayesian linear regression model we used to motivate our experiment (Griffiths, Lucas, Williams, & Kalish, 2009).

A summary of the data is provided in Fig. 3, which shows the cross-generational evolution of average slopes (i.e., best-fitting slope estimates for each subject averaged across generational peers in all families) in the three conditions.3 The figure makes two important points. First, it clarifies that in all conditions there was movement away from the initial function to the positive linear alternative, consistent with the results of Kalish et al. (2007). Second, the figure highlights that convergence was faster in the 4 × 1 condition than in the other two, which in turn did not differ much from each other.

For statistical confirmation, we first fit a regression model to the data in Fig. 3 that had separate slopes and intercepts for each condition. This model fit very well, , and the loss of fit was negligible when the slopes for the 4 × 10 and 40 × 1 conditions were constrained to be equal, *F*(1,27) = .012, *p* > .10. When the slope for the 4 × 1 condition was also constrained to be identical, the further loss of fit was considerable, *F*(1,28) = 10.43, *p* < .005, confirming the obvious pattern in the figure and the faster convergence of the 4 × 1 condition (slope 0.145) than the other two (joint slope 0.089).

We also performed a Bayesian analysis in which the point at which the slope switched to a positive value was treated as a Poisson random variable with a different rate parameter for each condition.4 We used a generic conjugate prior—an exponential distribution with unit mean—to obtain posterior distributions on the rate of the Poisson that were Gamma(25,6), Gamma(48,6), Gamma(45,6), for the 4 × 1, 4 × 10, and 40 × 1 conditions, respectively. This results in posterior probabilities of .995 and .992 that the rate was higher in the 4 × 1 condition than the 4 × 10 and 40 × 1 conditions, and .582 that the rate was higher in 4 × 10 than 40 × 1. The Bayesian analysis thus supports the same conclusions as the regression model.

### 5. Conclusions

- Top of page
- Abstract
- 2. Mathematical analysis of convergence rates
- 3. Testing the predictions with a function-learning task
- 4. Results and discussion
- 5. Conclusions
- Acknowledgments
- References
- Appendix:: Mathematical details

Mathematical analyses of cultural transmission by Bayesian agents predict that the rate at which information is changed by cultural transmission is inversely related to the amount of information that is transmitted. Our results partially bore out these predictions. In confirmation of predictions, convergence to a function that reflected people's inductive biases was faster when the function was transmitted using fewer observations (the 4 × 1 condition). The number of unique observations within the sample (40 × 1 vs. 4 × 10) did not have a statistically significant effect. These results are consistent with the conclusion that the amount of information transmitted between learners affects the rate of convergence of cultural transmission, but they reinforce the fact that the information provided by a sample depends on how people perceive it.

The lack of the predicted effect of the number of unique observations is an interesting finding that warrants further investigation. One possibility is that the sample size used in our study was simply not large enough to find this effect. If so, our results suggest that the effect of the number of unique observations must be weaker than the effect of sample size, which is consistent with the predictions produced by the simple model considered in the introduction (see Fig. 1 (b)). A second possibility is that human learners inserted sufficient noise into the observations to mask the fact that the number of unique observations was small. If perceptual or memory error was sufficient to “jitter” these observations into a sample that more closely resembled that seen in the 40 × 1 condition, we should not expect to see a difference between the conditions. As shown in Fig. 1 (c), such noise can remove the difference between the 4 × 10 and 40 × 1 conditions.

Our results have implications for both practical and theoretical questions related to cultural transmission. On the practical side, they illustrate how the amount of information passed between agents plays a crucial role in the ultimate fidelity of transmission. Our advice to the apocryphal World War I commander would be to tell his troops to repeat the message several times (i.e., instantiate a 4 × 10 rather than 4 × 1 condition), thereby increasing the probability that it would be successfully transmitted. On the theoretical side, the relationship between sample size and rate of convergence has the potential to deepen our understanding of which aspects of languages we might expect to change more rapidly as they are passed from generation to generation, providing a link to analyses that examine the relationship between the frequencies of linguistic constructions and the rate at which those constructions change over time (e.g., Reali & Griffiths, 2010).