The Cultural Evolution of Structured Languages in an Open‐Ended, Continuous World

Abstract Language maps signals onto meanings through the use of two distinct types of structure. First, the space of meanings is discretized into categories that are shared by all users of the language. Second, the signals employed by the language are compositional: The meaning of the whole is a function of its parts and the way in which those parts are combined. In three iterated learning experiments using a vast, continuous, open‐ended meaning space, we explore the conditions under which both structured categories and structured signals emerge ex nihilo. While previous experiments have been limited to either categorical structure in meanings or compositional structure in signals, these experiments demonstrate that when the meaning space lacks clear preexisting boundaries, more subtle morphological structure that lacks straightforward compositionality—as found in natural languages—may evolve as a solution to joint pressures from learning and communication.


Geometric dissimilarity measure
We selected four features that participants could potentially use to conceptualize and communicate about the triangles (see Table 1). For each of the features, the distance was computed between every pair of triangles in the static set, yielding four distance matrices. The matrices were converted to ranks to remove the distributional effects peculiar to each metric, summed together, and then normalized in the interval [0,1]. A pair of triangles that are similar in terms of all four features will have a score close to 0 (with 0 representing identity), while a pair of triangles that are dissimilar in terms of all four features will have a score close to 1. Fig. 1 shows the most similar and most dissimilar pairs of triangle stimuli based on this geometric measure (top) and based on the ratings from the naïve raters (bottom) for comparison.
There was a strong correlation between the scores produced from this geometric approach and the mean normalized dissimilarity ratings provided by the naïve raters (r = .49, n = 1128, p < .001; Mantel test). The results for structure using this measure are given in Fig. 2. The general trends are congruent with the equivalent results produced using the ratings from naïve raters. However, the structure scores tend to be lower under the geometric approach, suggesting that it does not fully capture the way in which the triangles are perceived. For this reason, we consider the structure results based on human ratings to be canonical and present this alternative method in support of our conclusions. euclidDist(cent(X),cent(Y)) Orientation Shortest angular distance by orienting spot a min(angCW(X,Y),angCCW(X,Y)) Shape Absolute difference in equilateralness ratio b abs(equilat(X)-equilat(Y))

Size
Absolute difference in centroid size c abs(centSize(X)-centSize(Y)) a Orientation is defined as the angular coordinate of the orienting spot when the triangle is centered on the origin. The shortest angular distance between two triangles is the shorter of the clockwise or counterclockwise angular distances. b See Equation 2 in the main paper. c Square root of the sum of squared distances from the centroid of the triangle to its vertices.

Encoded meaning dimensions
To determine the features encoded by a particular language, we correlated the pairwise string dissimilarity scores with all combinations of the four geometric features described in Table 1 to see which combination would yield the strongest correlation. For four features, there are = 15 combinations to consider, giving us a typology of 15 types of language that could potentially arise. These language types are listed in Table 2 along with reference numbers; for example, a Type 13 language encodes location, shape, and size.
The results of this analysis are given in Table 3; for all generations, the table gives the type number for the combination of features that resulted in the strongest correlation, along with the Pearson correlation in parentheses. The most common types of language to emerge across all experiments were Type 3 (shape; 52% of languages) and Type 10 (shape and size; 17% of languages). The language types with the highest average correlation (across all emergent languages) were Type 3 (shape; mean r = .19) and Type 10 (shape and size; mean r = .16). These results reveal a clear bias toward encoding the shape and size features of the triangles. This analysis was also performed with the dissimilarity ratings from the naïve raters; the strongest correlations were also with Type 3 (shape; r = .71, n = 1128, p < .001; Mantel test) and Type 10 (shape and size; r = .69, n = 1128, p < .001; Mantel test), suggesting that the naïve raters were also rating the dissimilarity between triangles based primarily on the shape and size features. This is supported by the fact that the dimensions of the MDS solution corresponded approximately to shape and size.  Note. Each cell gives the type number (see Table 2) for the combination of features that resulted in the strongest correlation; the correlation coefficient is given in parentheses.