Social complexity and the fractal structure of group size in primate social evolution

Compared to most other mammals and birds, anthropoid primates have unusually complex societies characterised by bonded social groups. Among primates, this effect is encapsulated in the social brain hypothesis: the robust correlation between various indices of social complexity (social group size, grooming clique size, tactical behaviour, coalition formation) and brain size. Hitherto, this has always been interpreted as a simple, unitary relationship. Using data for five different indices of brain volume from four independent brain databases, we show that the distribution of group size plotted against brain size is best described as a set of four distinct, very narrowly defined grades which are unrelated to phylogeny. The allocation of genera to these grades is highly consistent across the different data sets and brain indices. We show that these grades correspond to the progressive evolution of bonded social groups. In addition, we show, for those species that live in multilevel social systems, that the typical sizes of the different grouping levels in each case coincide with different grades. This suggests that the grades correspond to demographic attractors that are especially stable. Using five different cognitive indices, we show that the grades correlate with increasing social cognitive skills, suggesting that the cognitive demands of managing group cohesion increase progressively across grades. We argue that the grades themselves represent glass ceilings on animals' capacity to maintain social and spatial coherence during foraging and that, in order to evolve more highly bonded groups, species have to be able to invest in costly forms of cognition.


I. INTRODUCTION
Anthropoid primates differ from other mammals and birds in the extent to which they live in stable (bonded) social groups , 2010a. With the exception of a few (mainly species-poor) orders (elephantids, tylopods, equids, delphinids), in most birds and mammals these kinds of stable groups are limited to monogamous pair-bonds , 2010bPérez-Barbería, Shultz & Dunbar, 2007b). Bonded social groups are characterised by dyadic relationships that are emotionally intense, and involve a great deal of social grooming and the constant visual monitoring of grooming partners during foraging (Dunbar & Shultz, 2010;Massen, Sterck & de Vos, 2010). These dyadic relationships are also embedded within complex networks of relationships in which third-and fourth-party relationships become important and influence the outcome of dyadic interactions (e.g. Datta, 1983). Managing multi-party relationships is a complex process requiring sophisticated cognitive and social skills such as being able to predict intentions and knowing when and how to de-escalate situations before they get out of hand and risk triggering group fragmentation.
In primates [but not in other mammals or birds , 2010aPérez-Barbería et al., 2007b)], brain sizeand, specifically, neocortex volumeis a robust predictor of both social group size and many other aspects of social behaviour (including grooming clique size, coalitionary behaviour, mating strategies, tactical deception, and network complexity) . Moreover, there is now considerable neuroimaging evidence from humans (Powell et al., 2010;Lewis et al., 2011;Kanai et al., 2012;Kwak et al., 2018;Noonan et al., 2018;Kiesow et al., 2020;Spreng et al., 2020) indicating that individual differences in personal social network size correlate with the volume of the brain's default-mode neural network. This network makes up a substantial component of the neocortex, linking the prefrontal cortex with units in the parietal and temporal lobes, with additional connections to the limbic system. Analogous findings have been reported for Old World monkeys (Sallet et al., 2011;Meguerditchian et al., 2020). Together, these results suggest that the Social Brain Hypothesis (Dunbar, 1998) holds at the individual within-species level as well as at the species level.
Although the species-level relationship between group size and brain size has always been interpreted as a simple linear regression, there is reason to believe that, in primates at least, the relationship is more complex. A number of studies have suggested that the social brain relationship might consist of several distinct grades (sub-graphs whose slopes are the same but whose intercepts differ) (Dunbar, 1993;Lehmann, Korstjens & Dunbar, 2007). In addition, the distribution of species' mean group sizes in primates is not normally distributed as should be the case if the relationship was a simple linear one. Instead, the distribution appears to consist of a set of overlapping but distinct Poisson distributions, each of which forms a natural attractor making up a fractal series with means at approximately 2, 5, 15, 30 and 50 (Dunbar, Mac Carrron & Shultz, 2018b). More importantly, the distribution of data points at the species level is bivariate uniform rather than bivariate normal as should be the case for a simple causal relationship. Such a distribution is usually a signal that a data set contains grades. Grades that differ in brain size imply that some species require more cognitive processing power to maintain group cohesion than other species do, which in turn suggests that their sociality depends on different (perhaps more sophisticated) behavioural and cognitive mechanisms.
Complex sociality depends, in particular, on two forms of high-level cognition. The first, generally known as executive function, includes causal reasoning, analogical reasoning, onetrial learning and the capacity to inhibit prepotent actions (in effect, self-control) (Dunbar, 2012). These executive functions are all associated with the brain's frontal pole (Brodman Area 10), a prefrontal brain region found only in anthropoid primates (Passingham & Wise, 2012). Brodman Area 10 is not present in prosimians or, among the anthropoids, in the marmosets and tamarins (Passingham & Wise, 2012). In humans, the capacity to inhibit prepotent actions is strongly predictive of stable dyadic relationships (Moffitt et al., 2001;Pearce et al., 2019). Although inhibition has sometimes been used as an index of foraging skills (MacLean et al., 2014;Stevens, 2014), it is in fact essential for maintaining stable, cohesive social groups. In herding mammals, a combination of within-group competition (Clutton-Brock & Janson, 2012) and unsynchronised activity schedules (Ruckstuhl & Kokko, 2002;Ruckstuhl & Neuhaus, 2002;Calhim, Shi & Dunbar, 2006;Pérez-Barbería, Robertson & Gordon, 2005;Pérez-Barbería et al., 2007a;Dunbar & Shi, 2008;King & Cowlishaw, 2009) inevitably results in groups fragmenting over time, giving rise to a casual fission-fusion form of sociality. For groups to remain stable, members have to be willing to accommodate the needs of other group members by deferring foraging if others want to rest (or vice versa) (Conradt & Roper, 2000;Pérez-Barbería et al., 2007a). In short, animals need sophisticated sociocognitive skills and strong social bonds in order to prevent groups drifting apart (Dunbar, 2018), with the capacity to inhibit prepotent actions being crucial to this.
The second, more explicitly social, form of cognition is mentalising (the ability to understand others' intentions and perspectives). In humans, mentalising competencies are correlated with both social skills (Launay et al., 2015) and personal social network size (Stiller & Dunbar, 2007;Lewis et al., 2011;Powell et al., 2012). Although high-level mentalising (formal theory of mind and above) is probably unique to humans, experimental evidence suggests that both great apes (Krupenye et al., 2016;Kano et al., 2019) and some monkeys (Devaine et al., 2017;Hayashi et al., 2020) exhibit significant competencies in this respect. Mentalising is specifically associated with the default-mode neural network, but especially with the volume of the prefrontal cortex (Carrington & Bailey, 2009;van Overwalle, 2009;Powell et al., 2010Powell et al., , 2014Mars et al., 2012). The default-mode network is common to both humans and Old World monkeys, and in both cases is specialised for managing social relationships (Mars et al., 2012(Mars et al., , 2016Sallet et al., 2013;Sliwa & Freiwald, 2017;Amiez et al., 2019). Indeed, it is significant that, in between-species analyses, the relationship between group size and brain volume is strongest when subcortical brain regions and those cortical regions associated with routine perceptual input processing (e.g. the primary visual cortex in the occipital lobe) are excluded (Dunbar, 1992a;Joffe & Dunbar, 1997).
One aspect of social complexity that has attracted attention is the fact that some species live in multilevel societies: these social systems have a layered structure with a small basal social unit that is stable through time, with higher-level groupings built up out of increasingly unstable clusterings of these basal units. Human hunter-gatherer social communities, for example, typically consist of bands of 35-50 individuals, which cluster into communities of around 150 individuals, with these in turn gathered into successively higher-level groupings at 500 (mega-bands) and 1500 (tribes) individuals (Zhou et al., 2005;Hamilton et al., 2007;Dunbar, 2020). Similarly, the basal social unit of hamadryas (Papio hamadryas) and gelada baboons (Theropithecus gelada) is a single male reproductive unit of 5-10 individuals, with these being clustered successively into clans and bands (Hill, Bentley & Dunbar, 2008;Mac Carron & Dunbar, 2016). Other species are known to form large, unstable groupings that may consist of more stable sub-groups: examples include Papio papio (Guinea baboon: Dunbar & Nathan, 1972;Patzelt et al., 2014), Mandrillus (drills and mandrills) (Gartlan, 1970;Hoshino et al., 1984;Abernethy, White & Wickings, 2002;Brockmeyer et al., 2015), Nasalis (Yeager, 1992) and Rhinopithecus (Kirkpatrick et al., 1998). This fractal pattern is likely to be cognitively challenging if the higher-level groupings are to have any degree of temporal cohesion and stability (Amici, Aureli & Call, 2008).
Taken together, these observations invite a reconsideration of the group/brain size relationship in primates in three separate respects. One is whether the relationship in fact consists of a single, unitary grade or a set of grades (and, if so, how many). Assuming that the data support the latter option, then two further questions arise. One is whether this pattern is also reflected in the separate groupings of those species that have multilevel social systems: in other words, are these within-taxon grouping levels constrained by the same grades (suggesting that they reflect the same bonding processes as underpin the between-taxon differences). And, second, do the grades correspond to phase shifts in cognitive abilities that reflect increasing neurological processing demands?

II. METHODS
(1) Data The brain data used herein derive from four independent data sets that have employed different techniques for estimating brain region volumes. From these, we extract five different brain volume indices. Being mindful of Rilling's (2006) caution against mixing data obtained by different methods, each index is based on a single brain data set.
(1) Stephan, Frahm & Baron's (1981) sample of 39 species (35 genera) provides estimates of whole brain and brain region volumes (in particular neocortex and, for 31 species, the volume of the primary visual cortex, area V1, in the occipital lobe) based on histological analysis. Estimates include only neural matter, and exclude the spaces occupied by the meninges and ventricles. The brain size estimates of individual specimens were corrected for body size relative to the species' mean body mass. To their original data set we add data for orang utans (Pongo pygmaeus) provided by Zilles & Rehkämper (1988) from the same brain collection using the same techniques. We use these data to calculate two separate indices: neocortex ratio (the volume of the whole neocortex divided by the volume of the rest of the brain, i.e. midbrain plus brain stem) and the absolute volume of the neocortex excluding the primary visual system, V1 (non-V1 neocortex: neocortex volume less the volume of area V1). Neocortex ratio correlates better than most other brain volumes with a number of social and cognitive indices (Shultz & Dunbar, 2010c), probably because it measures relative investment in high-level cognition. Non-V1 brain volume correlates better with group size than other brain indices, including neocortex ratio, because it excludes a major brain region involved only in sensory processing (Joffe & Dunbar, 1997) that plays no direct computational role in social behaviour (Nummenmaa et al., 2015(Nummenmaa et al., , 2016. (2) Rilling & Insel's (1999) sample of 21 species (21 genera) provides data on total brain volume and frontal lobe grey matter volume using magnetic resonance imaging (MRI). We use only the data for frontal lobe grey matter (FRG) volume. (3) Isler et al.'s (2008) sample of endocranial volume (ECV) for 126 species (54 genera). ECV includes not only the neocortex and other cortical regions such as the limbic system, but also the midbrain and brain stem (both of which are mainly associated with the management of somatic tissue rather than high-level cognition) as well as the volumes of the meninges, the ventricular spaces within the brain and the space between the outer surface of the brain and the inner surface of the brain case. It thus includes significant volumetric components that have little or nothing to do with cognition of any kind. (4) Navarette et al.'s (2018) recent neuroimaging data set provides data on total brain volumes and the volume of the neocortex for 35 species (20 genera). We use only the data on neocortex volume (henceforth 'Navarette neocortex volume').
Many comparative analyses control for body mass, either by residualising brain volume against body mass or by including body mass as a covariate in the regression analysis. Doing so results in a completely different question being asked, namely why do some species have larger brains than would be expected for their body size, given the overall average relationship across primates? We are not concerned with this question (which is largely explained by energetic constraints and nutritional throughput: Dunbar, 1998), but rather with a question about the consequences of having an absolutely larger brain. Those who have taken the trouble to read Jerison (1977) will be aware that, when he introduced the idea of controlling for body mass, he did so not to control for body mass as such but in order to remove that component of the brain that was dedicated to managing somatic tissue, so as to estimate that part of the brain mainly involved in cognition (in effect, the neocortex). Four of our five indices already do that because they only consider the neocortex. More importantly, in primates at least, cognitive abilities correlate with absolute brain volume and not relative brain volume (Deaner et al., 2007;Shultz & Dunbar, 2010b). We therefore use absolute volumes in all analyses. In ignoring body mass, we follow conventional practice in the neurosciences, where absolute volumes are viewed as appropriate proxies for cognitive processing capacity, and body mass is not considered a relevant variable. We use only one relative measure of brain size, neocortex ratio (the ratio of the neocortex to the rest of the brain): (i) because it has been used in a number of published analyses, where (ii) it has often been found to be a better predictor of behavioural and cognitive functions than absolute volumes and (iii) because it probably indexes relative investment in executive cognition. Kronmal (1993) cautioned against the use of ratios in regression analyses, and his argument is well taken. In the online Supporting Information, Appendix S1, we follow his advice and show that the reason neocortex ratio produces better results is because of the strong relationship with neocortex volume and not the influence of the volume of the rest of the brain. We retain the index in the analyses, however, because it has been widely used and we wish to show that our results are robust to even radical differences in the brain indices used.
Many previous analyses (e.g. Dunbar, 1992a;Stevens, 2014;DeCasien, Williams & Higham, 2017;Powell, Isler & Barton, 2017) have introduced significant error variance by confounding foraging groups with social groups. This has been a particular problem for species like the nocturnal prosimians and those monkeys and great apes (notably the orangutan, Pongo) that have fission-fusion social systems (i.e. stable social groups that fragment to allow individuals to forage alone or in small independent parties). Most of the nocturnal prosimians sleep together in shared nests, even though they often forage alone. Similarly, all field studies agree that, even though they often (but not always) live alone, orangutans have distinct, stable local communities that consist of individuals who are more tolerant of each other; indeed, orangutans are commonly housed socially in captivity. We use nest-group sizes for all nocturnal prosimians and local community size for the orangutan. The converse problem, in which small stable groups fuse into larger temporary herds during foraging is common among the cercopithecoid monkeys, and creates an analogous problem: which grouping level is the correct one to use when comparing with taxa that are characterised by a single grouping level? Examples include the mixed-species foraging groups formed by the Cercopithecus guenons (Korstjens, Lehmann & Dunbar, 2018) and the multilevel herds regularly formed by Nasalis spp. (Yeager, 1992), Rhinopithecus spp. (Kirkpatrick et al., 1998), Theropithecus (Mac Carron &  and Mandrillus (Hoshino et al., 1984;Abernethy et al., 2002). Although less common among the colobines, herd formation by temporary fusion of stable groups has been reported, uniquely, in the Nyungwe population of Colobus angolensis (Fimbel et al., 2001;Fashing et al., 2007), and, indeed, is predicted by this population's particular ecology . We used the most recent compilation of primate group sizes by Dunbar, Mac Carron & Shultz (2018b) which tried to ensure that all group counts were for social groups (the local community of individuals who have broadly affiliative relationships in stable groups that share a common home range or territory).
We add one correction to this list. Most compilations give a group size for Miopithecus of 65, based on the maximum group size observed for one group in one field study. The social organisation of this genus has been notoriously difficult to discern, mainly because of the somewhat confusing fission-fusion grouping pattern it exhibits in the wild. A closer reading of papers from the few field studies of this genus (Gautier-Hion, 1970Rowell, 1973;Breitwisch, 1983) suggests a social system in which small, stable groups form large temporary foraging bands (or herds), especially when foraging in cultivated fields. Since sleeping groups at night are more likely to reflect the stable core grouping (e.g. sleeping groups in many nocturnal strepsirrhines), we use the mean value of 37.4 for sleeping groups censused away from villages, rather than the more commonly cited figure of 65 which was based on daytime censuses of foraging groups near villages (rich food sources that, as in many primate species, attract groups to forage together). Although brain data are available for Daubentonia and Mandrillus, we exclude both genera because their social systems are insufficiently well described to discern social group (as opposed to foraging group) size. Brain data are available for the dwarf lemur Microcebus murinus, but numerical data on social groups are available only for M. rufus; since these were formerly considered to be a single species, we combine these data to provide a datapoint for the genus since this is one of the few dwarf strepsirrhines for which brain data are available. Group size for Homo sapiens is taken to be the community or personal network (mean size 154.4), based on the extensive empirical evidence summarised by Dunbar (2020).
The data are provided as supporting information in online Appendix S4.
For species with multilevel social systems, we use data for the different grouping levels given by Hill et al. (2008) for Papio hamadryas and Theropithecus, and Dunbar (2020)  (at Mbeli Bai) are from Morrison et al. (2019). Chimpanzee (Pan) neighbourhood size (sub-clusters that groom and travel with each other) is based on Thompson et al. (2007), Kahlenberg et al. (2008), Langergraber, Mitani & Vigilant (2009) and Wakefield (2013): these indicate an average of 6.5 (range 2-9) females per neighbourhood, to which we add an average of one dependent offspring for each female. Because the distribution of chimpanzee community sizes is bimodal (Dunbar 2019), we include separate datapoints for their mean values: very large chimpanzee communities ('megacommunities', mean size 108) and conventional communities (mean size for the two species 42). The data are provided in online Appendix S5.
To test for cognitive differences across grades, we use performance on a suite of six executive function (reasoning) tasks (reversal learning, short-term memory, concept transference, an oddity task, a patterned string task, and invisible displacement: from Shultz & Dunbar, 2010c), three independent estimates of the ability to inhibit prepotent actions [reward reversal inhibition score (Shultz & Dunbar, 2010c), a go/no-go indifference score (Stevens, 2014) and an A-not-B self-control score (MacLean et al., 2014)], and a mentalising competence task (Devaine et al., 2017). We categorise species as having stable (or bonded) social groups (pair-bonded monogamy or stable social groups built around groomingbased dyadic relationships, usually between females) from those that do not (nocturnal species that forage solitarily or diurnal species that have unstable groups or some form of fission-fusion sociality). The data are provided in online Appendix S4.
(2) Statistical analysis (a) Testing for grades We are interested in testing whether or not all species lie on a common regression line. The conventional way to do this is to determine whether residuals from the common regression are unimodal or multimodal (i.e. form distinct clusters). For this, we apply k-means cluster analysis to the residuals from the common regression line. There are no formal methods for determining the optimal number of clusters. Goodnessof-fit will always increase monotonically with the number of clusters, reaching a limit at 1 when the number of clusters equals the sample size (i.e. every data point is a separate cluster) (Coulson, 1987). In general, the optimal number of clusters is that which (i) maximises goodness of fit while minimising the number of clusters; (ii) does not include clusters with very small numbers of data points; and (iii) has a good silhouette index (indicating limited overlap between adjacent clusters). The silhouette statistic (Rousseeuw, 1987) varies between −1 (complete overlap) and +1 (no overlap). Negative values imply unacceptable levels of overlap between adjacent clusters; ideally, the silhouette statistic should be >0.4.
Most comparative analyses use phylogenetic methods as a covariate in bivariate analyses. However, in this particular case, these methods are inappropriate for two reasons. First, they are designed to control for phylogenetic effects that inflate degrees of freedom rather than to search for grades in data. Second, they all use least-squares regression (LSR), which inevitably underestimates the slope if there are grades in the data. Fortunately, the earlier CAIC (Comparative Analysis by Independent Contrasts) methods allow grades to be detected directly by examining residuals (Harvey & Pagel, 1991). We therefore use standard contrasts methods (CAIC) to determine a common regression line for each brain index, and calculate residuals from this line. To map contrasts, we used the consensus phylogenetic tree for primates downloaded from https://10ktrees.nunn-lab.org/ (without branch lengths). In fact, the phylogenetic signal for group size and most ecological variables is close to zero in primates (Kamilar & Cooper, 2013). The same is also true of at least four of the cognitive indices we use: these yield identical results whether or not the data are phylogenetically corrected (Shultz & Dunbar, 2010c;MacLean et al., 2014;Stevens, 2014). In addition, with the exception of ECV and the Navarette data sets, the data derive mainly from a single representative species for each genus, thereby further reducing the likelihood of significant distortion due to phylogenetic effects. We ran the analyses both with and without controlling for phylogeny. The results were the same, but to avoid unnecessary controversy we present results from only the analyses using the phylogenetic method.
A more serious problem is the fact that most modern quantitative phylogenetic packages use LSR methods. It is well known that, when there are grades in the data, LSR methods underestimate the true slope (Mace, Harvey & Clutton-Brock, 1981;Harvey & Mace, 1982;Aiello, 1992). This is because LSR (which was designed for experimental contexts where x-axis values are pre-specified and therefore known exactly) assumes that the values on the x-axis are known without error. In addition, it assumes that the data are bivariate normal. As a result, the greater the variance in the data, the more the LSR equation underestimates the slope. This is a particular problem when grades are present in the data, since this will usually yield data that have a bivariate uniform rather than a bivariate normal distribution. Although our sample of log 10 -transformed group-size data are normally distributed, four of the five log-transformed brain indices cannot be distinguished from a uniform distribution while the fifth (Navarette neocortex volume) is significantly different from both normal and uniform distributions. This suggests that standard LSR methods are inappropriate. Kendall & Stuart (1979) have shown that, if the errors are unknown, then reduced major axis (RMA) regression (which minimises the deviation on both axes simultaneously, rather than just that on the y-axis as in LSR) gives the maximumlikelihood estimate of the true functional relationship. Rayner (1985) also recommended RMA in preference to other regression techniques when the error variances on the two axes are either equal or unknown, because it is the only one of the three common regression techniques that is independent of the correlated error variance. In simple terms, the RMA regression sets its line up the main axis of the data, rather than across it as an LSR regression will inevitably do (Fig. 1). The only disadvantage of RMA regression is that it is not possible to assign significance values to the regression coefficients (although RMA and LSR estimates will converge when r 2 > 0.95). Conventionally, the equivalent LSR values are used as a conservative proxy: the fit cannot be worse than this.
We therefore used an RMA regression for the distribution of contrasts on the x and y variables for each of the five brain indices, and estimated residuals on the y-dimension from these lines. To avoid biasing the regression slope by the very large size of its groups, we excluded data for Homo from this stage of the analysis. In any case, two of our five brain indices do not include Homo; including Homo for some indices and not others could distort the results. Online Appendix S2 provides the individual RMA regressions for the five brain indices.
(b) Fractal patterns in species with multilevel social systems The cluster analysis described in Section II.2a identified four separate grades in the data. In the follow-up analysis, we asked whether the sizes of the grouping levels in those species that have multilevel social systems also fall on these grade lines (and hence have a fractal structure). For these purposes, we plotted the observed mean size of the different groupings for each taxon of interest against the grade regression lines determined by the analysis in Section II.2a, and asked whether these values fit the grade lines or are randomly distributed in the state space. In the light of the first analysis, we limited this analysis to the neocortex ratio data set.
To identify which grade an observed group fits best, we calculate the Bayesian posterior likelihood of the fit to each of the values predicted by the equations for the four grades, calculated as: where B k is the group size predicted by the equation for grade k, p(XjB k ) is the likelihood that the observed value, X, is the same as the predicted value in any given case (estimated as the probability corresponding to the conventional t-test between observed and predicted) with p prior = 0.25 on the default assumption that, as priors, all four options are equally likely. Because we are interested in the deviation on the yaxis, not the fit to the regression line, we use the standard error of the intercept for each grade as the likelihoods. A fit is significant when p posterior ≥ 0.95 two-tailed. We consider two sets of genera for this analysis: those known to have multilevel social systems (Theropithecus, Papio hamadryas, Gorilla and Pan) and those where one taxon has unusually large groups (Piliocolobus, Semnopithecus, Miopithecus, Erythrocebus and Chlorocebus) compared to its allied taxa (Colobus, Trachypithecus and Presbytis, and Cercopithecus, respectively). Because Theropithecus is of particular interest as one of the few Old World monkeys that has a multilevel social system, we estimated its neocortex ratio from its cranial volume (ECV), using the regression equations given by (Aiello & Dunbar, 1993). There are no Asian colobines included in any brain databases, and we estimated their neocortex indices in the same way (see online Appendix S5).

(c) Testing for cognitive correlates
We test the hypothesis that the grades differ progressively on various cognitive and behavioural indices. For this, we use Kendall's τ as this has been specifically recommended for use with categorical variables when, as in the present case, these have an underlying continuum (Maxwell, 1961).

(d) Hypothesis testing
Since most of our hypotheses are explicitly directional (i.e. a significant result in the opposite direction is as much evidence against the hypothesis being tested as a non-significant result), all statistical tests are one-tailed unless otherwise stated.

III. RESULTS
(1) Evidence for grades To determine the optimal number of grades for each brain index, we ran k-means cluster analyses with 2 ≤ k ≤ 8 on each data set. To identify the optimal number of clusters, we use the three conventional tests listed in Section II.2a. Figure 2A plots the goodness of fit (indexed as the r 2 goodness-of-fit to the LSR regression, averaged across all the grades, excluding any grades where n k = 1) against number of clusters. For cluster k = 1, the values are the r 2 values for the standard LSR regression. Figure 2A exhibits a clear asymptotic pattern, with an asymptote at 0.967 (the horizontal line). Fit is maximised at k = 5 clusters, but in fact there is little improvement in fit after k = 4 clusters. The classic 'broken stick' algorithm suggests that there is a critical change of slope at k = 4, identifying this as the optimal number of grades. Figure 2B plots the mean silhouette value, averaged across the five brain indices, against number of clusters. This reaches an obvious asymptote of 0.65 at k = 4 clusters, with no further gain beyond that. A silhouette value >0.4 is considered acceptable as it indicates good separation between data points assigned to neighbouring clusters. Figure 2C plots the percentage of small clusters (those with fewer than 10% of the data points for the sample) against number of clusters for the five brain indices. Small clusters are considered undesirable. For k < 4, the number of small clusters is effectively zero, but their frequency rises rapidly and linearly once k > 4. This suggests that having more than four clusters is not ideal. In addition, we also examined the impact of increasing the number of clusters from k to k + 1 on the way species are allocated to grades. Adding an extra cluster to the four-grade solution results in one existing cluster being partitioned into two without changing the species allocated to the other clusters. We show the five-grade solution for two of the brain indices in online Appendix S3. In sum, all four tests identify k = 4 as the optimal solution for each of the five data sets. Table 1 provides the RMA regression equations for each of the five brain indices. Figure 3 plots individual species' mean group size from Appendix S4 against each of the five brain indices based on a k = 4 cluster solution, with the regression lines for the four grades indicated in each case. To determine whether there is a consensus in the way the brain indices allocate species to different grades, we ran pairwise correlations between results for the five indices, using Kendall's τ (see Section II.2c). All pairwise correlations were significant (Table 2: 0.014 ≤ P < 0.001), indicating that the five brain indices are in broad agreement.
In order to arrive at a consensus grade allocation, we averaged the grade allocations for the five brain indices, first Fig 2. (A) Mean (±1SE) goodness-of-fit (indexed as mean r 2 ) for log 10 group size regressed on log 10 brain index, averaged over the k clusters, for each brain index. The value for k = 1 is that for the least-squares regression (LSR) equation. The horizontal dashed line is the asymptotic value at 0.967. (B) Mean (±1 SE) silhouette as a function of number of clusters, averaged across the five brain indices. (C) Percentage of small clusters at different cluster numbers for the five brain indices. A small cluster is defined as one with <10% of the data points for the brain sample. endocranial volume (ECV); = Navarette neocortex volume; non-V1 neocortex; frontal lobe grey matter (FRG); neocortex ratio. within species and then within genera, rounding means down within the range 0-0.50 and rounding up from 0.51. Table 3 lists the consensus allocations by genera. We split Papio hamadryas and P. papio from the other Papio species because, having significantly larger groups (Dunbar, Mac Carron & Robertson, 2018a), these two species are consistently assigned to grade II whereas the other three Papio species are consistently assigned to grade III. P. hamadryas has a very different social system to the other Papio baboons, and this may also be true of the related but less well-studied P. papio (Dunbar & Mac Carron, 2019). All grades except grade IV include a mix of strepsirrhines and New and Old World monkeys. Although grade IV consists only of apes, not all the apes are assigned to grade IV. Figure 4 plots, separately for each grade, the distribution of species mean group sizes as a function of the species' social style. We distinguish nocturnal prosimians (all of whom forage solitarily but sleep socially), species with monogamous pair-bonds, species with stable (bonded) social groups that maintain spatial coherence, and species with unstable social groups (where groups regularly fragment during foraging or form multi-group herds, such that foraging groups have high membership turnover). Excluding Homo because of its unusually large group size, group sizes differ significantly across grades and type of social system (overall ANOVA model: F 6,128 = 8.28, P < 0.0001 two-tailed; grade, F 3,128 = 2.60, P = 0.055; social system, F 3,128 = 11.95, P < 0.0001). Notice that the unstable groups of grades I and II are significantly larger than those of grades III and IV, suggesting that these groups may be under more socio-demographic stress (sensu Dunbar, 1992b). Table 4 suggests that the proportion of species with bonded social groups (pair-bonds and stable social groups) increases with grade (χ 2 = 30.0, df = 3, P < 0.0001).
(2) Fractal structure of multilevel social systems Figure 5 provides, for the neocortex ratio data set, a plot of the group sizes for taxa known to have multilevel social systems (humans, gelada and hamadryas baboons, chimpanzees and the gorilla), overlaid with the LSR lines for the grades shown in Fig. 3D (see Table 5). Also included in Fig. 5 are the mean group sizes for those genera which, according to Table 3, are allocated to a lower grade (larger groups than expected for brain size) than their closely allied genera. These include langurs (Semnopithecus), red colobus (Piliocolobus) and several African guenon genera (Miopithecus, Erythrocebus and Chlorocebus), with corresponding mean group sizes for their respective allied genera (Trachypithecus plus Presbytis, Colobus and Cercopithecus). Also plotted is the mean size of multispecies foraging parties in Cercopithecus guenons (see Korstjens et al., 2018). Figure 6 plots the means and variances for the species group sizes in this second set of genera.
To determine how closely the group sizes shown in Fig. 5 fit specific grade lines, we calculated the Bayesian posterior likelihood that each observed group size equates to that predicted by the neocortex ratio regression equation for each grade. (Note that we use the LSR equations here because we require variance estimates for this analysis and these cannot be calculated for RMA regressions; however, since r 2 values are very high, the differences are likely to be marginal.) For each species with a multilevel social system (Homo, Pan, Gorilla, Theropithecus and Papio hamadryas), the individual values for the separate grouping levels clearly align with the successive grades rather than being distributed at random across the vertical state space, with a clear preference for one grade in each case (Table 6). Similarly, for those Old World monkey clades where one genus lives in small compact groups and another lives in larger, more diffuse groups (Cercopithecus versus Chlorocebus/Erythrocebus/Miopithecus; Colobus versus Piliocolobus; and Trachypithecus/Presbytis versus Semnopithecus), the respective values plot on adjacent grades. In some habitats, Cercopithecus species form unstable multi-species foraging groups, mainly in response to predation risk (Gautier-Hion, 1988;Cords, 1990;Korstjens et al., 2018). We plot the mean size of these groupings across a large sample of study populations from Korstjens et al. (2018). Note that this data point lies on the same grade (i.e. allowing for brain size differences, is numerically equivalent to) the typical group sizes of the sister genera (Chlorocebus, Erythrocebus and Miopithecus) that habitually live in large groups.
Excluding the four data points from the lowermost grade (dashed line, V) in Fig. 5, the observed group sizes are a significant fit to a specific grade (and one grade only) in 14/21 cases, with non-significant but clear tendencies (p posterior ≥ 0.8) in five other cases (Table 6). In sum, for 19/21 cases the observed groups are unequivocally (if not always significantly) associated with a specific grade, with only two cases (Homo communities and Piliocolobus) where grade assignment is ambiguous between adjacent grades (although even in these cases, there is a clear preference for one grade). The four data points for the smallest groupings in hamadryas baboons (harems), gelada (grooming cliques), chimpanzees (grooming-based neighbourhoods) and humans (bands, mean 50 individuals) appear to fit on a further grade (dashed line in Fig. 5; equation [V] in Table 5; F 1,2 = 97.4, P = 0.032) that has a similar slope (t = 1.71, df = 3, P = 0.093) to grade IV, but a significantly lower intercept (t = −3.63, df = 3, P = 0.018). We calculated posterior likelihoods for these four data points across the five grades using the regression equations given in Table 5. In all four cases, the fits to grades I-IV are significantly different (Table 6: p posterior ≤ 0.018), in contrast, the fits to grade V (p posterior ≥ 0.982) are all individually significant (as might be expected if the linear regression is a very close statistical fit).
Taken together, these results suggest that, for species that evolve larger groups in response to ecological pressures, then only certain group sizes are stable irrespective of whether these groups are bonded or unbonded.  Since we test only whether there is a positive correlation between two variables, all P-values are one-tailed. Grade allocations made using each of the brain indices are provided in Appendix S4. Taxa in bold are highlighted because they lie on a lower grade than their close taxonomic allies (in most cases, they were formerly considered to belong to the same genus as these allies). See text for details. *Genera with stable monogamous pair-bonds; (*) circumstantial evidence for monogamy. † There are no brain data for these callitrichid genera, but on behavioural grounds we assume they are likely to be similar to the other callitrichids (they were formerly assigned to the genus Callithrix).
§ Genera with unstable social groups (i.e. groups that fragment during foraging or are unstable over time). ¶ Nocturnal prosimians in which individuals sleep together but forage alone. ‡ Allocations based only on endocranial volume (ECV). (

3) Grade differences in cognition
Comparative analyses have suggested that the evolution of large brains is associated with parallel increases in cognitive abilities. We are interested in whether this evolutionary trajectory in fact reflects the grade structure in the social brain relationship. To test this, we plot experimental data on five cognitive indices as a function of grade in Fig. 7. Performance on a suite of executive function cognitive tasks increases significantly across grades (  (Table 7, P ≤ 0.015), with both predictors individually significant or close to significant. In addition, performance on the mentalising task also increases across grades, although, due to the small sample size, the correlation does not quite reach statistical significance (Fig. 7E: τ = 0.476, N = 7, P = 0.077).

IV. DISCUSSION
We have shown that, for primates at least, the Social Brain relationship consists of a series of distinct socio-cognitive grades rather than the single linear relationship usually assumed, and that both bondedness and cognitive abilities (especially those that play a central role in the temporal coherence of social groups) increase stepwise across these grades. The latter finding implies that the grades differ in the neurological and behavioural mechanisms of social bonding that underpin group stability. It seems that, even for species that live in multilevel societies, primates cannot choose their group sizes according to the momentary demands of circumstance. This contrasts strikingly with most flockand herd-forming birds and mammals where individuals can join and leave more or less at will (Krause & Ruxton, 2002). Even though the cognitive demands of the higher grouping levels (i.e. those on a lower grade) may be much lower than those of bonded social groups, nonetheless significant cognitive and social skills are still required to support even these grouping levels compared to those required for casual herds where social relationships are of the moment and have no sustained temporal continuity.
In addition, for those species that live in multilevel societies, the sizes of the constituent levels align closely with the stacked grades. The mean scaling ratio between the intercepts for successive grades in Fig. 5 is 2.8, very close to the  Table 3. Symbols: nocturnal solitary foragers; unstable social groups; monogamous pairs; stable social groups. M = Miopithecus; S = Saimiri. Homo is not included due to it very large group size.   Fig. 3D, with datapoints. added for taxa that have multilevel societies (see Appendix S5). Listed from high to low: Homo (tribes, mega-bands, communities, bands), Pan (mega-communities, typical communities, neighbourhoods), Papio hamadryas (bands, clans, harems), Gorilla (community, group), Theropithecus (bands, clans, harems, cliques). Aso shown are mean species group sizes for Piliocolobus ( ) versus Colobus ( ), Chlorocebus and Erythrocebus (L to R, ) and Miopithecus ( ) versus Cercopithecus ( top to bottom: multispecies community, average group size) and Semnopithecus ( ) versus Trachypithecus and Presbytis combined ( ). The equations for the LSR lines shown for each of the four grades (labelled I-IV) are given in Table 5. The lowest (dashed) regression line (V) is the LSR set through the four data points (F 1,2 = 119.7, P = 0.008), with the equation also given in Table 5 value of 3 obtained by Hill et al. (2008). These constraints may also apply to other mammalian taxa that live in multilevel social systems. Both orcas and elephants, for example, have multilevel societies whose layers have a scaling ratio of 3 (Hill et al., 2008). Since these species also have female coalitions, it may be that the need to evolve coalitions creates this structural regularity. As with primates, most of these genera are notable for having large brains by comparison with other mammals (Shultz & Dunbar, 2010a).
The most likely explanation for the fractal pattern is that larger groups are created by deferring group fission and holding daughter subgroups together through weak ties. This certainly applies to chimpanzees because the unusually large groups found at some sites become increasingly substructured into distinct sub-networks for some time prior to community fission (Feldblum et al., 2018;Watts 2019). An analogous phenomenon has been noted in baboons (Papio spp.), where populations facultatively switch between fractally related small and large group sizes as a function of local predation risk (Dunbar et al., 2018a;Dunbar & Mac Carron, 2019). Very large groups outside the normal size range for a genus start to lose coherence, with increasing risk of fragmentation, as has been documented in some Papio ursinus (Anderson, 1981) and many Papio papio populations (Dunbar & Nathan, 1972;Sharman, 1982;Patzelt et al., 2014), and very likely the two Mandrillus species (Gartlan, 1970;Jouventin, 1975;Abernethy et al., 2002;Brockmeyer et al., 2015). In extreme cases, this can result in a phase transition that leads to the creation of a more formal structure that involves the coalescence of reproductive females around specific males to create stable harems or single-male groups such as those found in Papio hamadryas (Kummer, 1968) and Theropithecus gelada (Dunbar, 1984), and possibly Gorilla (Harcourt & Greenberg, 2001;Harcourt & Stewart, 2008).
It seems that only a small number of large-brained genera have evolved the cognitive and behavioural flexibility to form stable multilevel societies (Fig. 5). They achieve this by allowing groups that have undergone fission to retain links between the new subgroups, giving rise to a fractal structuring. The sizes of these supergroups are defined by harmonics of the size of the basal social unit, with the species having no flexibility over this. Each higher-level grouping is, however, progressively less well bonded, so that subgroups do not intermingle and fragmentation occurs more easily. It seems likely that these larger groupings are based on low-key bonding processes similar to those that characterise species from the lower grades, most of which are less intensely social. One reason why these species may be larger brained than the average primate is that they need to be able to handle simultaneously relationships of very different type (in effect, strong and weak ties). In effect, the Social Brain Hypothesis is really about group structure rather than size per se.
The fact that large unstable groups occur mainly in grades I and II perhaps suggests that, during their evolution, primates initially counteracted external threats (such as increased predation risk or conflict with neighbouring groups) by opting for larger groups at the expense of coherence, much as herding mammals do. The progressive convergence on bonded groups in the higher grades suggests that bonded stability was ultimately a more effective way of mitigating these threats in very high-risk habitats (Dunbar & Mac Carron, 2019). This second strategy is, however, only possible if sufficient resources are put into a significantly larger brain to enable the evolution of novel cognitive mechanisms (and behavioural competencies) that allow groups to be held  together in the face of intense pressures to fragment during foraging. Predation risk is the principal factor promoting the evolution of large groups in mammals in general, and primates in particular (van Schaik, 1983;Shultz et al., 2004;McGraw & Zuberbühler 2008;Shultz & Finlayson 2010;Dunbar et al., 2018a;Dunbar & Mac Carron, 2019). Multilevel societies do seem to characterise genera that are more terrestrial and occupy habitats that are less forested, both of which are associated with increasing predation risk. Smaller brained species can increase group size up to a point (as in the cases of Chlorocebus, Miopithecus, Erythrocebus, Piliocolobus and Semnopithecus) or by forming temporary multispecies herds [the 'polyspecific associations' of Cercopithecus (Gautier-Hion, 1988;Bshary & Noë, 1997)], but they seem to do so at the expense of significant costs in terms of social coherence. Miopithecus groups, for example, seemingly form loose herds of up to 65 during daytime foraging as a result of the temporary fusion of smaller groupings (Gautier-Hion, 1973), making their structure difficult for fieldworkers to discern. Saimiri is also often described as having a chaotic social system, and so it may be no accident that, like Miopithecus, it is assigned to grade I. One prediction arising from this is that genera like Chlorocebus, Erythrocebus, Miopithecus and Piliocolobus that live in larger groups than their close taxonomic relatives will have groups that are more highly substructured, as well as being less stable and more prone to fragmentation.
It is notable that, with the exception of Cebuella (assigned to grade III), all callitrichids are allocated to grade I along with the semi-solitary nocturnal prosimians. Although the callitrichids are often described as pair-bonded, their groups are unusually fluid by anthropoid standards, especially as their size increases (Soini, 1982;Dunbar, 1995a,b;Goldizen et al., 1996). In this respect, the calltrichids contrast with the strepsirrhines and cebids that have stable pair-bonds and are consistently allocated to grades II or III, respectively. This suggests that, even within primates, the formation of stable, lifelong pair-bonds may be cognitively demanding, as seems to be the case in birds (Shultz & Dunbar, 2010b;Fedorova, Evans & Byrne, 2017) and, among mammals, in the ungulates and carnivores , 2010aPérez-Barbería et al., 2007b). The grade I allocation for callitrichids may reflect the fact that they are unique among anthropoid primates in lacking Brodman Area 10 (Passingham & Wise, 2012), implying that they lack crucial cognitive skills needed to hold even small groups together. It is significant that, although Cebuella has not been well studied in the field, reports suggest that it differs behaviourally from other callitrichids in a number of important ways, such as having a much stronger, mutually defended pair-bond (Soini, 1982;Ferrari & Ferrari, 1989;Garber, 1994). The contrast between the callitrichids and the other anthropoid primates provides direct support for the difference between organisational complexity and relationship complexity noted for mammals more generally by Lukas & Clutton-Brock (2018). These authors suggested that kinbased groups with reproductive suppression (a unique characteristic of the callitrichids among the primates) effectively enforce behavioural rules so that there is little need for negotiation in the management of relationships (a defining feature of species whose sociality is based on relational complexity, or social bonding). Since relationship complexity seems to be especially cognitively demanding (Lewis et al., 2017), it is possible only in species characterised by disproportionately large brains for their group size. This contrast may, thus, be a defining distinction between small-and large-brained mammal genera. The study of mammalian behaviour may benefit from rather closer attention to these structural and cognitive aspects of sociality.

V. CONCLUSIONS
(1) We show that the conventional relationship between group size and brain size in primates (the social brain hypothesis) consists of a series of distinct sociocognitive grades rather than the single linear relationship usually assumed. This pattern is consistent across different data sets and brain indices. (2) The grades cut across conventional taxonomic divisions: most contain taxa from all the major primate families.
(3) The grades form a fractal series with a scaling ratio of 3. (4) For species/genera that live in multilevel social systems, group sizes at successive social levels tend to fall on the grade lines rather than between them, suggesting that the grades represent group sizes that are especially stable. (5) Grades vary systematically in both the proportion of bonded social groups and social cognitive abilities (especially those with a central role in the temporal coherence of social groups), suggesting that the grades reflect phase transitions in the cognitive and behavioural mechanisms required to maintain a group's temporal stability. (6) The conventional assumption that the social brain relationship is a single unitary phenomenon may radically underestimate the slope of the true relationship, and so yield misleading results.

VI. ACKNOWLEDGMENTS
S. S. is funded by a Royal Society University Research Fellowship (UF160725).