Chimpanzee pant‐hoots encode individual information more reliably than group differences

Abstract Vocal learning, the ability to modify the acoustic structure of vocalizations based on social experience, is a fundamental feature of speech in humans (Homo sapiens). While vocal learning is common in taxa such as songbirds and whales, the vocal learning capacities of nonhuman primates appear more limited. Intriguingly, evidence for vocal learning has been reported in chimpanzees (Pan troglodytes), for example, in the form of regional variation (“dialects”) in the “pant‐hoot” calls. This suggests that some capacity for vocal learning may be an ancient feature of the Pan‐Homo clade. Nonetheless, reported differences have been subtle, with intercommunity variation representing only a small portion of the total acoustic variation. To gain further insights into the extent of regional variation in chimpanzee vocalizations, we performed an analysis of pant‐hoots from chimpanzees in the neighboring Kasekela and Mitumba communities at Gombe National Park, Tanzania, and the geographically distant Kanyawara community at Kibale National Park, Uganda. We did not find any statistically significant differences between the neighboring communities at Gombe or among geographically distant communities. Furthermore, we found differences among individuals in all communities. Hence, the variation in chimpanzee pant‐hoots reflected individual differences, rather than group differences. Thus, we did not find evidence of dialects in this population, suggesting that extensive vocal learning emerged only after the lineages of Homo and Pan diverged.

aspects, such as the ability to modify and learn new vocalizations through imitation (Fitch, 2010). Regardless of the particular definition used, it is clear that vocal learning has evolved independently multiple times in animals (Vernes et al., 2021). For example, songbirds (Passeriformes) (Cunningham & Baker, 1983) and humpback whales (Megaptera novaeangliae) (Garland et al., 2011) learn elaborate songs.
In comparison to birds and whales, the vocal learning capacities of nonhuman primates appear much more limited . Evidence for active learning of new vocalizations by nonhuman primates remains modest (Tyack, 2020). Recent studies indicate that orangutans (Pongo spp.) can acquire a voiceless vocalization (whistle) in captivity (Wich et al., 2009), produce novel voiced vocalizations in controlled settings (e.g., using a membranophone [Lameira & Shumaker, 2019]), and exhibit differences in alarm call variants at different population densities; population density being a measure for sociality (Lameira et al., 2022). Some nonhuman primates have been reported to engage in vocal learning through modifying the acoustic structure of vocalizations based on auditory feedback and imitation. Takahashi et al. (2015) found that in common marmosets (Callithrix jacchus), parental feedback influences the rate of vocal development (Takahashi et al., 2015). Marmosets (Callithrix spp.) exhibit dialects in the form of geographical variation in their vocalizations in the wild (de la Torre & Snowdon, 2009) as well as population specific acoustic structure across call types in captivity (Zürcher & Burkart, 2017). Sugiura (1998) reported that Japanese macaques (Macaca fuscata) match some of the acoustic features of recorded "coo" calls during a playback experiment.  reported vocal convergence in the grunts of male Guinea baboons (Papio papio) as individuals that interacted more frequently with one another exhibited greater resemblance than the grunts of males that interacted less frequently.
Much of the literature on vocal learning in animals focuses on dialects, defined as regional variation in vocal production (Janik & Slater, 1997;Nowicki & Searcy, 2014). Such regional variation in vocal production could arise due to genetic differences among geographically distant communities, but among geographically adjacent communities, learning would seem to be a more likely mechanism (Filatova et al., 2012). When such variation is learned, it may signal membership in the local population (as in songbirds [Cunningham & Baker, 1983]), or membership in a particular social group, as in orcas (Orcinus orca) (Filatova et al., 2012). Studies of social birds and mammals have found that learned signals of group membership can benefit individual signalers in two main ways: (i) by eliciting affiliative interactions from group members and mates and/ or (ii) by advertising group membership to rivals during agonistic interactions, such as during territory defense. For example, in birds, group-specific calls appear to (i) help maintain social bonds among group members, as in budgerigars (Melopsittacus undulatus) (Farabaugh et al., 1994;Hile & Striedter, 2000); (ii) facilitate territory defense by helping individuals identify flock members and focus aggression on foreign callers, as in black-capped chickadees (Parus atricadpillus) (Nowicki, 1983). Researchers have inferred similar functions in social mammals. For example, several species of toothed whales (Odontocetes) appear to use vocal dialects to facilitate spatial group cohesion and maintain social relationships (Janik, 2014;Tyack & Sayigh, 1997). Spatial cohesion in group-living species facilitates maintaining social bonds, finding mates, and defending territories (Janik & Slater, 1998).
Although vocal data from all great ape species can provide useful information for understanding the evolution of language (Lameira & Call, 2020), historically, researchers interested in the origins of human language have particularly focused on the vocal behavior of chimpanzees (Pan troglodytes), as they are one of the two living species most closely related species to humans (Fedurek & Slocombe, 2011). The other closest living relative of humans, bonobos (Pan paniscus), remain relatively understudied (Gruber & Clay, 2016;de Waal & Lanting, 1998). Several studies from the field (Arcadi, 1996;Crockford et al., 2004;Mitani et al., 1992) and captivity (Marshall et al., 1999) have found evidence for regional variation (dialects) in chimpanzee "pant-hoot" calls, which has been proposed to result from vocal learning (Crockford et al., 2004;Marshall et al., 1999).
Pant-hoots of males that spend more time together are more similar, and the acoustic features of their calls converge when chorusing together (Mitani & Brandt, 1994), suggesting a possible mechanism for the convergence of acoustic properties within groups Mitani & Gros-Louis, 1998). Call convergence has also been reported for chimpanzee rough-grunt calls in captivity (Watson et al., 2015a) (but see [Fischer et al., 2015] and [Watson et al., 2015b]). Chimpanzees live in groups with fission-fusion dynamics, in which individuals travel in subgroups (known as "parties") of varying size, and they communicate over long distances using vocalizations, often in noisy environments (Aureli et al., 2008;Eckhardt et al., 2015;Goodall, 1986;Marler & Tenaza, 1977). Thus, vocal dialects potentially facilitate spatial group cohesion, and territorial defense during intergroup encounters.
Chimpanzee pant-hoots are structurally complex loud calls with a relatively consistent temporal patterning (Fedurek et al., 2016;Marler & Hobbett, 1975). The typical pattern consists of a sequence of four kinds of sound elements over a duration range of 2−20 s. Each sequence of similar elements is called a phase and so the pant-hoots typically have four phases (see Section 2 for details). Of the four phases, the climax phase is the loudest, and can be heard most clearly over long distances. However, pant-hoots exhibit considerable acoustic variation within and among individuals (Fedurek et al., 2016;Kojima et al., 2003;Marler & Hobbett, 1975;Mitani et al., 1996). The variation is not only limited to frequency properties of elements such as fundamental frequency, peak frequency, and so forth, but also involves variation in the number and presence/absence of different elements and phases (Supporting Information: Figures S3 (a−m)) (ibid.).
Chimpanzees use pant-hoots in a variety of intracommunity and intercommunity contexts. In intracommunity contexts, chimpanzees use pant-hoot calls to communicate with members of their own community over long distances (Goodall, 1986). Pant-hoots may function to communicate the caller's location to allies and associates within their own community (Goodall, 1986;Mitani & Brandt, 1994;Mitani & Nishida, 1993). Further, pant-hoots play a role in facilitating social bonds as affiliative partners chorus more together (Fedurek, Machanda, et al., 2013) and play a role in regulating grouping dynamics by attracting allies and potential mates to the caller's location (Fedurek et al., 2014;Mitani & Nishida, 1993;Wrangham, 1977). In intercommunity contexts, interactions often involve hearing-and sometimes responding to-pant-hoots from callers that are hundreds of meters away, far out of view (Wilson et al., 2012). The long-distance nature of pant-hoots allows chimpanzees to use pant-hoots to advertise territory ownership (Wilson et al., 2007), and to signal numerical strength to members of neighboring communities during agonistic intergroup encounters (Herbinger et al., 2009;Wilson et al., 2001Wilson et al., , 2012. Individual callers might thus benefit from encoding community-specific cues. Playback experiments have demonstrated that chimpanzees can distinguish stranger pant-hoots from those of familiar individuals (Herbinger et al., 2009) and that they are sensitive to numerical strength during intergroup encounters, being more likely to respond to simulated intruders when they are in parties with more males (Wilson et al., 2001). Hence, community-specific dialects could play a role in cooperative defense by signaling community membership. While genetic similarity could lead to community-specific vocalizations, socially learned signals of group membership might be useful in cases where not all group members are close genetic kin.
Despite these reasons for thinking that vocal dialects would benefit chimpanzees, current evidence raises several questions about the extent to which chimpanzees have socially learned signals of group membership. In the first study of chimpanzee dialects, Mitani et al. (1992) reported differences between Gombe and Mahale panthoots and suggested that they may be an outcome of vocal learning.
However, the differences among the communities were subtle compared to differences observed in songbirds (Cunningham & Baker, 1983) or whales (Garland et al., 2011). Mitani et al. (1992) found geographical differences in the composition of the build-up phase, and in frequency properties of the climax phase. Mitani and Brandt (1994) later found that in a principal components analysis (PCA) of acoustic structure, community membership accounted for only 0%−11% of the variance on the principal components, compared to within-individual factors (48%−79% of the variance) and between individual factors (17%−52% of the variance). Mitani further reassessed his findings, pointing out that since Gombe and Mahale are far from one another (~160 km) and likely genetically isolated, the acoustic differences may not necessarily represent vocal learning, but instead could represent genetic differences and/or body size (Mitani et al., 1999). Additionally, other environmental factors like habitat acoustics and/or sound environment might be more important in explaining the variation in such geographically distant communities.
Some studies reported an association of some properties of the letdown phase of the pant-hoots with the context (Clark & Wrangham, 1993;Fedurek et al., 2016;Notman & Rendall, 2005). Notman and Rendall (2005) and Uhlenbroek (1996) reported an association of the tonal structure of the climax scream element of the pant-hoots with the context of the production. While this variation may provide information about context to receivers, Notman and Rendall (2005) argued that these differences are unlikely to be an outcome of vocal learning and are more likely to reflect arousal states of chimpanzees when calling. In any case, the context of the call production is a covariate that may need to be controlled for when testing for group differences (refer to the methods and the directed acyclic graph in Figure 2). Finally, as several studies have noted previously, panthoots are individually distinctive (Fedurek et al., 2016;Kojima et al., 2003;Marler & Hobbett, 1975;Mitani et al., 1996). Signaling individual identity, rather than group membership, might therefore be the primary function of these calls.
To test the extent to which the acoustic structure of pant-hoots specifically signals community membership and arises out of vocal learning via auditory feedback, three questions need to be answered: (i) Do the calls contain features that reliably indicate community membership, allowing chimpanzees to distinguish extra-community pant-hoots based on those features alone, rather than through familiarity with the calls of particular individuals? (ii) Do chimpanzees from neighboring communities have more distinct pant-hoots than those from geographically distant communities? Greater differences among neighboring communities compared to geographically distant communities would indicate that chimpanzees are actively modifying the acoustic structure of pant-hoots to differentiate their calls from those of neighbors. (iii) Does community membership explain vocal similarity better than genetic relatedness? Crockford et al. (2004) addressed all three of these questions by comparing genotyped individuals in three neighboring communities and one more distant community in Taï National Park, Côte d'Ivoire. They found that neighboring communities differed from one another more than they differed from the distant community, despite neighboring communities inhabiting adjacent areas of similar continuous forest environment, which supports the view that chimpanzees learned to produce an acoustic structure distinct to their own community. These findings thus support the view that vocal learning accounts for the acoustic DESAI ET AL. | 3 of 23 differences among communities. However, due to the small number of available males in the communities, this study could only include three individuals per group, resulting in a small sample size. This raises the possibility that the findings are a statistical artifact resulting from small sample size. While it is well known that small sample sizes may led to false negatives, it is also the case that small sample size with noisy data can artificially exaggerate effect sizes and lead to false positives (Loken & Gelman, 2017). Hence, more studies are needed to replicate these findings to have more confidence in the results.
As a step toward re-evaluating the role of vocal learning in chimpanzee calls, we recorded pant-hoot calls from two neighboring chimpanzee communities in Gombe National Park, Tanzania and the geographically distant Kanyawara community of chimpanzees in Kibale National Park, Uganda. The objective of this study is to assess the extent to which variation in the acoustic structure of the panthoots can be explained by community membership. To that end, we test two hypotheses. Our first hypothesis is: the acoustic structure of pant-hoots contains features that reliably indicate community membership. In line with Crockford et al. (2004), if vocal learning shapes the acoustic structure of pant hoots into community-specific dialects, we would expect to find greater differences in the structure of calls in the two neighboring Gombe communities, compared to the geographically distant Kanyawara community. Our second hypothesis is: the acoustic structure of pant-hoots contains cues of individual identity more than community identity. While these are not mutually exclusive hypotheses (i.e., one or both or neither could be supported), they provide a framework for our research questions.

| Subjects and study sites
We studied chimpanzees at two study sites: Gombe National Park, Tanzania and Kibale National Park, Uganda. In Gombe, we studied two neighboring communities: Kasekela and Mitumba. In Kibale, we studied the chimpanzees of the Kanyawara community. Gombe is located in western Tanzania, along the shore of Lake Tanganyika (4°40′S, 29°38′E). At the time of the study, Gombe had three contiguous communities of chimpanzees, two of which (Kasekela and Mitumba) were well habituated and were followed nearly every day, throughout the day, as part of the long-term research at Gombe.
For this study, we included male chimpanzees of ages ≥14 year.
By age 14, chimpanzees in our study communities are socially and sexually mature, and critically, previous research has shown that relevant milestones for mature pant-hoot production have been reached by this age. By age 14, male chimpanzees at Gombe exhibit a marked increase in their rate of pant-hoot production, (Fig. 14, [Pusey, 1990]) and their body weight approximates that of older adult males (Fig. 8, [Pusey et al., 2005] conducted focal follows of individual males with the goal of recording as many calls as possible from the focal male, throughout the day. In addition to recording calls from the focal target, they also opportunistically recorded as many other calls as possible from known individuals to obtain the maximum number of calls. For each recording, they noted additional information including caller behavior, context, location, and party composition. Here, the recordings were obtained when the caller was traveling, feeding (or arriving at a feeding site), displaying, and resting (not traveling, feeding, or displaying) contexts. If pant-hoots provide any information about food, an individual could produce them when they see food and also when consuming food. Hence, a pant-hoot given when arriving at a patch with visible food was considered feeding context. Furthermore, in situations where multiple contexts overlapped, we included the highest priority context based on the following hierarchy: travel > feed > display > rest. To ensure sufficient sample sizes and consistency with recordings from Kanyawara, we limited analysis for context differences, and those where context was relevant, to calls recorded in traveling and feeding contexts and only included individuals with at least three calls recorded in both contexts. While the field assistants recorded all call-types from both males and females, here we focus on pant-hoots from males, because (1) pant-hoots have been the focus of previous dialect studies; (2) they can be heard from far away, making them plausible signals of community membership, and (3) males produce pant-hoots more often than females (Wilson et al., 2007). We reviewed these recordings and found that N = 723 (N = 481 from Kasekela and N = 242 from Mitumba) were of sufficiently high quality for acoustic analyses. These recordings consisted of a variety of calls including pant-hoots, pant-grunts, rough-grunts, waa-barks, and screams. Of the pant-hoots in these recordings, some were choruses (where multiple individuals pant-hoot together), and not all were from identified individuals. Choruses that had overlapping elements from multiple callers were excluded, as such overlap makes it harder to extract meaningful acoustic features from known individual callers.
Further, to optimize both the number of recordings per individual and the total number of individuals included in the analyses, we excluded individuals that had fewer than 8 pant-hoot call recordings. Based on this criterion, we excluded two individuals from the Kasekela community: Ferdinand (FE) and Gimli (GIM). While high-ranking males usually call most frequently (Wilson et al., 2007), the highestranking male at the start of our study, FE, was overthrown in October 2016, after which we were unable to record any more pant-hoots from him. In Mitumba, in July 2017, the alpha male Edgar (EDG) killed one of the adult males Fansi (FAN) (Massaro et al., 2021). Before this, we were able to record enough calls from FAN for some analyses.
These selection criteria yielded a total of 214 pant-hoots (N = 128 from Kasekela and N = 86 from Mitumba) from 11 individuals (N = 6 males from Kasekela and N = 5 males from Mitumba) for acoustic analysis (Table 1).
At Kanyawara, P. F. recorded chimpanzee calls using a Sennheiser in two contexts: in which the caller was either traveling or feeding (or arriving at a feeding site with visible food). Using the same selection criteria as Gombe, we obtained 111 calls from 7 Kanyawara males for acoustic analysis (Table 1).

| Potential sampling biases
We evaluate the sources of bias using the STRANGE framework (Webster & Rutz, 2020). STRANGE stands for Social background; Trappability and self-selection; Rearing history; Acclimation and habituation; Natural changes in responsiveness; Genetic make-up; and Experience. In terms of social background and self-selection, previous studies indicate that high-ranking males call more frequently (Wilson et al., 2007), so they are more likely to be sampled (Table 1).
We attempted to avoid overcontribution from any particular individual in our statistical analyses by performing multiple permutations on balanced and randomized subsets of the data (see Section 2.6), but some bias toward individuals that call more frequently might have been introduced due to needing a minimum number of recordings from each individual (see Section 2.2).
Furthermore, the chimpanzee community sizes included in this study (Kasekela~50 individuals, Mitumba~30 individuals (Wilson et al., 2020), and Kanyawara~54 individuals) are close to the median community size of 39.2 individuals observed in long-term studies of wild chimpanzees (Wilson et al., 2014). In terms of rearing history, acclimation, and habituation, the chimpanzees at both Gombe and Kanyawara are wild, but were well habituated to observers at the time of recording. Additionally, since our studies were strictly observational, we did not subject chimpanzees to any invasive testing, thus mitigating any potential biases from acclimation, habituation, and experience. Natural changes in responsiveness due to seasons or timing could be sources of bias as chimpanzees produce pant-hoots more frequently in the mornings (Wilson et al., 2007) and pant-hoot production may vary with season depending on fruit availability (personal observation). While we followed the chimpanzees throughout the day and in all seasons, the sample is likely to contain more recordings from the mornings and from the wet season.
Lastly, 3 out of 6 individuals at Kasekela were close kin (two brothers: F. U. and F. N. D. and their father: S. L.; Table 1) and none of the other individuals at any of the communities included were known to be close kin. If calls of genetically related chimpanzees are more similar, then calls of Kasekela individuals might appear different from other communities due to genetic similarity (Walker et al., in revision).

| The pant-hoot call
The pant-hoot is a complex call composed of multiple elements.
Researchers typically divide pant-hoots into four phases, each of which consists of one or more acoustically similar elements: (i) the introduction-inhaled and exhaled tonal elements (fundamental frequency F0: 300−600 Hz), (ii) the build-up-shorter but more frequent exhaled tonal elements and noisy inhaled elements (F0: 200−500 Hz), (iii) the climax-loud tonal screams (F0: 800−2000 Hz) but often including other elements such as hoos and barks, (iv) the let-down-short, build-up-like exhaled elements, decreasing in F0 ( Figure 1, Sound S1) (Crockford et al., 2004;Mitani et al., 1992Mitani et al., , 1999). Chimpanzees do not always produce all four of these phases when giving pant-hoot calls. Sometimes during the call, chimpanzees hit tree buttresses with their feet (and rarely with their hands), producing drum-like sounds (Arcadi & Wallauer, 2013).
Distinguishing these pant-hoot phases can be difficult as the elements vary substantially in their acoustic structure within each phase. To address this ambiguity and to distinguish systematically among these phases, we proceeded as follows. We identified the exhaled elements in all phases as the elements that reached relatively higher maximum frequencies compared to elements preceding and succeeding them. To distinguish between the introduction and the build-up phase, we defined the start of the build-up as the first exhaled element of markedly shorter duration compared to the previous elements. The build-up consisted of a series of elements with a similarly short duration. Next, to distinguish between the build-up and the climax, we defined the start of the climax as the first exhaled element with a fundamental frequency greater than 500 Hz (see "500 Hz" rule Mitani et al., 1999). Next, to distinguish between the climax and the let-down, we defined the end of the climax as the last tonal scream element. In cases where the climax phase did not include screams, we marked the end of the climax as the first element of a reduced fundamental frequency. The let-down phase consisted of a series of these elements of a lower fundamental frequency.
Some studies have identified several different kinds of elements within the climax phase (Crockford et al., 2004). Here we categorized climax elements as either scream or non-scream elements that we could reliably differentiate.

| Acoustic feature extraction
Given the structure of the pant-hoots described above, acoustic  (Crockford et al., 2004;Mitani et al., 1992Mitani et al., , 1999 (Table 2). Spectral features quantify the frequency and tonal structure of individual elements from the power spectrum. We measured these from selected specific elements: one build-up element (24 features), and one climax element (25 features).
We extracted the acoustic features as follows. First, we measured structural features from the pant-hoot phases by visually inspecting spectrograms of entire pant-hoots using Praat version 6.1.15. We considered each phase separately and measured a set of acoustic features from each phase (Table 2). We present the visual summaries of these structural features of the pant-hoots from the three communities in the Supporting Information: (Figures S3 [a-m]).
Next, for the semiautomatic extraction of acoustic features, we chose one element from the build-up phase and one element from the climax phase. From the build-up phase, we chose the middle element in case of an odd number of build-up elements, and the element immediately preceding the middle of the build-up in case of an even number of elements (Mitani et al., 1999). From the climax phase, we chose the scream that reached the highest fundamental frequency in the spectrogram. To obtain appropriate frequency and time   (Mitani et al., 1999). Furthermore, genetics affects individual identity, and individual identity may affect both acoustic structure and community identity, because communities are defined as a group of individuals that live within the same territory. Lastly, context may affect acoustic structure (Fedurek et al., 2016;Notman & Rendall, 2005;Uhlenbroek, 1996). We used this DAG, to identify minimally sufficient adjustment sets for assessing the relationship of interest, that is, community identity and acoustic structure. A minimally sufficient adjustment set of variables is a list of variables that are sufficient to control for estimating, in an unbiased way, the statistical association of two variables in a DAG. The adjustmentSets function in the R-package dagitty prints a list of all minimally sufficient adjustment sets (Textor et al., 2016). This package identified the sets (individual identity, geographical location) and (environment, genetics, individual identity) as the minimally sufficient adjustment sets to assess the relationship between community identity and acoustic structure. Hence, we need to either control for individual identity and geographical location, or environment, genetics, and individual identity. Since we did not measure environmental variables or genetics, we could not control for those.
However, we could obtain an unbiased association between community identity and acoustic structure by controlling for individual identity and geographical location. We therefore controlled for geographical location by testing for differences between calls from neighboring communities and compared them with the geographically distant Kanyawara community. Next, we controlled for individual identity using the permuted discriminant functions analysis (pDFA) procedure. The pDFA procedure is used to test for differences in a factor of interest (a.k.a. test factor) while controlling for a confounding factor (a.k.a. control factor) (Mundry & Sommer, 2007). We needed to control for individual identity not only to close confounding "backdoor" pathways, but also to account for the nonindependence of data points due to there being multiple recordings from the same individual. We describe the pDFA procedure in more detail in the next section. Lastly, while context is not a confound opening any "backdoor" paths based on our DAG, context is a precision covariate that may affect the relationship of interest (community ID to acoustic structure) (Laubach et al., 2021).

| Analysis steps
We performed the pDFAs with 1000 permutations (i.e., 1000 randomized data sets including the original data set) in each of our analyses and used an alpha level of α= 0.05 on the cross-validated classification accuracy to infer a significant difference.  (Tables 2 and 3). We tested for differences in context, community identity, and individual identity in each of these four types of acoustic features. For each kind of acoustic feature set, we tested for context before performing other analyses to determine whether context was a precision covariate (Laubach et al., 2021) that needed to be statistically controlled for in the subsequent analyses (refer to the DAG logic in Section 2). If we found statistically significant differences in the context in any acoustic feature set, we controlled for the context by stratifying the data and considering calls from only one context at a time in separate analyses (Crockford et al., 2004). If we did not find any significant effect of context on an acoustic feature set, we did not need to control for context. To control for geographical differences, we performed two separate analyses for each kind of acoustic feature set. Following Crockford et al. (2004), we first investigated the acoustic structure of pant hoots from the two neighboring communities of Gombe, where maximal differences were expected. To then compare the two Gombe communities to a geographically distant community, we ran pDFAs including all three communities. To control for individual differences, we used individual identity as the control factor in each of the pDFAs when testing for context and community identity. When testing for individual differences using individual identity as the test factor, we used community identity as the restriction factor. When a restriction factor is added in the pDFA, the randomization process is done while accounting for the fact that the test factor is nested within the restriction factor. To avoid overfitting, we ensured that only as many, or fewer acoustic features are used to perform the DFAs as there are observations in the category of the test factor with the fewest observations. To avoid multicollinearity and to reduce the number of acoustic features while accounting for most variation contained in different acoustic features, we used PCA on the acoustic features.
We used the scores of each observation on the principal components as the features to be used in the DFAs. To choose the number of principal components to include, we used two heuristics. First, we used as many principal components as there were number of observations in that category of the test factor with the fewest observations or as many principal components that explained 90% of the variation, whichever was smaller. Limiting the number of principal components to those that explained 90% of the variation allowed us to avoid including too many components of little explanatory power when including many more components was possible in the pDFA design. Second, since no heuristic is perfect in all circumstances (Jolliffe, 2002), we used an additional heuristic to ensure the stability of results. We verified the consistency of the results of the pDFAs over different numbers of principal components selected using Cattell's scree test (Cattell, 1966

| Ethical note
The research reported in this paper is based on data collected noninvasively from free-ranging chimpanzees. 3 | RESULTS

| Differences in pant-hoots between contexts
To ascertain if we needed to control for context in our main analyses regarding acoustic differences in pant hoot structure as a function of community, we started by examining if context affected any of our acoustic feature sets. We found a statistically significant difference in the structural acoustic features ( Table 2) Table 5).
Hence, we conclude that the number of elements in the build-up, the climax, and the let-down phases potentially encoded contextual information.
Further, we found no differences in the contexts in other types  (Table 4). Figure 3b−d show the overlap between contexts in these acoustic features in a multidimensional space. Given that context is confounding only when it has a significant effect (refer to DAG logic in Section 2), we controlled for context when testing for differences in structural features alone and not when testing for differences in other types of acoustic features in the subsequent analyses.  Table 6). These features of the climax screams did not differ significantly between the two neighboring communities at T A B L E 4 Summary of the results from the pDFAs with context as the test factor and individual identity as the control factor for different types of acoustic features Entire call (Tables 2 and 3) Individual 6 Feed: 6 (3-9) 77 50.8 (47.7) 0.37 Travel: 6.5 (3−12)

| Differences in pant-hoots among communities of chimpanzees
Note: We indicate the number of individuals included in the test factor, that is, in both feeding and traveling contexts, the range of number of calls per individual and the total number of calls considered for each of the analyses.
Gombe, but the relatively low p value indicates these features may warrant further investigation (observed classification accuracy: 70.7% vs. expected: 58%; p = 0.089, Table 6). Furthermore, these features did not differ between the pair Kasekela-Kanyawara (observed classification accuracy: 88% vs. expected: 77.1%; p = 0.18) or between the pair Mitumba-Kanyawara (observed classification accuracy: 68.1% vs. expected: 57.2%; p = 0.13). Hence, we did not use the repDFA_nested function to test which acoustic features were important in the 3-community analysis. Considering that the DFA could be sensitive to outliers (Mundry & Sommer, 2007), we checked for the consistency of the results after removing outliers. The patterns remain similar after the removal of outliers (See Supporting Information: Figures S2 [a-b]).
Additionally, we observed no differences between the contiguous communities, or among all three communities, in the structural features (controlled for individual identity and context), acoustic features of the build-ups, or all acoustic features considered together (Table 6).

| Differences in pant-hoots among individuals
We observed statistically significant differences among the individuals in the structural features, acoustic features of the climax screams, and all acoustic features taken simultaneously. This was true when all communities were taken together as well as when the geographically adjacent communities of Gombe were assessed separately (Table 7). However, the individuals could not be separated based on acoustic features of the selected build-up elements in any    We used community ID as the restriction factor except when using context as a control factor. We indicate the number of individuals included, the range of number of calls per individual and the total number of calls considered for each of the analyses. Abbreviation: pDFAs, permuted discriminant function analysis. *statistically significant at p < 0.05. with more certainty than others. We can see this reflected in the low classification accuracies in the pDFAs (Table 7). differences are more prominent than group differences in the acoustic structure of chimpanzee pant-hoots.

| DISCUSSION
We found that the context of the vocalization could be identified from some structural acoustic features but not from any other kind of acoustic features. Within the structural features, the number of climax elements was higher in feeding contexts and the number of let-down elements as well as build-up elements was higher in traveling contexts. Our results support the findings of Clark and Wrangham (1993), Fedurek et al. (2016), and Notman and Rendall (2005) in finding an association of the let-down phase with the context of the pant-hoot. All of them observed a greater number of pant-hoots with let-down components in traveling contexts, which is a finding consistent with our findings of observing a greater number of let-down elements in traveling contexts. However, we did not have sufficiently detailed behavioral data to distinguish food arrival pant-hoots separately, and hence, we could not confirm the finding of Clark and Wrangham (1993), that a higher proportion of panthoots with let-downs occurred in the context of arrival at a food source. We further observed two more differences that have not been reported previously. First, we found that pant-hoots given in feeding contexts had more climax elements. Second, we observed a higher number of build-up elements in travel context. Furthermore, we found no differences between the contexts in other acoustic features that describe the tonal properties of the build-up and climax elements. Uhlenbroek (1996) described different types of pant-hoots based on their tonal and spectral properties: a "wail-like" pant-hoot is a pant-hoot with clear harmonic structure and a power spectrum with clear peaks; a "roar-like" pant-hoot is a noisy pant-hoot lacking a clear harmonic structure and a more evenly distributed power spectrum (Uhlenbroek, 1996). Notman and Rendall (2005) found that panthoots given in traveling contexts were more "roar-like" and those given in feeding contexts were more "wail-like." Since we found no context differences in the acoustic features related to the tonal properties, fundamental frequency, noise, or peak frequency, we could not confirm these findings from either Uhlenbroek (1996) or Notman and Rendall (2005 (Mitani & Brandt, 1994;Mitani et al., 1999). Mitani and Brandt (1994) found that the principal component that explained the most variance among individuals loaded most highly in acoustic features of the fundamental frequency F0 including, start, minimum, maximum, and mean F0. Similarly, Mitani et al. (1999) found significant individual differences in the minimum, maximum, and mean F0, and the frequency range of F0. Lastly, we observed no differences among individuals in the spectral features of the build-up elements. This is contrary to findings from previous studies (Fedurek et al., 2016;Mitani et al., 1999)  Our findings contrast with those of previous studies looking at community-specific acoustic differences in pant hooting (Crockford et al., 2004;Marshall et al., 1999;Mitani et al., 1992). In the first study reporting vocal dialects in chimpanzees, Mitani et al. (1992) found differences in geographically distant communities of Gombe and Mahale National Parks. Mitani et al. (1999) (Guerra et al., 2008), and doves (Streptopelia sp.) (De Kort et al., 2002) also lack geographic variation in many calls. Such instances of a lack of learned signals could be explained by genetic similarities and hybridization. For instance, loud calls of gibbon (Hylobates sp.) hybrids are not learned from parents and instead exhibit strong genetic inheritance (Brockelman & Schilling, 1984). Our failure to find evidence for community-specific signatures in chimpanzees could reflect features peculiar to Gombe chimpanzees. Alternatively, it may be the case that previous findings of differences among chimpanzee communities resulted from statistical artifacts.
In chimpanzees, several community-specific peculiarities can lead to differential selection pressures for community-specific vocalizations. For example, (i) a recent history of intergroup violence could lead to a greater selection pressure for community-specific vocalizations to facilitate identifying own community versus neighbors. There is a history of lethal intergroup violence in Gombe (Wilson et al., 2004), Kibale (Watts et al., 2006), as well as in Taï chimpanzees studied by Crockford et al. (2004) and . However, Gombe chimpanzees have experienced a higher rate of intercommunity killings Wilson et al., 2004), suggesting that the selection for community-specific vocalizations should be at least as strong as that for Taï chimpanzees, if not higher.
(ii) Stability of hierarchy and strength of affiliative bonds in the community promote vocal convergence (Fedurek, Machanda, et al., 2013;Mitani & Brandt, 1994;Mitani & Gros-Louis, 1998) and thus could create positive selection pressure for community-specific vocalizations. In Gombe, within-community bonds are likely stronger in the Kasekela community, which has more maternal brothers (Bray & Gilby, 2020) and closer overall genetic relatedness among males (Walker et al., in revision) compared to Mitumba, which has fewer brothers and higher within-community violence (Massaro et al., 2021). More data are needed to accurately test if social bonds affect vocal convergence across field sites. (iii) A larger community size may lead to a greater selection pressure for community-specific signatures as it becomes more difficult to keep track of individuals. All communities in this study and in Crockford et al. (2004) were moderate in size, median community size ±1SD: 39.2 ± 29.9 (Wilson et al., 2014), so the difference between our results and those of Crockford et al. (2004) are unlikely to result from differences in community size.
Another possibility is that previous findings of differences among chimpanzee communities may have resulted from statistical artifacts.
While Crockford et al. (2004) attempted to control for confounding factors, their sample size of only three individuals per community increases the possibility that apparent differences could emerge by chance. As evidenced from a simulation study (Loken & Gelman, 2017), noisy data with small sample sizes can lead to false positives.
For the analyses that were most comparable across this study and that of Crockford et al. (2004) (those focused on climax scream and entire call), our study included slightly more individuals per community (5−7 individuals per community compared to 3 individuals per community in Crockford et al. [2004]). Hence should have detected any differences among communities that were similar in effect size to those reported by Crockford et al. (2004). However, because our sample size remains modest, we could have failed to detect differences if the effect size at Gombe is lower than that at Taï, and hence we cannot rule out the potential for false negatives either.
Further, neither our study, nor Crockford et al. (2004) controlled for individual-level factors such as age, body size, health condition, and rank that could influence the acoustic structure. In addition, no studies have been able to quantitatively control for other factors such as the influence of habitat differences and sound environments that Mitani et al. (1999) suggested could be important. Hence, we argue that firm conclusions regarding chimpanzee vocal learning ability require further study, ideally with a larger number of sampled individuals per community. Furthermore, reanalyses of existing data with different methods such as Bayesian inference is another potential avenue for future research.
Our results reinforce the importance of replicating findings in animal behavior research. A key feature of scientific discovery is seeking results that are consistently reproducible (Burman et al., 2010;Johnson, 2002;Lamal, 1990;Popper, 1959). In recent decades, analyses of studies in several scientific disciplines, including fields as diverse as psychology and medicine, have found that most scientific findings fail to be reproduced by subsequent studies, leading to what has been called the replication crisis (Ioannidis, 2005;Wiggins & Chrisopherson, 2019). One factor contributing to this crisis is that studies replicating existing findings are rarely conducted, and are implicitly discouraged through reviewer bias against them (Neuliep & Crandall, 1993). Given that field studies in animal behavior typically have smaller sample sizes than studies in psychology or medicine, it is likely that the field of animal behavior is in even greater need of replication to test the validity of previous results with sufficient sample sizes (Johnson, 2002). Within animal behavior, the need for replication may be particularly acute for species such as chimpanzees, for which field conditions make it challenging to obtain sample sizes sufficient to be confident in results. Long-term data from multiple field sites have proven essential for providing sufficient sample sizes for a range of topics (e.g., culture: Whiten et al., 1999 | 19 of 23