The neurobiology of vocal communication in marmosets

An increasingly popular animal model for studying the neural basis of social behavior, cognition, and communication is the common marmoset (Callithrix jacchus). Interest in this New World primate across neuroscience is now being driven by their proclivity for prosociality across their repertoire, high volubility, and rapid development, as well as their amenability to naturalistic testing paradigms and freely moving neural recording and imaging technologies. The complement of these characteristics set marmosets up to be a powerful model of the primate social brain in the years to come. Here, we focus on vocal communication because it is the area that has both made the most progress and illustrates the prodigious potential of this species. We review the current state of the field with a focus on the various brain areas and networks involved in vocal perception and production, comparing the findings from marmosets to other animals, including humans.


INTRODUCTION
Communication is integral to social living.Despite considerable diversity in gregariousness and social organization, all animals interact with conspecifics.And to do so, they must communicate with each other.
As such, selection for effective communication systems, and the supporting neural circuits, has been a dominant force in the evolution of all species, including human and non-human primates (NHPs).The common marmoset (Callithrix jacchus)-an increasingly popular primate model in neuroscience [1][2][3] -is emerging as a keystone species for elucidating the neurobiology of natural primate vocal communication.
Though a key initial factor in driving the recent surge in marmoset interest was their apparent amenability for developing gene editing technologies in a primate, [4][5][6][7] unique facets of the species' social , cognitive, and communication systems are increasingly motivating research because of their many shared properties with humans. 2 These New World primates (Figure 1A) exhibit a unique repertoire of social behaviors, 8,9 including cooperative breeding [10][11][12][13][14] and com-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2023 The Authors.Annals of the New York Academy of Sciences published by Wiley Periodicals LLC on behalf of New York Academy of Sciences.plex vocal behaviors, similar to humans (for review, see Miller et al. 2 ).Like all primates, the functional organization of the marmoset brain is relatively conserved compared to the human brain, 15,16 further illustrating the species' suitability to model aspects of the human social cognitive faculty that, though shared with other primates, are absent in other mammalian models. 2As such, the marmoset has been proposed as a translational model to study speech disorders as well as speech-related aspects of neurodegenerative disorders (see Okano 17 for a review), [17][18][19] which is further aided by an increasing availability of genetic tools. 4,5,18,20Explicating the neural basis of vocal communication in marmosets will be integral to maximizing the translational significance of this powerful model organism.
Here, we detail current knowledge about the neural basis of vocal communication in marmosets to illustrate the increasingly powerful role that this species is now playing in elucidating primate social brain functions.Though focused on marmosets, this review draws upon evidence from several model species [21][22][23] to contextualize the findings.While most comparisons will be to the rhesus macaque (Macaca Ann NY Acad Sci.2023;1528:13-28.

VOCAL SIGNAL PROCESSING IN MARMOSETS
5][26][27][28][29][30] The most common types of vocalizations are phee, trill, twitter, and trillphee calls (Figure 1B), 3,24,27 which are each perceptually distinctive to marmosets. 31Phee calls are loud, high-frequency calls, that function to maintain social contact when visually occluded from conspecifics, and to mediate intergroup interactions. 25The acoustic structure of phees can vary based on the social context.Loud, long whistles are used for long-range communication, while shorter whistles are used for closer-ranged contact. 3,253][34] In these paradigms, two marmosets, which can hear but not see each other, engage in "conversations"-taking turns producing phee calls.6][37] Twitters are a second type of long-distance call, which is used both between conspecifics in the same group and during intergroup interactions. 25,27Trills are used for short-range contact between conspecifics, 24,38 with bonded partners trilling at each other most frequently. 38The trillphee call is a combination of a trill and a phee, with the calls usually starting as a trill and ending as a phee. 24Like trills, they are used for close-range interactions.In addition to these call types, marmosets also produce a variety of other calls, such as chirp, ek, tsik, and chatter, with each related to specific social or environmental context, such as the presence of a predator or an unfamiliar human or conspecific. 24,289][40][41] For example, simply broadcasting phee calls from an unfamiliar individual to a wild population of marmosets from within their territory elicits robust behavioral responses. 40Interactive antiphonal playback paradigms, in which marmosets engage in vocal exchanges with a VM, is a particularly powerful tool to study vocal behavior in laboratories as it affords experimental control over different facets of a naturally occurring vocal interaction. 33,39,41udies using this paradigm have shown that marmosets recognize the identity of conspecific callers in these contexts 35,39 and modulate their response rate based on the partner's identity and sex. 32though trills are less often studied, as they are only produced in close range to a conspecific, recent advances in wearable technology have allowed recordings in contexts where marmosets produce these vocalizations. 38This shows that marmosets are more likely to produce trills in response to calls by their pair-bonded partner, suggesting again they are able to recognize a caller's identity.Together, these behavioral results show that social information encoded in the acoustic structure of vocal signals is perceptually salient to marmosets and behaviorally relevant in a range of social contexts.

BRAIN NETWORKS UNDERLYING VOCAL PERCEPTION
Vocal signal processing in primates, including marmosets, follows the ascending auditory pathway found in all mammals.Like any sound, vocalizations are first encoded by the cochlea, which sends signals through the cochlear nuclei to the superior olivary complex. 423][44][45] Although these subcortical and brainstem regions involved in audition already perform varying levels of signal processing [46][47][48][49] and have received increasing interest in the study of vocal communication, 50 here we will focus primarily on the upstream cortical regions involved in vocal signal perception.The MGB projections to the auditory cortex (ACx) relay the signal to this main cortical site for the processing of auditory stimuli.
While the initial flow of spiking activity from the thalamus predominantly targets A1 before spreading to the belt and parabelt, all these regions share dense bidirectional connectivity, some of which even have thalamorecipient layers of their own. 44,45,51The ACx then sends output streams to multiple cortical and subcortical regions, principally to the prefrontal cortex (PFC) and other frontal lobe regions. 52,53In fact, auditory projections from downstream tertiary ACx fields comprise a plurality of inputs to the PFC. 54Across these connections, a hierarchical pattern of organization exists.[60][61][62][63][64][65][66][67][68] The fronto-temporal network in the primate brain is crucial for vocal communication 69 and is organized in two pathways; the postero-dorsal and antero-ventral pathway, which parallel the visual dorsal and ventral pathways (Figure 2). 64In primates, the postero-dorsal pathway, which involves connections from the caudal belt of the ACx to the dorsolateral PFC (dlPFC), is thought to be involved in the spatial processing of auditory signals.Experiments testing the antero-ventral pathway, which involves connections from the anterior belt of the ACx to the ventrolateral PFC (vlPFC), suggest it is integral to encoding the individual call types. 63,64,70Although convincing evidence for these two pathways has been found in rhesus macaques, relatively little functional work has been done in marmosets.However, given the presence of dual visual streams similar to macaques in marmosets, 71,72 and the Cortical networks for vocal perception.A graphical representation of sensory networks involved in processing vocalizations.The MGB in the thalamus projects to the auditory cortex.From the anterior part of the auditory cortex, projections to the vlPFC are made, which represents the antero-ventral stream, while the posterior part projects onto the dlPFC, which represents the postero-dorsal stream.Abbreviations: dlPFC, dorsolateral prefrontal cortex; IC, inferior colliculus; MGB, medical geniculate body; vlPFC, ventrolateral prefrontal cortex.
differential connectivity of the rostral and caudal belt areas of the auditory cortex to the frontal lobe, 44 it is highly likely they would possess a similar dual pathway system for auditory processing.Those neurophysiological studies that have been performed in marmosets have tended to focus on the antero-ventral stream and investigated responses to various call types in their repertoire in different neural substrates.
Studies on how marmosets perform sound localization have been less prevalent despite the facts that marmosets possess sufficient acuity to perform sound localization experiments, 73 localization is an important function of long-range calls in marmosets, 25 and initial studies show promising results. 74,75Increased focus on the postero-dorsal pathway may elucidate how the fronto-temporal network contributes to vocal communication in marmosets.
As evident from these various neural networks involved in auditory processing, several brain regions work in concert to process conspecific vocalizations.Although these substrates are highly interconnected, so far, most neurophysiology studies have focused on single brain regions.
Below, we unpack in more detail how the more well-studied substrates in the marmoset forebrain represent calls to better inform their respective, complementary roles in the broader vocal signal processing network.

Auditory cortex
The ACx is the neocortical substrate located in the temporal cortex that first processes sound in the forebrain, and it plays a crucial role in extracting perceptually salient features of a vocal signal.In all primates, the ACx is organized along the dorsal, medial, and lateral axes in the temporal lobe, most notably along the rostral-caudal axis (Figure 1C). 76Its fundamental pattern of organization consists of an elongated core region, where the primary sensory cortex (A1) is located, and receives the majority of its ascending inputs from the thalamus.The core is encircled by two concentric belt-like areas-the belt and parabelt-each divided into several subareas (Figure 1C). 52,77ch region and subregion exhibits functional specialization, with A1 encoding more basic sound properties, while belt and parabelt areas represent increasingly abstract, integrated patterns of auditory activity. 63,78,79rlier neurophysiological studies of the marmoset auditory cortex reported that both individual neurons and populations were responsive to, or selective for, conspecific vocalizations, including in both primary (A1) and secondary (caudomedial belt) ACx regions. 80,81triguingly, Zeng et al. 82 recorded calcium activity of cells in marmoset A1, finding that a subpopulation of A1 neurons exhibited selectivity for certain conspecific call types but not others, and became unresponsive when the call structure was acoustically manipulated.This degree of representational selectivity is somewhat greater than what is typically posited, 83 though consistent with early data on marmoset A1 showing response extinction when the vocal stimulus was played in reverse. 81 present, the dearth of studies examining how vocalizations are represented in the auditory cortex of behaving marmosets is a significant bottleneck in determining whether A1 is capable of more stimulus selectivity and dimensionality than was previously considered.
Recent studies with functional magnetic resonance imaging (fMRI) in marmosets have complemented neurophysiological experiments to investigate how multiple regions of the auditory cortex support vocal signal processing.[87] Rather than finding a similar gradient, Jafari et al. 88 showed "voice patches" in awake marmosets, similar to the human voice patch system, 89 consisting of three distinct voice areas in the temporal cortex.In addition, they reported a wider functional network of areaswith particular sensitivity to voices or vocalizations-distributed across the marmoset anterior temporal lobe, PFC, anterior cingulate cortex (ACC), and subcortical structures.This network included rostral areas of the ACC, which had not been reported in macaque studies and is generally not included in the human vocal processing network. 891][92] Given the integral social salience of both voices and faces to all primates, these parallel specializations of marmoset cortical pathways for processing both of these salient social signals provide a compelling framework for the future study of multimodal social communication in the primate temporal cortex.

Frontal cortex
The auditory cortex sends projections to a variety of brain regions, including the frontal cortex and other temporal regions. 93Although it is clear that auditory regions already represent higher-level acoustic representations, such as species-specific vocalizations, the upstream regions are thought to further process incoming stimuli, for example, determining the identity or location of a caller, and integrating various modalities of information.
Several regions in the frontal cortex share reciprocal connections with the auditory cortex, most notably the dlPFC through the posterodorsal stream and vlPFC through the antero-ventral stream. 64It is, therefore, no surprise that the frontal cortex exhibits responses to vocalizations in all primate species that have been tested. 37,94,95Neurons in the vlPFC of head-fixed macaques, for example, respond to different types of vocalizations, with most cells responding to a subset of possible call types and only a few responding to a single or all calls. 95Although less is known about the representation of vocalizations in the dlPFC, studies using natural sounds do show that this area responds to acoustic stimuli and specifically encodes the spatial location from which they have been emitted. 96,97It is, therefore, likely that this area would respond to vocalizations, but the expected degree of vocalization selectivity is not clear.100] In marmosets, several studies have shown that frontal cortex neurons are responsive to vocalizations, though in contrast to macaque experiments, these studies have predominantly been performed in freely moving animals.A cFos study showed significantly increased expression in the ventrolateral area of the PFC during perception of vocalizations compared to production or antiphonal conditions as well as compared to cFos expression in the dorsal premotor cortex during vocal production. 101,102Further studies showed stronger evidence of frontal activity in response to vocalizations, both in individual cells and across populations, with vocalization-responsive neurons found in multiple areas of the prefrontal and premotor cortices. 37,103Impor-tantly, neural activity was highly dependent on context: individual PFC neurons and populations responded differently depending on whether the vocalization was heard in a passive-listening context, freely moving and/or in the context of a natural conversation, suggesting that vocalization representations in this substrate are highly sensitive to the behavioral context in which they are heard. 37,103Overall, these studies point toward a representation of conspecific vocalizations across frontal regions, but more detailed work in particular subregions is still needed.In addition, studies on more complex representations of vocal stimuli, such as the identity or location of the speaker, are critical to determining the complementary roles of the ventral and dorsal pathways for vocal signal processing in the marmoset frontal cortex.

Subcortical substrates
In addition to the frontal cortex, the ACx also projects to various subcortical structures, suggesting these areas may also be involved in the processing of vocalizations.Indeed, Jafari et al. 88 report the activation of several subcortical areas in response to hearing vocalizations and suggest these are part of a larger vocal processing network.Other subcortical areas have been implicated in the integration of auditory information across modalities, such as the hippocampus integrating individual identity from faces and voices 104 and the amygdala extracting affective information from the auditory input. 105,106However, only a limited number of studies have specifically targeted subcortical substrates in the context of vocalizations in marmosets.Although responses to vocalizations can thus be found in these areas, crucial experiments in the broader context of natural communication behaviors are needed, as we will discuss in more detail.

MECHANISMS FOR VOCAL PRODUCTION IN MARMOSETS
Vocal communication systems are not only optimized for vocal signal processing and perception, but also the production and control of the vocalizations themselves.[109][110][111] Humans exhibit perhaps the greatest degree of flexibility and control over vocal emissions and rapidly change speech production in real time in response to auditory feedback. 112,113Determining the evolutionary origins of this degree of vocal plasticity in humans, however, has been challenging because of the seemingly dramatic evolutionary gap between humans and NHPs. 114,115Historically, vocal production in monkeys has been assumed to be largely innate and stereotyped, 116 a view largely based on studies in squirrel monkeys, which exert very little learning and vocal control over the timing and structure of their vocalizations. 117,1180][121][122][123][124] In particular, marmosets show plasticity and flexibility to control when, 125 where, 126 and what to vocalize. 127his suggests that the purported gap between humans and NHPs in capacities for vocal control may not be as dramatic as previously thought, and it opens the door for neurobiological investigations of this system in marmosets to serve as a model for at least some aspects of the parallel system in humans.
Vocal learning and plasticity in marmosets are evident in several naturally occurring social contexts.This is especially true during ontogenetic vocal development, which shows parallels with humans.
Marmosets begin producing baby cries, which gradually develop into "babbling" and finally fully mature calls. 128,1290][131][132] When subadults are separated from their parents, they will continue to produce baby-like cries and babbling vocal behavior into adulthood, which is suppressed in normally raised monkeys. 132Even after the initial infant stage, the call structure of individuals continues to mature, with each individual developing unique call sequences. 128Not just the content of the calls changes over ontogeny; young marmosets also learn turn-taking during conversations with their parents. 130,131Interestingly, vocal plasticity does not appear to be limited to the infant and juvenile periods of development as marmoset vocal behaviors remain highly plastic into adulthood. 36,128,133Adult marmosets, for example, change specific features of the calls depending on the social context, 14,35,38,[133][134][135][136][137][138][139] such as taking on a new "dialect" of a social group after translocation. 139e prevalence of vocal plasticity in marmosets is notable as it contrasts with other primates, 140,141 such as squirrel monkeys 142 and macaques, 143,144 which do not show the same effects of ontogeny.
This suggests a suite of potentially complementary mechanisms which may underly vocal control in this species at different stages of life and make the marmoset a compelling candidate to study aspects of speech development in humans. 14524]146 Phee calls consist of a variable number of pulses, most often two, and thus can be used to elucidate whether marmosets produce a motor plan for the full multipulsed call prior to the vocal onset or in concurrence with the generation of each pulse.Miller et al. 122 found that acoustic features early in the call were highly correlated with the call structure-both how many phee pulses the marmoset made and the spectral features of the call.Because the features of the first pulse were predictive of the number of pulses a call would consist of, they suggested that a single motor plan for the whole call is created before or at the call onset.However, later studies using acoustic perturbation at various times after phee call onset have shown that marmosets are able to change the call length 123 and the number of pulses 147 after initiation of a call.This suggests each call consists of a number of smaller motor units, rather than a single motor plan, which they are able to flexibly control throughout the call and not just at the onset.
Zhao et al. 124 further leveraged this acoustic-perturbation paradigm to show that rapid changes to the spectra of phee calls occurred during perturbation, such as shifts in the fundamental frequency and longer call phrases, while other experiments show that marmosets shift the fundamental frequency of their phee calls in the presence of noise that interferes with the particular frequency. 36milarly, Eliades and Tsunada 146 used a paradigm where marmosets heard a pitch-shifted version of their own call during vocalizations.
They found that the marmosets would change the pitch of their own vocalizations, suggesting that marmosets could sense the error between their motor goal the received sensory input and were able to rapidly adjust their calls.Pairs of marmosets also changed the relative timing of their vocalizations to enable vocal exchanges when tested with temporally periodic and aperiodic white noise broadcasts. 148gether, these studies show that marmosets are able to rapidly adjust their calls in response to sensory feedback and environmental noise, suggesting they possess a high level of vocal control throughout adulthood.

BRAIN NETWORKS UNDERLYING VOCAL PRODUCTION
The neural basis of primate vocal production has been hotly debated for many years.One recent model 114 states a division into two separate networks (Figure 3), distinguishing between the primary vocal motor network (PVMN), which exists across species, 149 and the volitional articulatory motor network (VAMN).The hypothesis suggests that the VAMN only exists in primate species. 114Another model 150 described a four-level system, where the PVMN is further split into a central pattern generator (CPG) and a slower innate driving network, while the laryngeal structure forms the fourth facet. 150 describe a top layer of the hierarchy that is shared between species that engage in coordinated vocal exchanges, including monkeys, but also meerkats.Additionally, Zhang and Ghazanfar 150 do not assign specific brain regions to these layers of the hierarchy, but rather a functional system that would roughly map onto Hage and Nieder's brain regions.Despite these differences, both agree that the innate production of vocalizations relies on a CPG in the brainstem, specifically in the reticular formation, 151 which is highly conserved across vertebrate species. 152,153The CPG connects directly to phonatory motoneurons and receives input from the periaqueductal gray (PAG), which is located in the midbrain. 1546][157] The PAG in turn receives input from a wide variety of areas, including the amygdala and ACC. 114,150,158The role of these connections is thought to be the stimulation of PAG to elicit calls in the appropriate context, such as a certain social or emotional context, [159][160][161] though the details on how various inputs control PAG and vocalizations in general are still debated.
In addition to this conserved mammalian vocal system, NHPs are thought to possess an additional network of structures, mainly located The PVMN 114 , or CPG and drive levels 150 (solid arrows), involve connections from the ACC to PAG to RF, which projects to the motoneurons.The VAMN, 114 or environment level 150 (dotted arrows), shows the connection from area 45 (Broca's area) and the vlPFC on the lateral surface to the ACC on the medial surface.The connectivity described by Cerkevich et al. 168 (dashed arrows) involves direct connections from M1, SMA, and the ventral premotor cortex to the RF, with the two premotor connections being stronger (represented by thicker lines).Abbreviations: ACC, anterior cingulate cortex; CPG, central pattern generator; M1, primary motor cortex; PAG, periaqueductal gray; PVMN, primary vocal motor network; RF, reticular formation; SMA, supplementary motor area; VAMN, volitional articulatory motor network; vlPFC, ventrolateral prefrontal cortex.
in the frontal cortex, that control more complex vocal sequences similar to speech in humans. 114,150This network includes areas of the premotor and motor cortices as well as the inferior frontal gyrus.Central to this network is Broca's area, which plays a large role in the production of speech in humans, 162,163 and the NHP homologs in the vlPFC. 164,165Both Hage and Nieder 114 and Zhang and Ghazanfar 150 suggest these areas connect to the ACC and control vocalizations through the PAG.Although general connectivity studies seem to support such a network, 166 several areas in the PFC also connect directly to the PAG or reticular formation, 167,168 and as such might be able to control vocalizations directly.In addition, connections between the ACC and vlPFC are reciprocal, 169 suggesting that information flows in both directions.Lastly, neither of these frameworks account for the apparent differences in vocal control between marmosets and other primates.Although other primates, such as macaques, are able to exert some level of control over their vocalizations, 170,171 marmosets are able to flexibly suppress vocalizations 123 and change a vocalization's frequency 36 and length. 147These vocalizations involve fine motor skills that have so far not been shown in other NHPs despite the similarity in neural circuits of vocal control.
A recent study sought to address this apparent disparity in vocal control between primates by performing a comparative study of the vocal motor pathways in marmosets and macaques. 168Cerkevich et al. 168 noted that enhanced vocal control in marmosets are unlikely to be attributed to laryngeal biomechanics and other peripheral mechanisms alone, 115 and thus they hypothesized that the brain areas driving the laryngeal muscles play a more significant role in this process.To test this conjecture, they used retrograde viral tracing from the cricothyroid muscle, the laryngeal muscle that is most specifically related to vocal motor control, to identify the cortical areas involved in vocal motor production in both macaques and marmosets.
Analyses revealed that the neurons in both premotor and primary motor areas formed disynaptic connections, via the reticular formation and nucleus ambiguous, to motoneurons.These connections are unlike those in humans, which have direct corticomotoneuronal connections.In marmosets, compared to macaques, a much larger number of these disynaptic output neurons was located in the supplementary motor area and ventral premotor cortex compared to the primary motor cortex (Figure 3), suggesting that the increased vocal plasticity and control in marmosets relative to macaques is not likely due to alterations in primary motor cortex, but from expansions in the disynaptic connections to laryngeal motor neurons in these two pre-

Frontal cortex
Several studies in marmosets have explored the possibility of a role for the frontal cortex in the production of vocalizations.Early studies using immediate early gene (IEG) expression as a proxy for neural activity showed seemingly contradictory results regarding the effect of producing vocalizations on IEG in the frontal cortex. 102,173Where one found increased expression in the vlPFC in animals that produced calls, 173 another study found only a significant increase in the premotor cortex. 102However, the task was slightly different between the two studies.Simões et al. 173 compared a condition where the animals listened and responded to conspecific vocalizations (antiphonal conversation) to a condition where the animals only listened but did not produce any calls themselves.Miller et al. 102 included these same two conditions, in addition to a condition where an animal did not listen to any calls, but vocalized spontaneously.When the marmosets produced calls, regardless of whether they also listened to calls, the premotor cortex was activated.However, when the marmosets also listened to calls, which was the case in the study by Simões et al., the vlPFC was activated as well.
Recent neurophysiology experiments in freely moving marmosets examined the responses of single neurons in the frontal cortex during vocal production. 101,125Overall, these studies report evidence that neurons exhibited robust vocal-motor-related activity during call production in both dorsal and ventral premotor regions, as well as from a subset of neurons in areas 45 and 8av in the vlPFC.The authors distinguished between an antiphonal and spontaneous state during vocal production, but found no difference between these two states. 101portantly, Roy et al. 125 confirmed these neurophysiology results and controlled for orofacial movement, indicating that the premotor activity is specifically related to the production of vocalizations.While each of these studies tested only a single call type, phee calls, a subsequent experiment recorded neural activity in the premotor cortex, while marmosets produced a broader corpus of vocalizations. 174 The majority of data on the neural basis of vocal production in marmosets are in the premotor cortex, despite convincing evidence from macaques showing the involvement of other regions, in particular the vlPFC and ACC. 161,165The vocal production networks described earlier assign the ACC the role of relay center, taking inputs from the vlPFC and premotor areas and sending signals to the PAG.This is based on results showing cue-related activity in the vlPFC and ramping activity in the ACC prior to a cued call but not for spontaneously produced calls. 161By contrast, neurons in the marmoset vlPFC and premotor cortex exhibited robust vocal-motor changes in activity for calls pro-duced spontaneously and in conversations. 101However, each of these results are based on a limited number of neurons and animals, thus further evidence is needed to determine the role of the vlPFC and ACC in the production of vocalizations.Given the pattern of activity in the premotor cortex during vocal production, and the robust connectivity patterns, future neurophysiology experiments are critically needed in these other substrates to understand the neural basis of vocal production more fully in primates.

Auditory cortex
6][177] Studies of auditory cortical processing in species such as rodents and songbirds have found that the ACx exhibits widespread inhibition of neural activity corresponding to background noise and self-generated locomotor activity. 62,178,179Indeed, Eliades and Wang 177 found that approximately 75% of recorded A1 neurons were significantly inhibited during the production of a marmoset's own vocalizations in a frequency-dependent manner.

VOCAL COMMUNICATION IN THE NATURAL WORLD
Our understanding of the neural mechanisms underlying vocal communication in primates is largely based on studies in which vocal perception and production are studied independently.In fact, much of the literature on the neural representations of vocalizations have used traditional head-restrained monkey paradigms and presented isolated exemplars of vocalizations that are largely divorced from their natural contexts.However, in natural contexts, communication is an inherently interactive social behavior involving the active exchange of signals between individuals.In this sense, vocal perception and production are not separable, but reflect complementary processes that seamlessly integrate as the foundational architecture of the communication system.Evidence suggests that facets of natural communication cannot be captured using more reductionist paradigms and the consideration of each component of the system in isolation. 103,180As such, it is crucial that future work takes a more holistic approach to the neurobiological study of communication and examines its processes in naturalistic contexts comprising the very challenges the system evolved to overcome. 181rmosets are highly voluble and amenable to freely moving, naturalistic paradigms and neural recording systems, which makes the species particularly ideal to elucidate the neural basis of natural communication.Indeed, as highlighted throughout this manuscript, work to this end is accelerating, further illustrating the significance of this model organism in the next chapter of the field.Here, we highlight some of the recent technological advances that allow the monitoring of vocal communication in the natural world and briefly discuss two lines of inquiry that have been understudied, but are becoming increasingly feasible to study because of current methodological innovations.

Technical advances
As In recent studies, marmosets were trained to wear small microphones, which were used to record their calls within the colony. 26,38kewise, several methods of automatically detecting calls from these recordings have been developed, 26,38,[182][183][184] which allow for rapid analysis of vast amounts of data.In addition to detecting calls, deep convolutional networks can also be used to identify which individual animal made a certain call. 26Lastly, marmosets can be filmed and tracked using automated pose estimation techniques [185][186][187] to determine the behaviors that covary with call emissions.These technological developments make it possible to quantify the detailed nuances of natural marmoset behaviors in increasingly complex natural contexts.
Despite considerable challenges, performing neuronal recordings in natural conditions is becoming increasingly feasible.Many systems for neuronal recording, such as two-photon imaging 188 and various methods of neurophysiology, 103,189 that traditionally required headfixed animals are being adapted for freely moving paradigms.Tethered miniaturized microscopes, for example, can be used to record calcium transients during natural behavior, 190 while wireless neural recording technologies can be used to record spiking activity with silicon probes and other electrodes. 191,192These methods have meant a significant improvement in the breadth of behaviors whose underlying neurobiology can be studied in naturalistic contexts, 193,194 but their use in marmosets comes with important considerations.Because of the marmosets relatively small body size, wireless equipment must be lightweight to not impede the animals' movements, 193 while at the same time, it must be designed to withstand the rigors of a 3D arboreal lifestyle.Recent studies using wireless recordings in marmosets have shown promising results, 193,195 recordings from up to 96 channels simultaneously while the marmoset is freely moving in the home cage.Employing such wireless recording technologies and combining them with wearable microphones and behavior tracking opens the door to a range of new questions related to marmoset vocal communication that were not possible even just a few years ago.

Cocktail party problem
Noise is ever present in natural environments.Given the significance of conveying accurate information for natural communication, it is not surprising that the auditory system evolved mechanisms to overcome these challenges.In fact, animals are remarkably adept at recognizing conspecific vocalization and extracting meaningful information from these signals in the presence of competing background sounds, including other vocalizations, [196][197][198][199] classically illustrated by the cocktail party problem (CPP) and auditory scene analysis (ASA).1][202][203][204][205] For example, in macaques, the auditory cortex shows increased separation for tones that are mistuned, which would allow for these mistuned tones to be more easily separated from the background. 204In addition, the activation in macaque auditory neurons follows human behavioral results, showing increased separation of tones that were out of phase. 202[208] While studies in NHPs and humans employ differing behavioral paradigms and stimuli, with vocalizations and speech notably absent as stimuli in primate studies, findings in both areas support the idea that primary auditory regions respond to both the competing sounds 87 and to the attended-to target sound, 79 while downstream regions in the parabelt and rostral STG more selectively encode only the target, be it an artificial tone or a vocalization. 62However, how the auditory system represents the attended-to vocalization, while effectively ignoring the other sounds, remains poorly understood.To our best knowledge, the field lacks any study which has recorded neural activity in the ACx, or any other area of the brain, while monkeys listened to multiple co-occurring vocalizations.Given the converging lines of evidence pointing to the role of both the ACx and its various downstream targets in stream segregation and vocalization processing, it is crucial that these experiments be conducted.Partially in service of this goal, recent behavioral studies by Jovanovic and Miller

Multimodality
1][212] Indeed, both macaques 213,214 and marmosets 30,215 exhibit a range of expressions, and marmosets have been shown to vary their expression depending on the affective value of a stimulus. 30In addition, macaques are sensitive to the congruence of faces articulating vocalizations (i.e., McGurk Effect), 210 strongly suggesting these primates integrate audio-visual information.
7][218][219][220] Primates match the identity of an individual across their face and voice, 104,221 though data on how these social signals are integrated for social recognition at the neural level are only beginning to emerge. 222A recent study in marmosets suggests that the hippocampus may be integral to the cross-modal integration of faces and voices for cohesive representations of individual identity. 104Tyree et al. 104 showed that, like humans, 223 individual neurons in the marmoset were highly selective to specific individuals when seeing their faces or hearing their voices.
In contrast to human experiments, however, a parallel population was discovered that represented the cross-modal identity of multiple individuals within individual neurons.Analyses of the population activity revealed not only that the cross-modal identity of all animals tested was encoded, but that other social categories (i.e., family vs. nonfamily) were also represented in the same population.This suggests the hippocampus may be involved in encoding critical information about social context (e.g., partner identity and social relationship) during communicative interactions.In addition to the hippocampus, the frontal cortex, 224 amygdala, 225 other areas of the temporal lobe, 226 and auditory cortex 227 all show multimodal responses to auditory and visual stimuli, suggesting that these representations recruit a broad network of substrates.Especially in marmosets, which use eye contact and facial expressions during their communication, we would expect to see an interplay of visual and auditory signals to drive communication behaviors. 30,212This multimodal aspect of communication has so far received little attention, with studies mainly focusing on phee calls, which are made in visual isolation.The recent interest in other types of calls, such as trills, 38 which are made while in visual contact with a partner, will enable future cross-modal experiments exploring how social signals across modalities are integrated and functionally complement each other during communication in natural contexts.

CONCLUSION
In this review, we discussed the current stage of research on the neural basis of vocal communication in marmosets and complemented this summary with suggestions for both exciting new avenues and topics with important remaining questions that need more work.A recurring theme throughout has been an emphasis on vocal communication in natural behavioral contexts, as it is in these settings that the neural processes that evolution selected to overcome the idiosyncratic challenges of communication will be most apparent. 181swers to these challenges are becoming increasingly attainable with the recent developments in gene editing, allowing for genetically expressed calcium indicators, 6 and high-density silicon probe recordings 192,193 to record large populations of neurons from the brain.While these cutting-edge neural recording approaches are critical, elucidating the neural basis of natural behavior hinges on techniques being developed to be used in freely moving experiments [192][193][194] combined with novel behavioral paradigms, 209,228 which exploit the wide range of natural behaviors in marmosets. 37,101,194,229This is particularly important as various behaviors can cause widespread activation across the brain and strongly predict activity across various regions, 230,231 suggesting that vocal communication should be studied in concert with other natural behaviors.Afterall, communication is a social behavior, not merely the process of hearing and producing signals.The continuing popularity of marmosets as a model animal across neuroscience disciplines 1,17 illustrates their many advantages to explore facets of the natural brain and behaviors, not only because of their practical benefits for gene-editing and freely moving neural paradigms but also as a result of their dynamic social, cognitive, and communicative behaviors that are only beginning to be explored.The marmoset repertoire can serve as a powerful engine of discovery in the coming years to elucidate the complex nuances of the neural mechanisms underlying primate social brain functions.
which is the most widely used NHP model animal, other species will also be highlighted when data are available.First, we will briefly review behavioral studies of marmoset vocal communication that is related to vocal perception and production.Next, we will discuss how various brain regions mediate vocal communication, describing separately literature on vocal signal processing and vocalmotor production.The final section focuses on communication in natural contexts and highlights areas of research needed to better understand the neural basis of communication more holistically.
Specifically, this model poses that the various layers contribute to vocal production at different timescales, thus addressing the coordination of vocal production across single syllables, words, or conversations.Although both hypotheses agree about the general structure and hierarchical control involved in vocal production, Hage and Nieder 114 argue that the VAMN is a primate-specific network, whereas Zhang and Ghazanfar 150

F I G U R E 3
Cortical networks for vocal production.A graphical representation of the three vocal production networks discussed in this review.
motor areas.Given these neuroanatomical results, future research into the contribution of the output neurons in these areas to flexible vocal production might give an insight into the neurobiology underlying complex vocal skills.Considered together with the results discussed above and the species amenability to naturalistic, freely moving neurophysiology paradigms, these findings highlight that marmosets are a uniquely powerful primate model of vocal production and control.Data in support of each of the aforementioned models are somewhat mixed, at least in part due to gaps in our knowledge about vocal production, control, and learning discussed above, as well as the neural mechanisms that underly each of these facets of the vocalmotor system in primates.As with the research on the neural basis of vocal signal processing discussed above, studies on vocal production have largely focused on individual brain regions that are thought to be involved.Here, we focus on the role of forebrain regions for primate vocal production (for a review on mid-and hindbrain regions, see Jürgens).172 Results indicated that while overall neurons exhibited vocal-motor-related changes in activity just prior to or during call production, different subpopulations of premotor neurons are active depending on the call type produced.Their results show that the marmoset premotor cortex and vlPFC are involved in the spontaneous/volitional production of vocalizations during communication, though more detailed experiments testing their respective contributions to the control and production of calls are needed.
Eliades andTsunada146 used an online vocal pitch-shifting paradigm to further study the role of the ACx in the control of marmoset vocal behavior.They found that the population of A1 neurons, which would ordinarily be suppressed by one's own call, exhibited a significant reduction in suppression upon the manipulation of the vocal pitch away from the fundamental frequency of the initial vocalization.They also demonstrate a causal role for the ACx by directly stimulating this same population of neurons, which resulted in correlated shifts in both vocal pitch and firing rate of A1 neurons.These findings provide critical evidence that the auditory cortex is integral to vocal communication, not only for the encoding of conspecific vocalizations but for accurate and dynamic vocal control as well.
outlined above, most of what is known about natural marmoset vocal communication has come from studies of the species phee call during antiphonal call interactions.While clearly powerful, this type of long-distance dyadic vocal behavior reflects only a subset of the species' communicative repertoire.The study of dynamic, multicaller environments that approximate more natural social landscapes, such as the home colony, comes with technical challenges.The vocalizations of the experimental monkey, as well as the other monkeys in the colony, need to be recorded and isolated.Animals may move around and produce calls intended for different animals in various social contexts that change fluidly, making it difficult to determine where and under what context a call is made.Despite these challenges, such experiments are becoming increasingly feasible.
in order to better understand marmosets' behavioral strategies for successfully navigating this core challenge.Coupled with wireless electrophysiology systems, such naturalistic behavioral paradigms offer powerful opportunities to investigate neural mechanisms that underlie the CPP and ASA in the primate brain that have been difficult to study with more conventional approaches.
209engaged marmoset subjects in an interactive, species-specific, simulated "cocktail party" scenario