Common principles in the lateralization of auditory cortex structure and function for vocal communication in primates and rodents

This review summarizes recent findings on the lateralization of communicative sound processing in the auditory cortex (AC) of humans, non‐human primates and rodents. Functional imaging in humans has demonstrated a left hemispheric preference for some acoustic features of speech, but it is unclear to which degree this is caused by bottom‐up acoustic feature selectivity or top‐down modulation from language areas. Although non‐human primates show a less pronounced functional lateralization in AC, the properties of AC fields and behavioural asymmetries are qualitatively similar. Rodent studies demonstrate microstructural circuits that might underlie bottom‐up acoustic feature selectivity in both hemispheres. Functionally, the left AC in the mouse appears to be specifically tuned to communication calls, whereas the right AC may have a more ‘generalist’ role. Rodents also show anatomical AC lateralization, such as differences in size and connectivity. Several of these functional and anatomical characteristics are also lateralized in human AC. Thus, complex vocal communication processing shares common features among rodents and primates. We argue that a synthesis of results from humans, non‐human primates and rodents is necessary to identify the neural circuitry of vocal communication processing. However, data from different species and methods are often difficult to compare. Recent advances may enable better integration of methods across species. Efforts to standardize data formats and analysis tools would benefit comparative research and enable synergies between psychological and biological research in the area of vocal communication processing.


| INTRODUCTION
Lateralization is a fundamental organizing principle in mammalian, non-mammalian and even some invertebrate species (Halpern et al., 2005), which allows corresponding brain areas in the left and right hemisphere to specialize on different aspects of a brain process and thereby presumably save computational resources. This hemispheric specialization also minimizes information travel across hemispheres, saving time in the process. In some cases, a lateralized brain can also perform two independent tasks in parallel, such as finding food and watching out for predators (Rogers et al., 2004;Vallortigara & Rogers, 2005). In mammals, the neocortex shows functional left-right asymmetries in the visual (Chaurasia & Mathur, 1976;Manns & Ströckens, 2014), motor (Knecht et al., 2000;Sun & Walsh, 2006;Uomini, 2009;Volkmann et al., 1998), prefrontal (Sandrini et al., 2008;Van Horn et al., 1996) and auditory cortex (AC).
The details of developmental formation of functional lateralization in mammals are not fully understood. Even in the zebrafish, a relatively simple model organism, several genetic, environmental, and epigenetic factors contribute to lateral asymmetries in neural development. Zebrafish show a genetically determined projection asymmetry in their midbrain, which is likely a basis for certain social and aggressive behaviours (Aizawa et al., 2005;Chou et al., 2016;Duboué et al., 2017;Facchin et al., 2015;Güntürkün et al., 2020;Güntürkün & Ocklenburg, 2017). Similarly, an initial genetically predetermined 'tipping of the scales' might propagate through differential use during development in mammals. This would explain the strong dependence of lateralization on individual experience (Brown et al., 2007;Hauser & Andersson, 1994;Nava et al., 2013;Skeide & Friederici, 2016;Špajdel et al., 2007). Even handedness, a classical example of lateralization, only shows weak heritability, and the early life factors contributing to hand preference each only have minimal predictive value (de Kovel et al., 2019), indicating the complexity of the process of how lateralization forms in the human brain.
A crucial ability distinguishing humans from other mammals is language. The comprehension of spoken language requires precise analysis of acoustically complex stimuli, as well as lexico-semantic and grammatical knowledge. Although language has classically been regarded as an exclusively left-hemispheric process, recent research clearly shows bilateral activity. Lateralization in the AC is special, as it is far more stimulus dependent than in other sensory cortical areas. Even in primary areas, response characteristics of left and right AC can vary considerably. How these dynamic responses are generated is largely unknown. This research spans several disciplines, from psychological and linguistic studies in humans to single-neuron recordings in model animals. In this review, we give an interdisciplinary overview of auditory lateralization in humans, non-human primates and rodents.

| HUMAN LANGUAGE PROCESSING IN AC IS LATERALIZED DUE TO BOTTOM-UP AND TOP-DOWN MECHANISMS
Communication sounds have a complex harmonic structure and vary rapidly over time. The auditory sensory system has evolved mechanisms to extract sound features relevant for communication, such as pitch (Bendor & Wang, 2006), amplitude modulations (Brugge et al., 2009) and voice onset time (Steinschneider et al., 1999). In humans, these mechanisms show specific tuning to the acoustic features of speech (Holdgraf et al., 2016;Moerel et al., 2012) and actively suppress background noise (Khalighinejad et al., 2019). Classically, human language processing is described as largely leftlateralized. This view originated from Broca's (1861) and Wernicke's (1874) studies and was expanded upon until the end of the 20 th century (Bethmann et al., 2007;Knecht et al., 2000;Liberman & Mattingly, 1989;Markus & Boland, 1992). Zatorre et al. (2002) and others (Cha et al., 2016;Schonwiesner et al., 2005) postulated that left-lateralized speech processing arises from slight differences in feature selectivity between left and right AC, with the right AC having higher spectral resolution and the left AC having higher temporal resolution. The work by Poeppel and colleagues (Hickok & Poeppel, 2007;Poeppel, 2003) provided evidence that this difference in resolution might be due to different time windows of integration. Left and right AC appeared to integrate incoming information for 20-50 and 200-300 ms, respectively, making the left AC more responsive to phonemes and short syllables, and the right AC more responsive for slower features of speech, such as intonation and prosody. This hypothesis has received significant empirical support (Hurschler et al., 2015;Kennedy-Higgins et al., 2020;Liem et al., 2014;Obleser et al., 2008;Overath et al., 2015;Rimol et al., 2005). However, there is also evidence against it. Responses in elderly subjects do not fit the expected lateralization, as they do not shift to either hemisphere upon suprasegmental manipulation of speech (Bellis et al., 2000;Keller et al., 2019). Some authors also demonstrated a stronger leftlateralized representation of intelligible speech than predicted by the hemispheric difference in integration time (Evans et al., 2014;. Furthermore, responses to the same sounds can be lateralized differently according to context, for example, depending on syntactic relevance of intonation (van der Burght et al., 2019), individual learning experience (Plante et al., 2015), added background noise for consonant-vowel combinations (Todd et al., 2011), or depending on individual focus on speech information or speaker identity (Kriegstein & Giraud, 2004). These different views can be unified by combining the notion of (bottom-up) specialization for spectrotemporal features of speech in the left and right AC with (top-down) signalling from non-auditory regions (Albouy et al., 2020;Assaneo et al., 2019;Giroud et al., 2020;Zatorre & Gandour, 2008). There are also conceptual discussions of whether the acoustic description of speech as a temporally complex stimulus sufficiently captures its lexical and grammatical complexities Peelle, 2012) and whether the large language processing network beyond AC can be ignored (reviews of language processing beyond AC: Fedorenko, 2014;Friederici & Gierhan, 2013;Rimmele et al., 2018). It is clear that there is top-down influence on the lateralization of human language via a dorsal and a ventral stream (Friederici & Gierhan, 2013), although neural correlates are only defined on the network level (Elmer et al., 2017;Skeide & Friederici, 2016) and information about specific neural circuits within auditory or higher level areas in this network is lacking. Currently, there is no accepted model of how top-down and bottom-up information converges in the AC. The exact role of neural plasticity, learning, neurodegeneration and context-dependent modulation of responses in shaping lateralized processing in human AC is still largely undefined (Friederici & Gierhan, 2013;Kreitewolf et al., 2014;Liem et al., 2014;Price, 2010).

| NEW INVASIVE METHODS IN ANIMAL MODELS ALLOW UNPRECEDENTED ASSESSMENT OF AC COMMUNICATION CIRCUITRY
Despite noticeable hemispheric differences in gross anatomy of the temporal lobe (Boni et al., 2007;Moerel et al., 2014;Penhune et al., 1996;Rademacher et al., 2001), functional differences in the human AC are surprisingly small and hard to measure. Given the small difference in activation between the left and right AC when processing speech-like stimuli Overath et al., 2015;Silbert et al., 2014;Upadhyay et al., 2008), and the limitations of non-invasive neuroimaging, further progress requires very intricate experimental designs and analyses (Bradshaw et al., 2017). Researchers have thus called for more anatomical and functional studies with higher spatial and temporal resolution (Albouy et al., 2020;Friederici & Gierhan, 2013;Hickok & Poeppel, 2007;Price, 2010;Zatorre et al., 2002). Animal studies can reach several orders of magnitude higher resolution (e.g., 64-μm isotropic marmoset diffusionweighted magnetic resonance imaging [dMRI]: Liu et al., 2020) and can achieve cellular resolution with microscopic methods (e.g., marmoset 143 tracer injections data: Majka et al., 2020; mouse brain 10-μm reference atlas: Wang et al., 2020; tracer study of gerbil primary sensory fields: Henschke et al., 2018). In addition, modern wet lab techniques and accompanying computational analysis techniques add options for manipulation and interrogation of neural circuits that have no equivalent in human studies. These include clearing coupled with light sheet microscopy (whole-mouse clearing: Cai et al., 2019; simple and rapid mouse organ clearing: Renier et al., 2014; whole human brain clearing: Zhao et al., 2020; review on clearing techniques: Ueda et al., 2020), optogenetics (Joshi et al., 2020), two photon microscopy (Benninger & Piston, 2013), multimodal image registration (Goubran et al., 2019) and deep learning for cellular analysis (Moen et al., 2019).
Rodent models feature a wide variety of genetic (e.g., optogenetics, fluorescent biomarkers and disease models) and experimental tools (e.g., calcium imaging, in vivo and in vitro electrophysiology, tracer studies, and cell-type-specific labelling). Of non-human primates, macaques (Macaca mulatta) and marmoset monkeys (Callithrix jacchus) have been used most prominently for communication research. Both show highly vocal social behaviour (Agamaite et al., 2015;Cheney & Seyfarth, 2018;Miller et al., 2016;Okano et al., 2016) and may be suitable models for human vocal communication due to potentially conserved or analogous mechanisms in the cortical auditory system.

| Side preferences in vocal behaviours reflect brain lateralization across several species
Non-human primates show some of the behavioural effects of lateralized communication call processing known from humans, such as a right ear advantage for more speech-like stimuli (Shankweiler & Studdert-Kennedy, 1967). In primates, equivalent head turn tests (Hauser et al., 1998;Hauser & Andersson, 1994;Petersen et al., 1978) and lesion studies (Heffner & Heffner, 1984) also indicate that primates favour the right ear (i.e., the left hemisphere) for conspecific vocalizations over heterospecific vocalizations or other calls. However, the inference from ear preference to cortical lateralization is somewhat ambiguous, because the auditory nerve does not exclusively project to the contralateral hemisphere, although the contralateral connections outnumber the ipsilateral connections 5 to 1 (Musiek & Baran, 2020). This also manifests in functional imaging studies, where lateralization is less clear and shows a complex pattern depending on the calls presented, similar to results in humans (Poremba et al., 2004;Taglialatela et al., 2009).
The house mouse (Mus musculus) also shows a right ear advantage for communication calls, as evidenced by pup retrieval studies: Mothers more frequently respond to the calls of their offspring when hearing them with their right ear (Ehret, 1987). These calls also evoke a higher Fos expression in the left AC (Ehret & Geissler, 2006;Levy et al., 2019). Oxytocin-dependent activity in the AC involved in pup retrieval (a frequently studied social communication context in mice) also shows leftward lateralization (Marlin et al., 2015;Mitre et al., 2016;Tasaka et al., 2020).
Lateralization is likely not a recent evolutionary feature. Even songbirds, which have a very different forebrain organization (no neocortex), show a remarkable amount of lateralized brain functions. Their visual system (Güntürkün et al., 2000), motor system (Casey & Martino, 2000;Randler et al., 2011) and their communication call processing are lateralized. When conspecific vocalizations are filtered spectrally or temporally, the representation of these calls shifts towards the left or right hemisphere, respectively, in birds (Ruijssevelt et al., 2017) and humans (Albouy et al., 2020). For a review of comparative approaches including birds, see Güntürkün et al. (2020). This analogy supports the notion that lateralization has an evolutionary advantage in some contexts by providing more efficient processing of stimuli (Corballis, 2017;Vallortigara & Rogers, 2005): Duplicated neural structures can have the same input but generate different outputs, especially when working in a predictive manner-therefore increasing the amount of information extracted from a stimulus (Seoane, 2020). In the case of AC, left and right AC have complementary preferences for temporally complex and spectrally complex stimuli (see above). This division of labour between left and right AC is a way of measuring high-resolution spectral and temporal fluctuations. Both require different temporal integration windows (Poeppel, 2003) and are therefore to some degree mutually exclusive ('acoustic uncertainty principle', Zatorre et al., 2002). Another example of a lateralized trait is handedness, or limb preference, , which is present in birds (Casey & Martino, 2000;Randler et al., 2011), fish (Bisazza et al., 2001), and even some snakes (coiling direction preference, Roth, 2003), and can be understood as a way to save resources (Güntürkün et al., 2020). There is currently only sparse evidence for a common mammalian ancestor with lateralized conspecific communication processing . Thus, both limb preference  and asymmetric representation of conspecific vocalizations are likely to be products of parallel evolution in a variety of species, implying an evolutionary advantage in both cases.

| BASIC ACOUSTIC FEATURES AND THEIR CORTICAL PROCESSING IN MAMMALIAN VOCAL COMMUNICATION
Compared with humans, non-human primate and rodent communication calls lack sophisticated grammatical structure and combinatorial power. Furthermore, the underlying biomechanics in the vocal tract are different-human speech relies heavily on spectral modulation through the upper vocal tract as opposed to nonhuman primates. Their communication calls are typically composed of short repeating syllable-like units of varying acoustic complexity, from relatively simple up or down sweeps to more complex phonetic units with multiple formants (Figure 1) (rodents: Arriaga et al., 2012;Holy & Guo, 2005;primates: Agamaite et al., 2015;(Bezerra & Souto, 2008;Landman et al., 2020;Takahashi et al., 2015;Zhu et al., 2019aZhu et al., , 2019b. Calls can consist of single or multiple concatenated phonetic units of the same or different syllable types. Their structure depends on social contexts, such as intersexual presence and estrous cycle states (Chabout et al., 2015;Cheney & Seyfarth, 2018;Gaub et al., 2016;Hanson & Hurley, 2012). These calls convey information such as gender, talker identity and emotional state (Seyfarth & Cheney, 2003). Rodents use communication calls to infer the affective state of infants (Boulanger-Bertolus et al., 2017), to form long-lasting relationships (Laham et al., 2021), and to negotiate non-aggressive and territorial encounters (Portfors & Perkel, 2014).
Other aspects of vocal communication are also present in primates and rodents: Vocal learning is very prominent in human communication, and some vocal learning traits are detectable in mice (Arriaga & Jarvis, 2013) and marmosets (Takahashi et al., 2017). Vocalizations of non-human primates and rodents are dependent on FOXP2 and other genes known to be involved in speech (mice: Gaub et al., 2016;Shu et al., 2005). These genes are expressed in homologous brain areas in nonhuman primates and humans (Kato et al., 2014).

| Animal models provide insight into characteristics of AC fields
Non-human primate models for communication are primarily macaques (Macaca mulatta) and marmosets (Callithrix jacchus). The latter recently gained popularity as a model organism for communication (Grimm, 2018;Miller et al., 2016;Okano et al., 2016). Both species feature a human-analogous gross organization of AC ( Figure 2). Primate AC is organized in a hierarchy of parallel fields, with primary core fields surrounded by secondary belt and parabelt fields, with secondary and tertiary areas integrating the signal over a longer time period, being more sensitive to certain complex features and feature combinations (Cammoun et al., 2015;de la Mothe et al., 2006;Hackett et al., 1998Hackett et al., , 2007Moerel et al., 2014;Morel et al., 1993;Upadhyay et al., 2007). The core consists of two or three tonotopically organized fields (Bendor & Wang, 2008;Besle et al., 2019;Formisano et al., 2003;Rauschecker, 1998;Schönwiesner et al., 2015;Tani et al., 2018). For a comparison of neuronal cell density in primates and mice AC, see the Supporting Information.
The layout of human core and belt auditory fields and their correspondence to non-human primate fields is debated, with some functional MRI (fMRI) studies claiming that disagreements between species originate from interindividual variability (Baumann et al., 2013;Schönwiesner et al., 2015), while other studies suggest fundamental differences in organization principles (Besle et al., 2019). Despite these differences in tonotopic gradient orientations, basic functional organization (e.g., broader frequency tuning and longer temporal integration in the belt, representation of feature combinations in belt and parabelt) appears similar between nonhuman primates and humans (Wessinger et al., 2001;  Zhu et al., 2019b). There are also structural similarities between monkey and human AC (Fullerton & Pandya, 2007;Smiley et al., 2013;Sweet et al., 2005). In contrast to Besle et al. (2019), Eichert et al. (2020) showed that comparative myelin-based alignment of auditory cortical regions is possible and indicates that differences between primates most likely arose from cortical expansion and formation of new tracts, rather than from fundamentally different principles of cortical organization.
Older studies in non-human primates found a tendency towards stronger responses to vocalizations in the left AC (Hauser & Andersson, 1994;Petersen et al., 1978), but newer work seems to show largely bilateral responses (Petkov et al., 2008;Sadagopan et al., 2015). In addition, there appears to be some variability across species. Chimpanzees show rightward lateralization for at least some communication calls (Taglialatela et al., 2009), and macaques have dedicated voice selective cells and regions primarily in the right anterior temporal lobe ('voice patches', Perrodin et al., 2011;Petkov et al., 2008). Although these higher order regions may be in different locations with respect to lower level AC in different primate species, the same principle of abstraction of incoming stimuli along the auditory processing hierarchy holds in non-human primates (Belin et al., 2018;Romanski & Averbeck, 2009;Sadagopan & Wang, 2009;Toarmino et al., 2017;Zhu et al., 2019b) and humans (Humphries et al., 2014;Latinus et al., 2013;Pernet et al., 2015;Tremblay et al., 2013).
Comparative studies provided the key insight that not only gross structure but also the encoding of vocalizations in the primary AC is very similar in non-human primates and humans. Both show a majority of AC cells coding tonotopically (Steinschneider et al., 2013;Wang, 2007). These tonotopic regions capture information that can be used to predict vowel identity in humans (Fisher et al., 2018;Obleser et al., 2006) and macaques (Fishman et al., 2016). However, a subset of primary AC neurons also codes for communication features, such as pitch (Zhu et al., 2019a(Zhu et al., , 2019b, frequency sweeps, and amplitude modulations (Sadagopan & Wang, 2009). In marmosets, the anterior temporal pole is the only known region that codes for vocal communication stimuli (Sadagopan et al., 2015). Macaques, however, show a multitude of F I G U R E 2 Schematic view of early auditory fields and neighbouring voice selective areas in different species. Auditory core (dark grey) and the surrounding belt fields (light grey) show comparable arrangement and tonotopic gradients in these mammals, although the exact homologies have not been demonstrated. (a) shows mouse auditory cortex (AC) after Stiebler et al. (1997); (b) shows marmoset AC after Tani et al. (2018) with voice selective areas according to Sadagopan et al. (2015); (c) shows macaque AC after Hackett and colleagues (2001), voice selective regions after Petkov et al. (2008Petkov et al. ( , 2009); (d) shows human auditory fields (Glasser et al., 2016) with voice selective area from Belin et al. (2000). It is not clear if A1 in humans is equivalent to A1 in the other species. It can be cytoarchitectonically subdivided , and cytoarchitectonic studies found similarities to monkeys (Fullerton & Pandya, 2007;Smiley et al., 2013;Sweet et al., 2005). Still, it is unclear if these areas are homologous to non-human primate fields A1, R and RT (Besle et al., 2019;Brewer & Barton, 2016). The specific tonotopic map of human AC and its correspondence to non-human primate AC tonotopy is still in active discussion (Baumann et al., 2013;Besle et al., 2019;Schönwiesner et al., 2015). A1, primary auditory cortex (in humans: auditory field 1); A2, secondary auditory cortex; AAF. anterior auditory field; AL, anterolateral belt; CL, caudolateral belt; CM, caudomedial belt; CPB, caudal parabelt; DP, dorsoposterior field; LB, lateral belt; MB, medial belt; ML, mediolateral belt; PB, parabelt; R, rostral field; RM, rostromedial belt; RPB, rostral parabelt; RT. rostrotemporal field; RTL, rostrotemporal lateral belt; RTM, rostrotemporomedial belt; V, voice selective areas distributed vocalization-coding regions in the temporal lobe ('voice patches', (Petkov et al., 2008) similar in principle to voice-selective areas in humans (Petkov et al., 2009), although in a different location (Belin et al., 2000;Belin & Zatorre, 2003). It is unknown if these voice patches are also present in marmosets. Sadagopan et al. (2015) argued that voice-selective neurons form an anterior-posterior gradient in marmosets and are therefore not detectable as patches with fMRI. However, they acknowledge the possibility that these patches might be present, but, due to the smaller brain size, might simply be too small to detect with fMRI. Whether voice patches in macaques are homologous to the human voice areas in the superior temporal sulcus of both hemispheres is unknown (reviewed in Bodin & Belin, 2020). Non-human primates appear to lack a representation of conspecific vocalizations in the superior temporal sulcus, whereas in humans is regarded as a central language processing hub (Friederici et al., 2017), indicating it might be part of a human-exclusive language pathway (Joly et al., 2012). The arcuate fasciculus is another part of this pathway (Friederici & Gierhan, 2013), which is present in chimpanzees (Rilling et al., 2008), but only a prototype is detectable in macaques (Balezeau et al., 2020).
An inherent weakness of non-human primate studies is the typically low number of individuals (frequently only 2 or 3) and the limited genetic control. These issues can be addressed with rodent models. In particular, studying mice allows for large sample sizes of genetically controlled strains. Mice are highly vocal (Chabout et al., 2015;Holy & Guo, 2005) and share many aspects of auditory processing with primates, including lateralization of vocal communication. Rodent core AC shows tonotopy comparable with primates, with at least two tonotopic gradients distributed in core AC (Guo et al., 2012;Joachimsthaler et al., 2014;Polley et al., 2007;Romero et al., 2020;Stiebler et al., 1997). The representation of amplitude modulations, an important basic sound feature, is similar in rodent and primate primary AC (Hoglen et al., 2018;Schulze & Langner, 1999). AC function in mice is highly lateralized. Responses to generic sounds show very low correlation between left and right AC (Shimaoka et al., 2019). The left AC shows a preference for communication sounds over the right (Ehret, 1987;Ehret & Geissler, 2006;Geissler et al., 2016;Geissler & Ehret, 2004;Levy et al., 2019;Stiebler et al., 1997). Like in primates, secondary AC in mice shows a significantly increased response latency, altered tuning bandwidth and less clear tonotopic representation than primary AC (Joachimsthaler et al., 2014;Polley et al., 2007;Romero et al., 2020). Some secondary fields also respond more robustly to vocal communication, even when distorted (Carruthers et al., 2015), or show a particular preference for ultrasonic vocalizations (Chong et al., 2020;Tasaka et al., 2020). However, vocalizations do not seem to be represented as patches like in macaque AC (Petkov et al., 2008(Petkov et al., , 2009), but rather as gradients like in the marmoset. Thus, the mouse is a suitable model for investigating functional microcircuits in AC.

| A POSSIBLE MICROANATOMICAL BASIS OF LATERALIZED PROCESSING
Neuronal responses in rodent primary and secondary AC are highly dynamic, contrary to what classical tonotopic maps might suggest. Single neurons in iso-frequency bands show diverse best frequencies (Issa et al., 2014, left hemisphere of mice; Maor et al., 2016, left hemisphere of mice) and respond very differently to different sounds, indicating that sound representation is far more complex than mere tonotopy (Luczak et al., 2009, left hemisphere of rats). Auditory stimuli are sparsely encoded (<5% of neurons represent a stimulus), and only highly dynamic subsets of neurons are active at any time (Hrom adka et al., 2008, left hemisphere of rats), likely forming spontaneously upon stimulus presentation (Shiramatsu et al., 2016, right hemisphere of rats). These representations were found in Layer 2/3 of primary AC and are likely a product of a finely regulated local excitatory/inhibitory balance (Liang et al., 2019, right hemisphere of mice). Even the input-output functions of single neurons in Layer 4 of primary AC are already non-linear and stimulus dependent (Kim et al., 2020, right hemisphere of mice).
Primary AC shows a strong hierarchy not only with its neighbouring fields but also within its layers. The incoming signal is transformed from L3b/4 to L2/3: Although L4 shows strong frequency selectivity and small intertrial differences, L2/3 has wider response properties (Guo et al., 2012, right hemisphere of mice; Winkowski & Kanold, 2013, left hemisphere of mice). Transformations from temporal to rate coding can happen within a single cortical neuron (Gao & Wehr, 2015, left AC of rats). Gaucher et al. (2013) argued that the sparse temporal code is the first step in building a representation of communication sounds independent of the acoustic features of sounds.
A significant percentage of the existent rodent studies focused on a single hemisphere, making comparisons of hemispheric specializations across studies difficult. Although gross anatomical and functional differences between AC hemispheres are evident (Ehret & Geissler, 2006;Stiebler et al., 1997), few studies have examined microanatomical differences bilaterally.
Columnar microcircuits are integral to processing in AC. Columnar microcircuits of <200-μm diameter function in a highly synchronized manner and might be the principal functional units in primary AC (See et al., 2018, right hemisphere of rats). When presented with vocalizations, clusters of activation are present in all cortical layers of left and right AC but are most prominent in Layer 2/3 (Geissler et al., 2016), where responses, even to pure tones, are very heterogeneous (Rothschild et al., 2010, left hemisphere of mice). Inputs to the neurons in these columns differ relative to the direction of the tonotopic gradient: Within an isofrequency band, neurons receive mainly input from within their cortical column, but along the tonotopic axis, inputs predominantly arise from neighbouring columns (Oviedo et al., 2010, left hemisphere of mice). Later studies found that in left AC, these out-of-column inputs along the tonotopic axis are preferentially from neurons in Layer 6 to neurons in Layer 2/3 that are tuned to lower frequencies. These might function as hard-wired microcircuits detecting down sweeps in auditory stimuli (Levy et al., 2019, left and right hemisphere of mice; Levy & Reyes, 2012, left hemisphere of mice). Down sweeps are typical components of mouse communication calls (Chabout et al., 2015;Liu et al., 2003). The right primary AC shows this potential down sweep detection circuit in the posterior part, but an inverse pattern corresponding to a potential up sweep detector in the anterior part. The authors propose a specialist role for communication call detection (i.e., down sweep detection) for the left AC and a generalist role for the right AC (Levy et al., 2019). The presence of these microcircuits has, so far, not been studied in primates or humans, nor have the experiments been independently replicated in rodents. However, a similar asymmetry in the responses to frequency up and down sweeps has been observed in the right AC of rats (Zhang et al., 2003). Neophytou and Oviedo (2020) speculate that in principle, the model they describe in mouse AC (Levy et al., 2019) is applicable to human speech. This notion, however, has not been tested empirically. A promising model species to investigate this question is the marmoset, which features frequent upward sweeps in its vocalizations (Agamaite et al., 2015). If the left AC preference for vocal communication features is analogous across rodents and primates, the marmoset should show a preference for upward sweeps in the left primary AC. Such results would ultimately need to be connected to behaviour. Very few studies have tried to demonstrate a differential involvement of left and right AC in communication or frequency sweep discrimination. Right AC appears to be critical for sweep discrimination in gerbils (Wetzel et al., 1998;Wetzel et al., 2008) and in rats (Rybalko et al., 2006). Apart from frequency cues, temporal cues are important in vocal communication. Left AC appears to play an important role in the processing of temporally complex stimuli in rats (Rybalko et al., 2010), but no hypothesis for the relevant microcircuits has been advanced so far. It is possible that other acoustic features are also detected by dedicated microcircuits in mammals. In fact, current understanding of the emergence of feature detectors in sensory brain areas through selective connectivity and shaping of receptive fields by the local microcircuitry makes this likely (Aponte et al., 2021;Atencio & Schreiner, 2010;Liu et al., 2019;Montes-Louridoa et al., 2021).
In humans, it seems likely that left and right AC performs tasks in a lateralized manner, as the human AC shows some anatomical hemispheric asymmetries: In the left primary AC, relative cell volume density is higher than in the right ; however, cell density seems to be symmetric (Smiley et al., 2013). This might, however, not be specific to auditory areas, as a comparable cell density asymmetry has been documented in the fusiform gyrus (Chance et al., 2013). Additionally, in non-primary left AC, microcolumns are more widely spaced than in the right (Buxhoeveden et al., 2001;Chance et al., 2008;Galuske et al., 2000). The left AC also shows higher dendritic density and spread (Seldon, 1981(Seldon, , 1982. Differences in the microstructure within left and right AC have been demonstrated with diffusion MRI (Schmitz et al., 2019), and higher density of dendrites and axons in the left hemisphere is associated with faster processing of speech (Ocklenburg et al., 2018). Across fields, the left AC shows more connections in ventrodorsal directions, whereas the right AC shows more antero-posterior connections (Cammoun et al., 2015). Primary AC also shows remarkably higher tangential than radial diffusion (McNab et al., 2013), reminiscent of what Levy et al. (2019) described in mice.

| CURRENT TECHNOLOGY COULD ENABLE SYNTHESIS OF ANIMAL AND HUMAN DATA
Until recently, MRI and invasive biological methods have generated data on vastly different spatial scales. Although MRI voxels are on the millimetre scale, but can be acquired from the whole brain, microscopy and electrophysiology acquire functional and structural data on the micrometre scale, but typically only from a few cells or from small brain areas. Imaging of cellular level structural data in large tissue samples is now possible through clearing techniques such as CLARITY (Chung et al., 2013;Tomer et al., 2014), CUBIC (Susaki et al., 2015) and the different DISCO variants (Cai et al., 2019;Renier et al., 2014Renier et al., , 2016. These methods are being continually improved (for review, see Molbay et al., 2021 andUeda et al., 2020). Clearing allows imaging of both brain hemispheres of a mouse in a single sample, simplifies registration of datasets to a reference atlas or functional data and allows easier analyses in three dimensions as opposed to serial 2D histology. For example, using iDISCO+, Renier et al. (2016) managed to map mouse whole-brain neuronal activity proxies in different contexts: a sensory task, a social task, an exploratory environment and chemical treatment. The same technique was later employed by Levy et al. (2019) to quantify hemispherical (and cortical layer) preference to frequency sweeps and vocalizations, showing a left-hemisphere asymmetry for vocalizations. With novel clearing agents, Zhao et al. (2020) showed that even a complete human brain can be rendered transparent. Human tissue is more difficult to process with clearing methods, because donors are typically of old age and their brain tissue has higher concentrations of collagen and fluorescent plaques, making microscopic imaging more difficult than in standardized animal models (Morawski et al., 2018). Other emerging approaches for primate studies include optogenetic (Rajalingham et al., 2020;Tremblay et al., 2020) and transgenic methods (Miller et al., 2016;Okano et al., 2016), which recently became available in non-human primate models. Using these methods, one could subsequently acquire functional and microstructural data in the same animal, for instance in a transgenic marmoset with labelled neurons. This 'longitudinal' approach would allow strong conclusions on structure-function relationships in lateralized structures.
Tools to analyse imaging data from humans and animal models are largely segregated, even though the technical problems that arise in both cases overlap. Key problems include standardized data formats (NIFTI, Cox et al., 2004;BIDS, Gorgolewski et al., 2016), registration of data to an atlas (Allen Mouse Brain atlas, Wang et al., 2020;ICBM 152 human atlas, Fonov et al., 2009), automated annotation and segmentation of structures (INSECT, ANIMAL, Collins et al., 1999), and the reproducible application of statistical learning methods (Abraham et al., 2014;Hanke et al., 2009;Hebart et al., 2015). Human imaging data are reasonably standardized, largely due to ICBM standards being well established in the field. In contrast, there are few universal standards for microscopy data in animal literature. The Allen mouse brain atlas (Lein et al., 2007;Wang et al., 2020) offers anatomical and genomic reference, but (meta-)data formats vary across labs (HDF5, Koranne, 2011, TIFF and variants), making adaptation of analysis pipelines to different data formats time-consuming and prone to errors. Across different species, there is currently no standardization. Several public datasets of marmosets and humans, both histological (Amunts et al., 2013, human;Majka et al., 2020, marmoset) and MRI (e.g., Liu et al., 2020, marmoset;Poldrack & Gorgolewski, 2017, human) can serve as a kernel for such standardization. Additionally, the PRIMatE Data Exchange (PRIME-DE) serves as a central collection point for such open datasets (Milham et al., 2018).
To the best of our knowledge, the only tool that integrates microscopy and MRI data (beyond atlases as a frame of reference) is MIRACL (Goubran et al., 2019). It was used to integrate CLARITY viral tracer data with MRI data to enhance diffusion prediction accuracy and to remedy shortcomings of diffusion MRI, such as the inability to resolve some fibres crossings. Mapping tractlevel connectivity and doing cell-level analyses in the same individual allows for novel approaches in various different neuroscientific fields.
According to Poeppel (2012), understanding a complex brain mechanism such as vocal communication requires "theoretically well-motivated, computationally explicit, and biologically realistic characterizations of function". The fusion of human and animal data is ultimately necessary for describing the underlying cortical microstructure and connecting it to psychological 'boxand-arrow' models of language processing.

| CONCLUSION
Although the classical concept of 'clear left-lateralized communication' does not capture the complexity of AC in humans or other mammals, it is evident that AC function is lateralized in rodents, non-human primates, and humans. The left AC is critical for communication processing in all of these species. This specialization, counterintuitively, seems to be more prominent in humans and rodents than in non-human primates, where it is hard to measure it reliably. This is perhaps because non-human primates have a vastly more complex AC than rodents but lack the complexity of human speech and hence the necessary analysis mechanisms. AC fields overlap in their characteristics across species: Tonotopy is pronounced in core fields, though local best frequency tuning is diverse; primates use a distributed network of higher order fields to code for vocalization stimuli; and anatomical hemispheric differences are present in all discussed species. Studies in rodent models can provide critical insights into cortical processes with methods like tissue clearing, cell-level functional imaging and genetic toolkits. However, data from different species and methods are often difficult to compare. Efforts to standardize data formats and analysis tools would benefit comparative research and enable synergies between psychological and biological work in the area of vocal communication processing.