Differential effects of the temporal and spatial distribution of audiovisual stimuli on cross‐modal spatial recalibration

Visual input constantly recalibrates auditory spatial representations. Exposure to isochronous audiovisual stimuli with a fixed spatial disparity typically results in a subsequent auditory localization bias (ventriloquism aftereffect, VAE), whereas exposure to spatially congruent audiovisual stimuli improves subsequent auditory localization (multisensory enhancement, ME). Here, we tested whether cross‐modal recalibration is affected by the stimulation rate and/or the distribution of audiovisual spatial disparities during training. Auditory localization was tested before and after participants were exposed either to audiovisual stimuli with a constant spatial disparity of 13.5° (VAE) or to spatially congruent audiovisual stimulation (ME). In a between‐subjects design, audiovisual stimuli were presented either at a low frequency of 2 Hz, as used in previous studies of VAE and ME, or intermittently at a high frequency of 10 Hz, which mimics long‐term potentiation (LTP) protocols and which was found superior in eliciting unisensory perceptual learning. Compared to low‐frequency stimulation, VAE was reduced after high‐frequency stimulation, whereas ME occurred regardless of the stimulation protocol. In two additional groups, we manipulated the spatial distribution of audiovisual stimuli in the low‐frequency condition. Stimuli were presented with varying audiovisual disparities centered around 13.5° (VAE) or 0° (ME). Both VAE and ME were equally strong compared to a fixed spatial relationship of 13.5° or 0°, respectively. Taken together, our results suggest (a) that VAE and ME represent partly dissociable forms of learning and (b) that auditory representations adjust to the overall stimulus statistics rather than to a specific audiovisual spatial relationship.

Cross-modal recalibration emerges in parallel at two distinct time scales (Bosen, Fleming, Allen, O'Neill, & Paige, 2018;Bruns & Röder, 2015;Mendonça, Escher, van de Par, & Colonius, 2015; Van der Burg, Alais, & Cass, 2015;Watson, Akeroyd, Roach, & Webb, 2019). On the one hand, even a single exposure to a spatially discrepant audiovisual stimulus results in an immediate but transient adjustment of auditory localization (Wozny & Shams, 2011). On the other hand, repeated exposure to a consistent audiovisual disparity results in stronger and more stable changes in auditory processing that seem to integrate sensory evidence over time (Bruns & Röder, 2019a;Frissen, Vroomen, & de Gelder, 2012;Zierul, Röder, Tempelmann, Bruns, & Noesselt, 2017). To induce such cumulative cross-modal recalibration effects, previous studies have typically presented audiovisual stimuli with a fixed spatial relationship at a steady rate of one or two stimuli per second for several minutes (e.g., Bruns & Röder, 2019b;Frissen et al., 2012;Lewald, 2002;Passamonti et al., 2009;Radeau & Bertelson, 1974;Recanzone, 1998;Sarlat et al., 2006). This stimulation protocol reliably induced changes in sound localization behavior, but it remained unclear from these studies whether and how different temporal or spatial patterns of audiovisual stimulation would affect the strength of cross-modal recalibration. This knowledge would be informative for understanding the underlying neural mechanisms of cross-modal recalibration. Recently, we observed that exposure to incrementally increasing audiovisual discrepancies resulted in lower cross-modal recalibration than exposure to a constant audiovisual discrepancy, suggesting a crucial influence of the precise stimulation protocol on learning outcomes (Bruns & Röder, 2019a).
To the best of our knowledge, the effects of multisensory high-frequency stimulation have not been tested so far. However, cross-modal transfer of unisensory high-frequency stimulation effects has been demonstrated in a study in hemianopic patients with unilateral visual cortical lesions (Lewald, Tegenthoff, Peters, & Hausmann, 2012). In this study, high-frequency auditory stimulation in the blind hemifield of the patients induced a subsequent improvement of visual detection performance. At least in some cases, residual visual functions in hemianopic patients, known as blindsight, seem to depend on the functional integrity of the superior colliculus (SC) in the midbrain, which may convey visual information directly to extrastriate cortical areas (Fox, Goodale, & Bourne, 2020;Leopold, 2012). The SC is a major multisensory integration site that contains superimposed maps of auditory and visual space (Stein & Stanford, 2008), and might, thus, have mediated enhanced visual performance in the hemianopic patients in response to auditory stimulation (Lewald et al., 2012).
Moreover, in other studies hemianopic patients showed an enhancement of auditory localization performance in response to (low-frequency) spatially congruent audiovisual stimulation in their blind hemifield (Leo, Bolognini, Passamonti, Stein, & Làdavas, 2008;Passamonti et al., 2009). In contrast, exposure to spatially discrepant audiovisual stimuli in the blind hemifield did not affect auditory localization performance. This dissociation between congruent and discrepant audiovisual stimulation suggests that flexible recalibration to changing cross-modal spatial correspondences requires visual cortical processing, whereas the collicular-extrastriate circuit allows for the integration of congruent audiovisual stimuli. Concordantly, studies in owls have suggested that spatial representations in the optic tectum, the avian homologue of the SC, become relatively stable in adulthood and are typically modulated by discrepant cross-modal input only within certain limits (Brainard & Knudsen, 1998).
Cross-modal recalibration studies have typically manipulated the spatial correspondence between auditory and visual stimuli without manipulating the variance of audiovisual discrepancies. One possibility is that presenting audiovisual stimuli with varying spatial discrepancies impedes recalibration because auditory and visual locations would be perceived as uncorrelated, and hence would be attributed to separate events (Parise, Spence, & Ernst, 2012). Another possibility is that auditory localization adjusts to the mean audiovisual discrepancy encountered in the exposure phase and hence would be unaffected by changes in variance. In the latter case, however, one might not expect an increase in auditory localization precision after exposure to varying audiovisual discrepancies with a mean of zero, because only spatially aligned audiovisual stimuli would activate the collicular-extrastriate pathway that appears to be crucial for reducing localization error (Leo et al., 2008;Passamonti et al., 2009). Concordantly, in studies with unimodal auditory stimulation it has been found that the perceived laterality of auditory stimuli was biased by the mean of recently encountered sound locations, whereas the sensitivity of lateralization judgments depended on the variance of the spatial distribution (Dahmen, Keating, Nodal, Schulz, & King, 2010).
In the present study, we systematically tested the effects of temporal (i.e., high-vs. low-frequency) and spatial (i.e., fixed vs. varying disparity) stimulation patterns on cross-modal recalibration to spatially discrepant (VAE) and spatially aligned (multisensory enhancement [ME]) audiovisual stimuli. In a between-subjects design, we compared recalibration effects in two baseline groups, which received standard exposure at a low frequency and with a fixed audiovisual disparity of either 13.5° or 0°, with two groups receiving exposure at a high frequency instead (Experiment 1) and two groups in which the audiovisual disparity varied around a mean of either 13.5° or 0° (Experiment 2). Two additional groups were exposed to auditory-only stimulation to control for unspecific test repetition effects (Experiment 3). We hypothesized (a) that high-frequency compared to low-frequency audiovisual stimulation would facilitate and (b) that exposure to varying audiovisual spatial discrepancies compared to a constant audiovisual correspondence would impede recalibration and audiovisual localization gains.

| Participants
A total of 120 healthy adult volunteers (28 male and 92 female) from the University of Hamburg community participated in the study. They had a mean age of 24.9 years (range 18-46 years), and all reported normal hearing and normal or corrected-to-normal vision. Participants were divided into eight groups of n = 15 each. All of them provided written informed consent prior to the study and received course credit or were compensated €7 for their participation. The experimental procedure was approved by the ethics commission of the German Psychological Society (DGPs), and the study was performed in accordance with the ethical standards laid down in the Declaration of Helsinki.

| Apparatus and stimuli
Participants were tested in a dark sound-attenuated room and faced the center of a semi-circular frame on which six loudspeakers (ConceptC Satellit, Teufel GmbH, Berlin, Germany) were mounted at a distance of 90 cm with eccentricities of ± 4.5°, ±13.5°, and ± 22.5° from the participants' straight-ahead position (0°). The loudspeakers were hidden from view behind an acoustically transparent curtain that extended to ± 90° from midline. For visual stimulation, a red laser pointer was projected onto the curtain at the level of the loudspeakers for 30 ms. The laser pointer was attached to a step motor which allowed visual stimulation at different azimuthal locations. Auditory stimuli were 1,000 Hz tones with a duration of 30 ms (including 5 ms linear rise/fall envelopes) that were presented at 65 dB(A). A sine tone, which is typically more difficult to localize than a noise burst, was used as in previous studies (e.g., Bruns & Röder, 2019a;Frissen et al., 2012;Lewald, 2002;Recanzone, 1998) to avoid potential ceiling effects in auditory localization performance. Sound intensity was randomly varied over a 4 dB range for every stimulus presentation to reduce any detectable differences in the loudspeaker transformation functions. During unimodal auditory localization tests, participants indicated perceived sound locations with a rotatable hand pointer which was mounted in front of them.

| Pre-and posttest
All participants performed a unimodal sound localization task before (pretest) and immediately after (posttest) exposure to passive audiovisual (Experiments 1 and 2) or auditory-only (Experiment 3) stimulation. Pre-and posttests each consisted of 90 trials (15 trials per loudspeaker location) that were presented in a randomized order. At the beginning of each trial, the laser point was presented at the central azimuthal location (0°). The laser point was turned off as soon as the participants had aligned the hand pointer within ± 10° from center, and after a random delay (between 500 and 1,500 ms), an auditory target stimulus was presented from one of the six loudspeakers. Participants adjusted the hand pointer as accurately as possible toward the perceived azimuthal location of the sound source and confirmed their response by a button press. The next trial started 350 ms after the response.

| Exposure phase
After the auditory localization pretest, participants received one block of 600 exposure trials that had a duration of 5 min. The duration and the number of stimuli were identical for all experimental conditions. Auditory stimuli were always presented from the same six locations as in the pre-and posttest (100 stimuli per location). In Experiments 1 and 2, they were presented synchronously with a visual stimulus that was either spatially congruent or incongruent, depending on condition (see below). Stimuli were arranged in sets of 10 successive stimuli with identical locations. To ensure that participants attended to the stimulation, the laser point was occasionally illuminated in the time interval between the tenth stimulus of a set and the first stimulus of the next set. Participants had to press a button whenever they detected such a deviant stimulus, which occurred six times during the exposure phase. The pattern of sensory stimulation differed as follows between experiments and experimental conditions (see also Figure 1). Each participant completed one experiment and experimental condition only.
F I G U R E 1 Schematic illustration of the audiovisual exposure phase. Stimulus locations changed every 5 s (i.e., after 10 audiovisual stimuli). Depending on condition, audiovisual stimuli were either presented at a steady rate of 2 Hz (LTD) or intermittently at 10 Hz (LTP). The audiovisual spatial relationship was either constant throughout the exposure phase (fixed) or changed every 5 s (varying). Visual stimuli were either (on average) displaced 13.5° to the right of the sound source (incongruent) or were (on average) presented at the same location as the sounds (congruent) [Colour figure can be viewed at wileyonlinelibrary.com]

| Experiment 1
This experiment compared the effects of temporal highversus low-frequency stimulation patterns on cross-modal recalibration, separately for spatially discrepant and spatially congruent audiovisual stimuli (see Figure 1). In the "LTD fixed incongruent" group, audiovisual stimuli were presented continuously at a low frequency of 2 Hz (i.e., two stimuli per second). Visual stimuli were always 13.5° to the right of the sound source. By contrast, in the "LTD fixed congruent" group, visual stimuli were always presented at the same location as the auditory stimuli (0° discrepancy). Aftereffects in the two LTD groups were compared with aftereffects in two groups which received high-frequency (LTP-like) stimulation instead. In the LTP groups, audiovisual stimuli were presented intermittently at a high frequency of 10 Hz. Sets of 10 stimuli (1 s stimulation) were separated by gaps of four seconds, amounting to 120 stimuli per minute as in the LTD conditions. In the "LTP fixed incongruent" group, visual stimuli were always 13.5° to the right of the sound source as in the "LTD fixed incongruent" group. In the "LTP fixed congruent" group, visual stimuli were always presented at the same location as the auditory stimuli (0° discrepancy) as in the "LTD fixed congruent" group.

| Experiment 2
This experiment tested the effects of a varying audiovisual spatial relationship on cross-modal recalibration in two new groups, as compared to the fixed relationship used in Experiment 1 (see Figure 1). In the "LTD varying incongruent" group, audiovisual stimuli were presented continuously at a low frequency of 2 Hz as in the LTD fixed conditions. However, the audiovisual spatial discrepancy varied between 5.4° and 21.6° (mean 13.5°) in steps of 1.8° with a uniform distribution. In the "LTD varying congruent" group, the audiovisual spatial discrepancies varied between −8.1° and 8.1° in steps of 1.8° with a mean of 0°; thus, although auditory and visual stimuli were never presented from congruent locations, their average locations were the same. Aftereffects in these two groups were compared to the two LTD groups from Experiment 1, in which the audiovisual spatial relationship was fixed at 13.5° ("LTD fixed incongruent" group) or 0° ("LTD fixed congruent" group).

| Experiment 3
To control for test repetition effects and unspecific effects of the stimulation protocols, two control groups were tested that received auditory-only stimulation during the exposure phase. In the first control group, auditory stimuli were presented continuously at a low frequency of 2 Hz ("LTD auditory control"). In the second control group, the auditory stimuli were presented intermittently at a high frequency of 10 Hz ("LTP auditory control").

| Data analysis
To verify that participants were able to perform the sound localization task at pretest, we calculated a simple linear regression between the mean pointing responses and the true locations of the six loudspeakers. Mean pointing responses were well approximated by regression lines in all participants (R 2 ≥ .66, p ≤ .049), except in one participant from the "LTP auditory control" group who was not able to discriminate the sound locations (R 2 = 0.21, p = .367). Data from this participant were, thus, excluded from further analyses.
Separate analyses were performed to assess VAEs in incongruent conditions and ME effects in congruent conditions (cf. Passamonti et al., 2009). VAEs in incongruent conditions were defined as rightward shifts in sound localization (i.e., in the direction of the visual stimuli) from pre-to posttest (Bertelson et al., 2006;Bruns & Röder, 2019b;Frissen, Vroomen, de Gelder, & Bertelson, 2003Lewald, 2002;Recanzone, 1998). Thus, separately for the pre-and posttest, mean constant (i.e., signed) errors in sound localization were calculated by averaging responses across the 15 trials per loudspeaker location. Ventriloquism aftereffects were then calculated by subtracting constant errors in the pretest from those in the posttest and averaged across loudspeaker locations.
Multisensory enhancement effects in congruent conditions were defined as reductions in absolute localization errors (Passamonti et al., 2009;Strelnikov et al., 2011). For this purpose, mean absolute (i.e., unsigned) errors in sound localization were calculated by averaging absolute errors separately for the pre-and posttest. Posttest values were then subtracted from pretest values to yield positive values for error reductions after the exposure phase. The resulting values reflect both changes in localization accuracy and changes in localization precision. Potential ME effects could theoretically arise from the fact that the visual stimuli might have informed participants about the range of possible loudspeaker locations (−22.5° to 22.5°). To exclude that such higher-order knowledge accounted for performance improvements, we additionally tested for pre-to posttest changes in the slopes of the individually fitted regression lines between the mean pointing responses and the true locations of the six loudspeakers by means of paired t tests. A significant change in slope would indicate that the typically observed overshoot in localization responses with pointing tasks (Bruns, Maiworm, & Röder, 2014;Bruns & Röder, 2019a;Lewald, 2002;Lewald & Getzmann, 2006;Zierul, Tong, Bruns, & Röder, 2019) was reduced by congruent audiovisual stimulation. Such a result would suggest that cognitive or decisional factors caused ME.
For comparisons of interest, independent two-sample t tests were used to test for differences in the size of the aftereffects between experimental conditions. Additionally, one-sample t tests were used to test whether aftereffects were significantly larger than zero. Where appropriate, p values were adjusted to control the family-wise error rate using Hochberg's step-up procedure of the Bonferroni correction (Hochberg, 1988). All t tests were additionally performed as Bayesian hypothesis tests in JASP version 0.12 (Wagenmakers et al., 2018), and Bayes Factors (BF 10 for two-tailed tests and BF +0 for one-tailed tests) are reported. In addition, (Bayesian) one-way repeated-measures ANOVAs were used to test whether aftereffects significantly differed between the six loudspeaker locations. The Greenhouse-Geisser correction was used to correct for violations of the sphericity assumption and, where appropriate, the corrected p values are reported. Bayes Factors (BF incl ) for inclusion of the factor Location (as compared to the null model) are reported.

F I G U R E 3 Group-averaged sound localization responses per loudspeaker location at pretest and at posttest in Experiments 1 and 2. Dots indicate mean
pointing responses (in degrees) across the 15 trials per loudspeaker location, averaged across participants in each group. Dotted lines indicate the actual locations of the loudspeakers. Differences between pretest and posttest in incongruent audiovisual conditions correspond to ventriloquism aftereffects. Differences between pretest and posttest in congruent audiovisual conditions reflect changes in accuracy only [Colour figure can be viewed at wileyonlinelibrary.com] d = 0.74; BF +0 = 9.48). The localization error reduction did not significantly differ between locations (F 5,70 = 1.03, p = .390, η 2 G = 0.04; BF incl = 0.16), and there was no significant change in the slope of the individually fitted regression lines (t 14 = 0.76, p = .459, d = 0.20; BF 10 = 0.34). The overall size of the error reduction did not differ significantly from the "LTD fixed congruent" group in Experiment 1 in which audiovisual stimuli were always spatially congruent (t 28 = 0.05, p = .959, d = 0.02; BF 10 = 0.35; see Figure 2 right panel).

| DISCUSSION
In order to shed light on the neural mechanisms of crossmodal learning, we examined how the temporal pattern of audiovisual stimulation (Experiment 1) and the distribution of audiovisual spatial discrepancies (Experiment 2) affect the strength of cross-modal spatial recalibration. These manipulations additionally allowed us to compare different types of cross-modal learning (VAE and ME) with unisensory perceptual learning for which different temporal stimulation patterns resulted in opposite effects (i.e., improvements vs. impairments) on performance. As in previous studies, continuous audiovisual stimulation at a low frequency of 2 Hz reliably induced a subsequent shift in sound localization toward the visual attractor (i.e., a VAE) in the incongruent condition (13.5° audiovisual disparity) and reduced sound localization errors in the congruent condition (0° audiovisual disparity). However, changing the temporal stimulation pattern to an intermittent presentation of the audiovisual stimuli at a high frequency of 10 Hz significantly reduced the VAE, although it did not affect ME after spatially congruent audiovisual stimulation (Experiment 1). By contrast, the distribution of audiovisual spatial discrepancies (either always the same or varying between trials) had no effect on cross-modal recalibration, as both VAEs and ME effects did not differ between the two conditions (Experiment 2). A control experiment confirmed that both aftereffects were due to cross-modal stimulation during the exposure phase and did not occur after auditory-only stimulation (Experiment 3).
Why did the temporal pattern of audiovisual stimulation (low-vs. high-frequency) selectively modulate the VAE but not ME? Previous research has indicated that these two types of cross-modal recalibration are mediated by dissociable mechanisms. Recalibration to an altered audiovisual spatial correspondence in the VAE required intact visual cortical processing, whereas ME after congruent audiovisual stimulation did not and rather seemed to involve a collicular-extrastriate circuit (Bertini, Leo, Avenanti, & Làdavas, 2010;Leo et al., 2008;Passamonti et al., 2009). Our results provide corroborating evidence for this dissociation and might suggest that high-frequency stimulation made a difference only for cortical processing underlying the VAE.
An upper limit of around 4 Hz for successful task performance had previously been reported for tasks in which participants had to judge audiovisual temporal synchrony or had to match individual visual and auditory stimuli-at stimulation rates above 4 Hz participants were no longer able to perform these tasks above chance (Benjamins, van der Smagt, & Verstraten, 2008;Fujisaki & Nishida, 2005; for review, see Chen & Vroomen, 2013). These findings do, however, not necessarily imply that the individual stimuli were not perceived as separate events. In fact, for synchronous audiovisual stimulation (as used in the present study), it has been demonstrated that participants are able to discriminate stimulation rates around 10 Hz and that they benefit from audiovisual as compared to unimodal stimulation which is consistent with statistically optimal multisensory integration (Brooks, Anderson, Roach, McGraw, & McKendrick, 2015;Locke & Landy, 2017). Thus, high-frequency stimulation does not seem to interfere with genuine multisensory integration. Rather, specific multisensory functions as judging the temporal (or spatial) alignment might be impaired by high-frequency stimulation, in line with the present results which found an effect of stimulation rate only for the VAE but not for the ME effect.
One possibility is that subcortical processes, which are assumed to mediate the ME effect, are not responsive to stimulation frequency. If ME in response to congruent audiovisual stimulation is predominantly due to changes in the receptive fields of audiovisual neurons in the superficial layers of the SC, such a calibration would not require a time-consuming looping through distant neural circuits and might just depend on the co-activation statistics, rendering ME independent of the stimulation frequency. Alternatively, the audiovisual exposure duration of only 5 min in the present study might have been too short to show a differential effectiveness of high-versus low-frequency stimulation for the ME effect.
Earlier studies showed that longer unimodal high-frequency stimulation for one hour altered visual collicular-extrastriate processing in hemianopic patients (Lewald et al., 2012). In any case, the presence of a ME effect with high-frequency stimulation suggests that the selective reduction of the VAE was due to a specific temporal limitation of the neural circuitry required for cross-modal recalibration and not due to a general impairment of multisensory integration or other unspecific effects of high-frequency stimulation.
Modeling approaches have shown that spatial discrimination and localization are oppositely affected by lateral inhibition in a spatiotopically organized population of neurons, as typically found in somatosensory and visual cortex (Dinse & Jancke, 2001;Dinse et al., 2008;Jancke et al., 1999). On the one hand, strong lateral inhibition impairs tactile two-point discrimination performance because it reduces coexisting activation at other locations in the cortical population of neurons, which is necessary for discriminating two points from one. On the other hand, lateral inhibition reduces noisy fluctuations of activation, thereby improving localization of the peak position in the cortical map. The model assumes that high-frequency tactile stimulation induces a decrease of lateral inhibition, which would result in enhanced two-point discrimination performance due to stronger bimodal activations when two stimuli are present, but would impair localization performance due to increased noise in determining the location of a single activation peak .
In contrast to the previous findings in vision and touch Dinse et al., 2011;Jancke et al., 1999), high-frequency auditory-only stimulation in the present study did not significantly impair auditory localization performance. Auditory spatial perception might be less influenced by a potential decrease in lateral inhibition after high-frequency stimulation, because neurons in auditory cortex do not form a spatiotopic map as found in the visual and somatosensory systems (McAlpine & Grothe, 2003;Stecker, Harrington, & Middlebrooks, 2005). We did not directly test the effect of high-frequency visual stimulation on visual localization performance. However, if high-frequency stimulation indeed impaired visual spatial reliability to a larger extent than auditory spatial reliability, this could explain why the VAE decreased after audiovisual high-frequency stimulation. Reducing the relative reliability of the visual stimuli has been shown to result in a reduced visual capture of the perceived auditory location (Alais & Burr, 2004), which in turn could have resulted in a smaller cross-modal recalibration effect. However, some studies have reported that the strength of cross-modal recalibration does not depend on the relative reliability of the cross-modal stimuli (Zaidel, Turner, & Angelaki, 2011).
In addition, or as an alternative to a modality-specific effect on visual spatial reliability, high-frequency stimulation might have reduced the reliability of the audiovisual spatial disparity itself. If processing of the size of the disparity between the visual and the auditory stimulus became more variable under high-frequency stimulation, cross-modal recalibration might have been inhibited because the stimulation would be less informative about the true audiovisual spatial correspondence. However, the results of Experiment 2 largely rule out this possibility. In Experiment 2, audiovisual stimuli were presented with varying disparities, a manipulation which should have resembled the effect of a reduced reliability of audiovisual spatial processing. Nevertheless, the size of the aftereffects was virtually identical to those seen after audiovisual stimulation with a fixed disparity.
The finding that the distribution of audiovisual spatial discrepancies (fixed vs. varying) did not affect the strength of cross-modal recalibration might suggest that sensory evidence was integrated over time and auditory spatial representations were adapted by the mean audiovisual disparity rather than on a trial-by-trial basis. This could explain why sound localization improved in the varying congruent condition (mean disparity of 0°) although audiovisual stimuli were never actually presented from the same location in this condition. Previous studies have shown that tones of two different sound frequencies can be recalibrated independently even if they were concurrently paired with visual stimuli that had a mean disparity (across sound frequencies) of zero (Bruns & Röder, 2015, 2019b. Since this effect was dissociable from a trial-by-trial influence on unisensory sound localization (Bruns & Röder, 2015), it might similarly indicate an integration of sensory evidence over time. In the present study, the presentation of audiovisual exposure trials and unimodal auditory test trials in separate blocks might have particularly facilitated integration of sensory information over trials, because the task did not require any localization responses during the audiovisual exposure phase.
In summary, our results substantiate previous evidence (Passamonti et al., 2009) indicating that spatially congruent and spatially discrepant audiovisual stimulation activate dissociable learning mechanisms and show that they are differently sensitive to the temporal pattern of audiovisual stimulation. By contrast, changes in the variability of audiovisual discrepancies did not affect learning, suggesting that sensory representations adjust to the overall stimulus statistics rather than to a specific cross-modal relationship. These findings have implications for the design of multisensory training protocols which should carefully consider potential effects of the chosen temporal and spatial pattern of crossmodal stimulation on learning outcomes.