Cross-modal contextual memory guides selective attention in visual-search tasks

Visual search is speeded when a target item is positioned consistently within an invariant (repeatedly encountered) configuration of distractor items (“contex-tual cueing”). Contextual cueing is also observed in cross-modal search, when the location of the— visual— target is predicted by distractors from another— tactile— sensory modality. Previous studies examining lateralized waveforms of the event-related potential (ERP) with millisecond precision have shown that learned visual contexts improve a whole cascade of search-processing stages. Drawing on ERPs, the present study tested alternative accounts of contextual

cross-modal search task: search for a visual feature singleton, with repeated (and nonrepeated) distractor configurations presented either within the same (visual) or a different (tactile) modality.We found reaction times (RTs) to be faster for repeated versus nonrepeated configurations, with comparable facilitation effects between visual (unimodal) and tactile (crossmodal) context cues.Further, for repeated configurations, there were enhanced amplitudes (and reduced latencies) of ERPs indexing attentional allocation (PCN) and postselective analysis of the target (CDA), respectively; both components correlated positively with the RT facilitation.These effects were again comparable between uni-and crossmodal cueing conditions.In contrast, motor-related processes indexed by the responselocked LRP contributed little to the RT effects.These results indicate that both uni-and crossmodal context cues benefit the same, visual processing stages related to the selection and subsequent analysis of the search target.

K E Y W O R D S
CDA, contextual cueing, event-related potentials, multisensory processing, PCN, selective attention

| INTRODUCTION
Visual attention can be top-down controlled and guided by observers' "online" knowledge of search-critical object properties.For example, if a searched-for target is repeatedly encountered in an invariant arrangement of distractor elements, observers can learn these configurations and use them to expedite their search-an effect termed "contextual cueing" (Chun, 2000;Chun & Jiang, 1998).In the contextual-cueing paradigm (Chun & Jiang, 1998), participants search for a target item embedded in an arrangement of distractor items whose spatial locations are either repeated or newly generated throughout the course of the search experiment (repeated and nonrepeated conditions, respectively).The important finding is that reaction times (RTs) are faster to repeated (or "old") as compared to nonrepeated (or "new") displays, because learnt distractor-target spatial associations (stored in long-term memory) come to guide the search, cueing attention to-or predicting-the target location (Chun & Jiang, 1998).Furthermore, guidance of selective attention by learnt contexts is assumed to be implicit and automatic, as participants are typically unable to reliably discern repeated from nonrepeated in post-experimental (yes−/ no) recognition tasks (e.g., Goujon et al., 2015) and they persist to deploy attention to the learnt location even after consistent repositioning of the target to some other location (e.g., Zinchenko, Conci, Hauser, et al., 2020).Chun and Jiang's (1998) attention account of contextual cueing receives support from many subsequent studies using a variety of behavioral and electro−/physiological measures (e.g., Brockmole & Henderson, 2006;Chen et al., 2021a;Geyer et al., 2010;Giesbrecht et al., 2013;Johnson et al., 2007;Peterson & Kramer, 2001;Schlagbauer et al., 2017;Tseng & Li, 2004;Zinchenko, Conci, Töllner, et al., 2020).However, there are other findings suggesting that contextual cueing might also facilitate later, responseselection and/or motor-execution stages of processing, when participants make a decision about which motor (hand) effector is required for a correct manual response (i.e., response-selection account of contextual cueing; see, e.g., Chen et al., 2021a;Hout & Goldinger, 2012;Kunar et al., 2007;Schankin & Schubö, 2010).
Contextual learning of spatial distractor-target relations has been demonstrated to occur not only when the stimuli are visual but also within the tactile modality (Assumpção et al., 2015(Assumpção et al., , 2018)).More recently, Chen and colleagues (Chen et al., 2020(Chen et al., , 2021b) ) have examined contextual cueing across sensory modalities, that is, under conditions when an invariant distractor context is defined in one modality and the search target in another modality.They found that repeatedly encountered (invariant) tactile distractor patterns facilitated search for a visual target embedded in an array of nonrepeated (randomly arranged) visual distractors (Chen et al., 2020).A similar context-based facilitation of search has been shown when the target is defined in the tactile modality and the predictive distractors are visual (Chen et al., 2021b).Additional tests using hand-gesture manipulations (flipped/crossed hands) revealed that cross-modal contextual cueing is mediated by an environmental reference frame and that additional time is required for the tactile distractors to be remapped from the initial, somatotopically sensed format (Assumpção et al., 2018).These observations are consistent with context memory being supported by a supramodal, most likely visuospatial, representation that maintains modality-independent relational information.However, an alternative conception is also feasible, namely, that supramodal memory, albeit supported by a visual reference frame, is generated only when the target and distractors are defined in different, the visual and tactile, modalities (e.g., Shams & Seitz, 2008).That is, supramodal context memory would be only a specific instance of visual memory, which would not necessarily be identical with the memory that supports unimodal visual contextual cueing.
The main goal of the present study was to determine any similarities-as well as dissimilarities-of contextual cueing of visual search when distractor-target relational memories were acquired with unisensory versus multisensory training.Because reaction times in a visual search experiment can be affected by any processing stages related to the sensors (eyes) and response output (feet), we examined certain lateralized event-related potentials (ERPs) recorded over visual and motor areas that index the operation of attention-and response-related processes.This permitted us to track the processes that work more efficiently with learnt (vs.non-learnt) distractortarget arrays and compare these between uni-and multisensory training conditions.This question is of theoretical importance, because current theories of the memoryguided search and contextual cueing almost entirely draw on findings from visual search tasks (for reviews, see, e.g., Wolfe, 2019;Wolfe & Horowitz, 2017).Thus, tracking attentional and post-attentional processes over time in both unisensory and multisensory search environments would significantly impact on psychological theory building, i.e., advance our understanding about any commonalities versus differences in the processes that are more efficient in repeated search arrays when long-term contextual memories are established in an unisensory versus multisensory fashion.
One component of prime interest is the posterior contralateral negativity (PCN), which is taken to reflect enhanced focal-attentional selection of the search target within 200-300 ms of stimulus presentation (Luck & Hillyard, 1994;Töllner et al., 2008;Wascher & Wauschkuhn, 1996; | 3 of 15 CHEN et al.Woodman & Luck, 1999, 2003).In contextual-cueing paradigms, the PCN has been shown to index more efficient shifts of attention toward learnt target locations, with repeated contexts eliciting larger PCN amplitudes (Johnson et al., 2007;Zinchenko, Conci, Töllner, et al., 2020).Of note, however, these previous studies concentrated only on the amplitude of the PCN component, and not on its latency.The PCN latency (which is in the focus of the present study) might provide additional insight into the pre-attentive perceptual processing stages (Töllner et al., 2012) that may operate more efficiently with repeated arrays.Further, the individual contextual-cueing effects in the PCN amplitude have been found to correlate positively with both the behavioral contextual-cueing effects (Schankin & Schubö, 2009) and blood-oxygen-level-dependent (BOLD) signals in the medial temporal lobes (MTL; Kasper et al., 2015), the latter being considered as the neural generator site of the contextual-cueing effect (Geyer et al., 2012;Preston & Gabrieli, 2008).Thus, the PCN would be indicative of an effect of context memory on the (more efficient) attentional selection of the target item in repeated arrays.
An additional component of interest was the contralateral delay activity (CDA), a sustained negativity post stimulus presentation that is thought to reflect postselective (focal-attentional) processing of items held in working memory (WM; Mazza et al., 2007;Töllner et al., 2013;Vogel & Machizawa, 2004;Woodman & Vogel, 2008).Zinchenko, Conci, Töllner, et al. (2020) recently showed that repeated, relative to nonrepeated, configurations of items give rise to enhanced CDAs.They took this to reflect facilitated decision-making about the orientation of the target-that is, the response-critical property of the target item: typically, the character T rotated 90° to the left or the right-at a postselective processing stage.For instance, the matching of the target item against perceptual templates (held in WM) for deciding of left-versus rightorientation may be facilitated in repeated sensory arrays.
Another focus of the EEG data was the lateralized readiness potential (LRP; e.g., Coles, 1989;Eimer & Coles, 2003) recorded over the motor area contralateral to the side of a manual response, computed relative to response onset (i.e., response-locked LRP), which is known to be an online marker of motor-related processes in search tasks (Schankin & Schubö, 2009;Töllner et al., 2012).The LRP onset indexes the time required to execute the motor response (Töllner et al., 2011).
Thus, in summary, the present study was designed to examine the neural correlates of uni-and cross-modal contextual learning.Participants' electrophysiological brain activity was recorded during their performance of a visual search task (with eyes fixed at the center) while their eight fingers (except the thumbs) were at the same time placed at, relative to the visual items, corresponding spatial locations on a lower plane and received vibro-tactile stimulation (see Figure 1).In the unimodal, visual-visual, condition, search arrays consisted of one visual target Gabor item, whose orientation had to be discriminated, presented amongst three non-target, distractor, Gabor items.For example, if the visual target was presented in the left display half, there was a second visual distractor item in this half plus two additional distractor items in the right display half (and vice versa for right-hemifield targets).Importantly, even in the unimodal condition, visual items were presented concurrently with a set of eight, though spatially homogeneous/indiscriminative, tactile distractors delivered at all four fingers (except the thumb) of each hand (see Figure 1).In the crossmodal, tactilevisual, condition, a given search array consisted of eight homogenous visual stimuli (one target and seven distractors, with four stimuli in each hemifield) arranged along the horizontal display axis; this visual array was presented together with four, spatially variably structured, tactile stimuli delivered at two fingers of each hand on the lower plane.Thus, both the uni-and cross-modal conditions used identical stimulations in terms of number of display items (4 visual +8 tactile items in the unimodal condition and 8 visual +4 tactile items in the cross-modal condition).Of note, the tactile stimulation always occurred 350 ms before the visual stimuli in order to compensate for processing differences across modalities and to allow for tactile-to-visual remapping to take place (see Chen et al., 2020Chen et al., , 2021aChen et al., , 2021b;;Colonius & Diederich, 2004).Further, 50% of the target items were presented on the left and 50% on the right half in order to avoid physical stimulus confounds in the measurement of lateralized ERPs (cf.Woodman, 2010).However, the uni-and crossmodal conditions differed with respect to the sensory modality that provided the repeated contexts: while the visual target was presented together with predictive visual distractors in the unimodal condition, it appeared amongst predictive tactile distractors in the crossmodal condition.Note that visual and tactile items were co-located with one-to-one correspondence, though they were presented on different planes (see Figure 1).This way, we were able to examine how search for a visual target element is aided by repeated distractor-context-information when distractor-target associations are formed within or across the sensory modalities of vision and touch.normal or corrected-to-normal visual acuity).The sample size was determined based on previous, visual and tactile contextual-cueing studies (e.g., Assumpção et al., 2015;Chun & Jiang, 1998, 1999;Geyer et al., 2010) and previous studies that examined contextual cueing in combination with EEG (e.g., Schankin & Schubö, 2009, 2010;Zinchenko, Conci, Töllner, et al., 2020), aiming for 85% power to detect a relatively large effect size (f[U] = 0.8) in a repeatedmeasures analysis of variance (ANOVA; p 2 = 0.4) with an alpha level of .05.Power estimates were computed using G*Power (Erdfelder et al., 1996).The study was approved by the Ethics Committee of the Department of Psychology, Ludwig-Maximilians-Universität München.All participants provided written informed consent and received €9/h for taking part in the study.

| Apparatus and stimuli
The experiment, conducted in a sound-attenuated testing chamber dimly lit by indirect incandescent lighting, was run on a Windows computer using Matlab routines and Psychophysics Toolbox extensions (Brainard, 1997;Pelli, 1997).Visual stimuli and task instructions/feedback were projected onto a semitransparent Plexiglas table (size, 70 × 60 cm; height, 84 cm; Figure 1a) by a projector (Sharp XR-32X-L).The table (screen) was tilted about 60° toward the observer.The viewing distance was fixed at about 55 cm.The tactile and visual items were presented at eight spatially corresponding locations positioned along two virtual "curves" (one to the left and one to the right) on a lower (tactile) and upper (visual) horizontal axis of the respective presentation plane (Figure 1b).The visual stimuli consisted of Gabor patches (Michelson contrast 0.96, spatial frequency of 2 cpd), each subtending about 1.8° of visual angle, presented on a gray background (mean luminance of 36.4 cd/m 2 ).The orientation of the distractor Gabor patches was homogeneously vertical (tilt degree: 0°).The singleton target was defined by orientation of ±9.2° (left-or right-tilted) from the vertical.Tactile stimuli were delivered via vibro-tactile stimulators (solenoid actuators with a diameter of 1.8 cm, Dancer Design) to the fingers (thumbs excluded).The vibration frequency was held constant at 150 Hz.The actuators activated lodged metal tips vibrating a pin 2-3 mm following the magnetization of the solenoid coils, controlled by a 10-Channel Tactor Amplifier (Dancer Design) connected to a National Instrument computer with a MOTU analog output card.The spacing between adjacent items (on each side) was set at about 1.9° of visual angle, while separation between co-located visual stimuli and solenoid actuators was about 1.5°.During the experiment, participants were asked to wear headphones (Philips SHL4000, 30-mm speaker drive), through which white noise (65 dBA) was delivered to mask the tactile vibrations that would otherwise have been audible in the sound-insulated testing cabin.

| Design and procedure
Each participant performed two sessions in a counterbalanced order.In the "predictive-visual" session (Figure 1b, upper panel), the search display consisted of one target with three distractor Gabor patches and four empty circles, accompanied by tactile stimulation over all four fingers on the left and right hand (except the thumbs).In the "predictive-tactile" session (Figure 1b, lower panel), one visual target was embedded in seven homogenous visual distractors, with four vibrotactile stimuli delivered to two (pre-defined) fingers of each hand.The locations of the Gabor patches (for the visual-predictive session) and the vibrotactile stimuli (for the tactile-predictive session) varied depending on whether the configurations were repeated or not (see Figure 2).For the repeated condition, the positions of both the target and distractors were fixed throughout the entire session.For new configurations, the positions of the search distractors were randomly generated on each trial anew; so, these positions had no predictive information regarding the target location, making it impossible for participants to form consistent spatial distractor-target associations.Note, though, that target positions were repeated equally often in repeated and nonrepeated configurations.That is, in each block of four repeated and four nonrepeated trials, four positions, two from each side, were used for targets in the repeated condition, and the remaining four positions (again two on each side) for nonrepeated configurations (Figure 2).This was intended to ensure that any performance gains in the repeated conditions could only be attributed to the effects of repeated spatial arrangements, rather than repeated target positions, in this condition (see, e.g., Chun & Jiang, 1998, for a similar approach), and also to balance stimulus presentations between the left and right hemifields (hands).The configurations used for the visualpredictive session were exactly the same as those used for the tactile-predictive session.Repeated and nonrepeated configurations were presented in each 50% of trials and randomly intermixed within each session for each participant.
A session consisted of 60 blocks of 8 trials each, with four repeated and four nonrepeated configurations.Each of the eight possible target locations was used and associated with the predictive or nonpredictive distractor configurations equally often in every block and throughout the experiment.Each trial began with a beep (600 Hz) for 300 ms to indicate the start of the trial.After a short interval of 300 ms with fixation, actuators began to vibrate 350 ms before the onset of the visual search array which was presented for 700 ms.The visual target was defined randomly as left-or right-tilted Gabor patch relative to the distractor orientation.Participants were asked to respond to the orientation of the target Gabor patch as quickly and accurately as possible.Responses were recorded using foot pedals (Heijo Research Electronics, UK).For example, when the tilt of the target was left (right), the participant had to press the left (right) foot pedal.Target-pedal assignment was counterbalanced across participants.A blank screen following the visual and tactile stimuli was presented until a response was executed or until a maximum duration of 800 ms had elapsed (participants could respond within 1500 ms from the onset of the visual + tactile search array).Next, a feedback screen (indicating "correct" or "wrong" response) was presented for 500 ms.After an inter-trial interval of 1000 to 1500 ms, the next trial began.Participants took a short break every 6 blocks.
Participants were not informed in any way of the aims of the experiment.Following written and verbal instructions, each observer was familiarized with the experimental setup.Before each session, they performed 24 practice trials.Participants then went on to perform the main experimental task of that session only if they achieved an accuracy level > 85%.Otherwise, participants were required to repeat the practice trials.After the experiment, participants were first asked to report anything they had noticed about the experimental task, whereupon they were administered an explicit (yes/no) recognition test consisting of 32 trials in which they had to indicate whether they had already perceived a given display layout-consisting of the visual target, the tactile distractors, and the visual distractors-during the prior search experiment.In this recognition test, half of the trial displays included predictive visual/tactile configurations from the previous search task, and the other half newly generated configurations not presented before.

| EEG recording
The EEG was continuously sampled at 1 kHz using Ag/ AgCl active electrodes (actiCAP system; Brain Products, Munich, Germany) from 64 scalp sites in accordance with the international 10-10 system.To monitor for blinks and eye movements, we additionally recorded the electrooculogram by means of electrodes placed at the outer canthi of the eyes and, respectively, the superior and inferior orbits.All electrophysiological signals were amplified using BrainAmp amplifiers (Brain Products) with a 0.1-Hz to 250-Hz band-pass filter.During data acquisition, all electrodes were referenced to FCz and re-referenced off-line to averaged mastoids.All electrode impedances were kept below 5 kΩ.
Prior to being segmented, the raw data were visually inspected in order to manually remove nonstereotypical noise; subsequently, the data were band-pass filtered using a 0.1-Hz to 40-Hz Butterworth infinite-impulse response filter (24 dB/octave).Next, an infomax independentcomponent analysis (ICA) was run to identify components representing blinks and horizontal eye movements and to remove these artifacts before back projection of the residual components (1% of all trials were removed because of eye-movement artifacts).Note that we also calculated the amount of horizontal eye-fixations based on the Grattonand-Coles algorithm (Gratton et al., 1983) before the ICA procedure to actively test whether these eye movements were at play in the present investigation and may have impacted on performance even after ICA analysis and found that the proportion of horizontal eye movements was minimal and comparable across conditions: 0.71%, 0.71%, 0.72%, 0.74% for visual-repeated, visual-nonrepeated, tactile-repeated, tactile-nonrepeated, respectively.A repeated-measures analysis of variance (ANOVA) with the factors Predictive Context (visual, tactile) and Display (repeated, nonrepeated) revealed no significant effects, Fs < 0.01, ps > .9,p 2 s < .001,BF 10 s < 0.3.ERPs were calculated time-locked to the onset of the visual stimuli, with segments extending from 200 ms before visual stimulus onset until 1000 ms afterwards and response-locked with segments extending from 600 ms prior to the response until 100 ms afterwards.In both stimulus-and response-locked analyses, baseline correction was performed using the 200 ms interval preceding visual target onset.Only trials with correct responses and without artifacts (any signal exceeding ±60 μV), bursts of electromyographic activity (as defined by voltage steps or sampling points larger than 50 μV), and activity lower than 0.5 μV within intervals of 500 ms (indicating dead channels) were accepted for further analysis on an individual-channel basis before averaging the ERP waves.
To extract the PCN and CDA components independently of the spatial location of the target in the left/ right hemifield, we subtracted ERPs from parietooccipital electrodes (PO7 and PO8) ipsilateral to the target's location from contralateral ERPs.The latencies of the PCN component were defined individually as the maximum negatively directed deflection in the time ranges of 180 ms to 350 ms after visual stimulus presentation.We computed PCN amplitudes by averaging 5 sample points, respectively, before and after the maximum deflection.The CDA amplitudes were computed by averaging activity over the time range of 400 ms to 700 ms after visual stimulus presentation as the mean RT was around 700 ms for both unimodal and cross-modal conditions.This suggests that the 400-700 ms time period was highly accurate in revealing processes associated with the perceptual analysis of the selected item for extracting the response-critical feature (indexed by the CDA).
The LRP was calculated relative to the onset of the response (response-locked LRP).The LRP component was computed by subtracting ERPs measured at medial central electrodes (C3/C4) contralateral to the unipodal response side from ipsilateral ERPs, given that foot responses generate a more negative readiness potential over the motor cortex ipsilateral rather than contralateral to the responding foot (Böcker et al., 1994).Due to the somatotopic representation of the lower extremities within the longitudinal fissure, this paradoxical lateralization arises because the current dipole, located in the hemisphere contralateral to the responding foot, points to the ipsilateral hemisphere (cf.Brunia & Van den Bosch, 1984).Peak latency and peak amplitude were measured as the global maximal voltage at electrodes placed over the motor cortex (C3/C4).The onset latencies of the LRPs were determined by the jackknife-based scoring method (Miller et al., 1998), according to which the LRP onset is indicated when the LRP amplitude meets a specific criterion.As recommended by Miller et al. (1998), we used 90% of the maximum LRP activation as optimal criteria for defining the response-locked LRP onset latencies (see also Töllner et al., 2008Töllner et al., , 2012)).The amplitudes of the LRPs were calculated by averaging five sample points before and after the maximum deflection obtained for response-locked LRPs.

| Behavioral data
For the RT analysis, trials with errors and with RTs below 200 ms and above 2.5 standard deviations from the mean were excluded from the analysis, leading to the removal of 3.3% of all trials (2.89% for the visual-predictive session; 3.7% for the tactile-predictive session).Mean error rates and RTs were submitted to a repeated-measures analysis of variance (ANOVA) with the factors Predictive Context (visual, tactile), Display (repeated, nonrepeated), and Epoch (1-5; one experimental epoch combining data across 12 consecutive trial blocks).Greenhouse-Geissercorrected values are reported when Mauchley's test of sphericity was significant (p < .05).When interactions were significant, Bonferroni-corrected post-hoc tests were conducted for further comparisons.We additionally report Bayes factors (BF 10 ) for nonsignificant results to further evaluate, i.e., confirm, the null hypothesis (see Jeffreys, 1961;Kass & Raftery, 1995).
Figure 3 depicts the mean RTs for repeated and nonrepeated displays as a function of epoch, separately for the visual-and tactile-predictive contexts.The ANOVA revealed a significant main effect of Display, F(1, 22) = 14.33, p < .001,p 2 = .39,indicative of a RT contextual-cueing effect.The main effect of Epoch, F(4, 88) = 14.45, p < .001,p 2 = .40,was also significant, indicative of non-configural, that is, general procedural learning of (how to perform) the task at hand.Importantly, the main effect of (visual vs. tactile predictive) Context and all (two-or three-way) interactions involving this factor were nonsignificant (all ps > .39,p 2 s < .05,BF 10 s < 0.11).This suggests that both repeated visual and repeated tactile contexts were equally successful in facilitating visual search.Note that there was no context-based facilitation of RTs in Block 1 in either the uni-or the crossmodal condition, Fs < 1, ps > .5, p 2 s < .02,BF 10 s < 0.36, though the contextual-cueing effect was statistically significant in Epoch 1 (i.e., after >2 repetitions of each repeated display) for both visual and tactile contexts: visual, F(1, 22) = 5.5, p = .028,p 2 = .2;tactile, The overall rate of response errors was 17% for visual contexts and 14.6% for tactile contexts; this is relatively high by the standards of search RT experiments but modest when the limited display exposure time and the prevention of eye movements are taken into account (see also Zinchenko, Conci, Hauser, et al., 2020;Zinchenko, Conci, Töllner, et al., 2020).A repeated measures ANOVA on the mean error rates revealed a significant main effect of Epoch, F(4, 88) = 6.82, p < .001,p 2 = .24,with error rates decreasing across epochs, and a main effect of Display, F(1, 22) = 12.16, p = .002,p 2 = .36,with lower error rates for repeated versus nonrepeated displays.Although the error rates were numerically higher for visual contexts, the effect of Context was not significant, F(1, 22) = 2.69, p = .12,p 2 = .11,BF 10 = 1.58 (though the Bayes factor provided only anecdotal evidence).No interaction effects were significant (all p's > .12,p 2 s < .11,BF 10 s < 0.3).The results of error analysis effectively rule out confounding of the RT effects by speed/accuracy trade-offs.
3.2 | Electrophysiological data were significant (all ps > .60,p 2 s < .01,BF 10 s < 0.29).An additional correlational analysis revealed a strong positive relationship between the contextual-cueing effect (nonrepeated minus repeated displays) in the PCN amplitude and the corresponding RT cueing effect.Of note, this effect was seen for both tactile (r = 0.55, p = .005)and visual predictive contexts (r = .49,p = .009;see Figure 6).The statistical significance of the correlation coefficient was determined by comparing the observed correlations with results derived from 20,000 permutations of the two variables excluding the influence from any outliers in the data (also for below).
Analysis of the CDA amplitude in the time window 400-700 ms post search-array onset also revealed a significant main effect of Display, F(1, 22) = 7.74, p = .01,p 2 = .26.Again, this effect was independent of whether the predictive context was visual or tactile (nonsignificant Display × Context interaction, p = .36,p 2 = .04,BF 10 = 0.4; see Figure 5).And the effect of Context was not significant, p = .39,p 2 = .03,BF 10 = 0.28.Contextual facilitation of CDA amplitudes also correlated significantly with the RT facilitation, for both visual, r = .41,p = .025,and tactile predictive contexts, r = .68,p < .001,respectively (see Figure 6).In sum, the results of both the PCN and CDA analyses revealed a facilitatory effect for repeated displays which was uninfluenced by the type ofvisual or tactile-predictive arrays.Importantly, though, neither the onset latency nor the peak amplitude effects (i.e., the difference in onset latency or peak amplitude between repeated and nonrepeated displays) of the response-locked LRP correlated significantly with the RT contextual-cueing effect, onset latency: visual

| Recognition test
None of the participants spontaneously reported having noticed the display repetition during the search task.

| DISCUSSION
In the present study, we examined a series of lateralized visual and motor ERP components to elucidate the mechanisms involved in the acquisition of uni-and crossmodal spatial distractor-target associations.We found that implicit and automatic context-based guidance of search was expressed by enhanced PCN and CDA waves, indicating greater early allocation of attention to and postselective processing of the target in repeated displays.Importantly, these effects were seen-and statistically indistinguishable-in both predictive context conditions.These findings imply that spatial contextual associations can be formed successfully across sensory modalities, that is, when a visual target is predicted by tactile distractor configurations, and subsequently improve the efficiency of visual search.This was also supported by correlation analyses that showed a significant positive relationship between RT contextual cueing and each of the two visual ERP waveforms.Taken together, we found a striking relationship between the RT contextual-cueing and ERP effects, whether distractor-target contextual associations were acquired under uni-or cross-sensory learning conditions; but contextual facilitation of response execution (reflected in the response-locked LRP) was not evident in neither uni-and crossmodal conditions.Concerning the visual ERPs, we also found contextual cueing to lead to an earlier onset of the PCN component.Assuming that the timing of the PCN reflects the transition from pre-attentive sensory coding to focal-attentional selection (Töllner et al., 2011), the finding of reduced PCN latencies maybe taken to suggest that statistical learning of both unimodal and crossmodal distractor-target contexts increases the speed with which attention can be shifted towards the visual target item.This is consistent with the notion that contextual memory allows attention to be reliably and quickly shifted to the hemifield containing the target (e.g., Chun & Jiang, 1998, 1999;Chun & Nakayama, 2000).Interestingly, there have been no previous reports of contextual-cueing effects on PCN latencies (e.g., Johnson et al., 2007).One reason might be that, in comparison with previous studies, we used fewer repeated displays (4 instead of 12 as in Chun &Jiang, 1998, andJohnson et al., 2007).This effectively increases the number of repetitions of each individual display and thus the statistical power to reveal contextual learning, assuming that, normally, not all repeated displays are learnt in contextual-cueing experiments (see, e.g., Geyer et al., 2013;Smyth & Shanks, 2008).Overall, the results of PCN analysis support the attentional-guidance account of contextual cueing (Chun & Jiang, 1998, 1999;Chun & Nakayama, 2000).That is, repeating the search display leads to more efficient attentional deployment toward the visual target item, in terms of both expedited (as indicated by shorter PCN latencies) and enhanced engagement of attention at the target location (as indicated by the enhanced PCN amplitudes; see Zivony et al., 2018), independently of whether distractor-target contextual associations were formed within the visual or across the tactile and visual modalities.This interpretation lines in well with a recent study by Zinchenko, Conci, Töllner, et al. (2020) showing that the effects of display repetition in attention-related components start already with an early posterior negativity (N1pc, 80-180 ms) preceding the PCN (180-350 ms).
In addition to the PCN, uni-and crossmodal contexts also gave rise to a similar pattern in the CDA component: the CDA was enhanced over parietal-occipital areas when the target was presented in a repeated versus a nonrepeated array.This can be taken to reflect more efficient 'focal-attentional' processing of selected items (i.e., items represented in visual working memory, VWM) in repeated search displays (e.g., Zinchenko, Conci, Töllner, et al., 2020), perhaps as a result of the enhanced engagement of attention reflected in the PCN.Following selection of the target into VWM, postselective processes would include establishing that the selected item is actually a searched-for target (here: an off-vertical Gabor) and, if so, extracting the relevant target feature (the left/ right orientation of the target Gabor) to make a response decision (e.g., Mazza et al., 2007;Töllner et al., 2013;Wolfe, 2021;Woodman & Vogel, 2008).The enhanced CDA for repeated displays might thus reflect faster accumulation of evidence toward the decision threshold (e.g., parallel matching against both templates, rather than serial comparisons, or simply expedited matching) and/ or a reduction of the amount of evidence required for a decision, as the target location (in repeated displays) is reliably predicted by the distractor context, which would increase the certainty that the selected item is the target (see Chen et al., 2021a;Sewell et al., 2018).More detailed (mathematical) modeling of decision making would be required to understand the dynamics at this postselective stage.Importantly, however, with regard to the question at issue in the present study: the CDA effects were comparable between the two learning conditions, that is: postselective target analysis is equally efficient whether search is guided by unimodal or crossmodal distractor-target contextual memories.
In contrast to the PCN and CDA effects, which both correlated significantly with the RT contextual-cueing effect, the motor-related ERPs appeared to contribute little, if anything: the response-locked LRP did not differ significantly between repeated and nonrepeated displays, which is consistent with Schankin andSchubö (2009, 2010); and they were not systematically correlated with the RT cueing effect-in either of the two contextual learning conditions.This suggests that, at least in the paradigms implemented and investigated in the present study (which differ in several respects from the "standard", T vs. L's search paradigm), there is not strong, or in Bayesian terms: inconclusive, evidence of the role of motor-related processes in accounting for the behavioral contextualcueing effects.
Concerning possible limitations of our search tasks, we note that the visual displays differ between cross-modal and uni-modal conditions, which may have impacted on the ease with which participants searched the displays in the two conditions.Specifically, in order to induce context-based predictions, we had to make the visual context constant by presenting the visual items at the full set of 8 locations (cross-modal condition), while there were only 4 predictive visual items in the uni-modal condition.Thus, there were differences in the number of visual items (8 vs. 4) in the cross-modal and uni-modal conditions.Accordingly, visual search may have been more efficient with larger numbers of homogeneous distractors (e.g., Geyer et al., 2007;Wang & Theeuwes, 2020).Related to this is the possibility that the presence of moredistractor-filler items increased the amplitude of the PCN component in the cross-modal condition (Drisdelle et al., 2020).Thus, more efficient search due to larger numbers of visual items may have confounded practicerelated improvements of search from repeated contexts.However, what is at odds with this proposal is that we did not find reliable effects involving the factor Context in relevant behavioral and electrophysiological measures.There were no significant differences between the unimodal and cross-modal conditions in either the RTs or error rates when comparing these measures in separate analyses for only repeated displays or only nonrepeated displays (all ps > .11,d z s < .35,BF 10 s < 0.7).Likewise, separate analyses of PCN amplitudes obtained from repeated and nonrepeated displays and a comparison of these measures between the uni-modal and cross-modal condition did not find a reliable effect (both ps > .72,d z s < .07,BF 10 s < 0.23).Moreover, the above cited studies (Drisdelle et al., 2020;Wang & Theeuwes, 2020) used larger numbers of (>/=16) filler items for inducing search-related changes, while in the present study the difference in visual items between the uni-modal and cross-modal conditions was only small (at 4 items).Moreover, the uni-modal condition also used circular elements at non-Gabor stimulus locations (i.e., these elements did not contain Gabor gratings; cf.Figures 1 and 2), which would at least equate the number of visual circular stimuli between the unimodal and cross-modal conditions.Worth mentioning is also that our study had a particular focus on relative ERP (and behavioral) effects, analyzing sensory and motor ERP waveforms in the critical repeated conditions always with reference to nonrepeated baseline displays.In doing so, unsystematic variability relating, e.g., to display design, would be effectively canceled out between the repeated and nonrepeated displays.Given these considerations, we believe that it is unlikely that the differences in the number of visual elements contributed to the pattern of highly comparable contextual-cueing effects found in the present uni-modal and cross-modal conditions.However, future work will be necessary to provide a stronger case for this particular claim.

| CONCLUSION
By tracking a series of ERP components (in addition to RTs) that reflect attentional and response-related processes in visual-search tasks, the present study provides new insights into the cognitive processes that are facilitated by unimodal and crossmodal contextual memory.We found that both uni-and crossmodal contexts afford more efficient guidance of attention, in terms of both the attentional engagement and speed with which selective attention is shifted toward the target location (or, at least, side) with repeated as compared to nonrepeated distractor-target contexts.Also, focal-attentional processing of the selected target was facilitated by both uni-and crossmodal contexts, whereas response-related processes contributed little (if anything) to the behavioral contextual cueing effect.Overall, our new results provide little evidence at variance with the (parsimonious) notion of a single, supramodal mechanism underlying contextual cueing of visual search, whether the cues are visual themselves or, respectively, tactile.

ACKNOWLEDGMENT
This work was supported by German Research Foundation (DFG) grants GE 1889/5-1, awarded to TG, and SH166/7-1 to ZS.We thank Gizem Vural for her help with data collection.We thank Shaoyang Tsai for technical support.Open access funding enabled and organized by Projekt DEAL.

F
I G U R E 1 Illustration of the experimental setup and stimuli.(a) the real experimental setup from a photo of one participant.Participants placed their fingers on eight solenoid actuators (dancer design) delivering tactile stimulation (the two actuators under the thumbs were disabled).The vibration frequency was held constant at 150 Hz.Visual stimuli were presented on a semitransparent Plexiglas table tilted about 60° towards the observer.The participants wore headphones (Philips SHL4000, 30-mm speaker drive), through which white noise (65 dBA) was delivered to mask the tactile vibrations.(b) depicts the stimuli of the two sessions.Visual stimuli were Gabor patches with the target defined by an orientation difference relative to the distractor patches.These stimuli were presented at eight locations positioned along two virtual "curves" (one to the left and one to the right) over the horizontal axis, corresponding to the locations of the eight actuators below.Observers' task was to respond to the (left−/right-tilt) orientation of the visual target via corresponding foot pedals.In the visual session, in which the predictive context was visual (upper panel), the search display consisted of one target with three distractor Gabor patches and four empty circles, accompanying tactile stimulation over all eight fingers.In the tactile session, in which the predictive context was tactile (lower panel), one visual target was embedded amongst seven homogenous distractors, with four vibrotactile stimulations delivered to two (selected) fingers of each hand.The gray circles represent stimulated fingers.The locations of the Gabor patches (for the visual session) and the vibrotactile stimulations (for the tactile session) varied depending on whether the configurations were repeated or not 14698986, 2022, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14025 by Cochrane Germany, Wiley Online Library on [29/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 14698986, 2022, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14025 by Cochrane Germany, Wiley Online Library on [29/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License | 7 of 15 CHEN et al.

F
Mean reaction time (RT) and error rates for repeated and nonrepeated contexts as a function of epoch, separately for visual and tactile contexts.Error bars indicate 95% confidence intervals F I G U R E 4 Grand-average ERPs at electrodes contra-and ipsilateral to the target (PO7 and PO8) are shown separately for nonrepeated (black) and repeated (red) displays, separately for the visual and tactile contexts 14698986, 2022, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14025 by Cochrane Germany, Wiley Online Library on [29/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Figure 4
Figure4presents the visual ERP waves contralateral and ipsilateral to the target for nonrepeated and repeated displays when the predictive contexts were visual and tactile, respectively.Of particular relevance are the latency and amplitude of the PCN, obtained at electrodes PO7/ PO8 from 180-350 ms post-display onset, both of which are indicative of more efficient allocation of attention to the search target in repeated versus nonrepeated displays (compare the contralateral minus ipsilateral difference waves in Figure5).Interestingly, this attentional-guidance effect was effectively unaffected by whether the repeated context was visual or tactile.Our statistical analyses supported these observations.Entering the ERP waveforms in separate 2 (Display: repeated, nonrepeated) × 2 (Predictive Context: visual, tactile) repeated-measures ANOVAs, we only found significant

F
for repeated and nonrepeated displays, separately for the visual and tactile contexts.The shaded gray areas illustrate the timing of the posterior contralateral negativity (PCN) and contralateral delay activity (CDA).Each component is depicted with a corresponding scalp distribution.(b) Mean peak amplitudes and onset latencies in the PCN and mean amplitudes in the CDA for repeated and nonrepeated displays in the visual and tactile contexts.Error bars indicate 95% confidence intervals 14698986, 2022, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14025 by Cochrane Germany, Wiley Online Library on [29/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License3.2.2 | Motor ERPs

F
Correlations.The scatterplots in (a) and (b) show the relations between the individual PCN/ CDA amplitude effects (difference between nonrepeated and repeated displays) and the individual RT contextual-cueing effects (CC effect) for the visual and tactile contexts, respectively.Solid lines indicate bestfitting regressions, shaded regions the 95% confidence intervals F I G U R E 7 Grand average lateralized readiness potentials (LRP), synchronized to the onset of the response (response-locked LRP), measured at central electrodes (C3/C4), separately for visual and tactile contexts.The onsets of the LRPs are marked by vertical dashed lines 14698986, 2022, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14025 by Cochrane Germany, Wiley Online Library on [29/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License context r = .28,p = .1,BF 10 = 1, tactile context r = .35,p = .052,BF 10 = 1.68; peak amplitude: visual context r = −.15,p = .76,BF 10 = 0.16, tactile context r = −.27,p = .90,BF 10 = 0.12.Thus, any subtle effects attributable to response-related components contribute very little, if anything, to the behavioral contextual cueing effect.