Emotional content overrides spatial attention

Spatial attention is our capacity to attend to or ignore particular regions of our spatial environment. However, some classes of stimuli may be able to override our efforts to ignore them. Here we assessed the relationship between involuntary attentional capture with emotional images and spatial attention at early stages of perceptual


| INTRODUCTION
Selective attention is our capacity to filter or focus on particular features of our environment. We typically think of the top-down aspects of such a process. For example, we can selectively attend to particular regions or objects in our visual environment. However, not all of our attentional selection is driven voluntarily. Some stimuli have the capacity to at least partially override our top-down control mechanisms. For example, visual emotional stimuli signaling threat or reward have often been observed to capture our attention and receive prioritized processing in a bottom-up, stimulus-driven fashion, presumably due to their significant role in producing fast and adequate behavior in order to survive (Carretie, 2014;Todd et al., 2020;Vuilleumier, 2005Vuilleumier, , 2015. A central debate in the field of attention and emotion concerns the extent to which visual emotional cues can reflexively capture and override spatial attention when they are taskirrelevant, at spatially unattended locations (Pessoa, 2005;Pessoa et al., 2002Pessoa et al., , 2013Vuilleumier et al., 2001;Vuilleumier & Driver, 2007). Some research findings show that emotional stimuli can draw attention involuntarily and lead to heightened activation of cortical and subcortical networks in spatial orienting tasks when they are displayed at ignored locations (Brosch et al., 2011;Keil et al., 2005;Pourtois & Vuilleumier, 2006;Vuilleumier et al., 2001). In a classic fMRI study by Vuilleumier and colleagues (2001), the authors manipulated stimulus emotionality and the allocation of spatial (task-instructed) attention in a visual task. They observed that the neural response to facestimuli in visual cortex was reliably modulated by each factor independently, and that enhancement of amygdala activity as a function of emotional expression was unaffected by the allocation of attention (Sander et al., 2005;Vuilleumier et al., 2001).
In contrast, other reports demonstrate that the processing of emotional distracter-images does not always occur independently of attention and requires the availability of attentional processing resources (see Pessoa et al., 2002;Silvert et al., 2007). For example, directing attention away from emotional distracters attenuate the amygdala response in experimental tasks with high attentional load that may deplete attentional capacity (Pessoa, 2005;Pessoa et al., 2005;Silvert et al., 2007) and thereby possibly reduce the influence of distracting emotional information on perceptual processing. Thus, the evidence regarding the extent to which attentional distraction elicited by emotional images is independent of spatial attention mechanisms is somewhat mixed.
To further examine the neural dynamics of the interplay between emotion and spatial selective attention, here we showed visual scenes of emotional and neutral content as rapid serial visual presentation (RSVP) stimuli during a covert spatial cueing task. Participants were cued on a trial-by-trial basis to attend to either the left or the right visual hemifield. In each trial, RSVP streams consisting of neutral images were initially shown in both visual hemifields. Part way through each trial, images in either the to-be-attended or to-be-ignored visual hemifield changed from neutral to unpleasant content. The swift and unpredictable switch in emotional content within a rapid image stream has recently been observed to involuntarily capture attentional resources and bias attentional processing in favor of emotional pictures in similar RSVP protocols (Bekhtereva et al., 2019(Bekhtereva et al., , 2020. Crucially, we displayed the two RSVP streams periodically, allowing us to elicit and record steady-state visual evoked potentials (SSVEPs). The SSVEP is a continuous electrocortical marker of selective attention in response to a periodic stimulus presentation that directly indexes neural activity related to visual stimulus processing, thereby providing a sensitive neural signature of ongoing attentional resource allocation (Forschack et al., 2017;Gundlach et al., 2020). The neural sources of the SSVEP signal are found largely in early visual areas (Norcia et al., 2015;Vialatte et al., 2010). Importantly, attentional resource allocation can be tracked in response to multiple stimuli simultaneously presented at different spatial locations by simply "tagging" them with distinct stimulation frequencies and quantifying their unique steady-state response.
In an earlier study, Keil and colleagues (2005) also employed a frequency-tagging paradigm to examine the relationship between spatial attention and reflexive attentional orienting toward emotional images. They reported an increase of the SSVEP amplitude as an additive function of spatial attention and affective content, reflecting early sensory gain mechanisms and enhanced processing in favor of emotionally significant sensory input (Bekhtereva et al., 2018;Brosch et al., 2011;Hillyard et al., 1998;Keil et al., 2003;Pourtois et al., 2013). The results were thus consistent with independent effects (unaffected by the allocation of attention) of affective content on selective visual attention discussed above. However, they presented the pictures in each visual hemifield flickering at the identical frequencies, thereby rendering it problematic to disentangle the neural responses elicited by the two sets of pictures. In contrast, we chose to display our bilateral RSVP streams at 4 and 6 Hz (that is, 250 and ∼167 ms per image, respectively), capitalizing on our recent experimental findings showing robust SSVEP modulations when RSVP streams changed from neutral to emotionally arousing content at these stimulation rates (Bekhtereva & Müller, 2015Bekhtereva et al., 2018). By using different periodic rates in each hemifield, the present experimental design has the methodological advantage of tracking attentional resources allocated to each image stream independently. Thus, we can more accurately assess whether top-down spatial and bottom-up driven attention by presentation of emotional images may act independently of spatial attention or interact at early perceptual processing stages.
Notably, our recent findings from similar RSVP study protocols have demonstrated that the amplitude pattern of the SSVEP response modulation with emotional as compared to neutral RSVP streams can differ across presentation rates. For example, we observed an increase in SSVEP amplitude for emotionally arousing relative to neutral images when RSVPs were displayed at a 4 Hz rate as opposed to a robust attenuation in SSVEP magnitude for emotional relative to neutral RSVP streams when shown at 6 Hz (Bekhtereva et al., 2019). However, such opposite emotion-dependent amplitude modulation patterns were unlikely to be driven by different neural mechanisms or by physical image properties (Bekhtereva et al., 2018). Instead, as our simulations with linear modelling have previously revealed (Bekhtereva et al., 2018), these opposite SSVEP amplitude effects may be a consequence of a linear superposition of ERPs elicited in response to each individual image in an RSVP (Bekhtereva et al., 2018;Capilla et al., 2011). Moreover, the direction of the SSVEP emotional modulation (increase or decrease) with RSVP streams has been shown to be irrelevant for biasing of attentional processing resources in favor of emotional image content (Bekhtereva et al., 2019). Based on these findings, here we quantified absolute differences in SSVEP amplitudes in response to neutral and unpleasant RSVP streams presented under different attentional conditions. | 3 of 14 We hypothesized that if facilitated cortical processing of affective image streams was dependent on the allocation of spatial attention, then the SSVEP modulation as a function of change in emotional image content should only be observed for RSVP streams presented in the attended hemifield. Alternatively, if sensory gain at early processing stages occurs involuntarily in favor of affective content change, then sensory amplification by emotionally aversive scenes should occur independently of spatial attention allocation, resulting in comparable affective SSVEP modulation in either hemifield.

| Participants
Sixty volunteers (45 female and 15 male) ranging from 18 to 39 years old with a mean age of 25 years (standard deviation [SD] = 5.36) and normal or corrected to normal visual acuity participated in the study and received either credit points or financial reimbursement (8 € per hour). The number of participants (30 subjects per each experimental group) was based on the smallest effect size of interest ( 2 g = 0.15) from our earlier work employing a similar RSVP paradigm (Bekhtereva et al., 2018) and should be appropriate to obtain power of 0.8. G*Power software was used to perform the power analysis (Faul et al., 2007). All participants were informed about the study's goals and provided their written informed consent before experimental recording. The study was approved by the ethics committee of the University of Leipzig and conducted in compliance with the Code of Ethics of the World Medical Association.

| Stimuli
In total, 80 neutral 1 and 80 unpleasant 2 images in color format were taken from the Emotional Picture Set database (EmoPicS; Wessa et al., 2010) as well as from the International Affective Picture System (IAPS; Lang et al., 2008). Experimental images were 320 × 240 pixels in size and were previously used as experimental material in our recent study (see Bekhtereva et al., 2020 for details), with similar image luminance, contrast as well as complexity across neutral and unpleasant experimental categories.

| Experimental procedure
Experimental images were presented in two rapid serial visual presentation (RSVP) streams on a 19-in. computer screen with a resolution of 1024 × 768 pixels against a black background, with 16 bits per pixel color mode. The refresh rate of the monitor was 60 Hz and the viewing distance was 80 cm. A yellow cross subtending 0.29°×0.29° of visual angle was displayed centrally throughout the experiment to help maintain fixation.
Each trial began with the presentation of a centrally displayed visual cue comprising the words "Attend left" or "Attend right" for a random time interval of 1,000-1,500 ms. The cue text (3.1° × 0.4° of visual angle) was shown in yellow and instructed participants to covertly direct their attention to either left or right while ignoring the opposite side of the visual field throughout the trial. At the offset of the cue, an RSVP image stream began in each visual field. These streams comprised multiple images of neutral valence, centered 6.72° of visual angle to the right and left of the fixation cross. Participants were told to report detection of a yellow dot (target) that would occasionally flash briefly for ~17 ms at a random area within the RSVP stream shown in the to-beattended visual field (see Figure 1) as accurately and quickly as possible, by pressing "Space" on the keyboard. In addition, they were told to ignore any such yellow dots appearing on the to-be-ignored visual field (distracters). Targets and distracters subtended 0.5° × 0.5° of visual angle. Each picture within the RSVP stream was displayed for ~167 and 250 ms per picture (10 and 15 frames of 60 Hz refresh rate), resulting in 6 and 4 Hz RSVP presentation rates, respectively. The periodic image presentation aimed to elicit a steady-state visual evoked potential (SSVEP). Our previous study (Bekhtereva et al., 2020) has shown that 2 Hz of separation of two RSVP streams in the left or right visual hemifield is sufficient to clearly isolate the respective SSVEP responses. Other studies that used non-emotional stimuli also used a 2 Hz separation between respective stimuli with up to 5 stimuli (cf. Andersen et al., 2008;Walter et al., 2014) or even 1-Hz separation with 6 different stimuli (cf. Walter et al., 2016), and were able to separate the respective SSVEP responses by means of an FFT with sufficient frequency resolution.
Images in RSVP picture streams subtended 5.87° × 8.1° of visual angle, with image luminance ranging between 17 1 IAPS numbers of neutral pictures: 1,122,1,350,16451,122,1,350, ,19451,122,1,350, ,20361,122,1,350, ,2037 2 IAPS numbers of unpleasant pictures: 1,111,1,113,1,200,1,202,1,220,1,30 0,2,661,2,683,2,691,2,703,2,710,2,730,2,981,3,001,3,019,3,064,3,103,3,11 0,3,150,3,190,3,195,3,212,3,213,3,230,3,250,3,261,3,350,3,500,3,530,6,02 1,6,210,6,313,6,550,6,560,8,230,9,002,9,008,9,031,9,040,9,042,9,075,9,14 0,9,163,9,181,9,250,9,254,9,300,9,342,9,410,9,420,9,433,9,471,9,495,9,57 0,9,571,9,590,9,594,9,596,9,600,9,623,9,635,9,810,9,902,9,920,9,930,9,94 0. EmoPicS numbers of unpleasant pictures: 216,232,233,234,235,236,240,241,243,248,321,325,326,327. and 48 cd/m 2 . At an unpredictable time during each trial, either the to-be-attended or the to-be-ignored RSVP stream would switch to unpleasant images, while the other RSVP remained neutral ( Figure 1). Images within each stream were in random order, with the restriction two identical pictures were never displayed twice in a row within an RSVP. To exclude any expectation effects, there were three possible timepoints at which the change in emotional valence could occur-either 2,500, 3,500, or 4,500 ms following the onset of the trial. In each trial, 4 and 6 Hz RSVP streams were shown simultaneously, and the combination of frequencies shown to the left and right visual hemifield was counterbalanced across participants, resulting in 30 participants viewing 6 Hz RSVPs in the right and 4 Hz RSVPs in the left visual field and 30 other participants in the opposite order. Overall, RSVP presentation lasted for 6,500 ms, resulting in 39 presentation cycles for 6 Hz RSVP and 26 presentation cycles for 4 Hz, respectively. A new picture was displayed every new cycle of RSVP presentation, and, in total, for 4 Hz RSVP stream each neutral picture was shown 96 times and each unpleasant image was displayed ~29 times throughout the experiment. In turn, for 6 Hz RSVP each neutral image was displayed 144 times and each aversive image was shown ~44 times during the recording. Because an RSVP of only neutral image content was shown to one visual hemifield throughout the experiment, neutral images had to be presented repeatedly more frequently. At the end of each trial, the black background with a yellow fixation cross was displayed for an additional 1,000 ms. There were 96 trials for each of the four experimental conditions. In total, the experiment consisted of 384 trials, with 32 trials per experimental block.
To allow development of a reliable SSVEP signal, the first 600 ms following RSVP presentation onset did not contain any targets or distractor events and thus was not included in the EEG analysis. In each trial, up to four events (targets or distracters) could appear. Targets and distractors were evenly distributed across the time interval before and after the variable time point of change in emotional valence (at either 2,500, 3,500, or 4,500 ms) for each condition as follows: (1) between 600-~2,417 ms and 2,600-~6,417 ms; (2) between 600-~3,417 ms and 3,600-~6,417 ms; and (3) between 600-~4,417 ms and 4,600-~6,417 ms. As a result, for each condition, targets and distracters were uniformly distributed over the time window of ~2.8 s before and ~2.8 s after the change in emotional content, on average. One half of experimental trials (i.e., 192 trials) did not contain any events. During the experiment, 240 targets and 240 distractors were shown in total. Targets/distractors were visible only for ~17 ms (1 frame of 60 Hz refresh rate) and were separated by a minimum of 800 ms. As in Bekhtereva et al. (2020), here we chose a crude distribution of targets and distracters because the main purpose of the events was to ensure that participants paid attention as instructed throughout the experiment.
Experimental conditions were shown randomized, and after each experimental block that lasted ~5 min, participants were encouraged to take a short break. Participants switched their response hand in the second half of the experiment and F I G U R E 1 Example experimental trial in which a 6 Hz RSVP stream in the left visual hemifield changed from neutral to unpleasant content, whereas a 4 Hz RSVP consisting of only neutral images was presented to the right visual hemifield. Participants were asked to detect the yellow square (target) occasionally briefly flashing in the image stream in the to-be-attended hemifield (depicted here in the left RVSP). All RSVP images were shown for either 250 or ~167 ms, resulting in a 4 and 6 Hz presentation rate. Each trial began with a cue instructing participants to covertly direct their attention to the left or right, following which a presentation of two RSVP streams started in the left and right visual hemifield. First, both RSVP streams contained visual scenes of neutral content. After a variable time interval (2,500, 3,500, or 4,500 ms), an RSVP stream in one of the visual hemifields changed from neutral to unpleasant content, while the other RSVP stream remained neutral their starting response hand was counterbalanced across participants. Before the EEG recording started, participants completed a short training trial to get familiarized with the task. For the training session, we used a different, separate set of image material. The timing and flow of the experimental stimulation was managed and controlled with the Cogent toolbox running under MATLAB (Cogent, www.vislab.ucl. ac.uk/Cogen t/; The Mathworks, Inc, Natick, Massachusetts).
At the end of the EEG experiment, all participants were asked to judge the experimental images on affective arousal and valence using the Self-Assessment-Manikin (SAM) scale (Bradley & Lang, 1994) varying from 1 (very low arousal and unpleasant valence) to 9 (very high arousal and pleasant valence). Pictures were displayed in randomized order for the duration of either 250 ms (one cycle of 4 Hz rate) or ~167 ms (one cycle of 6 Hz rate), being immediately masked by their respective phase-randomized (content-distorted) image version presented for the same duration (Bekhtereva et al., 2018). After that, the SAM scale followed on the screen, requesting to provide the arousal and valence rating for the respective image on a keyboard. Overall, the entire picture set was displayed twice: for 250 and ~167 ms, and the order of the exposure rates was counterbalanced across participants.

| EEG-recording and analysis
EEG was recorded from 64 Ag/AgCl scalp electrodes mounted on an elastic cap according to the international 10-20 system (Jasper, 1958) using an ActiveTwo BioSemi system (BioSemi, The Netherlands) at a sampling rate of 512 Hz. Two electrodes were used as reference and ground electrodes (CMS-"Common Mode Sense" and DRL-"Driven Right Leg"; for details see http://www.biose mi.com/faq/cms&drl. htm) during the recording. Furthermore, vertical and lateral eye movements were measured by means of four bipolar electrodes placed vertically above and below the right eye (vertical EOG) as well as horizontally on the outer canthi of each eye (horizontal EOG). For EEG data preprocessing and analyses, the EEGLAB toolbox (Delorme & Makeig, 2004) as well as custom MATLAB scripts (The Mathworks, Natick, MA) were used.

| SSVEP analysis
We timed the two simultaneously presented RSVP streams at 4 and 6 Hz to become phase-synchronized at the time point when the emotional content in either the left or right visual field changed from neutral to unpleasant. Thus, the onset of the first unpleasant image in the RSVP stream occurred simultaneously with the onset of an image in the RSVP stream that remained neutral. The continuous data were epoched from −1,500 to 1,500 ms relative to the onset of the change in emotional content. Only experimental trials without any targets or distracters were included in the analysis, to prevent any potential interference from task events and subsequent motor processes. In the first step, for each participant, linear drifts were identified and removed from the data, and an automatic algorithm, "Statistical Control of Artifacts in Dense Array EEG/MEG studies" (Junghöfer et al., 2000), was applied to detect epochs with artifacts. Following that, all epochs were visually inspected for artifacts, and, in particular, for non-stereotypical artifacts (i.e., voltage jumps or electrode movements). Contaminated epochs were removed. Subsequently, data was re-referenced to the average reference. Next, independent component analysis (ICA; Delorme et al., 2012) was performed on the epochs, to correct for any ocular and muscle artifacts. The acquired ICA components were manually inspected for components showing artifacts (i.e., displaying typical topographies of eye artifacts, muscle, or line noise), and the SASICA plugin for EEGLAB was used to provide additional judgment on artifactual components (Chaumon et al., 2015). Components classified as artifactual were removed from the data.

| Rhythmic entrainment source separation (RESS)
In contrast to the conventional approach for SSVEP analysis based on the selection of single or several electrodes with the greatest SSVEP response, here we used the Rhythmic Entrainment Source Separation method (RESS; Cohen & Gulbinaite, 2017). RESS constructs an optimal spatial filter for signals at specific frequencies based on a generalized eigenvalue decomposition of signal and reference covariance matrices, maximizing the frequency-specific SSVEP response over non-SSVEP signals (i.e., noise). As reference data, signals at frequencies neighboring the SSVEP frequencies were used, which are not driven and not phase locked to our visual stimulation (for a more detailed description of the method see below and in Cohen & Gulbinaite, 2017). Thus, we analyzed a linearly weighted combination of all electrodes from all experimental conditions, determined for each participant, thereby avoiding a number of potentially confounding biases associated with post-hoc selection of electrodes for analysis.
RESS filtering was performed on the artifact-corrected, epochized and average-referenced data. For each participant, two spatial filters were constructed separately for each stimulation frequency, given that different SSVEP frequencies have different topographical projections (Lithari et al., 2016). First, all trials from all experimental conditions were concatenated and temporally filtered using three narrow-band Gaussian filters: (1) a filter centered at the stimulation frequency f with full-width at half-maximum (FWHM) = 0.5 Hz, (2) a filter centered at the neighboring f − 0.5 Hz with FWHM = 0.5 Hz, and (3) a filter centered at the neighboring f + 0.5 Hz with FWHM = 0.5 Hz. Data filtered at the stimulation frequency are termed "signal" (S), while data filtered at the neighboring frequencies are named "reference" (R). Next, temporally filtered data of the entire epoch (from 1,500 ms before to 1,500 ms after the change in emotional content within an RSVP image stream) were used to quantify channel covariance matrices, namely, two R matrices and one S matrix. Third, generalized eigendecomposition (function eig in MATLAB) was used to derive spatial RESS filters that maximize the variance of the S over the average of both R-matrices. The electrode weights of these spatial filters were multiplied with the unfiltered single-trial time-series to obtain the RESS component time-series used in the further analyses. Figure 2 depicts the frequency spectra of RESS component timeseries in SNR units as well as the topographical distributions of RESS components. Overall, 120 RESS components were obtained (for 60 participants, one 4 Hz-and one 6 Hz-RESS component were calculated per participant). Finally, for each participant, the RESS component time-series were averaged separately for each stimulation frequency and experimental condition. From these averages, SSVEPs at 4 and 6 Hz were subsequently quantified by means of a Fourier transform as specified below.
Fourier analyses were calculated across the time intervals from −1,500 to −500 ms prior to and from 500 to 1,500 ms following the change in emotional content. As in our most recent study (Bekhtereva et al., 2020), the respective time windows for analyses were selected based on our earlier experimental findings, which consistently demonstrated SSVEP amplitude modulation by emotional picture content at ~500 ms after the onset of an emotional image (Bekhtereva & Müller, 2017;Hindi Attar et al., 2010). Furthermore, by analogy with our earlier work (Bekhtereva et al., 2018(Bekhtereva et al., , 2020Bekhtereva & Müller, 2015), we quantified the difference score between the amplitude of the time window prior to minus the time window after the change in emotional valence for each experimental condition. Moreover, since our earlier work demonstrated opposite modulation patterns of SSVEP amplitudes for 4 and 6 Hz as a function of emotional content (i.e., emotional >neutral or emotional <neutral; see Introduction section), here we took the absolute values of the difference scores aimed to test F I G U R E 2 (a) Channel weights of RESS components for 4 and 6 Hz, averaged across all experimental conditions separately for two participant groups that viewed either the 6 Hz RSVP stream on the left Hz and the 4 Hz RSVP stream on the right (upper panel) or vice versa (lower panel). Note that the channel weights are in arbitrary units. (b) Frequency spectra of RESS component time-series averaged across all conditions and participants, expressed in signal-to-noise ratio (SNR) units. Note the largest peaks represent the peak frequency of the RESS component; smaller peaks at other frequencies are those that have been suppressed through the use of RESS. Peaks at 8 and 12 Hz represent the harmonics of 4 and 6 Hz. The 10 Hz peak is most certainly the alpha activity the amplitude differences between neutral and emotional conditions per se, and not the direction of the emotional effect (see for a similar approach with respect to alpha oscillations Antonov et al., 2020); the additional statistical analysis based on raw scores is located in online Appendix).
Because we did not expect any group differences as a function of viewing 4 and 6 Hz RSVP streams in the left or right visual hemifield, we collapsed across the two groups before statistical analysis. Thus, for the statistical testing, a repeated-measures 2 × 2 × 2 ANOVA with the factors Change in Content (yes/no), Attention (attended/unattended), and Presentation Rate (4 Hz/6 Hz) were performed on the RESS difference scores as described above. Furthermore, we used Bayesian approach for the statistical testing. We calculated Bayes factors (Morey et al., 2015;Rouder et al., 2016) using the BayesFactor R package (version 0.9.12; Morey et al., 2015) to quantify the evidence in favor of the null as well as the alternative hypothesis that SSVEP amplitudes are modulated by change in emotional content either irrespective of attentional condition, or as a function of attention. The Bayes factor (BF10) for each model of interest was calculated using the function call anovaBF(amplitude ~frequency * attention * change + participant, data, whichModels = "withmain," whichRandom = "participant," iterations = 100,000). Starting from a model including all main effects and interactions, and a random effect of participant, this tests all possible subsets of the full model against a null model consisting of the grand-mean and the additive effect of the participant factor. An additional constraint is that where interactions were included, all main effects involving factors in those interactions were also included. Thus, no "interaction only" models were tested. We used Jeffrey-Zellner-Siow (JZS) priors with the default prior scaling factor (r = 0.5).
In addition, to quantify the effect of spatial attention on the SSVEP response, we analyzed RESS scores for the time windows in which only neutral RSVPs were shown (i.e., in the time window prior to change in emotional content) for attended and unattended visual hemifield using a repeatedmeasures 2 × 2 ANOVA with the factors Attention (attended/ unattended) and Presentation Rate (4 Hz/6 Hz).

| Behavioral data and SAM rating analyses
Correct button presses within 250 ms to 1,000 ms of the onset of a target were considered hits. Button presses in response to distracters presented within that time interval were considered false alarms. Similar to the SSVEP analysis above, we did not anticipate group differences in performance as a function of viewing 4 or 6 Hz RSVP streams presented in the left or right visual field, and therefore, we combined the two groups. Target detection rate (% of hits) and reaction times to correctly identified targets were analyzed using a 2 × 2 × 2 repeated-measures ANOVA with within-subjects factors of Switch Time (before vs. after), Attended Frequency (4 vs. 6 Hz), Change in Content (yes/no). In addtion, false alarms (% of reactions toward distracters) were analyzed with a 2 × 2 × 2 repeated-measures ANOVA with within-subjects factors of Switch Time (before vs. after), Unattended Frequency (4 vs. 6 Hz), Change in Content (yes/no), respectively.
Mean ratings for picture valence and arousal were analyzed using a 2 x 2 repeated measures ANOVA with the factors Emotion (unpleasant vs. neutral) and Picture presentation time (250 vs. 167 ms). Significant interactions were followed-up using planned comparisons with Bonferroni-Holm correction for multiple comparisons applied.

| SAM ratings
For valence ratings, the 2 (Emotion) × 2 (Picture presentation time) repeated-measures ANOVA indicated a significant main effect of Emotion (F 1,59 = 269.26, p < .0001, 2 g = 0.74), with unpleasant images being perceived as more aversive than neutral ones. A main effect of Picture presentation time was also statistically significant (F 1,59 = 6.87, p =.01, 2 g = 0.002), with slightly higher valence ratings for images displayed for ~167 ms relative to 250 ms. Both main effects were further qualified by the presence of a significant interaction Emotion x Picture presentation time (F 1,59 = 6.72, p = .01, 2 g = 0.002). Follow-up contrasts revealed that the judgments on valence for neutral scenes were comparable across their presentation times (mean difference = 0.009, p = .78). Conversely, emotionally unpleasant images were rated as slightly more negative (see Figure 3a) when their exposure time was 250 ms relative to ~167 ms (mean difference = −0.12, p < .001).

| SSVEP amplitudes
The 2 × 2 ANOVA on the time window prior to change in emotional content revealed that there was a significant main effect of Frequency (F 1,59 = 9.55, p =.003, 2 g = 0.03), with overall higher RESS values for 6 Hz presentation rate relative to 4 Hz rate. Importantly, there was a significant main effect of Attention (F 1,59 = 16.19, p =.0002, 2 g = 0.02), indicating increased response magnitude for attended relative to unattended RSVPs and demonstrating that attention was shifted to the cued visual hemifield, rather than split evenly across both RSVP streams.
Further analysis (based on the absolute values of the RESS difference scores, as described above) showed a significant main effect of Change in Content (F 1,59 = 15.70, p <.001, 2 g = 0.02), demonstrating that the SSVEP amplitudes were consistently modulated when RSVP content switched from neutral to unpleasant content, regardless of whether the image streams were shown at the attended or unattended spatial location. All other main effects and interactions, including the Change in Content × Attention interaction, were not significant (Fs < 2.82, ps >0.1, 2 g < 0.007). Figure 4 shows the amplitude difference scores (in absolute values) and illustrates that amplitude modulation with emotional content was similar across both 4 and 6 Hz presentation rates between attended and unattended conditions.
For the sake of brevity, for Bayesian statistics, we report only the most relevant models below (see also Table 1 for more details). The model for the interaction Change in Content × Attention yielded a Bayes factor of 0.34, which is considered good evidence in support for the null hypothesis (Dienes, 2014) and demonstrates that the SSVEP responses were unlikely to have been modulated by the interactive relation between the factors of attention and emotional content in the present study. Furthermore, by directly comparing the model for Change in Content to the interaction model of Attention × Change in Content (model BF[2] divided F I G U R E 3 Post-experimental valence (a) and arousal (b) ratings for emotionally unpleasant and neutral scenes presented for as brief as 250 ms (one presentation cycle of 4 Hz) and ~167 ms (one presentation cycle of 6 Hz). Like in our recent studies with the identical rating protocols (Bekhtereva et al., 2018(Bekhtereva et al., , 2020, neutral images were judged similarly on valence (a) independent of picture presentation time, whereas emotionally aversive contents were perceived as slightly more negative when they were shown for 250 than for ~167 ms (the mean difference was extremely small, however: −0.12). For subjective image arousal (b), unpleasant scenes were perceived as significantly more arousing than their neutral counterparts, regardless of their presentation time. The violin width is defined by the kernel density of the individual ratings' distribution (i.e., wider violin area indicates that more participants rated within a given score). In addition, boxplots are given superimposed, and individual ratings are provided by color circles | 9 of 14 BEKHTEREVA ET Al.
by BF[6]), we found that the model for the main effect was 28.6 times more probable than the interaction model. Thus, we consider the model containing only the main effect of change in emotional content-with the largest Bayes factor of 9.91-to be the most compelling model of the data. Thus, the statistical results of NHST as well as Bayesian analysis have demonstrated that amplitudes differed reliably as a function of change in emotional content of RSVP image streams, irrespective of whether the image streams were attended or not.

| Behavioral data
Analysis of the target detection rate revealed a significant main effect of Switch Time (F 1,59 = 5.57, p = .02, 2 g = 0.006; mean difference = 1.8%). Participants' target performance was marginally higher after the switch compared to before the switch, regardless of the location of the switch. Furthermore, the main effect of Attended Frequency was significant (F 1,59 = 38.67, p < .0001, 2 g = 0.07; mean difference = 6.3%), showing that during RSVPs shown at 4 Hz as compared to 6 Hz rate, the participants detected slightly more targets. All other effects were not significant (Fs < 3.24, p > .08, 2 g < 0.002). A similar pattern of differences was observed for the reaction times: we found a significant main effect of Switch Time (F 1,59 = 35.29, p < .0001, 2 g = 0.02; mean difference = −16.7 ms), with generally slightly faster reaction times to the targets in the trial interval after the switch, and a significant main effect of Attended Frequency (F 1,59 = 21.76, p < .0001, 2 g = 0.02; mean difference = −15.4 ms), with slightly shorter response times to the targets when the RSVPs were presented at 4 Hz; all other main effects or interactions were not significant (Fs < 3.95, p > .052, 2 g < 0.0008). Together, these results suggest a slight bias in performance in the post-switch part of the trial overall as well as for when visual streams were displayed at a slower presentation rate of 4 Hz, regardless of whether the attended RSVP streams were of emotional or neutral content. F I G U R E 4 RESS components after Fourier transform are presented for each experimental condition as boxplots of absolute mean difference values (between the amplitudes for time window prior to minus time window post-change in emotional content), given in arbitrary units. Boxes represent the inter-quartile range, while whiskers extend 1.5 times above and below the interquartile-range limits. Horizontal lines inside the boxes reflect the median. Individual dots of the absolute mean difference scores for all participants are presented overlaid in circles. The Change in Content signifies whether the RSVP stream changed from neutral to unpleasant content or remained neutral throughout the trial. A change in SSVEP amplitudes was reliably observed when a neutral RVSP stream switched to the presentation of an unpleasant one. This amplitude modulation did not interact with spatial attention, thus occurring automatically or independently of whether RSVP streams were attended or not (see text for more details) 10 of 14 | BEKHTEREVA ET Al.
No statistically significant effects were found for false alarms (Fs < 1.95, p > .17, 2 g < 0.003), with the average false alarm rate of only 1.04% across all conditions.

| DISCUSSION
In this study, we investigated the extent to which SSVEP amplitude modulation by emotional image content occurs independently of the allocation of top-down spatial attention. We employed frequency-tagging to enable direct analysis of neural cortical processing in response to a change in the emotional content of attended and unattended RSVP streams of images shown in the left or right visual hemifield during a covert spatial cueing task. As expected, we found that emotional content modulated the SSVEP response. These results are in accordance with earlier accounts of SSVEP amplification during affective image presentation (Keil et al., 2003(Keil et al., , 2008Schettino et al., 2019;Wieser et al., 2011). Importantly, we found that facilitation of early visual cortical processing of emotionally unpleasant images occurred whether the switch to emotional content occurred in the attended or unattended visual hemifield, in line with previous evidence that preferential processing of emotional information is unaffected by attentional allocation (Carretie, 2014;Pourtois & Vuilleumier, 2006). Our results of the SSVEP responses during the first phase of the RSVP streams, that is, when both streams consisted of neural images clearly showed that participants were compliant with the cue instruction, that is, they shifted attention to the cued visual hemifield to perform the task, excluding the possibility that subjects attended to both streams simultaneously, which was also confirmed by behavioral data (see below). Together, the present SSVEP findings accord with previous electrophysiological and neuroimaging studies proposing that emotionally salient information leads to an involuntary or "automatic" spatial attentional orienting toward emotional cues, resulting in facilitated stimulus processing without explicit instruction to attend (Anderson et al., 2003;Armony & Dolan, 2002;Öhman et al., 2001;Pourtois et al., 2004;Pourtois & Vuilleumier, 2006).
In the framework of "emotional" or "motivated" attention (Lang & Bradley, 2010;Lang et al., 1997;Vuilleumier, 2005), emotionally-laden visual scenes activate multiple neural brain networks which heighten perceptual processing to facilitate adaptive behavior. Our results suggest that the networks mediating the early visual cortex response to the change in affective salience and the networks subserving topdown voluntary attention may be at least partially independent. Affective cues may facilitate processing efficiency of emotionally relevant input through gain control mechanisms similar to those employed by top-down voluntary attention. Previous analyses of brain connectivity during sustained presentation of flickering images suggested that sensory gain may increase the neural representation of emotionally salient features in early visual areas conveyed via re-entrant feedback projections from higher-order cortices and subcortical brain structures mediating the extraction of emotional content (cf. Keil et al., 2009;Wieser et al., 2016). This produces strong modulation of the SSVEP response with emotionally arousing as opposed to neutral visual stimuli, as observed in the present data. Top-down attention may operate in parallel, through neural circuits involving the amygdala, pulvinar, and fronto-parietal areas (cf. Pourtois et al., 2013).
Importantly, our results replicate and extend our recent studies using similar frequency-tagging paradigms. We presented RSVPs of neutral and emotional images at 4 and 6 Hz rates as task-irrelevant distracters in various spatial layouts (Bekhtereva et al., 2019(Bekhtereva et al., , 2020. Sensory amplification of early sensory areas was reliably observed for aversive relative to neutral streams presented in the left and right visual hemifields as task-irrelevant distracters (Bekhtereva et al., 2020). Moreover, attentional capture by the emotional RSVP stream occurred at no cost to perceptual processing of other simultaneously presented stimuli across the visual field. In addition, when a visual foreground task was presented simultaneously with distracting emotional and neutral RSVP streams in the background, emotionally arousing images enhanced early visual perceptual processing more strongly than neutral images (Bekhtereva et al., 2019). Notably, we found an increase in SSVEP amplitude in response to unpleasant RSVPs shown at 4 Hz and a decrease for emotional relative to neutral visual streams shown at 6 Hz rate. As we argued earlier, the reversed emotional effect could not be attributed to enhanced T A B L E 1 Bayes factors (BF) and percentage of proportional errors (% pe) for most relevant models obtained by using JZS priors processing of neutral as compared with emotional scenes (Bekhtereva et al., 2019), and was not driven by low-level image properties (i.e., color or spatial frequencies; Bekhtereva & Müller, 2015) or image valence (Bekhtereva et al., 2018). Instead, linear modelling suggested that the reverse emotional SSVEP effects across presentation rates were a consequence of linear superposition of ERP waveforms in response to the individual images constituting the RSVP stream (Bekhtereva et al., 2018;Capilla et al., 2011). Specifically, given systematic differences in ERP amplitudes between affective and neutral images (Peyk et al., 2009;Schupp et al., 2004), linear superposition of the consecutive ERPs produces attenuation or enhancement of power at the fundamental frequency of the SSVEP through destructive or constructive interference, respectively (cf. Bekhtereva et al., 2018Bekhtereva et al., , 2020. Thus, the direction of the effect may depend on presentation rate (e.g., 4 vs. 6 Hz). We therefore examined the absolute magnitude of the emotion effect, which was comparable across 4 and 6 Hz RSVP streams for both the attended and unattended spatial locations.
The automatic attentional bias toward unattended emotional image streams in the present data is at odds with earlier reports showing that emotional content processing depends on spatial attention and is strongly eliminated when attention is explicitly directed away from visual emotional cues (Brassen et al., 2010;Eimer et al., 2003;Mitchell et al., 2007;Pessoa et al., 2002;Silvert et al., 2007), particularly under conditions when attentional demands are so high as to fully consume attentional resources. In the current experiment we did not parametrically manipulate the difficulty of the visual task and, therefore, we cannot entirely exclude the possibility that under a more challenging task the observed SSVEP modulations might be attenuated or even eliminated at unattended spatial locations. Nevertheless, we previously found that perceptual task load did not impact SSVEP modulation by emotional material (Hindi Attar & Müller, 2012). In that study, participants performed either a detection (low load) or discrimination (high load) task while task-irrelevant neutral or emotional pictures drove a background SSVEP. Notably, unpleasant images attracted more attentional resources from the visual foreground task than neutral images, regardless of the level of task difficulty. Thus, task load may be a "weak modulator" of attentional biases at early visual stages of perceptual processing as reflected in SSVEP response.
In the behavioral data, the present visual task did not show interference of affective content in the attended RSVP stream on task performance in terms of target detection accuracy or response times. There was a small modulation of reporting accuracy and reaction times in the time period after the change in image valence, with slightly faster reaction times (mean difference of 16.7 ms) and marginally more hits (mean difference of 1.8%), regardless of whether the attended RSVP stream was neutral or unpleasant. Because of the lack of interaction with emotional image valence, this effect may be due to a generic effect of a change in the image streams across the visual field, rather than a change in emotional valence per se. In addition, slightly higher hit rates (6.3% difference on average) with shorter response times (15.4 ms difference on average) were observed when RSVPs were shown at 4 Hz relative to 6 Hz. A similarly small bias in hit rates in a visual target detection task was observed in our recent experiment where the task was overlaid on the task-irrelevant RSVP streams of neutral and emotional pictures presented in the background at a 4 Hz relative to a 6 Hz rate (Bekhtereva et al., 2019). Thus, it may be slightly easier to detect a target displayed within a slower 4 Hz-stream than in a 6 Hz-RSVP stream. However, the present differences, although statistically significant, were minimal. The lack of behavioral distraction effects with emotional image presentation in similar RSVP protocols might be due to perceptual interference from visual masking (Bekhtereva et al., 2019;Keysers & Perrett, 2002). Fast periodic presentation of multiple pictures in a visual stream may have impaired analysis and extraction of image content, thus reducing the capacity of emotional cues to create distraction (see Bekhtereva et al., 2019 for more discussion on this). Furthermore, the number of button presses in response to events occurring in the hemifield with the unattended RSVP stream were minimal, constituting around 1% of responses, with no statistically discernible differences in false alarms between experimental conditions. Together with the spatial attention effect as indexed by the SSVEP response modulation elicited by the to-be-attended versus the to-be-ignored neutral picture stream, this finding strongly suggests that subjects were compliant with the task and the change to an unpleasant RSVP stream indeed pulled attention toward the unattended visual hemifield.
In addition, valence and arousal image ratings confirmed that emotionally negative pictures were perceived as more unpleasant and arousing than neutral images. For valence, we observed that unpleasant scenes were judged as slightly more aversive when a visual scene was briefly shown for 250 ms relative to a ~167 ms exposure time, with absolute average differences of 0.12 on a scale from 1 (very unpleasant) to 9 (very pleasant). These results closely mirrored our earlier findings (Bekhtereva et al., 2019(Bekhtereva et al., , 2020. While it has previously been documented that picture exposure time can influence the subjective perception of emotional image valence and arousal (Codispoti et al., 2009), the observed differences in our earlier work using identical rating protocols and in the present study are extremely small. Nevertheless, even with very short image presentation durations, our findings highlight that emotional valence and arousal can be rapidly extracted from visual images.
In conclusion, the current experiment provides direct electrocortical measurement of early visual cortical activity in response to neutral or emotional distracter-images presented in rapid visual streams at spatially attended and unattended locations in the left and right visual hemifields during a covert spatial cueing task. The valence-dependent SSVEP amplitude modulation clearly demonstrates that emotionally laden visual scenes reflexively draw visual processing resources, effectively overriding spatial attention. This finding is consistent with the notion of automatic attentional capture by emotionally significant stimuli and is in line with the account that SSVEP amplitude modulation by emotional valence may be partly triggered by re-entrant feedback projections onto lower-tier visual cortices from higher-order processing areas subserving the extraction of affective image content Norcia et al., 2015;Wieser et al., 2016).