Unexpected sounds inhibit the movement of the eyes during reading and letter scanning

Novel sounds that unexpectedly deviate from a repetitive sound sequence are well known to cause distraction. Such unexpected sounds have also been shown to cause global motor inhibition, suggesting that they trigger a neurophysiological response aimed at stopping ongoing actions. Recently, evidence from eye movements has suggested that unexpected sounds also temporarily pause the movements of the eyes during reading, though it is unclear if this effect is due to inhibition of oculomotor planning or inhibition of language processes. Here, we sought to distinguish between these two possibilities by comparing a natu-ral reading task to a letter scanning task that involves similar oculomotor demands to reading, but no higher level lexical processing. Participants either read sentences for comprehension or scanned letter strings of these sentences for the letter ‘o’ in three auditory conditions: silence, standard, and novel sounds. The re-sults showed that novel sounds were equally distracting in both tasks, suggesting that they generally inhibit ongoing oculomotor processes independent of lexical processing. These results suggest that novel sounds may have a global suppressive effect on eye-movement control.


| Motor inhibition following unexpected sounds
While the orienting response appears to be an important determinant of deviance distraction, recent evidence has indicated that unexpected events may also trigger an automatic inhibition of motor actions.For example, unexpected sounds temporarily suppress corticospinal excitability (Wessel & Aron, 2013) and induce a reflexive diminution of the force exerted on a transducer held between two fingers (Novembre et al., 2018).These and several other studies (Dutra et al., 2018;Iacullo et al., 2020;Wessel, 2017;Wessel & Aron, 2017) suggest that unexpected events trigger an automatic and global inhibition of ongoing motor actions.
The motor inhibition and attention orienting responses are thought to form an automatic cascade of events that follows all unexpected events (Wessel, 2018).According to Wessel (2018), the violation of expectations triggers an automatic inhibition of motor and cognitive processes, which in turn facilitates the subsequent attention orienting response.The inhibitory response is thought to be implemented via a frontobasal ganglia network, including the right inferior frontal cortex (rIFC), the pre-supplementary motor area (pre-SMA), and the subthalamic nucleus (STN).However, because both the inhibitory and attention orienting responses activate the rIFC, the two may form an inseparable part of the automatic cascade that follows unexpected sounds (Diesburg & Wessel, 2021;Wessel, 2018).
We recently tested whether a similar motor inhibition response may also occur in oculomotor behavior (Vasilev et al., 2019;Vasilev et al., 2021) and showed that the movement of the eyes is inhibited by unexpected sounds.More specifically, we showed that unexpected sounds lead to an immediate increase in fixation durations during reading.This finding is consistent with the notion that unexpected sounds trigger motor inhibition (Wessel & Aron, 2017), though it is not possible to rule out alternative explanations, such as the possibility that the effect may be due to an inhibition of language processes.More broadly, it is also not known whether unexpected sounds have a "global" inhibitory effect on eye movements, as such inhibition has been observed only within the confines of a reading task.
In the present study, we sought to determine whether oculomotor inhibition following unexpected sounds is completely independent from the lexical processing of words that occurs during reading.If this is the case, it would suggest that unexpected sounds may trigger an inhibitory neurophysiological response aimed at stopping ongoing eye-movement actions.We first briefly review some basic findings from eye movements during reading and other similar tasks and then consider how these tasks can be used to infer the nature of eye-movement inhibition by unexpected sounds.

| Eye-movement control during reading and letter scanning
When reading a text, the eyes alternate between short periods of relative stability (fixations) and quick ballistic movements (saccades).Fixations typically last some 225-250 ms (Rayner, 2009) and allow readers to uptake high-resolution visual information from the text.Saccades, on the other hand, move the eyes to new points of interest, typically by some 7-8 characters on average (Rayner, 1978;Yang & Vitu, 2007).Most reading saccades are progressive and bring the eyes forward in the reading direction.However, about 5%-20% of all saccades are regressions that bring the eyes back to previously read words (Inhoff et al., 2019).Regressions can occur for different reasons, such as comprehension difficulties, incomplete word processing, or low-level visual-motor processes (Vitu, 2005).While most words in the text receive at least one fixation, some are skipped and are never directly fixated.Content words are usually skipped about 15% of the time, but function words are skipped around 65% of the time (Rayner, 2009).Skipping is more common when words are shorter and more predictable from the previous text context (Balota et al., 1985;Kliegl et al., 2004;Rayner, 1979Rayner, , 1986;;Rayner et al., 2011;Rayner & Well, 1996).When words are fixated, the initial fixation typically lands close to the word's center but shifts further to the left with increasing word length and more distant launch site of the previous saccade (McConkie et al., 1988;Radach & McConkie, 1998;Rayner, 1979;Vitu et al., 1990).
There is now abundant evidence showing that fixation durations are sensitive to the ongoing cognitive processing of the text (Rayner, 1998(Rayner, , 2009)).For instance, fixations are shorter when readers are fixating on highfrequency words compared to low-frequency words (Inhoff & Rayner, 1986;Rayner & Duffy, 1986;Schilling et al., 1998;White, 2008).In addition, fixations are shorter when readers fixate on words that are highly predictable from the previous context, compared to words that are not predictable (Kliegl et al., 2006;Luke & Christianson, 2016;Rayner et al., 2005;Rayner & Well, 1996).These and other findings have suggested that the cognitive processing of the text can exert a direct influence on eye movements (Rayner & Reingold, 2015).Consequently, most modern models of reading (е.g., Engbert et al., 2005;Reichle et al., 2009;Snell et al., 2018) assume that eye movements reflect the complex interplay between the cognitive processes involved in extracting information from the text and the lower level oculomotor processes involved in planning saccades from one location to the next.
Interestingly, oculomotor control during reading appears to share a lot of similarities with other tasks that resemble reading but do not involve any lexical processing.This has led some to suggest that there may be a general oculomotor program that is used for the spatial planning of saccades during reading-like tasks (Al-Zanoon et al., 2017).For instance, in the z-reading paradigm (Vitu et al., 1995) real sentences are converted into meaningless strings by replacing all characters with the letter "z" (e.g., The cat hunted the mouse ->Zzz zzz zzzzzz zzz zzzzz).This effectively preserves the spatial layout of the text but removes all higher-level linguistic information.Such studies have shown that eye movements during reading and z-string scanning are fairly similar in some aspects-for instance, saccade lengths and initial landing positions are generally the same (Nuthmann et al., 2007;Rayner & Fischer, 1996;Vitu et al., 1995).However, some important differences have been noted as well: z-string scanning leads to longer fixation durations (Al-Zanoon et al., 2017;Gagl et al., 2022;Rayner & Fischer, 1996;Vitu et al., 1995), increased skipping of longer words (Rayner & Fischer, 1996;Vitu et al., 1995), and fewer regressions (Nuthmann et al., 2007) compared to reading.Comparable results have also been obtained with studies using false fonts (Henderson & Luke, 2012, 2014; though see Luke & Henderson, 2016), which also remove all text meaning (including orthographic information, which is at least somewhat present in the z-reading paradigm).
While the theoretical interpretation of these results continues to be debated (Al-Zanoon et al., 2017;Nuthmann et al., 2007;Rayner & Fischer, 1996;Reichle et al., 2010;Vitu et al., 1995), it is hard to deny that both reading and letter scanning impose similar demands on the oculomotor system, particularly with regard to the spatial planning of saccades.For this reason, letter scanning can be used as an oculomotor "control" condition to reading, which preserves the spatial aspects of saccade planning but removes the higher level linguistic processing of the text.In the present study, we used this oculomotor control condition to test whether the effect of unexpected sounds on reading is limited to the low-level oculomotor planning of saccades or whether it also influences the higher-level processing of lexical information in the text.

Evidence from eye-movements
While there is evidence suggesting that unexpected sounds can affect pupil dilation (Liao et al., 2016;Marois et al., 2018;Marois et al., 2020;Marois & Vachon, 2018;Ríos-López et al., 2022;Wetzel et al., 2016;Zhao et al., 2019), surprisingly little is known about how they influence eye-movement responses.In one study, Graupner et al. (2007) presented visual and auditory distractors 100 ms into every fifth fixation, while participants were engaged in a picture viewing task.The auditory distractors consisted either of 17 "standard" sounds (i.e., the same sinewave tone repeated 17 times), 16 standard sounds, and 1 pitch deviant sound, or no distractors.They found that the presentation of a single auditory deviant led to an increase in fixation durations.The auditory deviant led to saccadic inhibition (i.e., reduction in the proportion of terminated fixations), which was observed first at around 90 ms and a second time around 150 ms after the sound's onset.Similarly, Widmann et al. (2014) reported that sound intensity and pitch deviants led to an inhibition of microsaccades (i.e., miniature saccades occurring within a fixation) in a sound categorization task some 142-148 ms after their presentation.
More recently, Vasilev et al. (2019) presented either five standard sounds or four standard sounds and one deviant sound as participants read short sentences.The sounds were played as soon as participants fixated specific words in the sentence.The results showed that the deviant sound (a burst of white noise) led to an increase in fixation durations immediately after its presentation.The effect appeared to originate some 180 ms after the sound onset.In addition, Vasilev et al. (2021) found that novel sounds are more distracting when presented approximately in the middle of fixation (i.e., with 120 ms delay), as opposed to the beginning of fixation (i.e., with 0 ms delay).This may occur because distraction occurs in the critical stages of saccade planning at the end of the fixation, though this evidence is not sufficient to rule out alternative explanations (e.g., interference with lexical word processing that occurs during fixations).

| Present study
To summarize, unexpected sounds have been shown to inhibit motor responses some 150 ms after their presentation (Iacullo et al., 2020;Wessel & Aron, 2013).Recent eye-tracking evidence has suggested that they may also inhibit the movement of the eyes, particularly during tasks such as reading.However, the exact reason why unexpected sounds lead to an increase in fixation durations during reading remains poorly understood.
One hypothesis derived from Wessel andAron (2013, 2017) is that unexpected sounds may cause general inhibition of oculomotor control, thus inhibiting (i.e., freezing) the planning of eye movements for a short period of time.This would correspond to the initial motor inhibition process in the automatic cascade, where all unexpected sounds freeze ongoing motor actions to facilitate the processing of the unexpected event (Wessel, 2017).We will refer to this as the saccadic inhibition hypothesis (SIH).
An alternative explanation is that the prolonged fixation durations are due to interference in the lexical processing of words, which takes longer to complete either due to the shift of attention to the sound or the inhibition of cognitive processes as part of the automatic cascade.We will refer to this explanation as the lexical inhibition hypothesis (LIH).The LIH is a plausible explanation as lexical processing affects fixation durations (see above), and this processing is thought to start prior to the saccade planning stages (Engbert et al., 2005;Reichle et al., 2009).As the effect of unexpected sounds on eye movements may occur as early as 90-150 ms post sound onset (Graupner et al., 2007;Vasilev et al., 2019), it would still be within the time window of when lexical processing usually occurs.However, because both hypotheses predict that unexpected sounds will lead to an increase in fixation duration, it has been difficult to differentiate between the two.
Currently, there is no conclusive evidence for or against either hypothesis, but initial evidence seems to favor the SIH over the LIH.First, observational (covariate) analyses suggest that the distraction effect is not modulated by the lexical frequency of the fixated words (Vasilev et al., 2019), which suggests that it could be independent from ongoing word processing.Second, the effect is stronger when presented in the middle of fixations (Vasilev et al., 2021), pointing toward disruption in the later saccade planning stages.Nevertheless, there is no conclusive evidence that rules out the role of higher-level language processes in novelty distraction.In fact, more complex distractors such as speech and music are well known to interfere with language processing during reading (Hyönä & Ekholm, 2016;Meng et al., 2020;Yan et al., 2018;Zhang et al., 2018).Therefore, the present study sought to determine whether distraction by unexpected sounds is purely oculomotor in nature (i.e., related to the planning of eye-movements), whether it is mostly related to the lexical processing that occurs during reading, or whether it's some combination of the two.The last option is still an important one to consider because the two hypotheses are not mutually exclusive and lexical inhibition may occur first, followed by saccadic inhibition later.
We presented participants with two tasks: a reading task and a letter scanning task.In the reading task, participants read short sentences for comprehension.In the letterscanning task, the same sentences were transformed into z-strings.To ensure that participants had an explicit task, the letter o was randomly inserted into the string 1-4 times, and participants were instructed to scan the string and count how many times the letter o appeared in it.It is important to note that, while both tasks contained letter stimuli, only the reading task required the lexical processing of words.Therefore, even though the scanning task may also elicit some (likely limited) orthographic processing of the letters "z" and "o", it did not involve any lexical processing due to the lack of real words in the strings.Participants performed the task in three sound conditions: (1) silence (baseline); (2) five standard sounds; (3) four standard and one novel sound.
We expected that there would be a main effect of novelty distraction, with novel sounds leading to longer fixation durations compared to standard sounds (hypothesis (H) 1).If novelty distraction is purely oculomotor in nature (SIH), the effect should be present in both tasks (i.e., there should be no interaction between them), as they both involve similar oculomotor demands (H2).However, if the novelty distraction effect is purely related to interference in the lexical processing of words (LIH), there should be a sound by task interaction, with the effect present only in the reading task, but not in the scanning task (H3).This is because the reading task contained real words that could be lexically processed, but the scanning task did not.If the novelty distraction effect is present in both tasks, but is stronger in the reading task, this would suggest that it reflects a mixture of saccadic and lexical inhibition (H4).

| Participants
A total of 72 Bournemouth University students (55 females) participated for course credit.Their mean age was 20.2 years (SD = 4.2 years; range = 18-53 years).Participants were native English speakers, who reported normal hearing, normal (or corrected-to-normal) vision, and no reading disorders.Ethical approval was obtained from Bournemouth University (ID: 28216).Each participant provided informed written consent.
Sample size was determined a priori with power simulation using the simr package (Green & Macleod, 2016) based on previous data (Vasilev et al., 2021).The simulations assumed an α level of 0.05, a 25% random data loss, a novelty distraction effect size of d = 0.164, and a potential interaction between sound and task type that is half that effect size.The results (see Figure 1b) showed that 50 participants are required to reach >80% power

Silence trials
The new comet was discovered after analysing the latest data collected by the space probe.

Reading task:
Zzz zzz zzzzz zzz zzzzzzzzzz zzzzz zzzzzzzzz zoz zzzzzz zzzz zzzzzzzzz zz zzz zzzzz zzzzzz Scanning task: The new comet was discovered after analysing the latest data collected by the space probe.(d) The new comet was discovered after analysing the latest data collected by the space probe.
(120 ms delay) (this was rounded off to the nearest counter-balanced number).

| Design and materials
The design is illustrated in Figure 1.The experiment had a 2 (task: reading, scanning) × 3 (sound: silence, standard, novel) within-subject design.There were 180 experimental items (30 per condition), and the assignment of items to conditions was counterbalanced with a full Latin square design (see the Data S1 for the full list of items).The two tasks were blocked and the order of the two blocks was counterbalanced across subjects.Within each task block, the sound conditions were also split into 2 sub-blocks: one block consisting only of the silence (control) trials, and another block consisting of the standard and novel (experimental) trials.The order of the two sub-blocks was also counterbalanced across subjects.In each standard and novel trial, five sounds were played.In the standard trials, five standard sounds were played.In novel trials, four standard and one novel trial was played.The first sound was always a standard one to reactive the representation of the standard in memory (Cowan et al., 1993).The novel sound was then played in one of the remaining positions (2-5) with equal probability across the experiment.The position of the novel sound was counter-balanced across subjects such that it occurred equally often at each position for each item.Thus, overall, each participant completed 30 trials per sound condition in each of the two tasks.Because the standard and silence trials resulted in more observations compared to the novel trials (where only 1 novel sound was presented per trial; see Figure 1a), we sampled one target word per trial from the standard and silence trials based on the design matrix used to counterbalance the presentation of novel sounds.This ensured that an equal number of observations per sound condition were included in the analysis (prior to trial exclusions) and that each sound condition occurred equally often for each item and for each target word position, resulting in a fully counter-balanced analysis.

| Reading task
Participants were presented with single-line sentences to read for comprehension.There were 180 sentences in total (120 taken from Vasilev et al., 2019 and 60 new ones written for this experiment).The new and old sentences had comparable performance metrics and difficulty/naturalness ratings by an independent set of participants (see the Data S1).The sentences were 15.1 words long on average (SD = 1.83 words; range: 13-21 words).In each sentence, there were 5 "target" words on which the sounds were played (these were always the 3rd, 5th, 7th, 9th, and 11th word in the sentence).The targets were all arbitrary words in the sentence, but they were chosen to be longer in order to increase their first-pass fixation probability (see Kliegl et al., 2004;Rayner, 1979).The average target word length was 7.16 characters (SD = 1.96 characters; range = 3-14 characters).The targets were separated by one non-target word to increase the sound interstimulus interval.After 40% of sentences, a multiple-choice comprehension question with 4 options was presented.For example, in the sentence from Figure 1c

| Letter scanning task
The scanning task was loosely based on the z-string reading paradigm (Vitu et al., 1995).We preferred this paradigm over using false fonts (e.g., Henderson & Luke, 2012;Luke & Henderson, 2016), as it kept the stimuli in the two tasks visually more similar.The characters in the sentences from the reading task were replaced with the letter z, preserving capitalization and empty spaces (but not punctuation, to avoid distractors in the search task).The letter o was then randomly inserted 1-4 times in the string of zs (counter-balanced across conditions). 1 Participants were then instructed to scan the string of letters and count how many times the letter o appears in it.Each "sentence" was divided into four equal parts, and each letter o was inserted in one of these parts on a random "word", with the constraint that there was a maximum of one letter o per each "word".In cases where there were 4 os (maximum), the last one was inserted after the final target to ensure that participants did not stop scanning the after they had found all 4 of them.After 40% of trials, participants were asked to indicate how many os were present in the string (A multiple-choice question with 4 options corresponding to 1-4).This ensured that performance was assessed in the same format in both tasks.
An examination of the eye scanpaths (von der Malsburg, 2018; Von der Malsburg & Vasishth, 2011) during the reading and scanning tasks suggested that participants scanned and read the sentences in a similar way, moving mostly left-to-right and making occasional regressions (see the Data S1 for more details). 1The exact parameters of the search task were piloted, and this configuration was found to result in oculomotor behavior that was most similar to the reading task.All sounds were 120 ms long and were sampled at 44.1 kHz (16-bit, stereo).The standard sound was a 400 Hz sine wave with 10-ms fade-in/ fade-out ramps.Sixty different environmental sounds (e.g., drill, phone ringing, engine, door) were used as novel sounds.These were adapted from Andrés et al. (2006) (originally taken from Escera et al., 1998).The novel sounds were randomly assigned to sentences for each participant.

| Apparatus
Eye movements were recorded with an Eyelink 1000 tracker at 1000 Hz.Viewing was binocular, but only the right eye was tracked (the left eye was used for four participants due to tracking problems).Participants' head was stabilized with a chin-and-forehead rest.Visual stimuli were presented on a 24.5" Alienware 25 LCD monitor (resolution: 1920 × 1080; refresh rate: 60 Hz).Sounds were presented using Bose QuietComfort 25 noise-canceling headphones at 65 ± 1.5 dB(A).The experiment was performed on a Windows 7 PC, which had a Creative Labs Sound Blaster SB0770 sound card with a 12 ms output latency. 2  The experiment was programmed in Matlab 2014a (MathWorks, 2014) using the Psychophysics Toolbox v.3.11(Brainard, 1997;Cornelissen et al., 2002;Pelli, 1997).Sentence stimuli were displayed in 18 pt.Courier New monospaced font and appeared as black text over a white background.The text appeared on a single line in the middle of the screen vertically and with a 50-pixel offset horizontally.The width of each letter was 14 pixels.The monitor width was 54 cm and the eye-to-screen distance was 62 cm.Each letter subtended 0.343° horizontally.

| Procedure
The experiment started with a 3-point horizontal calibration.Calibration accuracy was kept at <0.3° across the experiment.Drift checks were presented before every trial and participants were recalibrated whenever necessary, but at least every 45 trials.Beeps during calibration/drift check were turned off.In the reading task, participants were instructed to read the sentences for comprehension.In the scanning task, participants were instructed to scan the letter strings and count how many times the letter "o" appears in the string.Participants were not given specific instructions on how to scan the strings (or asked about their strategy at the end), as the pilot study suggested that most people adopt the same left-to-right scanning strategy that is typical for reading.In both tasks, participants were instructed to ignore any sounds they may hear and focus on the task at hand.Each task started with six practice trials (done in silence), followed by the experimental trials.Participants were offered 3 breaks at equal intervals.
Each trial started with a black gaze box (40 × 40 pixels) centered at the first letter in the text.Once a stable fixation inside the box was detected, it disappeared, and the sentence/letter string was presented.The sounds were played using the auditory boundary-change technique (Inhoff et al., 2002;Rayner, 1975).An invisible boundary was placed at the start of the empty space prior to the target.Once the eye crossed the boundary to the right, the sound was played after a 120 ms delay (Vasilev et al., 2021).The delay was inserted so that the sound is played, on average, approximately in the middle of the next fixation, which is when novelty distraction is stronger (Vasilev et al., 2021).Empirically, the sound occurred on average 118.3 ms after fixation onset, close to the desired value of 120 ms.Participants pressed the left button of the mouse to terminate the trial and to answer the task questions.

| Data analysis
Two types of analyses were conducted: (1) a "global" analysis of eye-movements across the whole trial to investigate possible differences in oculomotor control between the two tasks (duration of all fixations, number of fixations per word, saccade length, initial landing position, first-pass skipping probability, and regression probability); and (2) a "local" analysis of the first fixation duration during which the sound is played, to investigate the novelty distraction effect. 3This is because the novelty distraction effect has been found to be immediate and constrained only to the initial fixation (Vasilev et al., 2019(Vasilev et al., , 2021)).In the analyses, a word was considered fixated if the average gaze point during a fixation fell within the pixel coordinates of the word on the screen (including the empty space immediately before 2 Due to hardware failure, the last nine participants were tested on a Windows 10 PC with a Creative Sound Blaster Z sound card (with all other equipment being equal).The audio timing was tested with the Black Box Toolkit v.2 and adjusted to be identical to that of the original set-up. 3To limit data loss and because the targets were just arbitrary words in the sentence, the first fixation data included cases where the target was skipped and another word was fixated.However, excluding these target words skips from the data did not change the results or conclusions (see the Supplemental Materials).it).Conversely, a word was considered skipped if it did not receive a fixation during first-pass reading (i.e., the initial left-to-right reading of the sentence) and the eyes landed on another word to its right.
The data were analyzed with (Generalized) Linear Mixed Models in R v.4.21 (R Core Team, 2022). 4Treatment contrast coding was used for the Sound condition (baseline: standard) and sum contrast coding was used for the Task condition (reading = 1; scanning = −1).Fixation durations were log-transformed in the models.Participants and items were added as random intercepts in the models (Baayen et al., 2008), and we attempted to add the sound and task factors as random slopes (Barr et al., 2013).If the models failed to converge, the slopes were removed one by one until convergences was achieved (the exact structure for each model is reported in the Results).The results were considered as statistically significant if the |t| or |z| values were ≥1.96.Bayes factors were also calculated for the first fixation duration model testing the present hypotheses (see the Data S1 for more details).In addition, empirical effect sizes are reported in Cohen's d (Cohen, 1988).
During the preprocessing of global reading measures, 4.12% of fixations were removed due to blinks and 2.12% of fixations were removed as outliers (<80 or >1000 ms).This left 93.76% of the data for analysis.In the preprocessing of the first fixation duration data, 18.1% was removed because the boundary was not crossed in a forward saccade or the trigger to play the sound occurred more than 10 ms after fixation onset, 4.93% was excluded because of binks, 4.69% was excluded due to boundary "hooks" (i.e., a drift of the eye to the right of the boundary during fixation improperly triggers the boundary), and 0.48% was excluded as outliers (<80 or >1000 ms).This left 71.8% of the data for analysis.
The distribution of the first fixation data post exclusions was as follows.In the reading task, there were, on average, 23.8 fixations per subject in the silence condition (SD = 3.60; range = 13-30), 20.6 fixations per subject in the standard condition (SD = 3.91; range = 9-29), and 19.9 fixations per subject in the novel condition (SD = 4.44; range = 9-28).In the scanning task, there were, on average, 24.1 fixations per subject in the silence condition (SD = 3.89; range = 12-30), 20.1 fixations per subject in the standard condition (SD = 4.61948; range = 8-27), and 20.4 fixations per subject in the novel condition (SD = 4.81; range = 8-29).Collapsed across subjects, there were no significant differences between the number of trials included per condition, χ 2 (2) = 0.95, p = 0.6225.

| Global analysis of eye-movement behavior at the task level
Descriptive statistics of global measures are shown in Table 1 and are visualized in Figure 2. (G)LMM results are reported in Table 2.The main effect of task was statistically significant in all models: the scanning task led to longer durations of all fixations (d = −0.08),fewer fixations per word (d = 0.09), longer saccades (d = −0.02),initial landing positions further to the right of the word start (d = −0.06),more first-pass word skipping (d = −0.10),and fewer regressive saccades (d = 0.08).Despite this, the effect sizes were small, indicating only mild differences between the two tasks.The difference in fixation durations between the two tasks may appear surprising as no such difference was observed in total trial durations.However, this is likely because total trial duration is an aggregate measure that is noisier and less specific than individual fixation durations.
The difference between the silence and standard sound condition also reached significance in half of the models: standard trials had 0.01 fewer fixations per word (d = −0.015),saccades in standard trials were 0.14 characters longer (d = 0.014), and words in standard trials had 0.006 less first-pass skipping probability (d = 0.011) compared to the silence trials (i.e., ≈0.09 fewer words skipped per trial, on average).There were no differences between the novel and standard sounds in any global reading measures.
In addition, there were significant interactions between task and silence vs. standard sound for saccade length, skipping, and regression probability.In the saccade length model, the interaction was due to the saccades being 0.12 characters shorter in standard sounds compared to silence for the reading task (d = −0.014),but 0.42 characters longer in standard sounds compared to silence for the scanning task (d = 0.044).In the skipping probability model, standard trials led to 0.002 less skipping compared to the silence baseline in the reading task (d = −0.006;≈0.03 fewer words skipped per trial), but 0.014 more skipping in the scanning task (d = 0.028; ≈0.21 more words skipped per trial).Finally, while the standard sound led to a reduction of 0.003 in regression probability compared to silence in the reading task, it led to a 0.008 increase in regression probability compared to silence in the scanning task (i.e., −0.3% and +0.8% change in regressions, respectively).Clearly, while the interactions were statistically significant due to the large number of observations, their effect sizes were very small and likely of limited practical significance.
To summarize, the scanning task led to fewer, but longer fixations, longer saccades, more distant initial landing positions, greater first-pass skipping, and fewer regressions compared to reading.The methodological control of silence versus standard sounds showed significant differences or interactions with task in four measures, though the estimated effect sizes from the models were only marginally different from zero.Standard sounds led to slightly fewer fixations per word compared to silence.For   the other three measures, standard sounds appeared to have a different effect based on the task.While they led to slightly shorter saccades, less skipping, and fewer regressions during reading, the had the opposite effect during scanning (i.e., longer saccades, more skipping, and more regressions).This shows very mild (though statistically significant) differences in global reading behavior when the standard sounds were playing.

| Local analysis of the novelty distraction effect on first fixation duration
The first fixation duration during which the sound is played was analyzed to test the hypotheses of the present study.The descriptive statistics are shown in Table 3, and the LMM results are shown in Table 4. Consistent with H1, there was a main effect of novelty distraction (Novel vs. Standard), indicating that fixation durations were longer following novel compared to standard sounds, d = 0.205.There was no difference between Silence and Standard sounds.Critically, there was no main effect of task and no interactions with Sound.The Bayes factors also indicated substantial evidence (Jeffreys, 1961;Wetzels et al., 2011) in support of the lack of main effect of task and interaction between task and sound.This supports H2 and suggests that novel sounds were equally distracting in both tasks (see Figure 3).Therefore, H3 and H4 are rejected by default.

| DISCUSSION
Previous studies have suggested that unexpected sounds during reading lead to an immediate increase in fixations durations, but the exact mechanism causing this has not been conclusively determined.We contrasted two viable hypotheses: (1) a saccadic inhibition hypothesis stating that novel sounds cause oculomotor inhibition (i.e., interruption of the planning of saccades) and (2) a lexical inhibition hypothesis stating that they inhibit the lexical processing of words in the text.Because both processes are involved in reading and an interruption of either process would result in longer fixation durations, it has been difficult to isolate the origin of the effect.The present study sought to address this issue by comparing natural reading to an oculomotor control condition of letter scanning, which involves similar oculomotor demands to reading but requires no lexical processing of words.If the distraction is equivalent in both tasks, it would favor the saccadic inhibition hypothesis as both tasks involve similar oculomotor control and any disruption should arise from processes that are shared between the tasks.However, if the effect is present mostly in the reading task, this would favor the lexical inhibition hypothesis as this is the only task that involves lexical processing of words.
The results from the study were quite clear: novel sounds led to an increase in fixation durations compared to standard sounds, thus replicating previous work (Vasilev et al., 2019(Vasilev et al., , 2021)).Crucially, however, there was no interaction between novelty distraction and task, indicating that novel sounds resulted in the same amount of distraction in both tasks.Therefore, this supports H2 that novel sounds lead to equivalent distraction in both tasks.On the other hand, H3 (stating that the effect is present only in the reading task) and H4 (stating that the effect is present in both tasks, but is stronger in the reading one) are both rejected.This result is consistent with the saccadic inhibition hypothesis and the idea that novel sounds cause rapid general inhibition of motor actions (Wessel & Aron, 2013, 2017).Clearly, these results indicate that novelty distraction can occur in the absence of any higher-level language processing (beyond the lowlevel orthographical processing of the two letters in the search task).This also agrees with previous results from scene viewing (Graupner et al., 2007) and categorization tasks (Widmann et al., 2014).
Nevertheless, it is still possible that language-related disruption may occur later in time (after the initial fixation when the sound is played).To test if this is the case, we did a post-hoc test of whether regression probability was affected on the subsequent fixations after playing the sound.The results (presented in the Data S1) showed that regression probability remained the same between the standard and novel sounds and there was no evidence of disruption.
Interestingly, there was no main effect of task during the critical fixation when the sounds were played, which suggests that the experiment was successful in creating very similar conditions for comparing the effect of novel sounds.In fact, as Figure 3a shows, the two tasks resulted in virtually identical fixation durations for the key fixation used to test the present hypotheses.Nevertheless, at the level of the whole trial, modest (but statistically significant) differences emerged in all global reading measures.More specifically, the scanning task led to longer fixation durations, fewer fixations per word, longer saccades, more distant initial landing sites, more first-pass skipping, and fewer regressions.We will briefly consider the meaning of these results before returning to the key findings.
The increase in fixation durations across the whole trial is not surprising as letter scanning of z-strings is well-known to result in longer fixations compared to reading (Al-Zanoon et al., 2017;Gagl et al., 2022;Nuthmann et al., 2007;Rayner & Fischer, 1996;Vitu et al., 1995).Perhaps, the more surprising finding was that the task effect was only a modest 6 ms difference, whereas previous studies have reported much larger differences of 30 to 50 ms (Al-Zanoon et al., 2017;Gagl et al., 2022;Nuthmann et al., 2007;Rayner & Fischer, 1996).We speculate that this may be due to the characteristics of our scanning task, which emphasized the need to continuously scan the strings, look for the target letters and hold the number of letters in working memory, which may have created more similar conditions to reading than in previous studies.Indeed, most other studies have typically not included a specific task and simply instructed participants to scan the strings and "pretend as if they are reading" (though some studies did include a search task condition for a single letter, e.g.Vitu et al., 1995).At any rate, the increase in fixation durations was mild and only statistically significant when analyzing all fixations in the trial.A closer examination of the fixation data (see Figure S7) revealed that the difference between the two tasks was most pronounced for shorter words.As the target words were generally longer (7.16 characters on average), this may help explain why there were no task differences in the critical fixation during which the sound was played.
The differences in word skipping (Rayner & Fischer, 1996;Vitu et al., 1995) and regression probability (Nuthmann et al., 2007) also replicate previous findings and likely reflect the lack of higher-level language processing in the scanning task, which arguably reduced the need for regressions and first-pass fixations on the words.Interestingly, the present research also found significant differences in saccade length and initial landing positions between the two tasks, whereas previous research has generally not (Nuthmann et al., 2007;Rayner & Fischer, 1996;Vitu et al., 1995).We speculate that such differences may simply reflect the higher statistical power of the present experiment, as both effect sizes were quite small (d = −0.02for saccade length and d = −0.06 for landing positions).In summary, while clear differences between the two tasks could be observed in global eye-movement measures, the two tasks were nevertheless broadly comparable in terms of their oculomotor demands and no reliable differences were observed in the critical fixation used to test the present hypotheses.

| Eye-movement distraction by unexpected sounds
The main contribution of the present study was to show that novelty distraction occurs in the absence of lexical processing and that the effect is not limited to reading, but also occurs in other saccadic tasks that rely on similar scanning of visual information.This suggests that the effect is likely oculomotor in nature and not related to the ongoing lexical processing of the text.This result is broadly consistent with Wessel and Aron's (Wessel & Aron, 2013, Wessel & Aron, 2017) theory that unexpected events cause global motor inhibition of responses and other evidence suggesting that surprising or threatening stimuli cause behavioral freezing through changes in physiological responses such as reduced heart rate (e.g., Noordewier et al., 2021;Roelofs, 2017).
As discussed previously, this saccadic inhibition response may reflect general motor inhibition during the automatic cascade following the violation of expectations by novel sounds.This inhibitory response may stop ongoing actions to facilitate the processing of the unexpected sound (Wessel, 2017;Wessel & Aron, 2017).In the context of eye-movements, this may be advantageous as it stops the eyes from sampling new information, thereby reducing cognitive load and allowing for more time and attentional resources to process the unexpected sound.
The motor inhibition response has been argued to recruit a fronto-basal action-stopping network, including the rIFC, the pre-SMA, and the STN (Wessel & Aron, 2013, 2017).The automatic cascade following the detection of unexpected events (which includes both motor inhibition and attention orienting) may be triggered by the rIFC and implemented via a hyperdirect pathway to the STN and basal ganglia (Diesburg & Wessel, 2021).The STN may play a modulating role in the network and have a downstream suppressive effect on thalamo-cortical structures (Wessel & Aron, 2017).We speculate that the same hyperdirect network could also be recruited during eye-movement control, with the STN relaying downstream inhibitory signals.The STN contains visual-motor neurons related to saccadic activity (Fawcett et al., 2005;Matsumura et al., 1992) and influences fixation control via an indirect pathway to the superior colliculus (SC) (Hikosaka et al., 2000).Therefore, the activation of the fronto-basal network may lead to downstream inhibition of the SC, temporarily inhibiting eye movements.However, whether this is the case remains to be tested.
Nevertheless, it is interesting to note that sudden visual changes (e.g., visual flickers of the screen lasting for 33 ms) inhibit saccades some 60-70 ms after their presentation, whereas short auditory stimuli (e.g., beeps) generally do not (Reingold & Stampe, 1999, 2002, 2004).The visual saccadic inhibition effect has been argued to be reflexive in nature and to originate in the superior colliculus due to its quick onset (Reingold & Stampe, 2004).On the other hand, the absence of the same effect in the auditory domain has been taken as evidence that sudden sounds do not yield the same rapid, reflexive inhibition of eye-movements.The present study corroborated this, as standard sounds did not lead to a significant increase in fixation durations compared to silence.This is also in line with other previous studies (Graupner et al., 2007;Vasilev et al., 2019;but see Pannasch et al., 2001).Therefore, this clearly suggests that the saccadic inhibition effect in the present study occurred due to the violation of sensory predictions by the novel sounds and not due to a general response to sudden auditory distractors.Because the novelty inhibition effect has been found to occur later in time (90-180 ms; Graupner et al., 2007;Vasilev et al., 2019Vasilev et al., , 2021) ) compared to the visual inhibition effect reported by Reingold and Stampe (2004), it is possible that it recruits higher level basal and cortical structures.However, further research is needed to find the exact neural origin of the effect.
If the novelty distraction observed in the present study reflects only motor inhibition and the orienting response occurs afterwards, one may wonder why no additional evidence of distraction has been found beyond the immediate increase in fixation durations.One possibility is that our reading task may not be sensitive enough to detect such a transient shift of attention.A more sensitive measure may be a visual discrimination task that participants perform immediately after the sound is played.Incidentally, the scanning task in the present study presented a similar opportunity, as the strings on which the sound was played either contained the target (letter "o") or they did not.A post-hoc analysis indicated that there was a trend toward novel sounds being more distracting when the string contained a target, but the result was not statistically significant (see Table S3).Therefore, there was no reliable evidence for distraction beyond the immediate increase in fixation durations, but more research is needed to explore this.

| Limitations
One limitation of the present study is that the two tasks may have differed in working memory load.While the scanning task required participants to hold the number of targets (letter 'o') in working memory, the reading task did not have such explicit requirement.Because higher working memory load may attenuate the amplitude of the P3a response (Berti & Schröger, 2003;Lv et al., 2010;SanMiguel et al., 2008), the observed distraction in the scanning task could potentially be smaller.Future studies could equalize memory load by requiring participants to hold a number in memory while reading the sentence.Nevertheless, because reading also requires working memory to parse the sentence (Daneman & Carpenter, 1980;Lewis & Vasishth, 2006), we would expect memory processes to be engaged at least to some extent.Therefore, both tasks likely involved working memory, but only the scanning task required participants to maintain a single item in memory for the duration of the trial.
In addition, because the scanning task contained letters, we cannot rule out that unexpected sounds disrupted the lower-level orthographic processing of letters.These processes are thought to occur early on in fixation durations and to precede the later lexical word recognition (e.g., Reichle et al., 2009).However, because the scanning task contained only two letters ("z" and "o"), with the former being highly repetitive, we speculate that such orthographic processes may be limited.In fact, participants may have mostly relied on the shape of letters for scanning, effectively looking for the letter that "pops out" among the uniform distractor of zs (e.g., Treisman & Gormican, 1988;Wang et al., 1994).At any rate, while we cannot rule out that orthographic processes were affected, the present data clearly suggest that novelty distraction occurs independently of the lexical processing of words, as no such information was present in the scanning task.
Finally, because the sounds were played contingent on fixating specific words in the sentence, this may have resulted in the learning of an association between sound processing and eye movements.In the present data, only 16% of participants declared to be aware of a link between sounds and eye movements, and there was no evidence to suggest that this affected the results (see the Data S1).However, such an association may exist outside the participants' awareness.The forming of an association between sounds and eye fixations is perhaps unavoidable in eye-movement research.However, future studies may weaken this association by playing sounds more randomly, perhaps on every n th fixation where n is a random integer (e.g., see Graupner et al., 2007 for a similar approach).

| CONCLUSION
Unexpected sounds have been shown to lead to an increase in fixation durations during reading immediately after presentation, but the exact mechanism behind this has remained elusive.The present study showed that this effect is not related to the lexical processing that occurs during reading, but that it also occurs more broadly in other tasks that require similar spatial scanning of information.These results are consistent with the notion that unexpected sounds induce a general and automatic inhibition of motor processes, and raise the possibility that such inhibition may occur across a range of oculomotor tasks.

F
I G U R E 1 An illustration of the method.(a) In standard trials, five standard sounds were played (one on each "target" word).In novel trials, 4 standard and one novel sound were played (the novel appeared equally often on target word positions 2-5).In silence trials, no audio was played.(b) Statistical power simulation results.(c) An example item in the reading and scanning tasks.The scanning task contained between 1 and 4 letters "o".The five target words on which the sounds were played are highlighted in green.(d) An illustration of the gaze-contingent auditory presentation on target word 3. Once the position of the eye crossed an invisible boundary (vertical dotted line) to the left of each target word, the sound was played with a 120 ms delay.
, the question was: "What was discovered?(a) a new comet; (b) a new galaxy; (c) a new mountain; (d) a new lake".
Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14389 by Test, Wiley Online Library on [25/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License T A B L E 2 (G)LMM Results for all global eye-movement measures.Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14389 by Test, Wiley Online Library on [25/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License values are formatted in bold.Abbreviations: Corr., correlation of random effects; Var., variance of random effects.T A B L E 2 (Continued) 14698986, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14389 by Test, Wiley Online Library on [25/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

F
I G U R E 3 (a) Boxplots of the first fixation duration during which the sound is played, broken down by the experimental conditions.(b) Boxplot of the effect sizes in the first fixation duration during which the sound is played.In both plots, the dots represent individual subject means.The overall mean (across all subjects) is indicated by the larger black dot.Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14389 by Test, Wiley Online Library on [25/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Task Sound Duration of all fixations in trial (in ms) Number of fixations per word Saccade length (in letters)
F I G U R E 2 Box plots and distribution of global reading measures by task.Dots represent individual subject means.The task mean (M) is plotted for all measures.
Descriptive statistics for the first fixation duration during which the sound is played.Linear mixed model results for the first fixation duration during which the sound is played.Statistically significant t-values are formatted in bold.While BF 10 > 1 indicates evidence in support of the alternative hypothesis, BF 10 < 1 indicates evidence in support of the null hypothesis.Abbreviations: BF 10 , Bayes factor comparing the alternative hypothesis to the null hypothesis; Corr., correlation of random effects; Var., variance of random effects.14698986, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psyp.14389 by Test, Wiley Online Library on [25/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Note: