Global–local consistency benefits memory‐guided tracking of a moving target

Abstract Introduction Previous findings have demonstrated that several Gestalt principles do facilitate VSTM performance in change detection tasks. However, few studies have investigated the role of and time‐course of global–local consistency in motion perception. Methods Participants were required to track a moving target surrounded by three different backgrounds: blank, inconsistent, or consistent. Global–local objects were be bound to move together (covariation). During the PMT, participants had to follow the moving target with their eyes and react as fast as possible when the target had just vanished behind the obstruction or would arrive at a predetermined point of interception. Variable error (VE) and constant error (CE) of estimated time‐to‐contact (TTC) and gain of smooth pursuit eye movements were calculated in various conditions and analyzed qualitatively. Results Experiment 1 established the basic finding that VSTM performance could benefit from global–local consistency. Experiment 2 extended this finding by eye‐tracking device. Both in visible phase and in occluded phase, CEs were smaller for the target in a consistent background than for the target in an inconsistent background and for the target in a blank background, with both differences significant (ps < .05). However, the difference in VE among three conditions was not significant. At early stage (100–250 ms), later stage (2750–3000 ms), and termination stage (5750–6000 ms) of smooth pursuit, the velocity gains were higher in the trials with consistent backgrounds than in the trials with inconsistent backgrounds and blank backgrounds (ps < .001). With the exception of 100–250 ms phase, the means did not differ between the inconsistent background and the blank background trials (ps > .1). Conclusions Global–local consistency could be activated within the first few hundred milliseconds to prioritize the deployment of attention and eye movement to component target. Meanwhile, it also removes ambiguity from motion tracking and TTC estimation under some unpredictable conditions, leading to the consistency advantage during smooth‐pursuit termination phase. Global–local consistency may act as an important information source to TTC estimation and oculomotor response in PMT.

advantage during smooth-pursuit termination phase. Global-local consistency may act as an important information source to TTC estimation and oculomotor response in PMT.

K E Y W O R D S
eye velocity trace, global-local consistency, prediction motion tasks, velocity gain, VSTM INTRODUCTION When people observe visual scenes, an important fundamental question is how the visual system organizes the incoming stream of visual information. Early Gestalt theorists have formulated a number of principles that aim to capture the regularities according to which perceptual input is organized or grouped into meaningful units or Gestalts (Koffka, 1922;Wagemans et al., 2012). For example, the principle of proximity refers to parts of the visual field that are close to each other tend to be grouped into one whole (which could be a pattern, a texture, or an object) whereas the principle of similarity states that elements will tend to be grouped together if their attributes are perceived as related (e.g., in color or shape). The Gestalt principle of grouping by good continuation states that we tend to group lines or curves that follow an established direction.
Since the early works on the Gestalt theory of scene perception, a considerable amount of research has been conducted on the global-local interaction. A seminal work by Navon (1977) has demonstrated that the global precedence effect is a prevailing property of object-background processing. Navon presented compound letters representing larger figures (global configurations), which were spatially constructed from a suitable arrangement of smaller figures (local elements), and observed an advantage in the processing of global configurations over local elements (i.e., faster judgments of local shape when local and global shapes are consistent, but not vice versa).
Critically, when global configurations and local elements were inconsistent, responses to the local elements were subject to interference from the global configurations, but local features did not interfere with global perception, which was termed as the "global interference effect." Further investigation has shown, whether compound letters or compound figures, inconsistent stimuli were responded to more slowly than both consistent and neutral stimuli, which did not differ from each other (Poirel et al., 2008). Evidences from event-related brain potential (ERP) studies have shown that consistent stimuli elicited larger N1 amplitude (150-220 ms), which occurs at the early steps of visual processing (Beaucousin et al., 2013).
The global-local consistency effects examined in form perception seem to be generalized to motion perception. In early studies, the Gestalt principle of grouping by common fate indicates that an invisible form composed of randomly arranged dots against a dotted background becomes immediately visible as soon as it moves, by virtue of the common fate of its dots, which all move together with a common speed and direction. The spatial integration of target and background motion signals has been studied by having observers track a pursuit target in the presence of a second moving object, or in front of a stationary or moving textured background (Spering & Gegenfurtner, 2008). Generally, the pursuit of a moving object on a stationary textured background was hampered, and initial acceleration and steady-state velocity were lowered (Masson et al., 1995;Spering & Gegenfurtner, 2008). Nevertheless, the results are more complicated when tracking a moving target on a moving background. A background moving in the same direction as the pursuit target raised pursuit velocity, while a background moving in the reverse direction lowered eye velocity (Masson et al., 1995). Pursuit was not impacted by the alterations in background velocity when the background moved in opposite direction of the pursuit target (Spering et al., 2006;. Therefore, motion signals (e.g., direction consistency, velocity consistency) from the local target and global background have to be integrated hierarchically in order to extract a precise velocity signal for initiating and maintaining an accurate eye movement (Eggert et al., 2009;Ladda et al., 2007;Ogawa et al., 2009;Spering & Gegenfurtner, 2008).
Throughout visual tracking of a moving target, what often happens is that a moving target is momentarily obstructed by additional objects and vanished from view (Albright & Stoner, 2002). In this case, visual short-term memory (VSTM) allows us to temporarily store and process relevant information from the visual world across saccades and other visual interruptions. VSTM is defined as short-term memory for nonverbal, visual information, a buffer that temporarily stores visual information before it can be further processed. Previous findings have demonstrated that several Gestalt principles (e.g., connectedness, common region, and spatial proximity) do facilitate VSTM performance in change detection tasks (Peterson & Berryhill, 2013;Woodman et al., 2003;Xu, 2006;Xu & Chun, 2006). However, relatively little is known about the role that global-local consistency play in visual short-term memory storage. In other words, can global-local consistency be an approach to enhance VSTM function by optimizing the processing of information? In the present study, prediction motion task (PMT) is designed to investigate the observers' estimate ability of the precise position of a moving object while lacking visual information input. In a typical PMT, an independent target moved at constant velocity along the frontoparallel plane and then vanished behind the obstruction; the participants were asked to press a button when they thought the object would arrive and touch a predetermined point of interception Bennett & Benguigui, 2016;Bennett et al., 2010;DeLucia et al., 1998;Flavell et al., 2018;Makin & Bertamini, 2014;Makin & Chauhan, 2014;Makin & Poliakoff, 2011;Makin et al., 2008;Vicovaro et al., 2019). In such situations, how is the globallocal consistency utilized by observers to estimate the precise location of an occluded target?
To investigate the time-course of global-local consistency on motion perception, we asked participants to track a moving target surrounded by three different backgrounds: consistent background, inconsistent background or blank background. Global-local objects were be bound to move together (covariation). Each condition was repeated a few times in random order to construct a representative estimation. During the PMT, participants had to follow the moving target with their eyes and react as fast as possible when the target had just vanished behind the obstruction or would arrive at a predetermined point of interception. Eye movements were recorded in various conditions and analyzed qualitatively to ensure that the participants acted as directed. We hypothesized that the consistent background eased perceptual encoding at the initial stages of visual processing of an object, and also modulated information processing in VSTM. Thus, global-local consistency effects are expected to be significant during visually guided tracking as well as during memory-guided tracking.

Participants
G*power is a free software that helps researchers to calculate the sample size needed when conducting an experiment. We set the power value 1 − β = 0.80 and the effect size f 2 = 0.25, which is a medium effect size value (Cohen, 1992) and got the estimated total sample size to be 19. A group of 25 right-handed undergraduate or graduate students at the Capital University of Physical Education and Sports took part in the experiment for cash compensation. They were between 18 and 26 years of age and reported having normal or corrected-to-normal visual acuity and normal color vision.

Design and stimuli
A 2 × 3 × 2 within-subjects factorial design was used for the experiment 1, with the first factor referring to the target visibility (visible or occluded), the second to the local target (triangle or circle), and the third to the global-local consistency (blank, inconsistent, or consistent). The visual display and response system were controlled from a computer running E-prime scripts (Psychology Software Tools, Pittsburgh, PA).
Before implementing the experiment, we recruited 70 undergraduate or graduate students to rank the candidate stimuli with which stimulus is more likely to occur in natural world. Chi-square test and pairwise comparisons provide the support for screening the global background conditions (χ 2 = 9.886, p = .02). After initial evaluation, there were three types of backgrounds: (i) the blank condition only displayed a small target circle or a small target triangle, without any background elements; (ii) the consistent condition presented a big triangular shape which consisted of a target triangle and eight small circles that were arranged in a triangular pattern, or a big circle shape which consisted of a target circle and eight small triangles that were arranged in a circular pattern; (iii) the inconsistent condition presented a big triangular shape which consisted of a target circle and eight small triangles that were arranged in a triangular pattern, or a big circle shape, which consisted of a target triangle and eight small circles that were arranged in a circular pattern. The triangular background elements were all apex pointing toward the same direction (approximately 2.43 • × 3.81 • ), while the circular background elements all had their apex pointing toward the center of the circle (3.82 • diameter) (see Figure 1).
As depicted in Figure 1, the target was a white filled triangle or circle (0.38 • in diameter), which was randomly presented at one of the four selected orientations on the left side of the screen (see with their eyes, tracking its motion for 6000 ms. After 3000 ms, the target moved behind the dark gray occluding bar (see Figure 2). We told the participants that the target continuously moved at the same velocity beyond the occluded bar. The participants had to push the left mouse button to mark when the target went into the left side of the dark gray occlusion bar and they had to push the button a second time when they thought that the target would arrive at the right side of the occlusion bar. However, following the occlusion period, the target did not appear again in reality. Since our preliminary results showed that the stimuli reappearing after the occlusion result in the ceiling effect of observers predicting, and likewise for when the target presented on the fixed orientation. Therefore, the target in our experiment randomly appears at one of the four selected orientations and never reappear after occlusion to avoid this ceiling effect. There was a 1000-ms waiting period before the initiation of a new trial. We asked the participants to push the button as fast and accurately as F I G U R E 1 Global-local conditions employed in the experiment 1. The white filled circle or triangle to be tracked was embedded in one of the three backgrounds: blank, inconsistent, or consistent. The target circle was randomly presented at the four orientations on the screen F I G U R E 2 Sequence of events in a trial they could. Before the experiment, all participants finished 12 practice trials to acquaint them with the task and the stimuli.
There were ten blocks of 48 trials (total 480 trials). Each block includes all six conditions. All trials were randomized in each block. The target was presented at four possible starting locations (see Figure 1d) with equal probability. There was a 1-min break between the blocks.
Instructions were given to the subjects at the start of the experiment.

Results and discussion
As a perceptual measure, we recorded the actual times when the target Trials with values more than three standard deviations above or below the individual mean were excluded before computing the overall mean  Table 2). A possible explanation is that a target circle moving in a uniform circular motion is commonplace. Meanwhile, people can also obtain the ability of identifying and tracking a target triangle moving in a uniform circular motion by learning and training. Although no difference was found between the two target types, tracking a moving circle seems to be easier and more stable than tracking a moving triangle. Thus, we further investigated the visual characteristics of and time-course of object-background consistency during tracking a target circle in PMT.

Participants
All of the experimental procedures were approved by and conducted We set the power value 1 − β = 0.80 and the effect size f 2 = 0.25, which is a medium effect size value (Cohen, 1992) in the TA B L E 4 Results of repeated ANOVA of 2 (visible or occluded) × 3 (blank, inconsistent, or consistent) × 2 (triangle or circle) for VE

Oculomotor recording and data analysis
Experimental protocols followed Experiment 1. There were five blocks of 60 trials (total 300 trials). Participants was required to follow the white filled target circle in three different conditions with their eyes.
We discarded any data more than three standard deviations away from the mean during each experimental condition (0.7% of responses in total) and then we computed CE and VE.
The participant's eye position was recorded with the SMI iView X RED Remote Eye-tracking Device, which is a remote tracking system that computes the gaze utilizing the reflection of a near infrared light from the cornea and pupil of one eye with a sampling rate of 250 Hz.
Nine-point calibrations were conducted at the start of every block.
The eye-tracking computer was synchronized to the E-prime computer via a parallel port cable. The eye movement data were scored offline.
Blinks, drifts, and other artifacts were identified and removed from the oculomotor data were applied (Bennett & Barnes, 2003). Those saccades were removed from the smooth eye velocity trace. The smooth pursuit gains were

Behavioral results
For CE (see Figure 3a

Eye movements
Eye velocity signals were derived from position signals using a central difference algorithm on a ±10 ms interval. The previous research findings about smooth pursuit system have shown that smooth pursuit is more efficient in the horizontal than in the vertical dimension from newborns, infants, children, adolescent to adults (Engel et al., 1999;González et al., 2019;Grönqvist et al., 2006;Robert et al., 2014;Rottach et al., 1996;Vinuela-Navarro et al., 2019). The horizontal-vertical tracking asymmetry is especially evident when subjects pursued a target moving on a circular trajectory (Collewijn & Tamminga, 1984;Grönqvist et al., 2006;Robert et al., 2014;Rottach et al., 1996). Based on all these studies, we only analyzed the horizontal component of eye velocity.
During visual tracking of a moving target, smooth pursuit response is usually separated into an open-loop phase (the first 100 ms after initiation), and a closed-loop or steady-state phase (Lisberger et al., 1987;Tychsen & Lisberger, 1986 is primarily driven by the target's retinal image velocity, because an internal signal about the eye velocity is not yet available to the system. As eye velocity is gradually adjusted to target velocity, pursuit tends to be steady-state and is mainly maintained by extraretinal inputs, such as efference copy ("eye velocity memory"), remembered target motion ("target velocity memory") and object-background consistency (Bennett & Barnes, 2004). In order to explore how extraretinal signals work to maintain a stable response with high gain, the eye velocity of horizontal smooth pursuit was plotted as a function of time.
The method of data analysis in studies of event-related brain potentials (ERPs) as the reference, the horizontal smooth pursuit trace was segmented into 24 time intervals, with each interval lasting for 250 ms (Blair & Karniski, 1993). Twenty-four mean velocity values were calculated on 22 subjects under each of three conditions. For each time interval, the statistical differences among the three consistency conditions were investigated by means of univariate analysis. Results of univariate analysis are exhibited in Table 5  Navon (1977) argued that global processing is a necessary stage of perception prior to more fine-grained analysis. The "global precedence effect" refers to these findings: (i) responses were faster to the global than the local level and (ii) when the levels were inconsistent, information at the global level interfered with (slowed down) responses to the local level, but not the other way around (Gerlach & Poirel, 2020).

F I G U R E 4 Continued
Global-local consistency effects could be generalized to visual search and recognition tasks (Aivar et al., 2014;Beanland et al., 2016;Castelhano & Pereira, 2018;Truman & Mudrik, 2018). Indeed, perception is temporally ordered so that global information is abstracted first and more local analysis is carried out some time later (May et al., 1995).
Consequently, at early stage (100-250 ms) of pursuit tracking, tracking a target circle embedded in a circular background, the constant error between the actual and estimated TTC decreased, compared with a target circle in isolation. Moreover, significant differences occurred in horizontal velocity gain between the circular and blank background trials.
A smooth pursuit eye movement is induced when we look at a mov- internal signal about the eye velocity is not yet available to the system (Newsome et al., 1985;Dürsteler & Wurtz, 1988). As eye velocity is gradually adjusted to target velocity, however, pursuit tends to be steady-state and is mainly maintained by extraretinal inputs (Bennett & Barnes, 2004). Generally, during steady-state pursuit of an uninterrupted visible target, retinal, and extraretinal input cooperate to maintain a steady reaction with a high gain (Bennett & Barnes, 2003;. Retinal input is obtained from the immediate feedback of visual motion signals, including image velocity and acceleration, while extraretinal input is driven by visual short-term memory (VSTM), such as efference copy ("eye velocity memory"), remembered target motion ("tar-get velocity memory"), volition, attention, and expectation (Bennett & Barnes, 2003;Barnes & Collins, 2008 Lee, 1976). In particular, participants may calculate TTC from the ratio of exposed distance to hidden distance and the length of the object's visible motion; the clock would count the latter duration and the obstructed time (Rosenbaum, 1975).
Conversely, the motion extrapolation strategy states that individuals create an inner cognitive representation of the object's visible movement and use this to determine the object's movement after it vanishes and to estimate TTC. Participants watch the target with spatial attention or pursuit eye movements, and then react when the gaze or spatial attention gets to the end of the obstruction.
Results provide the support of cognitive motion extrapolation depending on the pursuit system (DeLucia et al., 1998; Makin & Bertamini, 2014;Makin & Chauhan, 2014;Makin & Poliakoff, 2011;Makin et al., 2008;. When a moving target is temporarily obstructed from view by other objects and there are no visual feedback signals, smooth pursuit eye velocity first diminishes substantially but it is maintained at a lowered gain because of extraretinal input. Extraretinal input is consisted of cognitive factors, including an expectation that the target will appear again later along its trajectory, and an inner cognitive representation of global-local consistency (Bennett & Barnes, 2003;Barnes & Collins, 2008;Spering & Gegenfurtner, 2008).
These factors are utilized to extrapolate the object's motion and estimate TTC after it disappears. Results show that all participants' CE, VE, and predictive smooth pursuit were worse in the occluded phase than in the visible phase, but the tracking accuracy for a target circle in cir- Another new finding of this study is that at early stage (100-250 ms) of visual tracking, horizontal velocity gains were higher for a moving target in circular background than in blank background trials, as well as for a moving target in triangular background than in blank background trials. Consequently, no differences were found between the circular and triangular background conditions. The facilitation of the inconsistent background on a moving object was not found during memoryguided tracking and at later stage of visual tracking, but was found only at early stage of visually guided tracking. Whereas consistency advantages occur during both visually guided tracking and memory-guided tracking. In the present experiment, both the triangular background and circular background moved in the same velocity and in the same rotating trajectory. Meanwhile, the spatial layout between a target and other distractors were invariant over time. This invariant structure can be available to capture attention. However, compared with the triangular background, the circular background allows observers to perceive the wheel-like motion and then predict that the target to be tracked is likely to be a point on the rim of a rolling wheel (Steinbach, 1976).

CONCLUSIONS
Global-local consistency may act as an important information source to TTC estimation and oculomotor response in PMT. During closedloop phase of visual pursuit, global-local consistency could be activated within the first few hundred milliseconds to prioritize the deployment of attention and eye movement to component target. Meanwhile, it also removes ambiguity from motion tracking and TTC estimation under some unpredictable conditions, leading to the consistency advantage during smooth-pursuit termination phase. In summary, the current study reveals that a coherent, consistent background can facilitate smooth-pursuit initiation, steady state pursuit and smooth-pursuit termination of a component object.