Neural bases of self‐ and object‐motion in a naturalistic vision

Abstract To plan movements toward objects our brain must recognize whether retinal displacement is due to self‐motion and/or to object‐motion. Here, we aimed to test whether motion areas are able to segregate these types of motion. We combined an event‐related functional magnetic resonance imaging experiment, brain mapping techniques, and wide‐field stimulation to study the responsivity of motion‐sensitive areas to pure and combined self‐ and object‐motion conditions during virtual movies of a train running within a realistic landscape. We observed a selective response in MT to the pure object‐motion condition, and in medial (PEc, pCi, CSv, and CMA) and lateral (PIC and LOR) areas to the pure self‐motion condition. Some other regions (like V6) responded more to complex visual stimulation where both object‐ and self‐motion were present. Among all, we found that some motion regions (V3A, LOR, MT, V6, and IPSmot) could extract object‐motion information from the overall motion, recognizing the real movement of the train even when the images remain still (on the screen), or moved, because of self‐movements. We propose that these motion areas might be good candidates for the “flow parsing mechanism,” that is the capability to extract object‐motion information from retinal motion signals by subtracting out the optic flow components.


| INTRODUCTION
Detection of moving objects or living entities in our surroundings is among the most fundamental abilities of the visual system. Cells specialized in the detection of movement within their receptive fields exist in multiple brain regions at all levels of the visual processing hierarchy. However, a shift of the retinal image of an object may occur not only when the object moves, but also when we move within an otherwise static environment, or when we move our eyes or head. On the other side, in some cases the retinal image of a moving object may not move at all, for example, when we follow a moving object with our gaze, maintaining its image still on the fovea (smooth pursuit).
Thus, our motion perception consists of much more than the detection of retinal shifts and involves recognizing whether retinal shifts are due to true object displacements or generated by our own movements, or by some combination of the two. This recognition mechanism has been called flow parsing (Warren & Rushton, 2009a, 2009b and would consist in the segregation of retinal motion information into an object-motion and an ego component. The self-motion (egomotion) component would be estimated based on an analysis of the optic flow (as well as of vestibular, proprioceptive, and other nonvisual sources of information), that is, the visual stimulation generated on the retina by the observer's movement (Koenderink & Physics, 1986); then, self-motion would be "subtracted" from retinal motion to compute an estimation of "real" object-motion.
Regarding object-motion, several cortical areas of the dorsal visual stream in the monkey contain many "real motion cells" (for a review, see Galletti & Fattori, 2003), that is, cells activated by the actual movement of an object in the visual field, but not, or not so strongly, by an identical shift of the object retinal image produced by an eye movement. These cells were found in V1 (Bridgeman, 1973;Galletti, Squatrito, Battaglini, & Grazia Maioli, 1984), V2 (Galletti, Battaglini, & Aicardi, 1988), V3A (Galletti, Battaglini, & Fattori, 1990), V6 (Galletti & Fattori, 2003), MT/V5 (Erickson & Thier, 1991), MST (Erickson & Thier, 1991), and 7a (Sakata, Shibutan, Kawano, & Harrington, 1985). Likely, this type of cells is involved in object-motion detection in the visual field (Galletti & Fattori, 2003), even in such critical situations as when the retinal images are continuously in motion because of self-motion (Grossberg, Mingolla, & Pack, 1999). Galletti and Fattori (2003) suggested that the activity of these cells is responsible for the flow-parsing mechanism and hypothesized that these cells, besides recognizing real motion in the visual field, are the elements of a cortical network that represents an internal map of a stable visual world.
In humans, little is known about the neural basis of the mechanisms allowing to disentangle self-and object-motion from retinal information. Recent functional magnetic resonance imaging (fMRI) data suggest that some dorsal motion areas (like V3A and V6) likely contribute to perceptual stability during pursuit eye movements (Fischer, Bülthoff, Logothetis, & Bartels, 2012). In particular, V6 could be involved in "subtracting out" self-motion signals across the whole visual field and in providing information about moving objects, as originally suggested by our group (Galletti & Fattori, 2003;Pitzalis et al., 2010Pitzalis et al., , 2015 and later on by others (Arnoldussen, Goossens, & van den Ber, 2013;Cardin, Sherrington, et al., 2012;Fischer et al., 2012). These suggestions remain however at a speculative level and there is not yet a clear and complete picture of the brain regions involved in flow parsing, as well as it is still unclear whether distinct cortical areas are engaged in the detection of objectand self-motion as previously suggested (Previc, 1998;Previc, Liotti, Blakemore, Beer, & Fox, 2000;Rosa & Tweedale, 2001).
The aim of the present fMRI work is to identify human cortical motion-sensitive areas responding to object-and/or to self-motion, and whether they are capable to extract object-motion signals from the overall motion. To this aim, we used an event-related fMRI experiment, brain mapping methods, and wide-field stimulation to compare visually induced self-and object-motion in a realistic virtual environment, in conditions of natural vision (free scanning). We also investigated the flow-parsing phenomenon using a complex motion stimulation combining self-and object-motion, where the objectmotion must be inferred (i.e., extracted) from the overall motion by subtracting out optic flow information. To reproduce a realistic motion stimulation, we used a virtual reality software simulating a train moving in a natural landscape and movies as motion conditions. To minimize the effect of possible physical differences between trials, we estimated the quantity and direction of motion embedded in each single frame through a block-matching algorithm (BMA; see Bartels, Zeki, & Logothetis, 2008 for a similar approach). The use of wide-field stimuli (as described in Pitzalis et al., 2010) increased the realism of the virtual environment and made the illusory impression of selfmotion (vection) particularly compelling.
Importantly, we performed preliminary psychophysical experiments to verify and quantify the vection evoked by the different types of visual motion. In addition, we used dedicated functional localizers in each individual subject to map the position of area V6 (Pitzalis et al., 2010;Serra et al., 2019) and to distinguish specific partitions within the MT+ complex, that is, MT/V5 and MST+ (Dukelow et al., 2001;Huk, Dougherty, & Heeger, 2002). This was necessary because while the MT + complex is universally recognized as motion sensitive, the different role of MT/V5 and MST+ in self-motion perception is still debated sessions in separate days, the first one to perform the main experiment and the second one to perform two localizer scans to map areas V6+ and PPA/RSC. A subgroup of subjects (n = 8) underwent a third fMRI acquisition session in a separate day to perform a localizer scan to map areas MT and MST+. All participants had normal or correctedto-normal visual acuity and no history of psychiatric or neurological disease. All participants had extensive experience in psychophysical and fMRI experiments and were paid for their participation. They also gave written informed consent and all procedures were approved by the Ethics Committee of Fondazione Santa Lucia, Rome, Italy.

| Train experiment
In the main event-related fMRI experiment, hereafter called train experiment, we presented movies and snapshots from a realistic virtual environment representing different combinations of visually induced self-and object-motion. To reproduce a realistic motion stimulation, we used virtual reality gaming software (NoLimits Roller Coaster Simulation, Mad-Data GmbH & Co. KG, Erkrath, Germany).
The environment included a train moving on a railway (speed: 60 km/ hr), alternating straight and curvilinear plain stretches, within a realistic natural landscape rich in details (see Figure 1 and Supporting Information movie files). Each trial consisted in the passive observation of a 3 s movie belonging to either of five different experimental conditions (each of which has been consistently associated for clarity to a specific color in all figures in the article): 1. Offboard condition (red): the subject is still and observes the train moving in front of him (see Movie S1). This visual stimulation is consistent with pure object-motion in the absence of self-motion.
2. Onboard condition (blue): the subject is virtually sitting on the moving train and looks forward ahead (see Movie S2). This visual stimulation is consistent with pure self-motion within a static environment.
3. Joint condition (yellow): the subject virtually runs next to the moving train at a fixed distance (see Movie S3). In this condition, both F I G U R E 1 Schematic representation of the stimuli we used in the motion fMRI experiment (train). (a) Summary of trials sequence used in the experiment. (b) Example of the 3 s movie types for the four motion conditions (Offboardred, Onboard-blue, Joint-yellow, and Disjoint-green) and static frames (Black). fMRI, functional magnetic resonance imaging the subject and the train move together along the same trajectory and at the same speed. Interestingly, although this results in a combination of self-and object-motion, the train has a fixed position on the screen throughout the movie. In this situation, the subject perceives the train, which is fixed on the retina, as moving, and the external environment, which is shifting on the retina, as static.
4. Disjoint condition (green): as above, but the train is not moving along the same trajectory and at the same speed as the subject, thus its image on the screen is not fixed (see Movie S4). 5. Static condition (black): these pseudo-movies were a sequence of four frames lasting 750 ms each, randomly extracted from four different movies belonging to the four previously described categories, in order not to evoke any motion sensation.
In each condition (Onboard, Joint, Disjoint, and Offboard) we used three different types of movies showing a train moving along a (a) rightward curvilinear trajectory, (b) leftward curvilinear trajectory, and (c) straight trajectory. We ensured that each condition had equally many target-movie instances to avoid that attentional modulation could bias responses for particular conditions. It is noteworthy to say that we used four different categories of sample movie clips with different perspectives. As shown in Figure 1, Onboard and Offboard conditions are earthbound with a similar horizon, while the more complex motion conditions (Joint and Disjoint) are actually in aerial views (simulating the view of an observer flying over the train). Thus, the low-level stimulus features in the complex motion conditions cannot be considered a direct summation of those from the pure self-motion (Onboard) and object-motion (Offboard). We restricted the duration of each movie to 3 s both to take into account the strong BOLD response adaptation to flow and to minimize the amount of unwanted spontaneous eye movements during the visualization of the visual scene (which is an important methodological point since we placed no constraints on eye movements).
Each participant underwent six consecutive fMRI acquisition scans lasting approximately 5 0 30 00 each. During each scan, we presented a different pseudorandomized sequence (i.e., trial order was different for each scan but fixed across subjects), which included 15 different trials for each of the five conditions, plus 5 target trials (see below), for a total of 80 trials. The trial sequence was balanced such that each trial was preceded with equal frequency by all trial types. Trials were presented every 3 s and the intertrial interval was set to zero so that the movies were displayed one after the other (without any gap between them), to minimize the brisk on-off switch of the movies that could bias the neural response. Every set of 10 trials was interleaved with a 10 s fixation period. The fixation period (white central cross on a black background) constituted the low-level baseline for the study. This procedure yielded 80 trials and 8 fixation periods per subject for each scan.
In order to obtain a visual experience as natural as possible, we did not constrain subjects to maintain fixation while viewing movies.
To keep their attention high during the experiment, subjects were engaged in a one-back task. Subjects pressed a button with the right index finger each time the very same movie (not an instance of the same category) was repeated twice in a row (target trials). Target trials were generated by duplicating five randomly selected trials in each scan. Thus, they occurred randomly and quite rarely, approximately once per minute.

| Objective motion measure (BMA)
Movies constitute an excellent experimental approximation to our real-life visual input and have a great ecological validity. However, they are uncontrolled visual stimuli and need further analysis to control for possible physical differences between stimuli. To summarize the properties of motion in each movie, we estimated a set of parameters derived from the amplitude and the direction of a set of motion vectors estimated from the movie frames.
To this aim, we developed a BMA. In the literature, BMA is a well-established class of algorithms for object tracking, video encoding, and compression (Bashkaran & Konstantinides, 1995;Chen, 2009;Chen, Hung, & Fuh, 2001;Gallant, Cote, & Kossentini, 1999;Hui & Siu, 2007;Jain & Jain, 1981;Moshe & Hel-Or, 2009;Nie & Ma, 2002). The basic assumption is that the spatial patterns corresponding to objects and background in a frame of a video sequence move to form corresponding objects in the subsequent frame. To estimate this movement, BMA tessellates every movie frame into a set of macroblocks (see Figure 2a). Then, the movement of each macroblock at the current frame is estimated by comparing the macroblock with its neighbors in the previous frame. This is achieved by minimizing a cost function. The movement is represented by a motion vector whose 2D components are stored so that the amplitude and the direction of the motion can be recovered for each macroblock (see Figure 2a). This step is repeated at each macroblock and for every pair of frames comprising the movie under investigation to provide a sequence of vectors that allows estimating the evolution of motion through time. It is thus possible to have a spatial and temporal distribution of the quantity of motion (QoM) present in each 3 s movie (see Figure 2b). Important parameters must be set to obtain reliable estimates of movement such as: the macroblock size, the search area, the strategy adopted to explore the neighborhood and the cost function used to match macroblocks across frames. These parameters were set by starting from typical values reported in the literature (Nie & Ma, 2002), that were manually optimized on a set of test frames. Since each frame was composed by 640 by 480 pixels, squares of side 16 pixels were used as macroblocks so that every image was tessellated into 1,350 tiles. The search area was constrained up to p pixels on all four sides of the corresponding macroblock. Here, we set p = 7 since this corresponds to a maximum shift of 10 cm. This is the maximum movement expected between two successive frames based on the simulated speed of the train.
The block-matching step in which the "new" location of a given block is estimated was performed by minimizing a cost function depending on the values of the pixels within the considered block.
Here, we adopted the mean absolute difference (MAD) given by: in which N is the side of the macroblock, and C ij and R ij are the pixels being compared in the current and the reference macroblock, respectively. The adoption of MAD is a typical choice in the literature (see, e.g., Chen et al., 2001). The noise in the data may lead to an estimate of movement between still macroblocks in adjacent frames. In order to minimize this effect, we did not include the contribution in MAD of those pixels whose motion did not exceed a threshold called the zero motion threshold (Nie & Ma, 2002). This was estimated by selecting a This has the advantage to reduce the computational burden of this approach and to make the estimate more robust against noise.
In addition, to reduce the computational time, we implemented a two-step strategy, the adaptive rood pattern search (ARPS) followed by the diamond search pattern (DSP). ARPS exploits the assumption that typically the motion in a frame is coherent, that is, neighboring macroblocks very likely share similar motion. Thus, for each macroblock its first-order neighbor on the left predicts its own motion. Once the search is directed to an area of high probability of matching the block, the strategy is changed to DSP in which a DSP is adopted (for details see Nie & Ma, 2002).
We computed two parameters for each movie to characterize the

| Localizer scans
In a second set of fMRI experiments, several localizer scans were conducted to define V6+, MT/MST+, and PPA/RSC regions of interest (ROIs). V6+ ROI was defined by the flow field stimulus (see Figure 3a) based on the method used in Pitzalis et al. (2010). MT and MST+ were defined by the use of an ipsilateral stimulus (see Figure 3b) based on the method used in Huk et al. (2002). PPA and RSC were defined by contrasting pictures of places/scenes versus faces (see Figure 3c) as described in Epstein (2008). Further details are provided in Supporting Information. For the train experiment and V6+ localizer scans we used a widefield setup similar to that originally described by our group (Pitzalis et al., 2006) and then routinely used in many fMRI experiment from our and other laboratories (Cardin & Smith, 2010;Huang et al., 2015;Huang & Sereno, 2013;Pitzalis et al., 2010;Pitzalis, Bozzacchi, et al., 2013;Pitzalis, Sdoia, et al., 2013;Pitzalis, Sereno, et al., 2013;Serra et al., 2019;Strappini et al., 2015Strappini et al., , 2017. Visual In the train experiment, we used free viewing while in all the other sessions subjects were required to gaze at a central cross throughout the period of scan acquisition. The wide-field visual projection setup did not allow for eye tracking. However, to promote stable fixation (requested during the three functional localizer scans), the fixation point was continuously visible at a fixed position on the screen and only expert subjects with a good fixation stability were used. In the train experiment where subjects were asked to perform a task, responses were given through MR-compatible push buttons. In F I G U R E 3 Schematic representation of the stimuli we used for the Localizer scans. (a) Localizer for V6 (flow fields). In this visual motion paradigm, 16-s blocks of coherently moving fields (flow fields) were interleaved with 16-s blocks of randomly moving fields. The first two frames phase show the two different types of coherent motion (radial and rotation-spiral motion) that switched almost every 500 ms during the coherent motion blocks. For both radial and spiral motions, we tested both expansion and contraction components. Modified from Pitzalis et al. (2010). (b) Localizer for MT and MST+ (ipsilateral stimulation). Responses to ipsilateral stimulation were assessed by presenting a peripheral dot patch in either the left or right visual field. The 15 diameter field of dots alternated between moving (16 s) and stationary (16 s), while subjects maintained fixation on a small, high-contrast white cross 10 from the nearest edge of the dot patch. Modified from Huk et al. (2002). (c) Localizer for PPA/RSC (Face/Place). In this visual paradigm, 16-s blocks of pictures of human faces (male and female) were interleaved with 16-s blocks of pictures of places (indoor and outdoor) and with 15-s block of fixation. Modified from Sulpizio et al. (2013Sulpizio et al. ( , 2014. PPA, parahippocampal place area; RSC, retrosplenial complex all experiments, fixation distance and head alignment were held constant by a chin rest mounted inside the head coil. Subjects' heads were stabilized with foam padding to minimize movement during the scans.

| Image acquisition and preprocessing
The MR examinations were conducted at the Santa Lucia Foundation (Rome, Italy) on a 3T Siemens Allegra MR system (Siemens Medical Systems, Erlangen, Germany) using a standard head coil. Functional T2*-weighted images were collected using a gradient echo EPI sequence using blood-oxygenation level-dependent imaging (Kwong et al., 1992  Note that standard methods of GLM parameter estimation automatically removes the effects of shared variability, so that each effect is adjusted for all others. Specifically, SPM software package automatically performs orthogonalization for parametrically modulated regressors, based on the order in which the regressors are specified in the model. In our case, the "unmodulated" regressors entered before the "modulated" regressors, so that the latter are automatically orthogonalized with respect to the former. The model also included a temporal high-pass filter, to remove low-frequency confounds with a period above 128 s. Serial correlations in the fMRI time series were estimated with a restricted maximum likelihood (ReML) algorithm using an autoregressive AR(1) model during parameter estimation, assuming the same correlation structure for each voxel, within each run. The ReML estimates were then used to whiten the data.
These subject-specific models were used to compute a set of contrast images per subject, each representing higher estimated amplitude of the hemodynamic response in one trial type as compared to the fixation baseline. Contrast images from all subjects were entered into a within-subjects ANOVA with nonsphericity correction, where subjects were considered as a random effect, thus allowing to draw inferences related to the whole population our participants were extracted from. This model was used to search the whole brain for regions differentiating any of the four motion conditions (Offboard, Onboard, Joint, and Disjoint movies) from the Static condition (static frames). The resulting statistical parametric map of the F statistics was thresholded at the voxel level and by cluster size. Correction for multiple comparisons was performed using approximations from the Gaussian field theory (p < .01 FWE; extent threshold = 20 voxels).
The resulting regions are listed in Table 1 and rendered in Figure 4, and include all voxels showing a reliable positive BOLD response during motion relative to static frames, irrespective of the kind and amount of motion. In-house software (BrainShow, written in Matlab) was used to visualize the resulting regions onto a population-average, landmark-and surface-based (PALS) atlas (Van Essen, 2005), and to assign anatomical labels to activated areas at the level of Brodmann areas and cortical gyri. Brodmann areas were derived from the Talairach Daemon public database (Lancaster et al., 2000), while cortical gyri were derived from a macroscopical anatomical parcellation of the MNI single-subject brain (Tzourio-Mazoyer et al., 2002).
After identifying the regions differentiating motion from static frames, we searched for modulation of BOLD responses in these regions as a function of the movie type after controlling for motion inequalities across conditions. These steps were performed on regionally averaged data as follows. Each cluster of significantly activated adjacent voxels in the motion versus static group statistical map described above constituted a region to be analyzed. For each subject and region, we computed a spatial average (across all voxels in the region) of the preprocessed time series and derived hemodynamic response estimates for all parameters in the models. Finally, the regional hemodynamic responses, which are shown in the plots in T A B L E 1 MNI coordinates (mm) and sizes (mm 3 ) of cortical regions identified in the group whole-brain analysis comparing all motion conditions versus the static condition Note: We report the peak coordinates of the corresponding cluster.

| Functional localizer
Statistical analyses of functional images from the functional localizers used to define V6+, MT/MST+, and PPA/RSC ROIs are described in the Supporting Information. On these ROIs, we performed the same regional analysis as for the regions derived from the train experiment.
Present data will be made available on request in compliance with the requirements of the funding institutes, and with the institutional ethics approval.

| Behavioral results and motion parameters
Psychophysical data were collected during preliminary behavioral sessions, with stimuli identical to those in the functional MR session, but with the subjects that judge the degree of perceived movement after each stimulus with a self-report measurement.  Overall, we aimed to test whether the perception of motion could be affected by the amount or the SoM, differently present across the various stimulus type. We conclude that the two classes of motion parameters are completely independent and that subjects perceived motion regardless of the strength of movement parameters.

| Impact of motion measures on neural activity
We further explored whether each motion-sensitive region was significantly modulated by the objective and subjective motion measures.
As to the objective motion measures, the QoM did not influence the activation in any of the observed regions (all p >.05, Bonferronicorrected), while the SoM gave a positive contribution to activation in LOR (p < .001) and a negatively contribution to activation in MT+ (p = .012) and V6+ (p = .036). Note that a positive and a negative relationship with SM can be interpreted as a preference for incoherent and for coherent motion, respectively.
As to the subjective motion measures, the amount of OMS experienced during the Offboard trials gave a positive contribution to activation in MT+ (p = .034) and V3A (p = .043). The only other significant relationship we found was a negative contribution of the amount of OMS experienced during the Disjoint trials (p = .015) on the V6+ (as defined by the localizer) activation. Any motion-sensitive region was significantly modulated by the amount of SMS.

| Imaging results
To reveal general differences in cortical areas specifically associated to the motion conditions, as a first step we selected regions showing greater fMRI responses in at least one of the motion conditions (Offboard, Onboard, Joint, and Disjoint) relative to the static condition. The rationale behind this approach is that we first wanted to isolate the areas responding to motion from other visual areas responding to the physical presence of the stimulus per se (i.e., sensorial response in early visual areas).  Tables S1 and S2.

Results
We were particularly interested in verifying some specific contrasts. First, we compared Onboard versus Offboard conditions (in histograms, blue vs. red), to reveal specific preferences for pure self-and object-motion. Second, we compared the preferred condition of a specific region (i.e., Onboard, Offboard, or both in case of no preference) versus the Disjoint condition (in histograms, blue/red vs. green), to reveal preferences for a more complex and ecological condition. Note that in the Disjoint condition self-and object-motion coexist but are independent of each other (i.e., the subject moves in one direction and the train move as well, but in another direction). In  3.4 | Regions preferring Onboard movies (inducing self-motion perception)

| Area PEc
This region is located in the anterior precuneus just posterior to the dorsal tip of the cingulate sulcus ( Figure 4). We found activation in this region only in the right hemisphere. This position, as well as the mean coordinates of this region (Table 1), well corresponds to the position of the newly define human homolog of macaque area PEc . This position corresponds also to the PCu region found by Huang et al. (2015) and Fillimon, Rieth, Sereno, and Cottrell (2015). Plots in Figure 5 show that area PEc responds more to Onboard than Offboard.

| Area pCi
This region is located within the caudal part of the cingulate sulcus, in the ascending arm of this sulcus (Figure 4). This region was originally labeled Pc (as Precuneus) by Cardin and Smith (2010), but then also the same authors referred to it as the precuneus motion area (PcM) to distinguish it from other parts of the precuneus (Cardin & Smith, 2011;Uesaki & Ashida, 2015;Wada, Sakano, & Ando, 2016).
Although the region found here strictly corresponds to area Pc (Cardin & Smith, 2010; see also Table 1 for MNI comparisons), here we prefer to call it posterior Cingulate (pCi) area to highlight its F I G U R E 5 Motion areas preferring Onboard movies that induce self-motion perception. The plots for each region represent the averaged BOLD percent signal change ± SE of the mean across subjects and hemispheres for each experimental condition: Static (Black), Offboard (Red), Onboard (Blue), Joint (Yellow), and Disjoint (Green). Name abbreviations for some of the conditions are as follows: Sta (Static), Offb (Offboard), Onb (Onboard), Disj (Disjoint). Significant comparisons are also reported. *p < .05; **p < .01; ***p < .001. Note that we refer to the BOLD percent signal change as the percentage of signal change with respect to a fixed value (grand mean scaling = 100), not scaled with respect to the specific voxel, as implemented in SPM. LH, left hemisphere; RH, right hemisphere correct location within the cingulate sulcus. Plots in Figure 5 show that area pCi responds more to Onboard than Offboard.

| Area CSv
This region is located in the depth of the posterior part of the cingulate sulcus, anterior to the ascending portion of the cingulate sulcus (also called the marginal ramus of the cingulate sulcus; Figure 4). This location, as well as the mean coordinates of this region (Table 1), corresponds well to the original definition of human CSv provided by . Plots in Figure 5 show that area CSv responds more to Onboard than Offboard.

| Area CMA
We found significant activity in the middle portion of the cingulate gyrus (only in the left hemisphere; Figure 4). This region is still located in the cingulate cortex, but it is more anterior than CSv. Moreover, while CSv lies in the fundus of the cingulate sulcus, this region lies in the dorsal bank of the sulcus. It is slightly superior in terms of    Picard & Strick, 1996) and that has been recently found as being motion sensitive (Field, Inman, & Li, 2015). Thus, here we refer to this region as CMA. Plots in Figure 5 show that area CMA responds more to Onboard than Offboard and more to Joint than Onboard.

| Area PIC
We found a significant area of signal increase at the junction between the parietal and the insular cortex, extending from the lateral sulcus into the posterior part of the insula (PIC). We found activation in this region only in the right hemisphere ( Figure 4). Originally, a region in this location of the brain was called PIVC. It was a multisensory region in macaque (Grüsser, Pause, & Schreiter, 1990;Guldin & Grüsser, 1998), localized in humans with vestibular (Bottini et al., 1994;Brandt, Dieterich, & Danek, 1994;Bucher et al., 1998;Dieterich et al., 2003; Eickhoff as we used a pure visual stimulation, we refer to this region as PIC area. Notice that the mean coordinates of this region (Table 1) and its anatomical position (Figure 4) strictly correspond to those provided in the original paper by Cardin and Smith (2010). Plots in Figure 5 show that area PIC responds more to Onboard than Offboard.

| Area LOR
This area is located on the lateral occipital sulcus, between dorsal V3A and MT+. The LOR position ( Figure 4;  (Larsson & Heeger, 2006) which is also the typical reported position for the occipital place area (Sulpizio, Boccia, Guariglia, & Galati, 2018). Plots in Figure 5 show that LOR responds significantly more to Onboard than Offboard and to Joint than Disjoint. It responds more to Joint than Onboard.
It is now generally acknowledged that the large motion-sensitive region MT+ is a complex of several areas (Kolster, Peeters, & Orban, 2010;Pitzalis et al., 2010), which are also referred to as TO1 and TO2  Table 2) are similar to those provided in Kolster et al. (2010), respectively, for MT/V5 and pMSTv. Figure 7a,b shows the position of the two regions in the cortical surface reconstruction of the left hemisphere of some representative individual participants. Figure 7d shows the overlap of the individually defined MT and MST+ ROIs displayed on the template brain. MT (red) and MST+ (dark blue) regions occupy an anatomical position in between the inferior temporal sulcus (ITs) and the middle temporal sulcus (MTs), which is in line with the description of these two regions provided in previous fMRI studies where the two motion areas have been distinguished (Dukelow et al., 2001;Huk et al., 2002;Smith et al., 2006). Area MST+ (dark blue) was always anterior and often dorsal to MT, although there was some degree of variability across subjects. A constant evidence was that MST+ typically abutted MT, as previously reported by Huk et al. (2002). Plots in Figure 6 show that area MT+ responds more to Offboard than Onboard. It responds more to Disjoint than Offboard and more to Joint than Onboard. Furthermore, this region is so motion-sensitive that, as area V6, responds more to any type of motion than to static stimuli. Plots in Figure 7c show that the localizer-defined area MT responds more to Offboard than Onboard and more to Joint than Onboard. Conversely, the localizer-defined area MST+ responds more to Disjoint than Onboard.

| Area V6
The  (Table 1) well compatible with those of the originally defined V6 (Pitzalis et al., 2006(Pitzalis et al., , 2010. To check whether this region corresponds to area V6, we independently defined area V6+ according to the functional localizer (see Section 2 and Figure 3a) in all scanned subjects.
The map found with the localizer in 28/28 hemispheres (Figure 8a) has MNI coordinates ( Table 2) and position which closely resemble that obtained with the contrast M-S (Figure 4). The results (plot in Figure 6) show that area V6 (as defined by the motion vs. static contrast) responds more to Disjoint than to any other condition and shows a tendency to respond more to Joint than Onboard, which however does not reach statistical significance. Furthermore, this region is so motion-selective that responds more to any type of motion than to static stimuli. Plots in Figure 8b show that area V6+ (as defined by the localizer) has a functional profile very similar to that observed in Figure 6. In fact, it responds quite strongly but indifferently to Onboard and Offboard, although there is a tendency to respond more to Onboard than Offboard which however does not reach statistical significance. It responds more to Disjoint than any other condition. Furthermore, it responds more to any type of motion than to static stimuli. Interestingly, area V6+ as defined by the localizer responds more to Joint than Onboard, like the V6 region described above, but here the difference is statistically significant.
This difference in the functional profile could be due to the more refined V6 localization obtained by the localizer, which includes only voxels touching the POS.

| Areas V3A/V7
This region, located in the ventral portion of the posterior intraparietal sulcus (pIPS; Figure 4) likely corresponds to the dorsal visual area V3A (Table 1; Tootell et al., 1997). The activation is located posteriorly and laterally to that of the V6 region, bordering its posterior part. In absence of a retinotopic mapping, V3A was defined here based on anatomical position, MNI coordinates and neighboring relations with other areas. In addition, we showed in Figure S1 the overlay between V3A and the Conte69 surface-based atlas (Van Essen et al., 2011).
This overlay suggests that this region seems to overlap mainly the retinotopic dorsal area V7  encompassing also area V3A, especially in the left hemisphere (Sereno et al., 1995), but not V3B (Smith, Greenlee, Singh, Kraemer, & Hennig, 1998). See

Supporting Information for further details. While motion signals in V7
have been rarely (if any) reported, V3A is a consolidated dorsal motion area reported in several fMRI papers (Cardin & Smith, 2011Fischer et al., 2012;Helfrich, Becker, & Haarmeier, 2013;Pitzalis et al., 2010Pitzalis et al., , 2012Serra et al., 2019;Wall, Lingnau, Ashida, & Smith, 2008). For these reasons, we prefer being conservative labeling the region V3A/V7. Plots in Figure 6 show that the V3A/V7 region responds more to Disjoint than Offboard and Onboard, and more to Joint than to Onboard.

| Area IPSmot
This parietal region is located along the horizontal segment of the IPs  Bremmer et al. (2001), Sereno and Huang (2006), Bartels et al. (2008), and Cardin and Smith (2010), respectively. Thus, given that in absence of monkey fMRI data the homology question cannot be settled, we choose the neutral name of intraparietal sulcus motion region (IPSmot) as already done in another previous paper from our lab (Pitzalis, Sdoia, et al., 2013). The mean coordinates of this region (Table 1) and its anatomical position (Figure 4) are in line with those of the original IPSmot (Pitzalis, Sdoia, et al., 2013) which is however more lateral and slightly posterior than area VIP as described in other fMRI studies (Cardin & Smith, 2010;Huang et al., 2015;. Plots in Figure 6 show that area IPSmot responds equally well to Onboard and Offboard. It responds more to Disjoint than Offboard and Onboard and more to Joint than Onboard.

| Area LIP
This region is located in the dorsal portion of the intraparietal sulcus (IPS). Specifically, it is situated in the cortical region joining the horizontal segment of the IPS and the pIPS, anteriorly to V3A. The mean coordinates of this region (Table 1) and its anatomical position ( Figure 4) corresponds to the position of the retinotopic area LIP provided in the original paper by Sereno, Pitzalis, and Martinez (2001).
Other fMRI studies found a human homolog of macaque LIP in a similar position using saccades versus reaching paradigm (Galati et al., 2011, LD area;Schluppeck, Curtis, Glimcher, & Heeger, 2006;Hagler Jr, Riecke, & Sereno, 2007;Swisher, Halko, Merabet, McMains, & Somers, 2007;Tosoni, Galati, Romani, & Corbetta, 2008, pIPS area). In absence of a retinotopic mapping, LIP region was defined here based on anatomical position, MNI coordinates and neighboring relations with other areas. In addition, we showed in Figure S1 the overlay between this region and the Conte69 surface-based atlas (Van Essen et al., 2011). This overlay suggests that this region overlaps mainly the retinotopic fields IPS3-4, thus surely including the human homolog of area LIP (Schluppeck, Glimcher, & Heeger, 2005;Sereno et al., 2001;Silver, Ress, & Heeger, 2005;Swisher et al., 2007). Plots in Figure 6 show that area LIP responds equally well to Onboard and Offboard and it responds more to Disjoint than Offboard and Onboard.

| Area SFS
This region is located in the superior frontal sulcus (SFS). The mean coordinates of this region (Table 1) and its anatomical position ( Figure 4) correspond to the SFS motion region recently described by Huang et al. (2015). This region partially overlaps with the superior part of the frontal eye fields (FEF) region, still located on the precentral sulcus but more inferiorly (see Huang et al., 2015 for a strict comparison between FEF and SFS). Plots in Figure 6 show that SFS responds more to Onboard than to Offboard. It responds more to Disjoint than Onboard. Finally, it does discriminate between Joint and Onboard.

| Motion responses in ventral scene-selective areas
The reason why we decided to investigate the response of sceneselective areas in this study is the recently increased interest in finding motion responses in scene-selective regions using visual stimuli ecologically relevant for the human navigation (Korkmaz Hacialihafiz & Bartels, 2015;Schindler & Bartels, 2016). The general assumption behind this interest is that motion information is a dominant cue for scene reconstruction and spatial updating (Britten, 2008;Frenz, Bremmer, & Lappe, 2003;Medendorp, Tweed, & Crawford, 2003).
Consequently, the neural activity in scene-selective regions should be modulated by visual motion cues, especially when ecological stimuli representing realistic environment with a clear scene layout are used.
To check for the presence of motion responses in the sceneselective areas, we mapped PPA and RSC using an independent functional localizer following standard procedures as described in Section 2 (see Figure 3c) and in many previous papers (Epstein, 2008;Sulpizio et al., 2013Sulpizio et al., , 2014. Figure 9a shows the position of the two regions in the medial and ventral folded representation of the right and left hemispheres of the template brain. The mean coordinates of PPA (yellow) and RSC (orange) regions (Table 2) and their anatomical position (Figure 9) are very much in line with the description of these two regions provided in previous fMRI studies where the two sceneselective areas have been distinguished (Epstein, 2008;Sulpizio et al., 2013Sulpizio et al., , 2014. Note that we use the term RSC to follow the mainstream of current literature on scene selectivity, although it is well known that, from an anatomical standpoint, the human sceneselective RSC is mainly located within the ventral portion of the POS and minimally extends anteriorly into the retrosplenial cortex proper (see Silson, Steel, & Baker, 2016).
As shown in the plots of Figure  in the stimulus. We suggest that some of these areas may be involved in disentangling real object-motion from self-induced optical flow.
Psychophysical tests showed that subjects perceived motion in the stimuli used in this work regardless of the QoM present in them.
As in everyday life, the movement of an object can be perceived even with the image of the object is still on the retina (when we pursue a moving object), and the immobility of objects can be the perceptual result of retinal image motions (when we move around in a structured environment). Given that, Onboard and Offboard conditions are able to evoke a pure SMS and OMS, respectively, hereafter we will use indifferently the terms self-or object-motion to refer to our two conditions Onboard and Offboard, respectively.
In the following sections, we will first report a separate description of the main results achieved on human dorsal motion areas, then we will combine the evidence from macaque and human brain to suggest the possible functional role played by some of these regions in the flow-parsing phenomenon.

| Cortical areas preferring Onboard movies (inducing self-motion perception)
We found a remarkable preference for pure self-motion with respect to pure object-motion in several medial and lateral cortical areas (PEc, pCi, CSv, CMA, PIC, and LOR) involved in the analysis of visual motion. All these regions showed a stronger response to Onboard than Offboard. These areas exhibited such a selective response to pure self-motion that none of them responded more if object-motion was also present in the stimulus. In fact, in none of these regions the BOLD signal obtained in Disjoint was significantly higher than that found in Onboard. Finally, four of these regions (CSv, CMA, PIC, and LOR) not only preferred self-motion to object-motion, but their mean response to Offboard condition, that is in presence of pure objectmotion, was negative.
Previous works have shown that CSv is sensitive to wide-field egomotion-compatible stimuli (Antal, Baudewig, Paulus, & Dechent, 2008;Cardin & Smith, 2010;Field et al., 2015;Fischer et al., 2012;Pitzalis, Sdoia, et al., 2013;Wada et al., 2016) and receives vestibular as well as visual input Smith, Wall, & Thilo, 2012). We found (Pitzalis, Sdoia, et al., 2013) that CSv does not discriminate between the various types of egomotion-compatible stimulations we used. Here, we found that CSv was not activated at all during the Offboard and the Static visual stimulations, that is by any type of visual stimulation that have not a vestibular input counterpart in physiological conditions. In line with Pitzalis, Sdoia, et al. (2013), present results show that CSv responds to all egomotion conditions independently from the presence of object-motion, confirming the suggested role of CSv in self-motion processing .
It is known that the cingulate cortex participates in the cortical network monitoring head and body movements in space, as well as in visuospatial attention (Corbetta, Miezin, Shulman, & Petersen, 1993;Gitelman et al., 1999;Kim et al., 1999;Mesulam, 1999). It is therefore not surprising to find a CMA activation in our experimental conditions. However, to date, motion-related responses in CMA in fMRI studies on egomotion had never been observed Pitzalis et al., 2010;Pitzalis, Bozzacchi, et al., 2013;Pitzalis, Sdoia, et al., 2013). It could be that the extremely vivid SMS evoked by the naturalistic virtual reality movies and the widefield screen vision used here represents in our subjects a stronger input to CMA than standard visual stimuli such as random dots.
Area PIC in the Sylvian fissure is a motion region responding to visual and vestibular motion, presumably supporting the integration of motion information from visual and vestibular senses for the perception of self-motion (Frank, Sun, et al., 2016;Frank, Wirth, & Greenlee, 2016). A recent study by Huang et al. (2015) found that the right PIC (there called PIVC) responds to active dodges suggesting that it plays an active role in sensing and guiding translational egomotion. Similarly, we also found a significant BOLD response only in the right PIC. This asymmetry is consistent with a right hemispheric dominance of the vestibular cortex in right-handed subjects, as suggested by previous neuroimaging studies using visual, optokinetic, and vestibular stimuli (Dieterich et al., 2003;. LOR is part of the kinetic occipital motion-sensitive region originally described by Orban's group (Van Oostende et al., 1997). In recent studies by our group we observed in LOR both a motionselective response for radially moving stimuli (Pitzalis et al., 2010) and a speed-o-topic organization (Pitzalis et al., 2012). Here, we found a remarkable preference of this region for self-motion. In addition, it is worthwhile noting that LOR is the only motion area reading the difference between the two complex motion conditions (Joint and Disjoint), and highly preferring the Joint condition (where the object-motion does not create any slip of the image on the screen).
Overall, we found that four regions prefer self-motion, and in all of them, the mean response to pure object-motion was negative. As already suggested (Field et al., 2015;Pitzalis et al., 2010), it could be that in physiological conditions an excitatory vestibular input activates the visuo-vestibular areas (like CSv, CMA, and PIC) during self-motion, whereas in motion conditions not implying egomotion and vestibular input (like in our Static and Offboard) the areas get inhibited. Present data are in support of this speculative hypothesis.
The PEc and pCi preference for self-motion confirms the involvement of the precuneus and cingulate cortex in egomotion.
These two areas are differently known in the egomotion literature.
Since the discovery of the pCi (or pC/PcM) as a motion area by Cardin and Smith (2010), this region was frequently associated to the motion network  in that it responds to egomotion-compatible stimuli (Cardin & Smith, 2010Serra et al., 2019). Here, we confirm that pCi is a motion area responding to self-motion more than object-motion. The PEc, conversely, is a newly defined region that respond not only to leg and arm movements but also to flow field visual stimulation, as that used here to map area V6+. Monkey PEc has visual neurons preferring optic flow with curve trajectories compatible with heading changes (Battaglia-Mayer et al., 2001;Raffi et al., 2002;Raffi, Carrozzini, Maioli, & Squatrito, 2010). In line with these previous studies, here using wide-field naturalistic stimuli simulating continuous heading changes we have found in the PEc motion responsiveness and a reliable preference for self-motion. This reinforce the hypothesis suggested in Pitzalis et al. (2019) that area PEc integrates visually derived selfmotion signals with motor leg movement with the aim of guiding locomotion. We have also shown a preference for pure self-motion in the frontal SFS, although this region was not as selective as others were. Indeed, the area responded more if object-motion was also present in the stimulus (Disjoint > Onboard) and thus will be discussed in the next section.

| Selective preference for Offboard movies (inducing object-motion perception) in MT
Only the lateral motion area MT (as defined by functional localizer) showed a preference for pure object-motion. This area responded maximally to the Offboard condition, and significantly more to the Offboard than to the Onboard condition. The high preference of this region for the object-motion is also supported by a positive correlation between its BOLD activity and the OMS experienced during the Offboard condition. MT exhibited such a selective response to pure object-motion that the BOLD signal in Disjoint was not significantly higher than that found in Offboard. This means that self-motion (present in Disjoint together with object-motion) did not increase the BOLD signal in MT. Note that this selectivity for object-motion found in MT was not present in MT+ as defined in the group analysis or in MST+, both responding more to complex conditions. This strengthens the need to separately refer to the two subdivisions (MT and MST) to avoid masking effects or lack of significance due to the average, which in some cases could cancel out possible differential effects.

| Cortical areas preferring Disjoint movies (inducing object-and self-motion)
We found several areas (V6, IPSmot, MST+, V3A, LIP, SFS, and PPA) responding more to complex visual stimulation (Disjoint), where both self-and object-motion are present, than to pure object-motion. However, these areas exhibited different behaviors in terms of preference for pure object and self-motion, in that while SFS responded more to Onboard than Offboard, the other cortical areas (V6, IPSmot, MST+, V3A, LIP, and PPA) showed no preferences, responding equally well to both Offboard and Onboard conditions. Human V6, like macaque V6, is a retinotopic motion area that responds to unidirectional motion (Fattori, Pitzalis, & Galletti, 2009;Pitzalis et al., 2006Pitzalis et al., , 2010. It has a strong preference for coherent motion (Cardin & Smith, 2010;Helfrich et al., 2013;Pitzalis et al., 2010;von Pföstl et al., 2009) and a recent combined VEPs/fMRI work (Pitzalis, Bozzacchi, et al., 2013) has shown that V6 is one of the earliest stations (together with MT) coding motion coherence. Human V6 is highly sensitive to flow fields (Arnoldussen, Goossens, & van den Berg, 2011;Cardin & Smith, 2010;Pitzalis et al., 2010) and is able to distinguish between different 3D flow fields being selective to translational egomotion (Arnoldussen, Goossens, & van Den Berg, 2015;Pitzalis, Sdoia, et al., 2013). The view that V6 is involved in the estimation of self-motion has been confirmed also in other recent fMRI studies Cardin, Sherrington, et al., 2012;Cardin & Smith, 2010Huang et al., 2015;Sherrill et al., 2015). Since macaque V6 contains many real-motion cells, that is cells activated by real object movement but not by self-evoked retinal image movements (Galletti & Fattori, 2003), we suggested (Pitzalis et al., 2010) that human V6 is involved in object-motion recognition. This area could also process visual egomotion signals to extract information about the relative distance of objects, in order to act on them, or to avoid them, an hypothesis supported by its tight connectivity with areas involved in grasping (as V6A and MIP [Galletti et al., 2001;Galletti & Fattori, 2003]), its sensitivity to optic flow patterns combined with disparity cues, which are most informative for nearby objects (Cardin & Smith, 2011), and its reported preference to nearfield stimuli in humans (Quinlan & Culham, 2007).
Considering human and macaque data, we previously suggested that V6 is involved in both object-and self-motion recognition (Pitzalis et al., 2010Pitzalis, Bozzacchi, et al., 2013;Pitzalis, Sdoia, et al., 2013). The present results support this hypothesis, as V6 responds well to both self-and object-motion, and highly prefers complex conditions where the two types of motion are simultaneously present (Disjoint condition).
Among the several medial motion areas, V6 is the only one preferring complex visual motion.
The VIP (IPSmot) shares a similar functional profile with V6. All the recent neuroimaging results from our and other laboratories have demonstrated that human area VIP is a motion area involved in estimation of egomotion (Pitzalis et al., 2010;Cardin & Smith, 2010Fischer et al., 2012). Additionally, like V6, VIP is able to distinguish between the different optic flow components, an evidence that agrees with the functional properties of macaque area VIP, whose neurons respond selectively to optical flow stimuli Colby, Duhamel, & Goldberg, 1993) and shows a strong response to translational egomotion . Interestingly, area VIP was found to respond to looming objects (Huang, Chen, Tran, Holstein, & Sereno, 2012; which explains its strong response to the approaching train present in Offboard. By the way, the greater response of this region for stimuli containing also self-motion confirms previous human (Cardin & Smith, 2010Peuskens, Sunaert, Dupont, Van Hecke, & Orban, 2001;Pitzalis, Sdoia, et al., 2013; and macaque Colby et al., 1993) findings suggesting that this area also processes optic flow and egomotion.
We found that area MST+, as defined by the functional localizer, is not differentially modulated by Onboard and Offboard, responding well to both conditions, and on average responding more to complex motion stimulation. The trend observed in MST+ deserves some comments. In a recent study (Pitzalis, Sdoia, et al., 2013) we showed that MST+ is able to distinguish between different 3D egomotion signals and is more sensitive to translational and radial egomotion than to the circular one. This preference was already observed in the past in both macaques (Duffy, 1998;Duffy & Wurtz, 1991;Eifuku & Wurtz, 1998;Graziano, Andersen, & Snowden, 1994;Nelissen, Vanduffel, & Orban, 2006;Orban et al., 1992;Saito et al., 1986;Tanaka, Fukada, & Saito, 1989; and humans (Kovács, Raabe, & Greenlee, 2008;Morrone et al., 2000;Smith et al., 2006;. These past studies led to a general agreement in reporting MST as an area sensitive to the motion coherence and to egomotion signals (Arnoldussen et al., 2011;Helfrich et al., 2013;Kleinschmidt et al., 2002;Kovács et al., 2008;Morrone et al., 2000;Smith et al., 2006;. However, recent studies failed to report positive evidence in favor of a role of MST+ in egomotion (Kleinschmidt et al., 2002; raising doubts about its effective importance in terms of egomotion perception. Although here we show that MST+ responds well to both self-and object-motion, we also found that MST+ does not discriminate between Joint and Onboard and thus it does not seem to have a role in the flow parsing. We expected to find a more selective response profile in this area. In contrast, we found an overall general weaker BOLD signal than that found in MT (see Figure 6c) and a moderate MST+ preference for the self-motion, meaning that this area responds to egomotion but is not strongly selective for it.
Area V3A is a retinotopic area whose motion sensitivity has been observed in several fMRI studies (Pitzalis et al., 2010;Sereno et al., 2001;Tootell et al., 1997;. Many previous works on humans reported greater BOLD signal change in V3A for coherent than for random motion (Braddick, O'Brien, Wattam-Bell, Atkinson, & Turner, 2000;Braddick et al., 2001;Moutoussis, Keliris, Kourtzi, & Logothetis, 2005;Vaina et al., 2003; but see Pitzalis, Bozzacchi, et al., 2013;Pitzalis, Sdoia, et al., 2013). Several other studies have shown that V3A is responsive to flow fields stimulation (Arnoldussen et al., 2011;Helfrich et al., 2013;Pitzalis et al., 2010;Sereno et al., 2001) and to other types of global motion signals, as those involved in reconstruction of form from motion (Orban, Sunaert, Todd, Van Hecke, & Marchal, 1999). Here, we show that area V3A responds to both object and self-motion but even more to a complex visual stimulation where both types of motion coexist. Present results support the view of this area as a motion area processing egomotion signals.
Present data support this view and show in detail that LIP and SFS/FEF are differently modulated by the various motion conditions used here: LIP responds indifferently to object-and self-motion, SFS/FEF prefers pure self-motion and both regions prefer complex motion.
We revealed the presence of some kind of motion sensitivity also in a ventral region, the PPA, which is typically not activated by visual motion in the absence of ecological scene-like stimuli (Cardin & Smith, 2010;Greenlee et al., 2016;Pitzalis et al., 2010;Pitzalis, Sdoia, et al., 2013). The PPA, although not activated on average by motion conditions relative to the Static condition, revealed a preference for the Disjoint condition relative to Onboard and Offboard. This motion sensitivity of PPA is indirectly supported by recent evidence indicating that this area receives inputs from typical brain motion areas (Schindler & Bartels, 2016;Sherrill et al., 2015). In Tosoni et al. (2015), for instance, we found that human motion area V6 is functionally connected with PPA (see also Boccia, Sulpizio, Nemmi, Guariglia, & Galati, 2016 for a related finding).
In line with Tosoni et al. (2015), unpublished data collected in Galletti's lab indicate that monkey V6 is directly connected with the ventral cortex within the occipitotemporal sulcus, a region that seems to be homologous to human PPA. The connections of V6 with area PPA support the view of a possible role of V6 in spatial navigation (Cardin & Smith, 2011;Pitzalis, Sdoia, et al., 2013).
Finally, we observed evidence of motion-related responses also in the ventral part of the calcarine scissure, where Mikellidou et al. (2017) have recently identified the human prostriate area, a retinotopic region located medially in between RSC and PPA regions responsive to extremely fast motion over a wide visual field (see Supporting Information for details on the mapping procedures and Figure S2A for the anatomical position, MNI coordinates, and size of the region). The prostriate, although not activated on average by motion conditions relative to the Static condition as the PPA, revealed a preference for the Disjoint condition relative to Onboard (see Figure S2B). Like PPA, the prostriate is not involved in the flowparsing phenomenon, being unable to distinguish Onboard from Joint.
Our results indicate that the prostriate, in line with its anatomical location, shows a functional profile intermediate between area PPA, which shows a preference for complex motion stimulation, and area RSc which is completely insensitive to any motion stimulation.

| Neural basis of the flow-parsing phenomenon
The neural basis of the flow parsing in humans is still a matter of debate (Arnoldussen et al., 2013;Billington & Smith, 2015;Fischer et al., 2012). Rushton (2009a, 2009b) have recently proposed that a flow-parsing mechanism identifies and subtracts the optic flow associated with the observer's movement from the pattern of retinal motion, to estimate the true object-motion. In monkeys, cells activated by real object movements but not by retinal image movements evoked by self-movements were found (Galletti & Fattori, 2003). We postulated that the cortical areas containing these "real motion" cells represent the neural basis of the flow-parsing mechanism (see Galletti & Fattori, 2018). Here, we investigated this phenomenon using a very peculiar motion condition called "Joint." In this condition, the object-motion is clearly perceivable by the subject although the image of the object is not moving on the retina. Conversely, the subject clearly perceives as motionless the objects whose images are moving on the retina because of self-motion. Given that Joint has the same quantity of self-motion as Onboard condition but only in Joint the subject perceives object-motion, we speculate that a difference between the BOLD signal in Joint and Onboard conditions could be considered an index of the "real motion" extraction. In this respect, note that a higher response in the Joint condition cannot be attributable to a higher level of self-motion perception because the psychophysical results revealed that subjects perceived more selfmotion in Onboard than in Joint.
We found a network of cortical areas more responsive to Joint than to Onboard and thus able to recognize the real movement in the visual field. Most of these are motion regions distributed in lateral and medial temporoparietal regions, and their functional in the other examined contrasts is heterogeneous, including regions preferring selfmotion (LOR), object-motion (MT), and combined self and objectmotion (V6+, V3A, and IPSmot). Macaque and human results suggest that the flow-parsing mechanism raises from a distributed and integrated network involving early and higher order regions having different functional properties and likely playing different roles in the visual motion processing. In humans, most of these flow parsing-related regions are well known in the motion literature and have properties that, according to Rushton (2009a, 2009b), constitute important prerequisites for processing egomotion signals in relation to object-motion. First, most of these regions (LOR, MT, VIP, V6, and V3A) are activated by optic flow (Smith et al., 2006;Morrone et al., 2000;Pitzalis et al., 2010Pitzalis et al., , 2012 which constitutes a rich source of visual cues that can facilitate navigation through the external environment. Second, MT, V3A, and VIP respond to changing heading directions (Furlan, Wann, & Smith, 2014;Huang et al., 2012), which is another important visual cue that contributes to the perception of self-motion. Third, V6 and VIP are specialized in distinguishing among different types of self-movement, showing a strong response to translational egomotion (Pitzalis, Sdoia, et al., 2013) that allows to extract information about the relative distance of objects, useful to act on them, or to avoid them (see also . Finally, area VIP (but not MT or V6) shows vestibular responses and appears to integrate visual and vestibular cues to direct self-motion (Billington & Smith, 2015;Greenlee et al., 2016;Smith et al., 2012).
Overall, some of these flow parsing-related regions could be involved in the extraction of optic flow for the computation of heading direction (MT, V3A, and VIP), others for obstacle avoidance (V6 and VIP) and/or for visual/vestibular cue integration (VIP). Biagi, Crespi, Tosetti, and Morrone (2015) demonstrated that MT+ and V6 are operative even by 7 weeks of age, likely providing very young infants with a sense of vection. More in general, the flow parsing-related regions could act as "sensors" of real movement in a neural network that subserves an internal, objective map of the visual field. Such an internal representation of the FOV could allow one to correctly interpret real motion as well as the plethora of sensory changes resulting from exploratory eye and self-movements in a stable visual world (Galletti & Fattori, 2003). Of course, we are aware that with the analysis implemented here we can only identify which motion areas of the dorsal stream are likely involved in the flow-parsing phenomenon, but without specific analysis on the timing of these activations (e.g., event-related peak latency) we cannot assess when the activation occurred and infer directionality.

| Ecological stimuli, eye movements, and motion perception
A peculiar aspect of this study is the choice of the visual stimuli. To reproduce a realistic self-and object-motion stimulation, we used a virtual reality software simulating a train moving in a natural landscape and, more importantly, movies as motion conditions. Recent neuroimaging studies have begun to use virtual reality simulation to investigate the neural substrates of egomotion (Billington, Field, Wilkie, & Wann, 2010;Field, Wilkie, & Wann, 2007;Huang et al., 2015). Movies constitute an excellent experimental approximation to the reality and come much closer toward everyday like scenarios than clouds of dots, which are typically used (Billington & Smith, 2015;Pitzalis, Sdoia, et al., 2013;Pitzalis, Sereno, et al., 2013;Smith et al., 2006). The use of wide-field stimuli increased the realism of the virtual environment and made the subjective experience of self-motion (vection) particularly compelling (Palmisano et al., 2015).
To further increase the realism of our paradigm and the subjective self-motion experience, subjects were free to move their eyes. When an individual move around the environment a coherent pattern of image motion known as optic flow reaches the retina and its center of flow (CoF) coincides with the gaze direction . In this condition, the CoF indicates the direction toward which the individual is heading. In movies as those used here the CoF continuously change its position on the screen and if subjects are asked to stare at a central fixation cross, the CoF and gaze direction would dissociate through the run. Because of this, the visual percept would be unnatural, not informative about heading and not able to elicit a compelling self-motion experience (Wann, Swapp, & Rushton, 2000). Therefore, we opted for a natural vision (free scanning). Of course, we are aware that although this is appreciable because it is exactly what happens in everyday life, the eye movements might be considered a critical confound, because part of the observed brain activity might have been induced by these eye movements. But the presence and the amount of eye movements is an ingrained feature of the type of movements we studied here. When, for instance, we move forward, we tend to keep our gaze in correspondence of the CoF. In contrast, when we observe a moving train from a still position, we tend to keep the train still on the fovea by pursuit eye movements. Also, during self-motion consistent optic flow subjects make compensatory eye movements, and it is known that changes in such eye movements over time are correlated with reported increases in vection strength (Kim & Palmisano, 2010). These results are of course reported only when the observer freely views the self-motion display (Palmisano, Kim, & Freeman, 2012). Without considering these different eye patterns, we feel that we are excluding important elements toward the real understanding of what really happens in the bran in the everyday life. Additionally, also previous imaging studies allowed participants to view movies reproducing self-motion (or a combination of self/objectmotion) with no fixation point, thus better emulating the various conditions that observers would encounter in the real world (Bartels et al., 2008;Field et al., 2007). Interestingly, Field et al., (2007) replicated their motion experiment under condition of fixation in order to rule out eye movements as an explanation of the differences in brain activation. They found that just preventing actual eye movements did not prevent activation even in those cortical regions more likely influenced by eye movements, as the cortical eye fields, and the parietal eye field PEFs, which is the homolog to the area lateral intraparietal (LIP) in monkey. This is because, under conditions of fixation, movies as those used here (showing railways and natural background) were likely to produce an increase in planned but unexecuted eye movements relative to static. This is especially likely in the case of the parietal eye fields which are also involved in shifting spatial attention independently of actual eye movements (Bisley & Goldberg, 2003;Gottlieb & Goldberg, 1999). Thus, it could be argued that, in motion conditions emulating self-motion in the real world, the impact of eye movements on brain activation cannot be completely ruled out, even when eye movements were constrained.
An additional aspect to be considered when using realistic movies reproducing different combination of self/object-motion is the potential impact of physical motion features embedded in each single frame on the neural activity of motion-selective regions. We found that SoM, which is a measure of the overall spread of the orientation motion within each movie, significantly influenced the activity in some motion regions, giving a positive contribution to the activation in LOR and a negative contribution to the activation in both MT+ and V6+. Since this motion parameter reflects the SD of motion vectors obtained in all pixels and frames within each movie, it represents an index of motion incoherence. Thus the negative relationship between SoM and the activity in both MT+ and V6+ observed in the current study is well in line with the concept that these regions are modulated by motion coherency (Smith et al., 2006;Morrone et al., 2000;Pitzalis et al., 2010;Pitzalis, Bozzacchi, et al., 2013). On the other side, we did not observe significant correlations between the overall QoM and the neural activity in any of the observed motion region. At first glance it may seem to contradict results from Bartels et al. (2008) in which a significant positive correlation between variations of global motion (reflecting self-motion) and the activity of the medial posterior parietal cortex (mPPC), which likely corresponds to the V6 area, was observed. However, their division of the total motion into a global (reflecting self-motion) and local (reflecting object-motion) motion components did not correspond of our motion estimates, which instead reflected the amplitude (QoM) and the direction (SoM) of motion, with only the latter likely reflecting a measure of global/coherent motion. Future studies could address the impact of the quantity and direction of motion with respect to both local and global motion variations on the neural activity of motion-selective regions.
A final note goes to the role of vection. According to some modern theorists and researchers, our conscious experiences of self-motion are simply intriguing epiphenomena and vection is irrelevant in simulation based self-motion experiments (see for review Palmisano et al., 2015).
According to other authors, it is possible that our conscious experiences of self-motion play important functional roles in the perception, control, navigation, or guidance of self-motion. For example, vection convincingly improves spatial orientation in virtual reality (Chance, Gaunet, Beall, & Loomis, 1998;Kearns, Warren, Duchon, & Tarr, 2002;Klatzky, Loomis, Beall, Chance, & Golledge, 1998;Riecke, 2008;Riecke, Feuereissen, Rieser, & McNamara, 2012). It is known that vection induces (or is based on) differential cortical activity, and many functional neuroimaging studies (including the present one) have attempted to identify the neural correlates of visual self-motion perception (Beer, Blakemore, Previc, & Liotti, 2002;Cardin & Smith, 2010;de Jong, Shipp, Skidmore, Frackowiak, & Zeki, 1994;Deutschländer et al., 2004;Kleinschmidt et al., 2002;Kovács et al., 2008;Previc et al., 2000;Tokumaru, Kaida, Ashida, Yoneda, & Tatsuno, 1999;. However, whether the BOLD signal observed in the cortical areas responding to self-motion is correlated to the different amount of subjective experience of self-motion is a still open question. Here, for instance, we found no sign of correlation between vection and BOLD signal even in those areas preferring self-motion condition, suggesting that the cortical regions respond to self-motion irrespective of vection and its subjective intensity. The lack of correlation found here seems to support the hypothesis that vection has little or no behavioral relevance. However, an alternative explanation could be that in our study we reached a ceiling effect, being the vection in the onboard condition on average high and uniformly distributed across subjects. In conclusion, the functional significance of vection is still an open and largely unexplored question (Palmisano et al., 2015) and more evidence is needed to verify if vection is or not an important parameter to be considered in future fMRI studies on the self-motion.

| Conclusive remarks
Many years ago, it has been suggested (Previc, 1998;Previc et al., 2000;Rosa & Tweedale, 2001) that lateral and medial motion areas are engaged in the detection of object-and self-motion, respectively.
In accordance with this hypothesis, we found a lateral area (MT) particularly sensitive to object-motion and several medial motion areas (PEc, pCi, CSv, and CMA) preferring self-motion. However, we failed to find a strict segregation of functions, since the medial motion area V6 responded equally well to both self-and object-motion, and several motion areas of the dorso-lateral surface (PIC and LOR) preferred self-motion. In addition, some regions (V6, IPSmot, MST+, V3A/V7, LIP, SFS, and PPA) prefer not pure self-or object-motion, but complex visual stimulation where both self-and object-motion are present in the stimulus, similarly to what happens in daily life when people move in a dynamic and complex environment.
In macaques, information on real movement is encoded in single cells of several areas of the dorsal stream (Galletti & Fattori, 2003).
Here we show that in humans the same areas, and others not yet studied in nonhuman primates, participate to the real movement recognition. We suggest that a network of cortical areas able to recognize the real motion ensures a stable and correct perception of the external visual world, necessary to orchestrate eye, arm, and body movements, while navigating in a complex and dynamic environment.
We believe that this network of cortical areas is a good candidate for being the neural circuit of the flow parsing-the separation of objectmotion from self-motion (Warren & Rushton, 2009a, 2009b.

ACKNOWLEDGMENTS
This research was supported by research grants from the University of Rome Foro Italico (number RIC142016) to S.P., from the Italian Ministry of Health (RC2010-11) to G.G., and by Fondazione del Monte di Bologna e Ravenna (Italy) and Italian Ministry of University and Research (PRIN) to C.G.

CONFLICT OF INTEREST
The authors declare no potential conflict of interest.

DATA AVAILABILITY STATEMENT
Present data will be made available on request in compliance with the requirements of the funding institutes, and with the institutional ethics approval.