A response‐locking protocol to boost sensitivity for fMRI‐based neurochronometry

Abstract The timeline of brain‐wide neural activity relative to a behavioral event is crucial when decoding the neural implementation of a cognitive process. Yet, fMRI assesses neural processes indirectly via delayed and regionally variable hemodynamics. This method‐inherent temporal distortion impacts the interpretation of behavior‐linked neural timing. Here we describe a novel behavioral protocol that aims at disentangling the BOLD dynamics of the pre‐ and post‐response periods in response time tasks. We tested this response‐locking protocol in a perceptual decision‐making (random dot) task. Increasing perceptual difficulty produced expected activity increases over a broad network involving the lateral/medial prefrontal cortex and the anterior insula. However, response‐locking revealed a previously unreported functional dissociation within this network. preSMA and anterior premotor cortex (prePMV) showed post‐response activity modulations while anterior insula and anterior cingulate cortex did not. Furthermore, post‐response BOLD activity at preSMA showed a modulation in timing but not amplitude while this pattern was reversed at prePMV. These timeline dissociations with response‐locking thus revealed three functionally distinct sub‐networks in what was seemingly one shared distributed network modulated by perceptual difficulty. These findings suggest that our novel response‐locked protocol could boost the timing‐related sensitivity of fMRI.

Since fMRI distorts the timescale, one strategy to increase chronometric sensitivity is to modify a paradigm's timescale to be more compatible with that of the Blood-Oxygen-Level-Dependent (BOLD) signal (E. Formisano et al., 2002;Gratton et al., 2017;Ploran et al., 2007;Ploran, Tremel, Nelson, & Wheeler, 2011;Richter et al., 2000). However, fMRI's timescale distortion poses a problem illustrated by the following scenario. Suppose A and B are neurocomputational stages of a cognitive task, where A always precedes B. Interpreting A's functional relationship to B depends on their timing relative to the behavioral events of the task. If both A and B always occur between the stimulus onset and the behavioral response, then A's relationship to B would be interpreted differently than in a scenario where A always precedes the behavioral response while B follows it. To avoid such ambiguities, one solution is to "time-lock" neural dynamics to behavioral events by directly integrating behavioral events into the neural timeline, as is done in electroencephalography (EEG) and magnetoencephalography (MEG).
This solution is, however, unavailable to fMRI due to the differing timescales of the behavioral events (milliseconds) and the BOLD signal (seconds). To address this issue, we investigated a novel time-locking strategy based on the paradigm's temporal structure. Rather than modifying a paradigm's overall timescale, the timing of behavioral events on each trial was used to modify that trial's timescale selectively.
The proposed strategy is directed toward typical experimental paradigms where the reference time-point is the stimulus onset when the brain receives an external information input, and the ensuing action (e.g., pressing a button) represents the output of the cognitive process.
On this stimulus-referenced timeline, the input-to-output transformation is attributable to the neural processes operating in the time interval between the stimulus and the response, that is, the response time (RT) (Donders, 1868;Luce, 1991;Sternberg, 1969). Identifying these time-bound neural processes, however, requires them to be accurately distinguished from processes operating outside the RT interval, namely, in the pre-stimulus and post-response periods. This within/outside distinction is the target of the proposed strategy. Unlike pre-stimulus neural dynamics (cf. Busch, Dubois, & VanRullen, 2009), the post-response neural dynamics are RT-irrelevant as they can no longer influence the RT. Furthermore, the pre-response and post-response periods are defined relative to the response event which is independent of the stimulus. Based on this rationale, we devised a response protocol to modulate the duration of the post-response processes in a response-locked manner. We conjectured that this approach enhances the distinguishability of BOLD signal dynamics with a neural origin in the postresponse (RT-irrelevant) versus pre-response (RT-relevant) periods.
To test this conjecture, the response protocol was embedded within a perceptual decision-making task (Donner, Siegel, Fries, & Engel, 2009;Gold & Shadlen, 2007;H. R. Heekeren, Marrett, Ruff, Bandettini, & Ungerleider, 2006;T. Liu & Pleskac, 2011;Tosoni, Galati, Romani, & Corbetta, 2008). On each trial (Figure 1a), participants viewed a random moving dot stimulus and reported its motion direction by pressing a corresponding button. Rather than a single button-press, the selected button had to be pressed repetitively and rapidly until a halt was signaled with a MoveOff stimulus. Importantly, the timing of this MoveOff stimulus was response-locked, that is, defined relative to the first button-press on that trial. Each trial's duration was the sum of two intervals: (a) from the stimulus to the first button press (i.e., the RT) and (b) from the first button-press until the MoveOff stimulus (i.e., the movement time [MvT]). The RT and MvT were independently varied in an RT × MvT design by modulating the stimulus noise levels and the responselocked timing of the MoveOff stimulus ( Figure 1b). With this design, the mean BOLD differences between trial-types with the same mean RT but different mean MvTs should have a response-locked neural origin, namely, after the first button-press. We evaluated whether these predicted response-locked differences in the fMRI data increased sensitivity in the context of the perceptual decision-making task. Even though motion direction is a property of the visual stimulus, stimulus-related RT-modulations have surprisingly been found to modulate action-related neural processes in non-human primates but this modulation in the human brain has been controversial (Donner et al., 2009;Filimon, Philiastides, Nelson, Kloosterman, & Heekeren, 2013;Gold & Shadlen, 2007; H. R. Heekeren et al., 2006;H. R. Heekeren, Marrett, & Ungerleider, 2008;Resulaj, Kiani, Wolpert, & Shadlen, 2009;Tosoni et al., 2014;Tosoni et al., 2008). We hypothesized that activity modulation by RT and MvT would provide a strict functional (rather than anatomical) criterion to identify stimulus-modulated regions (pre-response) with a measurable role in action execution (post-response).

| Participants
Thirty-three healthy, young volunteers (mean age: 25.6 years [SD = 2.8 years], range: 19-32 years, female = 16) participated in the experiment and received financial compensation. Participants were right-handed (mean score = 81.2% [SD = 18.2%] [Edinburgh Handedness Inventory (Oldfield, 1971)]), had normal or corrected-to-normal vision, no history of psychiatric or neurological diseases, and no contraindications for MRI scanning. Participants were additionally prescreened for their ability to perform the task. The local ethics committee approved the study, which complied with the Declaration of Helsinki.
All volunteers provided their informed consent before the experiment.
Statistical analyses reported here are based on datasets from 30 (of the 33) participants due to the exclusion of 3 datasets based on quality considerations (see details below).

| Stimulus specification
Visual stimuli were generated and displayed using the Presentation® Software (Neurobehavioral Systems, Inc.) on an LCD screen (size: 68.6 cm [diagonal], resolution: 1,200 pixels × 800 pixels, frame rate: 60 Hz). The screen was located behind the scanner and was viewed via a mirror installed on the head coil.
The dots were distributed equally in each of the circle's quadrants.
Each dot had a finite life that ended either (a) after 0.5 s (30 frames), or (b) when the dot moved outside the invisible circle's periphery. A dot that met either criterion was replaced on the next frame by a new dot that appeared at a random location within the circle. To reduce the abruptness with which dots disappeared at the circle's periphery, the circle was windowed by a Gaussian luminance envelope so that dots appeared to be brighter at the circle's center and progressively dimmer toward the periphery.
On every frame, each dot was assigned a motion direction based on two trial-specific parameters: (a) coherence (Coh) and (b) global direction (D). The parameter Coh specified the proportion of dots to be assigned to the (coherent) motion direction D. These dots with a coherent motion direction were selected randomly on each frame while the remaining dots were each assigned a direction selected randomly over the uniform range [0 , 359 ]. For example, in an RDK with D = 90 and Coh = 60%, 360 randomly selected dots (60% of 600) on each frame would be assigned a motion direction of 90 while each of the remaining 240 dots would be assigned a random motion direction.
Due to this randomization procedure, a dot with a direction D in one frame would likely have a different direction in the next frame and vice versa, thus making it difficult to infer the direction D by tracking the motion of any single dot.
Finally, a small static disc (diameter: 0.2 v.a.) was centrally displayed over the whole experiment to serve as a fixation point and as a F I G U R E 1 Experimental paradigm. (a) Trial schematic. Each trial started with the stimulus onset and ended with the stimulus offset. Participants lay supine in the scanner, and the response device was positioned vertically on their midline. The colors of the central fixation point and their relative durations are depicted using thick lines on the time-axis. The fixation point was colored red at stimulus onset with a change to blue following the first button press (i.e., MoveOn signal) and changed back to red after either k = 3 or k = 8 button-presses (i.e., MoveOff signal). (b) The hypothesized timing between the different stimuli is shown. The MoveOff signal was displayed after either three or eight button presses (filled squares). The timing of the first button press (open squares) varies with stimulus coherence (black line). The coherently moving dots are illustrated here with arrows and were not displayed in the experiment. The interval between the first button press to the third/eighth button press was assumed to be unaffected by stimulus coherence cue at different stages of the experiment based on its color (red, blue, or gray).

| Paradigm
A crucial context for the experimental paradigm was the response setup. Response events in the experiment were index finger button presses, which were recorded with an MRI compatible LUMItouch response pad (Photon Control Inc., Burnaby, BC, Canada). The response device was placed on the participant's chest to align the buttons vertically along the participant's midline (Figure 1a). Two buttons were designated as the upper (closer to the participants' head) and lower (closer to their feet) buttons. Participants positioned their hands to press the upper and lower button with their left and right index finger, respectively. The assignment of left/right index fingers to press the upper/lower buttons was counterbalanced across participants, that is, half the participants pressed the upper button with the right hand while the other half used the left hand.
The up/down orientation for both index finger positions and RDK motion directions ensured that the response rather than the stimulus primarily drove any lateralization of brain activity. If the stimuli/ responses instead had a left/right orientation (as in Heekeren et al., 2006;de Lange, Rahnev, Donner, & Lau, 2013;Kelly & O'Connell, 2013) then lateralized neural activity linked to both the RDK's motion direction (for example, due to attention orienting) and the corresponding response (i.e., with the left index finger) would be lateralized to the same (i.e., right) hemisphere. Each trial began with the display of the RDK stimulus with a redcolored fixation point at its center. Participants judged whether the moving dots of the RDK had an upward or downward direction and reported this perceptual decision by pressing the corresponding upper or lower button repeatedly and as rapidly as possible. The first button-press of the response triggered a change in the fixation point's color from red to blue. Following this color change, participants now had to monitor the fixation point's color to detect a second color change while repetitively pressing the selected button. The second color change (from blue to red) was the signal to halt the response immediately and marked the end of the trial. For clarity, we henceforth refer to these two color changes as the MoveOn (red to blue) and MoveOff (blue to red) signals. The RDK was continuously displayed over the trial's entire duration.
Unknown to the participants, the interval between the first button-press and the halt-signal (i.e., the blue-to-red color change) was controlled by their behavior. The button presses were counted in realtime as they were produced, and the MoveOff signal was displayed when this real-time count reached a pre-defined target value. This target value was either 3 or 8 button presses corresponding to a "short and "long" movement duration (denoted as Mov Short and Mov Long ).
Real-time counting was used rather than a pre-defined period so that the number of movements associated with Mov Short and Mov Long was consistent across individuals to equate for interindividual differences in maximum tapping rate (cf., the clinical Finger-Tapping Test: Shimoyama, Ninchoji, & Uemara, 1990). To ensure that the time to complete three button presses was shorter than to complete eight button presses, a missed trial occurred if neither button was pressed within 1,800 ms following stimulus onset or if the required number of button presses was not completed within 1 s (for Mov Short ) or 2.5 s (for Mov Long ). The MoveOff signal was presented while participants were pressing a button rapidly and repeatedly. Consequently, we assumed that instructing a halt would lead to additional button presses before all movements ceased. The target number of buttonpresses 3 and 8 were selected so that the relative time differences would be maintained even with a few excess button presses.
To prevent the perceptual decision from being prioritized over the response demands, the RDK stimulus was continuously displayed until the MoveOff signal to de-emphasize the distinction between the RT and MvT intervals. This continuous stimulus display also avoided a confound between (a) the neural processes associated with the response onset and (b) processes evoked by the stimulus offset (also see Kelly & O'Connell, 2013). Strictly speaking, the MoveOff signal was redundant information as it coincided with the RDK's disappearance from the screen but this redundant color cueing served to emphasize that the response requirements were a crucial part of the task.

| Procedure and training
Participants attended two sessions on separate days: instruction and training outside the scanner (Session 1), and the main experiment in the scanner (Session 2). Training began with detailed task instructions delivered verbally, following a script. Next, participants were familiarized with the RDK by performing the task on stimuli that steadily increased in difficulty, going from 100% coherence to 20% in stepwise reductions of 20% every 20 trials. This procedure was repeated with additional instruction as needed until an overall accuracy of 90% was reached. After familiarization, a calibration procedure followed to identify three coherence values {Coh High , Coh Med , Coh Low } that would produce mean RTs of 600 ms, 750 ms, and 900 ms, respectively. Briefly, calibration started with an initial coherence estimate for each target RT (<Coherence, RT>: <90%, 600 ms>; <80%, 750 ms>; <70%, 900 ms>). This initial estimate was then iteratively refined based on the participant's performance using the Parametric Estimation by Sequential Testing (PEST) algorithm (Lieberman and Pentland, 1982). The iterative coherence adjustments (step size: maximum = 8%, minimum = 0.5%; maximum trials: 50) continued until each target RT was reached with an accuracy of 90%. The minimum possible coherence value was set to 7% (based on our pilot studies). Calibration for all target RTs was performed concurrently by interleaving trials for each target. For robustness, this calibration procedure was repeated three times, and the mean coherence across these repetitions was used as the final calibrated value. Participants were unaware that these calibrated coherence values would determine the stimuli that they would be exposed to in the main experiment. Finally, participants performed a shortened version of the main experiment to confirm whether the calibrated coherence values produced the expected RT ordering in the actual experimental scenario. If the mean RT was lowest for Coh High , highest for Coh Low , and intermediate for Coh Med (with a mean accuracy ≥ 85% per coherence), then calibration was deemed successful, and participants were invited to the main experiment.
In the second session, participants were re-instructed and practiced the task before entering the scanner to perform the main experiment. The assignment of left/right index fingers to press the upper/ lower buttons was the same for both sessions. Special precautions were taken to firmly secure participant's elbows in the scanner to minimize head movements while responding. Due to the long duration of the experiment (~96 min), participants received rest periods of several minutes between runs when scanning was halted. Trials were organized into four runs (~24 min each) in a rapidevent design, and each run started and ended with an 11 s task-free period to minimize transient effects of the starting and ending of the session on task-related BOLD activity. A run was divided into four task-blocks (39 trials each) interleaved with 12 s task-free periods that were indicated by a gray-colored fixation point. Each task-block ended with a feedback screen (3 s) displaying the numerical accuracy (in percent) and the number of missed stimuli on that block. The intertrial interval varied from 4 to 8 s. To de-confound task and scanner timings, the trial onset was jittered relative to the start of a new EPI (uniformly, randomly in the range [0 ms, 2,200 ms (=1TR)] discretized into 100 ms intervals). To improve randomization (T. T. Liu, Frank, Wong, & Buxton, 2001; Thomas T. Liu & Frank, 2004), three null trials (duration: 1800 ms) were included in each block. All trials were displayed in a pseudorandomized order specified by a Maximum Length Sequence (or m-sequence) (Aguirre, 2007;Aguirre, Mattar, & Magis-Weinberg, 2011;Buračas & Boynton, 2002) that ensured a counterbalanced presentation of the 13 trial types (12 conditions +1 null event).

| Trial design
Since the stimulus-response mapping differed between participants (see Paradigm), before statistical modeling, each trial was recategorized based on whether that trial required a response with the left or right index finger (rather than the RDK direction) due to our focus on action-related processes. With this re-categorization, the

| fMRI data acquisition and preprocessing
Functional and structural MR images were acquired on a 3T MR scanner (Siemens Tim Trio, Erlangen, Germany) using a circularly polarized (CP) head coil, as part of a simultaneous fMRI-EEG study. Image For preprocessing and statistical analyses, the acquired images were converted from the Siemens DICOM format to the NIFTI format using the dcm2nii utility (version 2013) (Li, Morgan, Ashburner, Smith, & Rorden, 2016). Functional images (EPIs) were spatially realigned to the mean EPI image (using second degree B-spline interpolation) followed by slice-timing correction (relative to the middle slice). The mean EPI was then co-registered to the structural image.
Using SPM12's unified segmentation/normalization algorithm, the structural image was segmented to distinguish white from gray matter and then warped to match a standard Montreal Neurological Institute (MNI) template brain image. The deformation fields estimated from this segmentation/normalization procedure were applied to all EPIs to transform them into standard MNI space (normalization) followed by resampling to a voxel size of 2 mm × 2 mm × 2 mm (fourth degree Bspline interpolation). The normalized EPIs were smoothed with an isotropic 8 mm full-width-at-half-maximum (FWHM) Gaussian kernel.
For statistical analyses involving the explicit estimation of the timing of the hemodynamic response (see below), a concern was that specific preprocessing steps, such as the slice-timing correction and smoothing, could distort inter-voxel timing relationships (e.g., Kamitani & Sawahata, 2010). Therefore, for these time-sensitive analyses, we created a duplicate dataset that was preprocessed as described above but omitted the slice-time correction step and used a smaller smoothing kernel of 6 mm (rather than 8 mm). No unwarping (e.g., Andersson, Hutton, Ashburner, Turner, & Friston, 2001) was performed on either dataset during preprocessing to minimize potential distortions of timing information.
Following preprocessing, datasets from three individuals were excluded from further analysis: one due to technical errors in recording responses; another due to an incomplete scan due to technical delays; and one for low overall accuracy (~50%). For all remaining participants, fewer than 5% of the~2,600 images acquired (4 runs ×~650 images/ run) were affected by large head movements defined here as a framewise displacement greater than 0.5 mm between consecutive images (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012)

| fMRI statistical analysis
All statistical analyses were conducted within a mass-univariate framework where the evoked hemodynamic response at each voxel was independently estimated using a general linear model (GLM). Two different basis functions were used for the analysis. As with prior studies, the BOLD effects of the coherence modulation were modeled using the canonical Hemodynamic Response Function (HRF) basis (Friston et al., 1998;Josephs, Turner, & Friston, 1997;Worsley & Friston, 1995) (on the slice-time corrected EPIs). However, to explicitly account for the role of timing, the BOLD effects of the MvT modulation were modeled using the Finite Impulse Response (FIR) basis function (Boynton, Engel, Glover, & Heeger, 1996;Glover, 1999;Goutte, Nielsen, & Hansen, 2000;) (on EPIs without slice-time correction). For both basis functions, the trial onset was defined at the stimulus onset.
• Coherence modulation (canonical HRF basis): To assess the effect of coherence modulation on perceptual processes that are invariant to action selection/execution, correct trials with the left and right index-finger responses were each modeled using a single regressor (collapsing across the Mov factors) with a parametric regressor to assess modulation by coherence. For simplicity, parametric modulation was assumed to be linear, and the coherence levels {Coh Low , Coh Med , Coh High } were coded as [+1, 0, −1], respectively (Philiastides & Sajda, 2007). Since the coherence-calibrated RTs had a small range (300 ms), trial duration was set to 0 ms. Independent of the basis function used, all estimated first-level models shared the following properties. The regressors of interest only modeled correct trials. Additional regressors of non-interest were included to account for incorrect trials and the feedback period. The six head-movement parameters estimated during spatial realignment (i.e., translation and rotation relative to the X, Y, Z axes) were included as covariates to account for head-movement effects. To increase statistical power, images from all runs were concatenated and analyzed as a continuous time-series in the GLM with additional regressors to account for inter-run differences. The time-series at each voxel were high-pass filtered (1/128 Hz) to remove slow trends. First-level model estimation was restricted to voxels contained within a full-brain mask estimated during normalization.
For second-level statistical analyses, analyses were restricted to voxels contained within a functionally defined mask of voxels that exhibit task-related activity changes. This task-mask consisted of all voxels where the mean activity on the left and right index finger response trials (estimated using the canonical HRF basis function) was significantly greater than zero (second-level t test, p < .05 FWE, corrected at the voxel level). The boundaries of the task-mask are displayed in the visualization of the second-level analyses in Section 3.
Unless noted otherwise, voxel-level activity across the Left/Right conditions was averaged together based on each voxel's lateralized location relative to the moving finger, that is, contralateral or ipsilateral.
For all statistical maps, contralateral activity was displayed over the left hemisphere.
Contrasts were performed at the individual level (i.e., first-level), and these contrast images (without any additional smoothing) were used for group (i.e., second-level) statistics. Second-level statistics were only assessed at voxels contained within the task mask. Due to our interest in obtaining spatial maps, all second-level statistics reported here were corrected for multiple comparisons at the threshold of p < .05 FWE cluster-corrected with a cluster-forming threshold determined at p < .0001 (uncorrected). For exploratory purposes, the conjunction between activity maps was assessed based on the minimum-T criterion (Nichols, Brett, Andersson, Wager, & Poline, 2005). For clarity, statistical tables only report clusters and peak-coordinates from the contralateral hemisphere.
Statistical maps were rendered on the Conte69 cortical surface  the canonical HRF produces a predicted hemodynamic response with an amplitude that peaks at~6.6 s following stimulus onset. By treating 6 s as the approximate mid-point of task-relevant hemodynamic changes on a given trial, we restricted the subsequent analysis to a 2 × 6 s = 12 s period following stimulus-onset.

| FIR time-series analysis
These voxel × time beta-value matrices from each condition were used to analyze the (a) hemodynamic states at each time-point (i.e., columns of the matrix), and (b) voxel-specific dynamics (i.e., rows of the matrix).

| Hemodynamic state and Euclidean distance
Each column of the voxel × time matrices described above were used to obtain a snapshot of the instantaneous activity across the brain (i.e., the hemodynamic state) at each time-point. Specifically, the instantaneous hemodynamic state at time t (denoted as S t ) for each condition was defined as a vector of beta-values where β v,t denotes the beta-value at voxel v at time t (Figure 2a). Each of the 12 conditions was associated with nine states corresponding to each of the T time points. Therefore, intercondition differences should manifest as a difference between the corresponding hemodynamic states at one or more time-points independent of how the taskspecific networks are organized.
To quantify inter-state differences, we formulated the hemodynamic states and their relationships in geometric terms (Figure 2b).
The hemodynamic state is treated as a point in a multidimensional space where each voxel defines one dimension, similar to the typical formulation used in Multivariate/voxel Pattern Analysis (MVPA) Haxby, Connolly, & Guntupalli, 2014;Norman, Polyn, Detre, & Haxby, 2006). The Euclidean distance between two states was used as a measure of the difference in activity (as indexed by the beta-values) between those states.
The overall activity level of a particular state S t in condition C was defined as its Euclidean distance to the starting state S 0 of that same condition, namely, The relative difference in activity (i.e., Euclidean distance) between a hemodynamic state in condition Cx (S Cx t ) to the corresponding state in condition Cy at the same time t (S Cy t ) was computed as As illustrated in Figure 2b, the two states S t Cx and S t Cy are different from each other in relative activity (i.e., ΔC t > 0) as voxel j has a relatively higher activity than voxel i for condition Cx while the opposite is true for Cy. Nevertheless, both states can have the same Euclidean distance to the start state S 0 (i.e., d(S t Cx ) = d(S t Cy )).
The Euclidean distance between any two states, whether within or between conditions, was always a single scalar value. All measures were averaged across responding hand.

| Peak amplitude and peak time
The hemodynamic changes at each voxel were summarized using two parameters related to the activity peak, namely, the peak amplitude and the peak time as described below.
The voxel × time beta-value matrix for each condition (described above) provided access to the beta time-series at each voxel v for each condition C, that is, [β 1,t β 2,t …β N,t ]. We qualitatively verified that the beta time-series at most voxels showed a single-peaked shape across conditions and participants (as in Figure 2a). This time-series was summarized with two parameters: the peak amplitude (i.e., the maximum beta value of the time series), and the peak time (i.e., the earliest time-point at which the peak amplitude is reached).
For each condition, the peak amplitude (denoted as β v ) and peak time at each voxel (denoted as t v ) were written to the corresponding voxel location of an empty image having the same dimensions as the beta images and then smoothed (6 mm FWHM). These summary images were used to calculate intercondition contrasts in peak amplitudes and peak times per individual, and these contrast images were statistically evaluated at the group level. 2.11 | Normalized difference in peak amplitude and peak time To assess the relative magnitude of intercondition differences in peak amplitude/time across different voxels, these differences were normalized similar to the coefficient of variation (i.e., [SD]/mean). The normalized difference of the peak amplitude and peak time differences between conditions Cx and Cy at voxel v (i.e., β Cx v − β Cy v and t Cx v − t Cy v ) was defined as follows and expressed as a percentage These values were computed at the first level using the peak amplitude and peak time summary images described above. These behavioral differences were a pre-condition to evaluate the readout of the corresponding neural dynamics in the BOLD signal.

| Response time and movement time have dissociable hemodynamic effects
To read out response-locked neural activity in the BOLD activity, the independent modulation of RT and MvT at the neural level should have dissociable effects on the hemodynamic responses. In our paradigm, the RDK coherence modulated the RT (i.e., the time from the stimulus to the first button press) before the MoveOff stimulus could influence the MvT (i.e., the time between the first and last buttonpress). These sequential effects predicted an asymmetry relative to the stimulus onset. The timing of the neural activity differences between the conditions <Coh Low , Mov x > and <Coh High , Mov x > should be unaffected by whether Mov x was equal to Mov Short or Mov Long .
However, the neural differences between the conditions <Coh x , Mov Long > and <Coh x , Mov Short > should occur later when Coh x was equal to Coh Low (i.e., high RT) rather than Coh High (i.e., low RT). We tested whether this predicted asymmetry was detectable in the hemodynamic states constructed from the beta-value time-series estimated with the FIR basis function (see Figure 2, Section 2).  relationships were only evident in multivariate inter-voxel comparisons and not by measures of overall activity relative to S 0 (see Figure 2). We next turned to map the spatial locations of these response-locked differences.

| Mapping response-locked activity reveals functional distinctions
The readout of the response-locked neural differences between <Coh x , Mov Short > and <Coh x , Mov Long > in the stimulus-locked BOLD signal could take different forms across the brain. For example, the mean BOLD activity for <Coh Low , Mov Long > was significantly greater than zero over a longer period (until 7.5 s) than for <Coh Low , Mov Short > (until 6 s) at several voxels ( Figure 4a). Nevertheless, the activity profiles at these voxels could differ from each other, depending on their functional origin (Ploran et al., 2007). In our paradigm, a basic functional difference between <Coh x , Mov Short > and <Coh x , Mov Long > was related to movement production (i.e., 3 versus 8 button-presses). Therefore, we used the activity profile at the primary motor cortex (M1) as a template to map the potential functional origin of the response-locked differences.
Across <Coh, Mov> conditions, the activity at M1 contralateral to the responding hand (Figure 4b)  . Across coherence values, the peak amplitude for Mov Long was higher than for Mov Short and was also reached at a later time. (c) Map of voxels over contralateral hemisphere where peak activity for Mov Long was significantly greater than for Mov Short in peak amplitude (left column) and in peak time (right column) (p < .05, FWE-cluster corrected, cluster-forming threshold for peak amplitude = 52 voxels, for peak time = 41 voxels). Colors indicate normalized differences in peak amplitude/time at these voxels. (d) Relationship between maps of movement-related peak amplitude (green) and peak time (magenta) differences. The maps show considerable overlap (yellow) [minimum cluster size = 41 voxels] but also considerable nonoverlapping regions presses) was larger and occurred later (relative to stimulus onset) than for Mov Short (3 button presses). This peak-based pattern of differences was used as a functional template.
Activity at a voxel was categorized as being "like M1" (i.e., potentially having a movement-related origin) if the peak amplitude and peak time for <Coh x , Mov Long > were significantly greater than for <Coh x , Mov Short > (averaged across Coh levels). Voxels that satisfied only the peak amplitude criterion or only the peak time criterion were categorized as being "unlike M1" (i.e., having a nonmovement origin). Due to the low activity at ipsilateral M1 ( Figure 4b), which was expected given the relative simplicity of the required unilateral responses, we restricted our subsequent analyses to the contralateral hemisphere.
The peak amplitude criterion and the peak time criterion were each satisfied over an extensive set of regions (Figure 4c, Tables S1 and S2). More voxels satisfied the peak amplitude criterion (Figure 4c, left panel) than the peak time criterion (Figure 4c, right panel). Furthermore, the maximum normalized difference in peak amplitude (~15%) was more than twice the corresponding maximum for peak time (~6%). Large normalized differences in peak amplitude and in peak time were concentrated over M1, S1, and the occipital cortex.
The (binary) maps specific to the peak amplitude and the peak time criteria were overlaid to identify voxels based on the like/unlike-M1 categorization rule (Figure 4d, Table 1). The overlap of the peak amplitude and peak time maps (yellow areas) included M1, S1, and S2 with additional overlaps at the occipital cortex. Notably, several regions were unlike M1. Only the peak amplitude criterion (green areas) was satisfied at caudal SMA, anterior ventral premotor cortex (prePMV) in the vicinity of area 44/45, posterior insula, and cingulate gyrus. Despite the poor time resolution of the hemodynamic signal, regions that met the peak time criterion (magenta areas) were not strictly a subset of regions satisfying the peak amplitude criterion.
There were large clusters over preSMA where only the peak time criterion was satisfied.
In summary, irrespective of functional origin, the activity differences revealed by the map in Figure 4d meet the definition of being response-locked, namely, they followed after the first button-press.
Applying a "like/unlike-M1" functional template to these responselocked differences revealed that several regions had an activity profile like M1 (as expected), but several regions had an activity profile that was unlike M1 in different ways. To clarify these functional distinctions revealed by response-locking, we next compared it to a conventional stimulus-locked view of these data.

| Stimulus-locked activity: Coherencemodulated networks
Coherence-modulated activity relative to the stimulus onset was estimated using the canonical HRF basis function. When averaged across Mov levels and responding hand, changes in stimulus coherence modulated activity negatively in some regions and positively in others (Figure 5a upper, Tables 2 and 3). Since stimulus identification precedes action selection, the modulation is displayed without flipping the data relative to the responding hand. Parametric decreases in coherence (i.e., increases in RT) were associated with a reduction in mean activity (i.e., negative modulation) at the left dorsolateral prefrontal cortex (dLPFC), bilateral angular/supramarginal gyrus, precuneus, and the thalamus. However, coherence decreases were  (Table 4).
Surprisingly, the stimulus-locked map showed an overlap with both response-locked maps only at the occipital cortex. No other region with a response-locked activity profile that was "like-M1" (e.g., M1, S1, and S2) had an overlap with the stimulus-locked maps.
However, the overlaps of the stimulus-locked maps with the

| DISCUSSION
Our strategy here was to use the response protocol in order to read out response-locked neural effects from the BOLD signal. Multiple lines of evidence suggest that the response protocol achieved this goal. First, we found evidence consistent with an asymmetric sequential relationship between RT and MvT on the hemodynamic state (Figure 3a,b). Second, there was a robust difference in the peak amplitude and peak timing of BOLD activity at the primary motor cortex and somatosensory cortex, consistent with the responselocked differences between Mov Short and Mov Long in the duration and number of executed movements (Figure 4). There were also strong differences over the occipital cortex consistent with the differences in stimulus display duration between Mov Short and Mov Long .
The response protocol itself did not disrupt the stimulus-related properties of the perceptual decision task. First, the distribution of positive/negative activity modulation with coherence was consistent with prior studies (H R Heekeren et al., 2006;Ho, Brown, & Serences, 2009;Krueger et al., 2017;T. Liu & Pleskac, 2011;Nee & D'Esposito, 2016;Wheeler et al., 2015), especially the study of Heekeren, Marrett, Bandettini, and Ungerleider (2004) that emphasized the role of the dLPFC (Figure 5a). Second, a fundamental property worth emphasizing is that the response protocol enabled the measurement of a behavioral RT in the same manner as a traditional response protocol involving a single button press. For instance, in prior studies, a common strategy to isolate pre-movement information-processing stages from action execution was to use separate stimuli to provide information about stimulus identity and to command action execution (Filimon et al., 2013;Gold & Shadlen, 2007;Hebart, Donner, & Haynes, 2012;Schouten & Bekker, 1967). Although this "forced" RT strategy (Schouten & Bekker, 1967) provides greater experimental control over the timing of an action, this approach also disrupts the dynamic transformation of the stimulus into an action that response-locking seeks to access. The availability of behavioral RT with our response protocol provides a common ground truth to integrate findings from fMRI with findings from matched paradigms using high time-resolution techniques such as EEG and MEG.
We next turn to the question of whether this response protocol revealed any new information about the underlying cognitive computations that might otherwise not be available from a stimulus-locked perspective.

| Functional dissociations in perceptual decision-making
In our paradigm, increasing perceptual difficulty (with decreasing coherence) led to increases in stimulus-locked activity over the lateral/medial prefrontal cortex (including the anterior cingulate cortex, preSMA, and prePMV) and the anterior insula (Figure 5a). Although several studies have reported this modulation, their underlying computational function to date remains elusive (Filimon et al., 2013;Gold & Shadlen, 2007;Hebart et al., 2012;Ploran et al., 2007;Thura & Cisek, 2014). For instance, it is unclear if they are associated with distinct functions. The response-locked perspective reveals a dissociation between these regions (Figure 5b). . It might be that the peak time modulation was detected at fewer voxels than the peak amplitude modulation merely due to the low sampling frequency with fMRI (here, TR = 2,200 ms). If this were the case, then the peak time map would be a strict subset of the peak amplitude map. However, this prediction was violated most notably at the preSMA, where there was no measurable difference in peak amplitude.
The properties of the paradigm provide certain constraints for interpreting this putative functional dissociation. When the response is treated as the reference event, it is not solely an output (relative to

| The response-locked view of perceptual decisions
If a region has a function that operates continuously from the first button press until the end of the trial for both Mov Short and Mov Long , then the integrative form of the hemodynamic response function (i.e., a convolution of the HRF with the activation duration) would predict that the BOLD signal has a higher peak amplitude and peak time  (Gratton et al., 2017;Ploran et al., 2007).
However, the planning of action can occur at many levels of abstraction.
One interpretation of movement preparation is that it involves subthreshold activation of motor representations that are then activated during movement execution, that is, "raise-to-threshold" preparation (Erlhagen & Schöner, 2002;Thura, Beauregard-Racine, Fradet, & Cisek, 2012;Thura & Cisek, 2014). Such representations might be expected to exhibit activity consistent with the "like M1" template. Contrary to this latter possibility, we found no evidence that motor representations of the kind involved in executing an action were active and modulated by RT or stimulus coherence during perceptual decisionmaking.
Even though none of the coherence-modulated regions seemed to have a movement-related origin, the activity profile at preSMA was consistent with a process triggered by the MoveOff stimulus, namely, a marker of the end of the trial (see above). This result is consistent with the preSMA's previously reported role in context monitoring, sequencing, and task control (Hikosaka & Isoda, 2010;Isoda & Hikosaka, 2007;Nachev, Kennard, & Husain, 2008;Shima & Tanji, 2011) and suggests that the preSMA might have a similar function in the transformation of a stimulus to a response. The preSMA might monitor the start of the response or, possibly, the switch from a perceptually oriented sub-task (namely, decoding the stimulus) to the demands of action execution.

| The stimulus-locked view of perceptual decisions
In general, to calculate the RT, a "response" marks the earliest time following the stimulus at which the execution of a task-relevant action can be detected (e.g., a button press. However, this instantaneous binary event is an impoverished description of the actual movement, which takes place over an extended time involving changes in multiple effector-specific kinematic variables (e.g., velocity, acceleration, forces) (Shenoy, Sahani, & Churchland, 2013;Viswanathan et al., 2019).  (Ratcliff & Rouder, 1998).
In the current paradigm, there are two buttons, and each button is associated with a different effector, that is, the left and the right index finger. Therefore, the first press of a button fully specified the choice and further presses of that same button do not provide added information about the choice and cannot change the choice. Many of the processes that are functionally involved in reaching the decision, that is, that cause the RT differences to vary with stimulus coherence, or with the immediate consequences of choosing might no longer have a function following the onset of the response.
The anterior insula and the dorsal anterior cingulate cortex have been found to be co-activated in numerous paradigms and are referred to as being part of a salience network (Medford & Critchley, 2010;V. Menon & Uddin, 2010). The salience network has been hypothesized to be engaged in cognitive control among a variety of functions. Even though the repeated and rapid movements required by the protocol requires cognitive control and appropriate resources, the absence of a modulation of the salience network may suggest that this network has a role in monitoring the choice with a limited role after the choice leads to the initiation of the corresponding action.
Along similar lines, the prePMV may indicate the delayed offset of neural activity from the perceptual decision. We hypothesize that prePMV might indicate the continued integration of the decision-evidence, even though an action has been selected (Rabbitt & Vyas, 1981;Resulaj et al., 2009;Selen, Shadlen, & Wolpert, 2012).
The integration continues longer than Mov Short but less than Mov Long , so time differences are less evident even though small amplitude differences are present.

| An integrated view
When studying the transformation of a stimulus into action, a traditional representation of the information processing stages is with a series of boxes and arrows, where the arrows represent the ordinal flow of information while the boxes represent hypothetical processors of this information. For instance, a classic model for how perceptual decisions inform action selection is via a processing sequence represented with boxes/arrows: sensory processing, perceptual identification, response selection, and response execution (Figure 6a). These processors perform a transformation, for example, of a stimulus' sensory representation into its identity, or the stimulus identity into an action choice. This is especially crucial for arbitrary stimulus-response mappings where there are two instructed decisions: a decision about the identity of the stimulus (based on an instruction specifying the stimulus features to be evaluated) and a decision about the action to select based on an instructed mapping of stimulus identity to a corresponding response. However, this box-arrow representation is deceptive about the relative timing of the onset/offset of this processing activity.
When we consider time as a cardinal variable, the well-ordered transition between consecutive stages is difficult to identify due to the unclear onset/offset of activity in brain regions and their roles within these networks. Here we propose an adjustment that allows this box-arrow representation to be interpreted relative to cardinal time (for a similar formulation obtained by integrating fMRI-EEG acquired simultaneously see [Muraskin et al., 2018]). In Figure 6b, the sequence of processing stages remains. However, an essential modification is to the onset and offset of the processing-related stages.
Information might pass from stage X to the next stage Y at a specific time (indicated by the arrows), but the hypothetical "processor" that implements stage X might nevertheless continue to be active (i.e., the length of a box along the time dimension). As a consequence, the neural implementations of certain processing stages might seemingly appear over extended periods. Two regions A and B that are co-active at a particular time might belong to different functional networks that operate at different times, but the delayed offset of region A and the early onset of region B might lead to a misattribution that they are involved in performing the same computation. Therefore, a perspective that integrates the stimulus-locked and response-locked perspectives is crucial to disentangle not only the onset of a cognitive subprocess but also its offset.

| CONCLUSION
Despite the comparatively low time-resolution of fMRI, our findings demonstrate that the choice of reference point has a measurable influence on the sensitivity of measuring timing information with fMRI, much like other high time-resolution modalities. The novel response protocol introduced here to implement response-locking involves a minimal change to the typical experimental paradigms that present a stimulus and collect a button press. As demonstrated with a well-studied perceptual decision-making task, the increased sensitivity afforded by this small change in response protocol suggests a promising avenue for future research and application to other tasks.

This work was supported by the University of Cologne Emerging
Groups Initiative (CONNECT group) implemented into the Institu-

CONFLICT OF INTEREST
The authors declare no conflicts of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.