Dynamic Simulation and Static Matching for Action Prediction: Evidence From Body Part Priming

Authors


Correspondence should be sent to Anne Springer, University of Potsdam, Department of Sport and Exercise Psychology, Am Neuen Palais 10, D-14469 Potsdam, Germany. E-mail: anne.springer@uni-potsdam.de

Abstract

Accurately predicting other people's actions may involve two processes: internal real-time simulation (dynamic updating) and matching recently perceived action images (static matching). Using a priming of body parts, this study aimed to differentiate the two processes. Specifically, participants played a motion-controlled video game with either their arms or legs. They then observed arm movements of a point-light actor, which were briefly occluded from view, followed by a static test pose. Participants judged whether this test pose depicted a coherent continuation of the previously seen action (i.e., “action prediction task”). Evidence of dynamic updating was obtained after compatible effector priming (i.e., arms), whereas incompatible effector priming (i.e., legs) indicated static matching. Together, the results support action prediction as engaging two distinct processes, dynamic simulation and static matching, and indicate that their relative contributions depend on contextual factors like compatibility of body parts involved in performed and observed action.

1. Introduction

Humans seem to run internal simulation processes when they engage in predicting the future course of other people's actions (Blakemore & Frith, 2005; Kilner, Marchant, & Frith, 2009; Wilson & Knoblich, 2005). An increasing body of research suggests that these internal simulations run in real time (Graf et al., 2007; Parkinson, Springer, & Prinz, 2012; Sparenberg, Springer, & Prinz, 2012) and depend on motor processes (Flanagan & Johansson, 2003; Rotman, Troje, Johansson, & Flanagan, 2006; Springer et al., 2011). For instance, motor regions of the brain are involved when observers must predict the future course of observed actions (Kilner, Vargas, Duval, Blakemore, & Sirigu, 2004; Stadler et al., 2011) and eye movements during action observation are predictive rather than reactive (Flanagan & Johansson, 2003). This corresponds to the notion that sensorimotor simulations enable for extrapolating future actions (Grush, 2004; Thornton & Knoblich, 2006). Specifically, efficient visuomotor control requires the estimation of one's own body state prior to movement execution, which is based on internal forward models. These forward models allow for anticipating the sensory consequences of one's own movements in real time based on motor commands (i.e., efference copies) (Wolpert & Flanagan, 2001). They may also be applied to predict the actions observed in others (Blakemore & Frith, 2005; Kilner, Friston, & Frith, 2007; Prinz, 2006; Wolpert & Flanagan, 2001).

In accordance with this view, recent neuroscientific evidence revealed increased premotor activation when observers predicted the course of actions that were briefly occluded from vision (Stadler et al., 2011), supporting the broader notion that the motor system is used to simulate and predict others' actions (Jeannerod, 2001; see Schubotz, 2007; for a review). Interestingly, when the same observers were instructed to memorize the last action pose seen before an occlusion, right pre-SMA activation was found as a correlate of maintaining internal representations of actions (Stadler et al., 2011). Together, these findings suggest that action prediction may involve both dynamic simulations and evaluations of internally stored action images (“static matching”). Furthermore, behavioral data suggested that dynamic simulations and matching of statically maintained action representations may continuously run in parallel during action prediction (Springer & Prinz, 2010). However, although dynamic and static processes may be crucial for action prediction, these processes have hitherto not been directly investigated.

This study aimed to test whether the prediction of others' actions involves dynamic processes (i.e., dynamic simulations) and static processes (i.e., static matching) as two possible alternative processes, which may be called into play depending on the actual context within which action prediction occurs. To this end, we used body part priming. Our participants had to control a virtual ball by using either their arms or legs, yielding compatible and incompatible body part priming relative to arm movements observed in an action prediction task.

Observing actions made with different effectors (like mouth, hand, and foot) have been shown to activate the premotor cortex in a somatotopic manner (i.e., reflecting the body parts being observed) (Buccino et al., 2001; Sakreida, Schubotz, Wolfensteller, & von Cramon, 2005). Moreover, action priming effects occur in an effector-specific way (Heyes & Leighton, 2007). Action recognition improves when observers move the same effector as is involved in an observed action (Reed & Farah, 1995), while hand and foot movements are selectively primed by observing a corresponding effector in motion (Gillmeister, Catmur, Liepelt, Brass, & Heyes, 2008; cf. Wiggett, Hudson, Tipper, & Downing, 2011). This corresponds to the fundamental notion that observed and performed actions share common representational codes (Hommel, Müsseler, Aschersleben, & Prinz, 2001; Prinz, 1997), which may provide a basis for anticipating the actions of others' by mapping those actions onto one's own motor repertoire (Gallese, 2005; Jeannerod, 2001).

On the basis of the notion of internal sensorimotor activation (simulations) during the prediction of actions observed in others (Blakemore & Frith, 2005; Graf et al., 2007; Kilner et al., 2009; Wilson & Knoblich, 2005) and based on the notion of effector-specific mappings between action observation and execution (Buccino et al., 2001; Gillmeister et al., 2008; Heyes & Leighton, 2007; Reed & Farah, 1995; Sakreida et al., 2005), we predicted that priming of body parts would affect the degree to which dynamic simulation and static matching become evident during action prediction, allowing dissociation between the two proposed processes (as explained in more detail below).

To measure dynamic simulation and static matching, we adopted an action prediction task developed by Graf et al. (2007). Our participants watched sequences of point-light (PL) arm movements that were briefly occluded and then continued by the presentation of a static test pose. This test pose was either in the same orientation as the previously seen action or slightly rotated in depth. Participants judged whether it showed a spatially coherent continuation of the previous action (i.e., same or different depth angle; Graf et al., 2007). In advance of completing this task, participants played a motion-controlled computer game using either their arms or their legs, allowing us to realize compatible versus incompatible body part priming.

The hypothesis of real-time simulation (Graf et al., 2007) holds that the last visible action segment seen before occlusion is internally updated in real time during the occlusion period. To measure such real-time simulation process, two factors were manipulated independently (Fig. 1). First, the “occluder time” was varied in terms of the time interval for which an action sequence was occluded from vision (i.e., occluder times of 100, 400, and 700 ms). Hence, the occluder time was defined as the time gap between the presentation of the last visible action pose (prior to occlusion) and the presentation of the test pose (shown after occlusion), yielding an interstimulus interval of 100, 400, or 700 ms. Second, and independent of the actual “occluder time,” the “pose time” factor was varied in terms of the time gap elapsing between the last visible action pose and the actual test pose (yielding pose times of 100, 400, and 700 ms). Thus, the pose time corresponded to the progression of the action sequence made from the last-seen pose prior to occlusion to the test pose presented after the occlusion. This progression was defined by the number of frames that would have passed in real time between these two poses, irrespective of the duration of the actual occlusion period. Therefore, an increase in the pose time implies a decrease in the similarity between the last visible frame and the actual test frame.

Figure 1.

Each trial started with a fixation point (not depicted), followed by a film displaying an arm movement of a point-light figure, an occluder, and a static test pose. The test pose showed a continuation of the movement in the same or a different depth angle. The duration of occlusion (occluder time) and the temporal advance of the test pose relative to occlusion onset (pose time) varied independently (the figure is taken from Springer et al., 2011, p. 28).

According to the real-time hypothesis, performance will be best when the occluder time (OT) and the pose time (PT) correspond to each other, because the internal action representation (updated in real time) should match the actual test pose. In addition, performance should decrease as the time distance between both factors increases. The larger this difference, the more difficult it should be to assess whether the test pose shows a coherent continuation of the action. Thus, real-time simulation predicts an interaction of OT and PT that emerges as a monotonic distance function (Graf et al., 2007): Performance should be best when the temporal distance between OT and PT is zero, intermediate at a distance of +/- 300 ms (100/400; 400/700; 400/100; 700/400), and lowest at a distance of +/- 600 ms (100/700 and 700/100; cf. Fig. 1). Hence, the real-time hypothesis implies a monotonic error increase, while it does not necessarily imply a linear error increase (i.e., an increase in the time distance between OT and PT is not predicted to yield a proportional increase in errors). Furthermore, it should be noted that we use the term “real-time simulation” (specifying the timing of an assumed internal simulation process) synonymously with “dynamic simulation” and “dynamic updating.”

In addition to real-time simulation, action prediction may involve a matching process, implying that the test pose after occlusion is matched against a statically maintained representation derived from the last-seen action pose that was perceived prior to occlusion (static matching) (cf. Stadler et al., 2011). If static matching takes place, performance should decrease with increasing pose times (i.e., 100, 400, or 700 ms) irrespective of the actual duration of the occlusion period, because an increase in the pose time implies, by definition, a decrease in the similarity between the last visible action pose (shown before occlusion) and the test pose (shown after occlusion). Thus, while static matching in its pure form predicts a main effect of pose time (but no interaction of occluder time and pose time and, therefore, no monotonic distance function), real-time simulation, in its pure form, predicts a strong interaction (emerging as a monotonic distance function), but no main effect of the pose time factor.

To test whether dynamic simulation (updating) and static matching can be differentiated as two possible alternative processes, we used body part priming, allowing for manipulating the degree to which the actual task context may foster either one or the other process. In particular, after being primed with the same effectors (i.e., arms) as observed in the action sequences, participants should more strongly engage in dynamic updating than static matching, because compatible (but not incompatible) effector priming implies that recently activated internal real-time models (used for controlling one's own actions, as required in our priming task) can be mapped onto subsequently perceived (arm) actions, thus favoring the involvement of real-time simulation. If so, priming compatible effectors should reveal evidence of real-time simulation (i.e., a monotonic distance function).

On the other hand, after being primed with a different effector (i.e., legs) than was observed, internal real-time models of one's own actions may not (or only to a lesser degree) be applicable to internal real-time models of the perceived (arm) actions. Incompatible priming may even hinder dynamic simulations due to interference arising from the activation of incompatible body part representations (Hommel et al., 2001; Wilson & Knoblich, 2005). Hence, we expected that after being priming with incompatible body parts, similarity-based evaluations of statically maintained action representations may provide an efficient alternative for solving the prediction task. Put differently, priming incompatible effectors should increase the relative contribution of static matching to action prediction performance. If so, priming incompatible effectors should reveal evidence of static matching (i.e., a pose time effect).

2. Method

2.1. Participants

Thirty-six participants (18 females, age range 18–35; M = 24.36) were tested. All were right handed and had normal or corrected-to-normal vision. The participants were treated according to the Declaration of Helsinki. They signed an informed consent and were paid for their participation.

2.2. Stimuli and design

A motion-controlled video game (Camgoo Header King, Rebel Games, 2004) was used as an effector priming task. During this game, the participants looked at a display showing a soccer ball up in the sky, which moved like a real ball. They were instructed to attend to this ball and to keep it up in the air for as long as possible by moving either their arms or legs (depending on the actual priming condition). Their movements were recorded by a small webcam (placed on the monitor in front of them) and were thus visible as a pale transparent reflection in the background of the screen, facilitating the coordination of one's movements with the visual target ball. However, if the ball was missed three times in a row, the trial ended; the next trial started after 2 s. The game was played for a constant period of 5 min, yielding compatible and incompatible body part priming relative to the subsequently perceived arm movements (Fig. 1).

To capture the hypothesized action prediction processes, the “real-time simulation” paradigm by Graf et al. (2007) was used (as described in the introduction). Each trial of the prediction task started with a central fixation dot (1200 ms), followed by a point-light display depicting a person performing an arm movement (reaching out the right arm to the right side, lifting the right arm upwards over the head, reaching out the left arm to the left side, or lifting the left arm upwards over the head). After some time (action sequence duration of 1,254–1,782 ms), an occluder was presented, immediately followed by a static test pose, displayed until participants responded or for a maximum of 2,500 ms. This test pose showed a continuation of the arm movement either in the same direction as in the previous action sequence or in a different direction (i.e., the movement was continued at the same visual angle or slightly rotated in depth; Fig. 1). The task was to judge whether the test pose showed a spatially coherent continuation of the previous action. Participants were instructed to respond as quickly and as accurately as possible by pressing a pedal (“yes”/“no”) with the right or left foot, respectively. Feedback (300 ms) about response accuracy was provided. The next trial started 500 ms after the response (or aborted trial). Three independent variables were used within participants: (a) body part priming (arms vs. legs); (b) occluder time (i.e., 100, 400, or 700 ms); and (c) pose time (i.e., time gap between the last visible action pose and the test pose, again corresponding to 100, 400, or 700 ms; cf. Fig. 1).

We used point-light stimuli since they emphasize motion information while minimizing the feasibility of response strategies related to other cues (Johansson, 1973). They were recorded using a right-handed female agent in front view using seven cameras (Vicon Motion Systems Ltd., Oxford, UK) and a temporal sampling rate of 120 Hz. Trajectory data were processed using commercial software by VICON and MATLAB. The point-light display comprised 13 black dots on a gray background, located at the major joints of the body (center of the head, shoulders, elbows, wrists, and pelvis, knees, and ankles). The dots were approximately 5 mm in diameter. The actions were rendered with 30 Hz. The point-light actor was about 9 cm in height and moved within an area of 340 pixels width and 340 pixels height (about 12 cm × 12 cm) at the screen center. An occluder of the same size was rendered in white with a light green frame.

The stimuli entailed one distractor pose for every correct test pose, created by depth-rotating the correct arm posture with a rotation angle of 40° toward the observer (thus demanding a “different” response to the test pose due to the rotation after occlusion). Rotation was based on the fitting of a three-dimensional kinematic model to the arm and manipulating the shoulder angle. The “different” trials entailed two error sources, a temporal error (when occluder time and pose time diverge) and an additional spatial error (due to depth rotation of the test pose). Since we had no definite hypothesis about the interaction of these errors, we could not derive clear-cut predictions for the “different” trials. Thus, following Graf et al. (2007), we focused on “same” trials (demanding a “same” response due to continuous depth orientation of the action depicted in the test pose), allowing us to investigate the temporal factor of interest in its neat form (as “same” trials do not imply an additional spatial error).

2.3. Procedure

The experiment, lasting approximately 1.5 h, was run as a single session in a dimly lit, quiet room. First, the participants were told they should work on two different tasks, one of which dealt with playing a virtual ball, and the other with recognizing action continuations. Then, each point-light action was shown once without occlusion. To avoid front-view/back-view ambiguity, the participants were instructed that they would see the PL actor from the front. Participants then practised the interactive video game for 30 s. This was followed by 30 practice trials of the action prediction task (randomly chosen from all possible occluder time/pose time combinations; including all actions).

Afterwards, the participants played the video game for 5 min, using either their arms or legs (body part priming) according to the instructions given. This was followed by the first test phase of the prediction task. After a short break, participants played the video game again (again for 5 min), now controlling it by the effectors that had not been used previously. The order of effectors used (arms, legs) was counterbalanced across participants. Following this, participants performed the prediction task again. After completing the second test phase, they filled out a short post-experiment questionnaire concerning, for instance, possible problems during the experiment. They were then thanked, debriefed, and dismissed.

Hence, the participants completed two test phases, each comprising 144 trials (yielding 288 test trials in total), which were subdivided into three blocks of 48 trials. Following Graf et al. (2007) (Experiment 4), each block provided a constant occluder time (100, 400, or 700 ms), to avoid potential additional effects due to temporal uncertainty about the point in time when the test pose will occur. The order of blocks (with occluder times of 100, 400, or 700 ms) was counterbalanced across participants. Apart from this arrangement, the trials were randomly chosen from all possible stimulus combinations (3 pose times × 3 occluder times × 2 responses × 4 action sequences). All pose time/occluder time combinations appeared equally often. Each of the two test phases included every trial type twice. There was a short self-timed break after every block.

Participants sat approximately 82 cm from a Samsung SyncMaster 997 MB 19-inch color monitor (resolution 1,024 × 768 pixels; refresh rate 90 Hz; Samsung SyncMaster, Samsung Electronics, Seoul, South Korea). Stimulation was controlled by “Presentation” run on a 3.0 GHz PC running Windows XP (Microsoft Corporation, Redmond, Washington, USA). Since the distractors (i.e., the depth-rotated arm poses) were highly similar to the correct test poses, the overall task difficulty was high. Accordingly, data analysis focused on the error rates (cf. Graf et al., 2007). Reaction times were analyzed for correct responses only. Analyses of variance (anovas) with repeated measures were used, with degrees of freedom corrected according to the Huynh-Feldt formula (Huynh & Feldt, 1970). Post hoc paired comparisons (t-test, two-tailed) were Bonferroni corrected.

3. Results

The error rates of “same” trials (i.e., trials demanding a “same” response) were entered into a repeated measures anova with the within-subjects factors occluder time (100 vs. 400 vs. 700 ms), pose time (100 vs. 400 vs. 700 ms), and effector priming (arms vs. legs). Consistent with the real-time simulation hypothesis, a highly significant interaction of occluder time × pose time emerged, F(4, 140) = 4.69, MSE = .026, p = .005; ή² = .118. In addition, a highly significant main effect of pose time was obtained, F(2, 70) = 90.88, MSE = .171, p < .001; ή² = .722, indicating increased error rates for the shortest pose times of 100 ms (45.6%, SE = 3.3) as compared to longer pose times of 400 ms (8.0%, SE = 1.3) and 700 ms (11.2%, SE = 1.5). This error increase can be attributed to the fact that in the case of 100 ms pose times, the PL arm extension shown in the test frames had only advanced a little (i.e., by 100 ms), such that the arm was still very close to the PL actor's body. As a consequence, these test frames were clearly more difficult to judge regarding the spatial angle of the PL arm. Indeed, response accuracy for test poses implying a pose time of 100 ms did not differ from chance level, t(35) = −1.340, p = .189 (n.s.). This indicated that the participants had difficulties in responding to test frames with a 100 ms pose time and were thus guessing whether the PL arm movement shown after the occlusion was conducted with the same spatial angle as before the occlusion. Therefore, as these test frames were afflicted with a stimulus-based lack of clarity, the 100 ms pose times were excluded from all further analyses in order to rule out the possibility that mere stimulus-driven effects masks the critical pose time by occluder time interaction effect (i.e., monotonic distance function, indicating real-time simulation). The full pattern of results is provided in the Appendix (Table 1).

A three-way anova (repeated measures) was conducted for the error rates of “same” trials without the 100 ms pose times. The results again revealed a significant occluder time × pose time interaction, F(2, 70) = 4.92, MSE = .008, p < .011; ή² = .123, in accordance with the real-time hypothesis. Moreover, a significant main effect of pose time was indicated, F(1, 35)  = 5.52, MSE = .019, p = .025; ή² = .136, reflecting increased error rates for long pose times (700 ms: 11.2%, SE = 1.3) than for short pose times (400 ms: 8.0%, SE = 1.3). This finding supports static matching. Corresponding to our expectations, the pose time factor interacted significantly with effector priming, F(1, 35) = 6.68, SE = .010, p = .014; ή² = .160, and a marginally significant triple interaction of occluder time, pose time and effector priming was indicated, F(2, 70) = 2.94, MSE = .009, p = .059; ή² = .078. Furthermore, effector priming yielded a main effect, F(1, 35) = 5.45, MSE = .015, p = .025; ή² = .135, indicating more errors after incompatible than compatible body part priming (11.0%, SE = 1.5; 8.2%, SE = 1.3, respectively).

In an additional next step, we applied a procedure by Graf et al. (2007) allowing to test the occluder time × pose time interaction effect more strictly, according to the real-time hypothesis. As explained previously, the real-time hypothesis predicts that performance will be best when occluder time and pose time correspond and will deteriorate monotonically with increasing time distance between both factors (due to increased deviance of the test pose from an internal reference updated in real time). If so, a “monotonic distance function” should emerge from an occluder time by pose time interaction. To test this function, we averaged the error rates for “same” trials over the same absolute distance levels between OT and PT, yielding distances of 0, 300, and 600 ms. For example, an OT of 400 ms and a PT of 700 ms resulted in a distance of +300 ms, while an OT of 700 ms and a PT of 400 ms resulted in a distance of -300 ms; both were referred to as “distance 300 ms.” Notably, positive distances (10.5%, SE = 1.3) and negative distances (10.0%, SE = 1.6) did not differ from each other, t(35) = .35, MSE = .014, p = .73 (n.s.).

As predicted, the errors increased monotonically with increasing time distance between OT and PT (cf. Table 1). A two-way anova (repeated measures), including the within-subject factors distance (0 vs. 300 vs. 600 ms) and effector priming (arms vs. legs), confirmed a significant distance effect, F(2, 70) = 4.96, MSE = .013, p = .023; ή² = .124, exhibiting a significant linear trend in the predicted way, F(1, 35) = 7.59, MSE = .011, p = .009; ή² = .178, corresponding to a monotonic distance function. Furthermore, effector priming yielded a main effect, F(1, 35) = 7.81, MSE = .010, p = .008; ή² = .182, due to higher error rates for incompatible effector priming (12.0%, SE = 1.5) as compared to compatible effector priming (8.3%, SE = 1.3). A marginally significant interaction of distance × effector priming was indicated, F(2, 70) = 2.96, MSE = .007, p = .065; ή² = .078.

Additional two-way anovas (repeated measures) with the factors, occluder time (100 vs. 400 vs. 700 ms) and pose time (400 vs. 700 ms) were conducted for the priming conditions separately. The compatible condition (“arms”) yielded a highly significant occluder time × pose time interaction, F(2, 70) = 7.71, MSE = .007, p = .001; ή² = .180, corresponding to the real-time hypothesis (Fig. 2A, left panel). No main effect of pose time was indicated, F(1, 35)  = .232, MSE = .011, p = .63; ή² = .007 (n.s.). To further specify the observed interaction effect, we again averaged the error rates (“same trials”) over the same absolute distance levels between occluder time and pose time (i.e., distance of 0, 300, and 600 ms, respectively) (Graf et al., 2007). A one-way anova (within-subject factor distance) confirmed a significant distance effect, F(2, 70) = 4.07, MSE = .007, p = .036; ή² = .104, with a significant linear trend in the predicted way, F(1, 35) = 5.70, MSE = .006, p = .022; ή² = .140. This finding corresponds to a monotonic distance function, in accordance with the real-time simulation hypothesis.

Figure 2.

Error rates (“same trials”) plotted for the different combinations of occluder times (OT) and pose times (PT) (left column) and plotted as a function of the pose time (right column). Error bars represent the standard error of the mean. A. Upper part of the figure: Results after compatible effector priming (i.e., arms). Performance was best when OT and PT corresponded and deteriorated with increasing time difference between OT and PT, in correspondence to the real-time simulation hypothesis (left graph). No effect of the pose time was indicated (right graph). B. Lower part of the figure: Results after incompatible effector priming (i.e., legs). No interaction of OT and PT was revealed (i.e., no indication of real-time simulation) (left graph). As hypothesized, the errors increased with an increase in the pose time factor (right graph), indicating static matching of recently perceived motion was involved in action prediction.

A reversed pattern occurred after incompatible effector priming (“legs”). In this condition, the errors did not increase monotonically with increasing time distance between occluder time and pose time, and no occluder time × pose time interaction emerged, F(2, 70) = .93, MSE = .009, p = .40; ή² = .026 (n.s.). Thus, while the compatible condition revealed evidence of internal real-time simulation, the incompatible condition did not. However, in the incompatible condition, a significant main effect of pose time was revealed, F(1, 35) = 9.47, MSE = .018, p = .004; ή² = .213, reflecting increased error rates for long pose times (700 ms: 13.8%, SE = 1.9) as compared to short pose times (400 ms: 8.2%, SE = 1.5) (Fig. 2B, right panel). This finding corresponds to static matching, implying that the last visible action pose prior to occlusion is maintained and then used as an internal reference for the match with the upcoming test pose (as explained in the introduction section).

The reaction times of “same” trials (Appendix, Table 2) were analyzed analogous to the error rates. A three-way anova (repeated measures) with the within-subjects factors occluder time (100 vs. 400 vs. 700 ms), pose time (400 vs. 700 ms), and effector priming (arms vs. legs) revealed no occluder time × pose time interaction (F(2, 70) = 1.95, MSE = 5093.13, p = .15; ή² = .053; n.s.), no main effect of pose time (F(1, 35) = 3.69, MSE = 12310.65, p = .06; ή² = .095; n.s.), and no interaction effects with effector priming (pose time × effector priming: F(1, 35) = 1.47, MSE = 5136.81, p = .23; ή² = .040; triple interaction: F(2, 70) = .91, MSE = 7339.79, p = .40; ή² = .025). Furthermore, there was no main effect of effector priming (F(1, 35) = 2.56, MSE = 36327.55, p = .12; ή² = .068; n.s.). Hence, no speed-accuracy-trade-offs were indicated (i.e., the RT pattern was not inverse to the error pattern).

4. Discussion

We hypothesized that two different processes, dynamic updating and static matching, can be used for action prediction depending on the context within which action prediction takes place (cf. Stadler et al., 2011). Therefore, we expected to find both a monotonic distance function (indicating dynamic updating in real time) and a pose time effect (indicating static matching). We used body part priming as a means of manipulating the degree to which the task context invites either one or the other process. Hence, we expected body part priming to interact with the two critical effects.

The results revealed evidence supporting our hypotheses. In particular, after an exclusion of trials afflicted with ambiguous test poses, our findings revealed both a monotonic distance function and a pose time effect when participants were asked to indicate whether observed actions continued with coherent spatial angle after visual occlusions. These results emerged, although the experimental task neither required any explicit judgments about the timing of the observed actions nor about the similarity between the action poses shown before and after occlusions, ruling out the possibility that potential real-time effects and similarity-based matching effects were merely due to task instructions.

Moreover, crucially, both effects were modulated by priming of body parts. While priming compatible effectors yielded a monotonically increasing distance function, corresponding to real-time simulation, priming incompatible effectors clearly did not. More precisely, in the compatible condition, accuracy in an action prediction task was best when the duration of occlusion matched the actual test pose shown after the occlusion, indicating an internal model of the observed action was updated in real time, thus matching the actual test pose (Graf et al., 2007). In addition, response accuracy decreased monotonically with increasing time difference between the duration of occlusion and the actual test pose (i.e., monotonic distance function), corresponding to an increase in the time difference between an internal real-time model and the actual action outcome shown in the test pose. Hence, the findings of the compatible condition support real-time simulation (dynamic updating).

On the other hand, in the incompatible priming condition, evidence of real-time simulation was lacking (i.e., the duration of occlusion did not interact with the actual action progress shown in the test pose; cf. Fig. 2B, left panel). In this condition, however, response accuracy decreased with an increase in the pose time factor, implying a decrease in the similarity between the last visible action pose seen prior to occlusion and the test pose seen after occlusion―irrespective of the actual duration of the occlusion period. Hence, after being primed with incompatible effectors, participants were more accurate in an action prediction task when the test pose was more similar to the most recently perceived action pose (seen prior to occlusion), and accuracy decreased with decreasing similarity between these two action poses (pose time effect). This effect cannot be explained by internal real-time updating of the last perceived action image (as explained in the introduction section). It supports our assumption that static matching is involved. Specifically, instead of being matched against dynamically (real-time) updated representations, test poses may alternatively be matched against statically maintained representations derived from the most recently perceived action pose, which are maintained and then used as a static reference for the match with an upcoming action event.

The present findings are consistent with previous evidence showing action simulation and imitation can be tuned by motor priming (Liepelt, Prinz, & Brass, 2010) and effector-specific mappings (Gillmeister et al., 2008). According to the fundamental principle of “common codes” (Hommel et al., 2001; Prinz, 1997, 2002), the same underlying representations are involved in performing and perceiving an action. These shared representations may act to support internal simulations of other people's actions, by mapping these actions onto one's own motor vocabulary (Gallese, 2005; Jeannerod, 2001). With respect to the present data, our participants may have mapped the (sensorimotor) representations used for action execution to solving the prediction task (Jeannerod, 2001; Knoblich, Seigerschmidt, Flach, & Prinz, 2002). If so, using a compatible (but not incompatible) effector should aid action prediction (Reed & McGoldrick, 2007) and should foster internal real-time simulation (Springer et al., 2011) due to effector compatibility. This is what our data show. Action prediction was more accurate after compatible than incompatible body part priming. Moreover, crucially, after compatible (but not incompatible) effector priming, a monotonic distance function emerged, taken to reflect internal real-time simulation (Graf et al., 2007).

On the other hand, after being primed with incompatible effectors, representations of one's own actions may have not (or only less efficiently) been applicable to an internal real-time updated model of the observed actions. Effector incompatibility may even prevent internal simulations due to interference arising from the activation of incompatible body parts involved in performed and observed actions (Hommel et al., 2001; Prinz, 1997; Wilson & Knoblich, 2005). Hence, observers may preferentially rely on static matching (i.e., matching internally stored action images, without the involvement of (possibly conflicting) internal real-time models). In line with this view, perception biases associated with internal (sensorimotor) simulations (e.g., judging objects as nearer to the body than they actually are) disappear when participants perform actions like reaching for an object (Witt & Proffitt, 2008).

Together, our results accord with previous evidence indicating action prediction involves internal real-time simulation (Graf et al., 2007; Parkinson et al., 2012; Sparenberg et al., 2012). Moreover, the present findings allow us to differentiate between two distinct processes, which characterize the prediction of action: dynamic updating (corresponding to real-time simulation) and static matching. Hence, our findings may provide important insights into the detailed cognitive processes underlying action prediction.

However, we wish to point out that our results do not allow for making any claim about the representational format in which these processes occur (e.g., simulation/matching in the visual and/or motor domain). Further, it is possible that the prediction of occluded actions may rely not only on one domain but alternating or simultaneously involves processes in different domains (e.g., visually driven static matching and motor-driven dynamic simulation). This issue needs to be adressed in future studies by including, for instance, both visual and motor priming. Likewise, using verbal responses to an action prediction task may be insightful, possibly leading to stronger priming effects relative to action responses. Furthermore, showing visual actions done with different effectors (like arms, legs, and mouth) should enable to examine whether action prediction processes are specific to the prediction of arm movements (as used in this study).

Interestingly, when averaging across the compatible and incompatible priming conditions, we obtained both a monotonic distance function (indicating real-time simulation) and a pose time effect (indicating static matching). Hence, the relative contributions of dynamic and static processes may depend on the actual context of action perception. If action representations that were recently accessed can be mapped onto the actions perceived in another individual due to common representational grounds (i.e., due to effector compatibility), internal simulation may be favored. On the other hand, if recently accessed action representations are not (or only to a lesser degree) applicable to internal forward models of perceived actions (due to effector incompatibility), real-time simulation may be constrained (Prinz, 1997, 2002; Wilson & Knoblich, 2005). As a result, incompatible effector priming fosters static matching as an alternative process for solving the prediction task (as explained previously).

In accordance with this view, our participants were generally more accurate to predict the occluded actions after compatible than incompatible body part priming. This finding may suggest that real-time simulation yielded, overall, more precise predictions than static matching. This view corresponds to the broader notion that internal (predictive) simulations involve the observer's own motor repertoire (Blakemore & Frith, 2005; Kilner et al., 2009; Wilson & Knoblich, 2005) and that action observation recruits the motor system in a somatotopic way (Buccino et al., 2001; Decety & Grèzes, 1999, 2006; Sakreida et al., 2005). Furthermore, observing the start and middle phases of action sequences coincides with higher motor facilitation as compared to observing the final postures of the corresponding actions (Urgesi et al., 2010), highlighting the idea that parts of the human motor system are preferentially activated by an anticipatory simulation of actions observed in others (Blakemore & Frith, 2005; Kilner et al., 2007; Stadler et al., 2011).

Acknowledgments

We thank Martin Giese for providing us with the point-light videos used in this study.

Appendix

Table 1. Error rates [%]
 Pose time
100 ms400 ms700 ms 
  1. Numbers in parentheses are standard errors of the means.

Effector Priming Arms
Occluder time100 ms37.8 (4.3)7.0 (1.5)9.8 (2.1)18.2 (1.6)
400 ms47.6 (4.1)6.4 (1.6)11.3 (2.5)21.8 (1.8)
700 ms51.2 (4.4)10.1 (2.0)4.6 (1.2)22.0 (1.6)
 45.5 (3.6)7.9 (1.4)8.6 (1.5) 
Effector Priming Legs
Occluder time100 ms42.4 (4.4)7.7 (1.8)15.8 (2.5)22.0 (1.6)
400 ms46.9 (4.7)7.0 (2.2)11.1 (2.5)21.7 (2.0)
700 ms48.0 (3.5)9.8 (1.7)14.4 (2.4)24.1 (1.6)
 45.7 (3.3)8.2 (1.5)13.8 (1.9) 
Overall
Occluder time100 ms40.1 (3.9)7.4 (1.4)12.8 (1.9)20.1 (1.4)
400 ms47.3 (3.9)6.7 (1.6)11.2 (2.1)21.7 (1.6)
700 ms49.6 (3.3)10.0 (1.6)9.5 (1.5)23.0 (1.4)
 45.6 (3.3)8.0 (1.3)11.2 (1.5) 
Distances (inclusive of pose time 100 ms)0 ms300 ms600 ms 
Effector Priming Arms 16.2 (1.5)19.0 (1.6)30.5 (2.5)21.9 (1.5)
Effector Priming Legs 21.3 (2.1)18.9 (1.6)31.9 (1.7)24.0 (1.3)
  18.7 (1.7)18.9 (1.4)31.2 (1.7) 
Distances (exclusive of pose time 100 ms)
Effector Priming Arms 5.5 (1.0)9.5 (1.5)9.8 (2.1)8.3 (1.3)
Effector Priming Legs 10.7 (2.0)9.5 (1.5)15.8 (2.5)12.0 (1.5)
  8.1 (1.3)9.5 (1.4)12.8 (1.9) 
Table 2. Reaction times (RTs) (ms)
 Pose time
0 ms300 ms600 ms 
Effector Priming Arms
Occluder time100 ms845.4 (40.5)734.3 (28.2)743.2 (34.5)774.3 (30.8)
400 ms823.9 (49.6)729.0 (28.5)768.6 (29.9)773.9 (31.8)
700 ms910.5 (45.6)764.2 (34.4)752.1 (31.5)808.9 (30.4)
 860.0 (36.4)742.5 (27.2)754.6 (29.4) 
Effector Priming Legs
Occluder time100 ms737.9 (59.5)766.7 (35.2)778.9 (37.5)761.2 (35.0)
400 ms836.4 (51.9)754.6 (35.8)794.3 (35.8)795.1 (32.6)
700 ms937.0 (44.3)769.3 (33.5)803.8 (38.3)836.7 (34.3)
 837.1 (34.7)763.5 (32.3)792.4 (34.3) 
Overall
Occluder time100 ms791.7 (38.5)750.5 (29.9)761.1 (33.8)767.7 (29.0)
400 ms830.1 (47.7)741.8 (28.6)781.5 (29.9)784.5 (30.3)
700 ms923.8 (36.4)766.7 (32.2)778.0 (31.6)822.8 (29.5)
 848.5 (31.8)753.0 (28.3)773.5 (30.3) 
 0 ms300 ms600 ms 
  1. Numbers in parentheses are standard errors of the means.

Distances (inclusive of pose time 100 ms)   
Effector Priming Arms775.5 (28.7)772.7 (29.9)826.9 (31.7)791.7 (28.1)
Effector Priming Legs765.5 (35.3)791.7 (30.3)858.0 (36.4)805.0 (30.8)
 770.5 (28.2)782.2 (28.9)842.4 (30.1) 
Distances (exclusive of pose time 100 ms)
Effector Priming Arms740.6 (27.2)755.7 (28.3)743.2 (34.5)746.5 (28.5)
Effector Priming Legs779.2 (34.7)776.8 (32.5)778.9 (37.5)778.3 (33.1)
 759.9 (28.7)766.2 (28.9)761.1 (33.8) 

Ancillary