should be sent to Richard P. Cooper, Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK. E-mail: firstname.lastname@example.org
Automatic imitation or “imitative compatibility” is thought to be mediated by the mirror neuron system and to be a laboratory model of the motor mimicry that occurs spontaneously in naturalistic social interaction. Imitative compatibility and spatial compatibility effects are known to depend on different stimulus dimensions—body movement topography and relative spatial position. However, it is not yet clear whether these two types of stimulus–response compatibility effect are mediated by the same or different cognitive processes. We present an interactive activation model of imitative and spatial compatibility, based on a dual-route architecture, which substantiates the view they are mediated by processes of the same kind. The model, which is in many ways a standard application of the interactive activation approach, simulates all key results of a recent study by Catmur and Heyes (2011). Specifically, it captures the difference in the relative size of imitative and spatial compatibility effects; the lack of interaction when the imperative and irrelevant stimuli are presented simultaneously; the relative speed of responses in a quintile analysis when the imperative and irrelevant stimuli are presented simultaneously; and the different time courses of the compatibility effects when the imperative and irrelevant stimuli are presented asynchronously.
Automatic imitation is of interest because it implies that humans tend to copy the actions of others without intending to do so. It also lies at the intersection between important recent developments in cognitive neuroscience and experimental social psychology. The former relate to the discovery of “mirror neurons” or the “mirror neuron system”; areas of the premotor and inferior parietal cortex that are active during the execution of specific actions, and during passive observation of the same actions (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Gallese, Gernsbacher, Heyes, Hickok, & Iacoboni, 2011; Iacoboni, 2009). The latter developments provide evidence that, in everyday life, people engage in a great deal of spontaneous “motor mimicry,” and that this activity has a major impact on cooperative attitudes and behavior (see Chartrand & Van Baaren, 2009; for a recent review). Many researchers believe that automatic imitation is mediated by the mirror neuron system (Longo, Kosobud, & Bertenthal, 2008), and that motor mimicry is automatic imitation “in the wild,” that is, motor mimicry and automatic imitation are the same psychological phenomenon, with the former being the manifestation of that phenomenon when detected under naturalistic conditions and the latter being its manifestation in tightly controlled experimental conditions (Chartrand & Van Baaren, 2009). If this is correct, understanding the psychological processes mediating automatic imitation could cast light on the cognitive functions of the mirror neuron system, and on the mechanisms that allow imitating and being imitated to promote prosocial behavior (Heyes, 2011).
The present study attempts to further our understanding of the psychological processes mediating automatic imitation by asking whether those processes are the same as, or distinct from, the processes mediating spatial compatibility effects. The spatial compatibility effects that are most directly comparable with automatic imitation occur when a task-irrelevant stimulus facilitates the performance of a response in the same relative position and/or interferes with the performance of a response in a different relative position. For example, when participants are asked to press a right key or a left key in response to color cue (red or blue; the “task-relevant” stimulus dimension) appearing on the right or left of a computer screen (the “task-irrelevant” stimulus dimension), correct responses are initiated faster when the location of the color cue is response-compatible (e.g., right stimulus and right response) than when the location of the color cue is response-incompatible (e.g., left stimulus and right response; Simon, 1969).
A number of studies have shown that automatic imitation effects are not reducible to spatial compatibility effects; that is, automatic imitation is not merely a type of spatially compatible responding in which the spatial stimulus happens to be presented in the form of a body movement (e.g., Bertenthal, Longo, & Kosobud, 2006; Brass et al., 2001; Catmur & Heyes, 2011; Chong, Cunnington, Williams, & Mattingley, 2009; Heyes, Bird, Johnson, & Haggard, 2005; Leighton & Heyes, 2010; Press, Bird, Walsh, & Heyes, 2008). For example, the hand opening/closing automatic imitation effect, first reported by Stuermer et al. (2000), persists when the stimulus movements are presented in a plane orthogonal to that of the response movements, thereby controlling for left-right and up-down spatial compatibility (Heyes et al., 2005). Similarly, a study varying the anatomical identity of the stimulus hand, and response hemispace, has confirmed that simple orthogonal spatial compatibility (e.g., a tendency to respond to up stimuli with right responses—Weeks & Proctor, 1990), and complex orthogonal spatial compatibility (e.g., a tendency to respond to up stimuli with right responses in right hemispace, and to down stimuli with right responses in left hemispace—Cho & Proctor, 2004) do not contribute to automatic imitation effects in the hand opening/closing procedure (Press et al., 2008). Results of this kind indicate that, whereas spatial compatibility depends on the position of the response relative to an external frame of reference, automatic imitation depends on the configural or topographical properties of body movements—the way in which parts of the body move relative to one another.
It has been suggested that automatic imitation and spatial compatibility effects not only depend on different stimulus variables but are also mediated by distinct mechanisms. This suggestion was prompted by the results of a study in which participants responded with a tapping movement of the index or middle finger of their right hand to stimuli depicting index and middle finger tapping movements of a model’s right or left hand (Bertenthal et al., 2006). In one experiment (3a), assessing spatial compatibility, participants were instructed to imitate the finger used by the model (i.e., to move their index finger when they saw an index finger movement, and to move their middle finger when they saw a middle finger movement). In alternating blocks, the finger movement stimulus appeared in the same relative position as the correct response (spatially compatible trials, e.g., left stimulus and left response) and in the opposite relative position (spatially incompatible trials, e.g., right stimulus and left response). In another experiment (3b), assessing automatic imitation, participants were instructed to respond with a spatially congruent finger (i.e., to move their left finger when they saw a finger movement on the left, and to move their right finger when they saw a finger movement on the right). In alternating blocks, the finger movement stimulus matched the correct response (imitatively compatible trials, e.g., index finger stimulus and index finger response) and did not match the correct response (imitatively incompatible trials, e.g., middle finger stimulus and index finger response). The results indicated that the spatial compatibility effect (i.e., the difference in RT between spatially incompatible and spatially compatible trials) observed in Experiment 3a (41 ms) was greater than the imitative compatibility effect (i.e., the difference in RT between imitatively incompatible and imitatively compatible trials) observed in Experiment 3b (9 ms), and that the spatial compatibility effect was more persistent. Specifically, the spatial compatibility effect was present throughout the 20-trial blocks, whereas the automatic imitation effect disappeared after the first five trials. These differences in the magnitude and persistence of the spatial compatibility and automatic imitation effects led Bertenthal et al. to suggest that automatic imitation is mediated by a process that can be more readily inhibited, and that is less dependent on learning (Brass, Derrfuss, Matthes-von Cramon, & von Cramon, 2003), than the process that mediates spatial compatibility.
In a similar study, Catmur and Heyes (2011) used a procedure in which the stimulus and response movements were abductions of the index and little fingers, and all four trial types (spatially compatible and imitatively compatible; spatially compatible and imitatively incompatible; spatially incompatible and imitatively compatible; spatially incompatible and imitatively incompatible) were presented in random order within each block. The advantage of trial mixing was that it prevented participants from using strategies of a kind that could mask automatic imitation; for example, blurring one’s vision to avoid processing of stimulus finger identity (e.g., index versus middle). Consistent with the hypothesis that strategic factors contributed to the transient nature of the automatic imitation effect reported by Bertenthal et al., Catmur and Heyes (Experiment 1) found that neither their automatic imitation effect nor their spatial compatibility effect declined in the course of the experiment.
However, consistent with the results of the previous study, Catmur and Heyes also found larger spatial compatibility than automatic imitation effects in both of their experiments (Experiment 1: 41 ms vs. 19 ms; Experiment 2: 34 ms vs. 14 ms), and a difference in the time courses of the two effects. Dividing trials into five RT bins, they found that the automatic imitation effect increased with RT, whereas the spatial compatibility effect remained constant (Experiment 1). Thus, and in contrast to the spatial compatibility effect, the automatic imitation effect was smaller in trials where participants responded quickly than in trials where participants responded more slowly. Furthermore, whereas the spatial compatibility effect was present when the task-relevant or “imperative” stimulus was presented before (−160 ms and −80 ms), simultaneously with (0 ms), and after (80 ms and 160 ms) the task-irrelevant action stimulus, the automatic imitation effect was evident only in the simultaneous and delayed conditions (Experiment 2). Both of these results indicate that, within a trial, spatial compatibility has a more immediate effect on motor selection than imitative compatibility.
Although their results were similar in certain respects to those of Bertenthal et al. (2006), Catmur and Heyes (2011) argued that automatic imitation and spatial compatibility are mediated by processes of the same kind—processes that are equally subject to attentional control and highly dependent on learning. According to this view, both automatic imitation and spatial compatibility are mediated in the manner described by “dual route” models of stimulus–response compatibility (SRC). These assume that responses can be activated via two distinct routes: an intentional (or “conditional” or “controlled”) route, and an automatic (or “unconditional”) route (Proctor & Vu, 2006). Once it has been identified through perceptual analysis, the task-relevant stimulus activates the correct response via the intentional route. This route is often modelled as a short-term stimulus–response (S–R) connection; an excitatory link between a stimulus (or “sensory”) representation (or “code”) and a response (or “motor”) representation (or “code”), which is established on the basis of task instructions, and held in short-term memory for the duration of the task (Barber & O’Leary, 1997; Zorzi & Umiltà, 1995). In addition, the task-irrelevant stimulus or stimulus dimension activates a similar or “corresponding” response via the automatic route. This route is typically modeled as a long-term S–R connection; an excitatory link between a stimulus representation and a response representation, and held in long-term procedural memory. If the intentional and automatic routes activate the same response representation (compatible trials), then the automatic route facilitates response selection and the correct response is executed rapidly. However, if the two routes activate different response representations (on incompatible trials), then facilitation is absent and responding is slower. Moreover, if the different response representations are mutually exclusive, then activation of the correct response via the intentional route must overcome competing excitation of the incorrect response via the automatic route. This takes extra time, and therefore responding is slower still. The alternative to this dual route framework implicates a distinct system which operates according to its own distinctive principles, in automatic imitation but not in spatial compatibility.
The purpose of the present study is to examine more closely, through computational modelling, whether the findings reported by Catmur and Heyes are consistent with the dual route framework, that is, to determine whether the results can be accounted for without appeal to a system specialized for automatic imitation. If we assume that, rather than being processed by distinctive mechanisms, movement topography (automatic imitation) and relative position (spatial compatibility) merely activate different sets of input nodes in a dual route architecture, how is it possible to explain the differential magnitude and time courses of automatic imitation and spatial compatibility effects?
1. A model of automatic imitation and stimulus–response compatibility
1.1. General architecture of the model
Previous computational studies of both stimulus–response compatibility (e.g., Boyer, Scheutz, & Bertenthal, 2009; Cohen, Servan-Schreiber, & McClelland, 1992; Zhang, Zhang, & Kornblum, 1999; Zorzi & Umiltà, 1995) and automatic processing (e.g., Cohen & Huston, 1994) have demonstrated that many empirical effects can be accurately modeled within an interactive activation framework. Within this framework (see McClelland, 1993), stimulus and response codes are modeled as activity over a set of sensory and motor nodes, where each node has an activation value that varies over time (e.g., between zero and one). When a stimulus is presented, it excites corresponding sensory nodes, including those that encode both task-relevant and task-irrelevant stimulus dimensions. The sensory nodes activate motor nodes via weighted connections. Typically, activation of one motor node will increase over time. When that motor node’s activation exceeds a threshold, the corresponding response is produced.
As discussed above, in the task used by Catmur and Heyes (2011) the stimuli consisted of images showing abductions of the index and little fingers. The orientation of the stimulus image meant that the location of either of these movements could be to the left or the right. Superimposed on this image was the imperative stimulus—a disk, the color of which indicated whether the response should be an abduction of the index finger or of the little finger. In order to model this task within the interactive activation framework, we therefore assume the architecture shown in Fig. 1.
The architecture consists of two pairs of stimulus nodes corresponding to the task-irrelevant dimensions (sensory nodes for location of movement and finger identity), one pair of stimulus nodes corresponding to the task-relevant stimulus (imperative nodes), and one pair of motor nodes corresponding to the two possible responses. In addition, excitatory links between task-irrelevant stimulus nodes and compatible motor response nodes are assumed. There are four such links: between the sensory node encoding movement of the index finger and the node encoding an index finger motor response; between the sensory node encoding movement of the little finger and the node encoding a little finger motor response; between the sensory node encoding movement on the left and the node encoding an index finger motor response (since an index finger response corresponds to movement on the left of the hand being used to produce the response); and between the sensory node encoding movement on the right and the node encoding a little finger motor response. These four connections correspond to the automatic route described above. The intentional route is similarly modeled by connections between stimulus nodes and motor response nodes, though in this case the relevant stimulus nodes are those encoding the imperative stimulus dimension.
Given that spatial compatibility effects are generally stronger than imitative compatibility effects, we assume that the strength of the connections from nodes encoding the spatial properties of stimuli to compatible responses are stronger than the equivalent links between imitative aspects of stimuli and responses. We assume these are “long-term” connections, strengthened through general learning principles supported by relatively frequent exposure to events in which a stimulus and the compatible response co-occur. Such connections within the automatic route contrast with short-term connections between nodes encoding the imperative stimulus dimension and task-relevant responses. Following Zorzi and Umiltà (1995) and Barber and O’Leary (1997), we assume that these links are established by the subject in response to task instructions and practice trials, and that they are stronger than those in the automatic route.
We make two additional assumptions. First, that following presentation of a stimulus excitation of task-irrelevant stimulus nodes is transitory while excitation of task-relevant stimulus nodes is sustained throughout a trial (Hommel, 1994). Second, the time course of excitation of sensory nodes varies as a function of the complexity of processing each stimulus dimension, with identification of movement location occurring earlier than identification of the finger being moved. Thus, automatic excitation of left or right spatial nodes in response to a stimulus is assumed to occur relatively earlier following presentation of that stimulus than automatic excitation of the appropriate stimulus finger identity node.
1.2. Mathematical details and parameters of the model
More formally, we assume that each node i has an activation value, ai, that varies as a function of time according to the activation accumulation function of Cohen and Huston (1994), namely:
where t is the time, Ii(t) is the net input to node i at time t, ρ is a parameter that controls the degree to which current activation persists (or equivalently, the rate at which current activation decays), and σ is the logistic or sigmoid function that maps all inputs to the range zero to one.
Each node is assumed to have a “resting” or base-line activation level to which it tends in the absence of any excitatory or inhibitory input. This resting level is determined by a parameter βi, the bias of node i. Consistent with previous work (e.g., Cohen & Huston, 1994), we assume that sensory and imperative nodes have a bias of −2 and that motor nodes have a bias of −6. This ensures that all nodes have resting activations between 0.20 and 0.40 (see Fig. 1). 1
The net input Ii(t) to motor node i at time t is the node’s bias, plus the sum of weighted excitation or inhibition to the node from other nodes, plus normally distributed noise. That is:
where βi is the bias as described above, wji is the weight or associative strength of the connection from node j to node i, and η is the standard deviation of noise added on each processing cycle. For stimulus nodes, net input is given by:
where Ei is a fixed positive value if the stimulus dimension is exciting the node and zero otherwise.
With the exception of the simulation studies below where associative strengths are explicitly manipulated, the simulations reported here assume wji is set to 8.0 for links from spatial stimulus nodes to compatible response nodes, 4.0 for links from finger identity stimulus nodes to compatible response nodes, and 10.0 for links from imperative nodes to corresponding response nodes. η, the standard deviation of noise, is assumed to be 2.0 for all nodes. Table 1 details all model parameters.
Table 1. Parameters that govern model behavior and their default values in the stimulations reported here
Input excitation to stimulus nodes when the relevant stimulus dimension is present (see Eq. 3)
Ei = 5 (all input nodes)
Bias added to weighted input of each node (see Eqs. 2 and 3)
Processing on any trial proceeds as follows. Activations of all nodes are initialized to values based on their bias. Thus,
The activations of all nodes are then updated according to Eqs. (1)–(3) for 500 cycles so that all activations approach their resting level.2 The stimulus (e.g., an index finger abduction on the right together with a red imperative stimulus indicating a left/index finger response) is then presented and the corresponding task-irrelevant and task-relevant stimulus nodes receive strong excitation. Two parameters control the time-course of excitation of task-irrelevant nodes relative to excitation of the task-relevant node: δs is the number of processing cycles between presentation of excitatory input to the task-relevant (i.e., imperative) node and presentation to the appropriate task-irrelevant spatial node, while δi is the number of cycles between presentation of excitatory input to the task-relevant (i.e., imperative) node and presentation to the appropriate task-irrelevant finger identity (i.e., imitative) node.
The assumption concerning the transitory nature of excitation of task-irrelevant stimulus nodes is implemented by assuming that, when a task-irrelevant input dimension is present, the corresponding node receives strong input (Ei = 5.0 in Eq. (3)) until the node’s activation level reaches a threshold, τs (of 0.80 in the simulations reported here). Excitation of the node then ceases and the node’s activation decays back to its resting level following Eq. (1) with Ii(t) equal to the node’s bias value plus noise. In contrast, on presentation of an imperative stimulus the corresponding imperative node receives sustained excitation (again Ei = 5.0 in Eq. (3)). This level of excitation continues until a response is produced. We assume that the parameters which govern the transitory nature of task-irrelevant input are independent of input dimension (spatial or imitative), reflecting the hypothesis of Catmur and Heyes (2011) that the mechanisms underlying spatial and imitative compatibility effects do not differ in kind.
Following presentation of a stimulus, processing continues with the activations of all nodes updated on each time step until a motor response node’s activation exceeds a threshold τr of 0.80 or the trial times out (i.e., no response is produced within 2,000 cycles). RT is assumed to be a linear function of the number of cycles between stimulus presentation and response.
The value of one parameter—persistence—remains to be discussed. The effect of manipulating this parameter is to change the rate of increase or decrease of activation per processing cycle of a node given a constant net input. In the simulations reported here the persistence parameters of task-irrelevant sensory nodes and other nodes were set independently with two goals in mind, namely to produce behavior where cycle time may be taken to be approximately 1 ms, and to produce SRC effects qualitatively similar to those of human participants. Specifically, ρ for task-irrelevant sensory nodes was set to 0.960 while ρ for imperative and motor nodes was set to 0.990. As discussed below (see Fig. 3), these parameter values produce a spatial compatibility effect of approximately 43 cycles (comparable to the 41 ms effect noted by Catmur & Heyes, 2011; Experiment 1) and an imitative compatibility effect of approximately 18 cycles (comparable to the 19 ms effect noted by Catmur & Heyes, 2011, Experiment 1).
2. Simulation results
2.1. Operation of the model on a single trial
Fig. 2 shows the activation profiles over time of nodes during a typical spatially incompatible, imitatively compatible trial. In this example, excitation of the imperative node (dark line, upper panel) begins at cycle 500. Activation then gradually increases until the trial terminates when a response is produced on cycle 683. Excitation of the irrelevant spatial sensory node (light line, middle panel) begins 10 cycles before that of the imperative node (δs was −10). It rises to threshold within about 20 cycles and then falls back to rest. Also shown in the middle panel is excitation of the irrelevant imitative sensory node (dark line), which begins 80 cycles after presentation of the imperative node (δi was +80). As shown in the lower panel, activation of the two motor nodes settles to a base-line level of approximately 0.35 after 200 cycles. Activation of both motor response nodes stays around this level until the increasing activation of the spatial sensory node begins to percolate to the motor response nodes near cycle 500. Activation of the spatially compatible, but incorrect, motor response node (light line, lower panel) briefly rises, reflecting activity of the irrelevant spatial sensory node, before falling back and being over-taken by activation of the correct motor response node (dark line, lower panel), whose activation is boosted by excitation from the imperative node as well as excitation from the imitatively compatible sensory node.
2.2. Simulation of basic compatibility effects
The most basic effects of interest, shown initially by Bertenthal et al. (2006) and replicated by Catmur and Heyes (2011), are the separable imitative compatibility and spatial compatibility effects on response time. As shown in Fig. 3, the model captures these basic effects, with time to respond to the imperative stimulus (measured in cycles) being longer in spatially incompatible trials than spatially compatible trials, and longer in imitatively incompatible trials than imitatively compatible trials. Moreover, the compatibility effect is approximately twice as large in the spatial modality compared to the imitative modality, as found by Catmur and Heyes (2011), Experiment 1.
Catmur and Heyes (2011) found not only main effects of spatial and imitative compatibility but also no interaction between these factors. This lack of interaction is replicated by the model with the parameter values given above. However, given the underlying mathematical formulation, it is not implied by the model. Indeed, if the activation spikes of task-irrelevant spatial and imitative sensory nodes were to overlap (cf. Fig. 2, middle panel), then an interaction between spatial compatibility and imitative compatibility would be produced. We return to this point in the General Discussion, as some authors (e.g., Press, Gherri, Heyes, & Eimer, 2010) have reported such interactions.
A second critical finding of Catmur and Heyes (2011) was revealed by their “quintile analysis.” This analysis was conducted in order to investigate the time course of the compatibility effects, following Ratcliff (1979). The basic rationale is that, given natural variability in the speed of production of responses across trials, if either compatibility effect arises relatively late in processing, then it should be less evident in those responses that are produced quickly compared to those that are produced more slowly. Thus, for each trial type and each participant, RTs were ordered by speed and divided equally into five “bins” (1 = fastest to 5 = slowest). Compatibility effects were then calculated for each bin. Results, shown in Fig. 4 (left panel), were subjected to a two-way within-subjects anova, which revealed a main effect of compatibility modality (with the spatial compatibility effect being greater than the imitative compatibility effect) and an interaction of response speed and modality. Analysis of simple effects showed that while the spatial compatibility effect appeared to decrease with increasing RT, this decrease was not significant. In contrast, the imitative compatibility effect increased significantly with increasing RT. Following the logic of the analysis, this suggests that imitative compatibility effects arise relatively late in processing compared with spatial compatibility effects.
The quintile analysis was performed with the results of the model (see Fig. 4, right panel). Similar effects were obtained, with the imitative compatibility effect increasing with increasing RT. The fit to participant performance was very good, with a root mean square error of 3.16 ms over the 10 data points (R2 = 0.95).
One criticism that might be leveled at this simulation is that the model has many parameters. It may be possible to simulate a range of behaviors with the model by judicious choice of parameter values, in which case the model would be of little theoretical interest (cf. Roberts & Pashler, 2000). In order to explore this possibility, two parameter variation studies were performed.
Parameter variation study 1a examined model performance across a wide range of values of the two delay parameters (δs and δi). All parameters were set to their default values (as specified in Table 1) with the exception of the delay parameters. δs (default value of −10 cycles) was set to values ranging from −100 cycles to +100 cycles at intervals of 10 cycles, while δi (default value of +80 ms) was set to values ranging from 0 cycles to +200 cycles at intervals of 10 cycles, giving 441 points in parameter space. The model was then run 100 times for each parameter setting, using the Express software (Yule & Cooper, 2003) to manage the exploration of the parameter space and to collate results. On each model run the quintile analysis was performed on the resulting RTs and goodness-of-fit measures (root mean square error and R2) were calculated between the model’s behavior and that of the participants of Catmur and Heyes (2011). Mean values of the goodness-of-fit measures are shown in Fig. 5 (upper panel). As is clear from the figure, the fit of the model to subject performance is good across a large region of the space, with root mean square error of less than 6 msec for spatial delays of −100 to +10 cycles and imitative delays of +20 to +110 cycles and R2 of greater than 0.9 for spatial delays of −100 to +10 cycles and imitative delays of +70 to +160 cycles.
Parameter variation study 1b examined model performance when the strength, w, of spatial-to-motor and imitative-to-motor associations was varied. Following the same procedure as used in parameter variation study 1a, the strength of spatial-to-motor associations (default value of 8.0) was varied from 6.0 to 10.0 at intervals of 0.2 while the strength of imitative-to-motor associations (default value 4.0) was varied from 2.0 to 6.0 at intervals of 0.2, giving another 441 points in parameter space. The model was again run 100 times for each parameter setting and the quintile analysis was again performed on the resulting RTs. Goodness-of-fit measures (root mean square error and R2) were again calculated between the model’s behavior and that of the participants of Catmur and Heyes (2011). Mean values of these measures are shown in Fig. 5 (lower panel). Once again, as is clear from the figure, the fit of the model to subject performance is good across a large region of the space, with root mean square error of less than 6 ms for spatial-to-motor associations of approximate strength 6.4–9.4 and imitative-to-motor associations of approximate strength 2.4–5.2 and R2 of greater than 0.9 for spatial-to-motor associations of strength 6.0–8.4 and imitative-to-motor associations of strength 2.8–6.0.
These parameter variation studies thus demonstrate that the model produces a good fit to the behavioral data over a wide range of parameter values. Critically, the fits in the upper panel of Fig. 5, where delay is varied, depend upon spatial-to-motor associations being substantially greater than imitative-to-motor associations (a basic assumption of the model that is justified above in the General Architecture of the Model), and the fits in the lower panel of Fig. 5, where association strength is varied, depend upon the spatial delay being much less than the imitative delay (again a basic assumption that is justified in the General Architecture of the Model).
In a second investigation of the time course of compatibility effects, Catmur and Heyes (2011) varied the time of onset of the imperative stimulus and the irrelevant movement stimulus. Eight participants were tested with five levels of offset: with the imperative stimulus presented 160 ms before, 80 ms before, coincident with, 80 ms after, and 160 ms after the movement. Results, shown in Fig. 6 (left panel), indicate a significant spatial compatibility effect at all offsets, but the imitative compatibility effect was only found to be significant when the offset was non-negative.
The procedure of Catmur and Heyes (2011), Experiment 2, was replicated with the model, with stimulus offsets calculated assuming one msec per processing cycle. Recall that in the previous simulations the imperative stimulus was presented at cycle 500 (i.e., at t = 500). To simulate a trial in which the imperative stimulus occurred 160 ms before the onset of the irrelevant movement, excitation of the relevant imperative node was applied 160 cycles before this time (i.e., at t = 340). Excitation of the location node relevant to the trial type then occurred as usual at t = 490 (given δs = −10), while excitation of the finger identity node relevant to the trial type occurred at t = 580 (given δi = +80). As in the behavioral experiment, 28 trials of each of the four types and at each level of offset were performed. The mean compatibility effect is shown in Fig. 6 (right panel). As in the participant data of Catmur and Heyes (2011), the spatial compatibility effect is high at all levels of offset, peaking roughly when the imperative stimulus is presented at the same time as the irrelevant movement. In contrast there is no imitative compatibility effect when the imperative stimulus is presented substantially before the irrelevant movement, and the effect peaks when the imperative stimulus is presented shortly after the irrelevant movement.
As with the previous simulation, two parameter variation studies were conducted to test the sensitivity of the model’s behavior to the specific parameter settings chosen. Parameter study 2a followed the procedure of parameter study 1a. Delay parameters were varied over the same range as in parameter study 1a. The model was run on all conditions of Experiment 2 for each point in parameter space 100 times and the mean goodness-of-fit calculated for each point. Results are summarized in Fig. 7 (upper panel). While the fit on both measures is not as good as in parameter study 1a, there are still large areas of the parameter space where root mean square error is less than 10 msec (roughly when spatial delay is between −100 cycles and +10 cycles and imitative delay is between +70 and +140 cycles) and R2 is greater than 0.7 (roughly when spatial delay is between −100 cycles and 0 cycles and imitative delay is between +100 and +130 cycles).
A final parameter variation study (study 2b) examined model performance at different stimulus onsets when the strength, w, of spatial-to-motor and imitative-to-motor associations was varied. As in parameter variation study 1b, the strength of spatial-to-motor associations (default value of 8.0) was varied from 6.0 to 10.0 at intervals of 0.2 while the strength of imitative-to-motor associations (default value 4.0) was varied from 2.0 to 6.0 at intervals of 0.2. As in parameter variation study 2a, 100 simulations were performed at each point in parameter space and mean goodness-of-fit statistics were calculated with respect to the behavioral data of Catmur and Heyes (2011) Experiment 2. Results are summarized in Fig. 7 (lower panel). Again, using either measure of goodness-of-fit, it is clear that the behavior of the model matches well that of the human participants over a substantial portion of the parameter space.
3. General discussion
We have presented an interactive activation model of spatial and imitative compatibility that simulates all key results of the study of Catmur and Heyes (2011). The model, which is in many ways a standard application of the interactive activation approach, captures the relative size of the imitative and spatial compatibility effects, the lack of interaction when the imperative and irrelevant stimuli are presented simultaneously, the relative speed of responses in the quintile analysis when the imperative and irrelevant stimuli are presented simultaneously, and the different time courses of the compatibility effects when the imperative and irrelevant stimuli are presented asynchronously. Therefore, the model substantiates the hypothesis that spatial and imitative compatibility effects depend on processes of the same kind; it demonstrates that, contra Bertenthal et al. (2006), it is not necessary to postulate separate mechanisms underlying the effects of spatial and imitative compatibility on behavior. The observed behavioral differences between the two may be accounted for by an automatic route (albeit with different strength and delay parameters for different forms of information) within a more general dual-route framework.
3.1. Relation to previous related models
Interactive activation models of this sort were first employed to provide a process account of a range of effects within the Stroop paradigm. Thus, Cohen and Huston (1994; see also Cohen, Dunbar, & McClelland, 1990) developed an interactive activation model in which two banks of input nodes corresponded to two stimulus dimensions—word identity and stimulus color—and one bank of output nodes corresponded to verbal responses. A final bank of nodes represented the current task—either to read the stimulus word or to name the color in which it was printed. With appropriate weighting of the connections between nodes, the Cohen and Huston model is able to reproduce the observed facilitatory and inhibitory effects when stimuli are compatible or incompatible with the required response, both in the easier/automatic task (word reading) and the more difficult/intentional task (color naming). The same principles have been used to model spatial SRC effects (Zorzi & Umiltà, 1995), flanker effects3 (Cohen et al., 1992), and the so-called SNARC effect4 (Stoianov, Umiltà, & Zorzi, 2005), and Zhang et al. (1999) generalize the basic architecture to develop models of the complete space of stimulus–stimulus and stimulus–response compatibility tasks. Zhang et al.’s principal innovation in generalizing the architecture was the addition of a layer of nodes between the sensory and motor nodes to represent abstract concepts (like colors, which have verbal and visual input representations) that, they hold, mediate stimulus–response associations.
The basic architecture of the model is therefore far from novel. However, none of the above models deals specifically with imitative compatibility. Two other sets of models of particular interest have been developed in this domain. One of these, that of Boyer et al. (2009), does indeed use standard interactive activation principles and addresses both spatial and imitative compatibility, though in a slightly different task. It is the most direct competitor to the model presented here. The Boyer et al. task differs from that of Catmur and Heyes (2011) in two ways. First, and less critically, the imitative stimulus is not a finger abduction but a finger tap (of the index or middle finger). Second, the imperative stimulus is not a separate stimulus (such as the coloured dot of Catmur & Heyes, 2011) but the onset of movement of the finger.
To understand the Boyer et al. (2009) model it is necessary to understand their task. They tested four conditions in a between-subjects design. In the “direct spatial” condition, participants were required to respond with a spatially congruent finger tap. In the “direct imitative” condition, participants were required to respond with an imitatively congruent finger tap. In the “reverse spatial” condition, participants were required to respond with a spatially incongruent finger tap. In the “reverse imitative” condition, participants were required to respond with an imitatively incongruent finger tap. To illustrate, if in the reverse imitative condition the stimulus is a middle finger tap, the participant should respond with an index finger tap. On half of the trials—those where the stimulus was a right hand—this response was spatially compatible with the stimulus, but on the other half—where the stimulus was a left hand—this response was spatially incompatible. In one experiment, stimulus hand was blocked (i.e., the stimulus hand was fixed within block but alternated between blocks) while in a second experiment the stimulus hand was randomized within blocks.
The direct conditions revealed similar effects to those in other empirical studies of spatial and imitative compatibility, with compatible responses being faster than incompatible responses, but these effects did not hold in the reverse conditions. Here, at least when trial types were randomized rather than blocked, there was no significant compatibility effect in the imitative dimension while the spatial compatibility effect was reversed, with spatially incompatible responses being generated significantly more quickly than spatially compatible responses in the relevant condition.
Boyer et al. (2009) present separate but related models of the direct and reverse conditions. In the direct model, basic stimulus-response associations (as in the automatic route of the model presented here) are supplemented with nodes that effectively amplify the task-relevant stimulus–response mappings. The reverse model supplements the direct model with “reversal” nodes, which inhibit the direct mapping nodes and excite appropriate incompatible response nodes. The authors fit the models to the empirical data by adjusting associative weight parameters appropriately for each experimental condition. While the use of different associative weights for the different experimental conditions may appear unsatisfactory, it is justified by the experimental paradigm, where there is no explicit imperative stimulus and where the four experimental conditions differ only in the instructions given. Nevertheless it is unclear whether empirical differences in RT to blocked versus randomized stimuli, which in our view derive from different strategic approaches to the task (e.g., blurring of the eyes for blocked stimuli), should be accounted for by differences in associative weights.
There are, however, more substantial concerns with the Boyer et al. (2009) models. Empirically, while the models can capture the basic spatial and imitative compatibility effects, they fail to capture the reverse spatial compatibility effect. This is despite the degrees of freedom available in setting associative weights specifically for that condition. Theoretically, the conclusions of the work are also subject to debate. While the strengths of associations may vary between spatial and imitative routes, in our view the Boyer et al. work suggests (like our own) that spatial and imitative compatibility are mediated by processes that do not differ in kind. That is, while the models implement an ideomotor principle where there is “direct matching of observed and executed action” (Boyer et al., 2009, p. 2284), they do not imply that the process implementing this principle is any more specialized than the process implementing spatial compatibility.
A second set of models which addresses the relationship between spatial and imitative compatibility is described by Sauser and Billard (2006). These authors develop both “single-route” and “direct-matching” model variants of an SRC task first introduced by Brass et al. (2000). The critical feature of these models is that they attempt to be faithful to what is known of the neural substrate of perceptual and motor processing, both by their approach to neural coding and their decomposition of processing into a series of stages. Thus, the models adopt a dynamic neural field approach to representation (e.g., Amari, 1977), where the activity of units within a topographically organized network is described by a differential equation and where activity in different regions of the network represents different values of the dimension for which that network is specialized. With regard to stage-wise processing and connectivity, both models consist of sub-networks processing: (a) spatial and motion cues (held to be localized in medial superior temporal cortex); (b) movement observation (superior temporal sulcus); (c) ideomotor integration (ventral premotor cortex); (d) motion plans (dorsal premotor cortex); (e) cue integration (posterior parietal cortex); and (f) motor selection (ventral premotor cortex). The models differ in the connectivity of these sub-networks. Thus, in the single-route model ideomotor integration feeds to cue integration (which then outputs to motor selection), while in the direct-matching model ideomotor integration feeds directly to motor selection.
One key finding of Sauser and Billard (2006) is that with appropriate parameter settings both models can provide a good fit to the empirical data of Brass et al. (2000). Thus, at least with respect to the study of Brass and colleagues, Sauser and Billard’s model comparison does not provide strong evidence for or against either the single-route or the direct-mapping approach.
The Sauser and Billard (2006) models have not been applied to asynchronous presentation of sensory and imperative stimuli as in Catmur and Heyes (2011), Experiment 2, but they would seem to be compatible with the paradigm.5 Curiously, despite the neurophysiological plausibility of the approach, Sauser and Billard (2006) caution that the precise time course of the neural processes mediating spatial and imitative compatibility are beyond the scope of their models. Nevertheless, different parameter values (which determine the amplitude of inputs to, and weights within, a module) are used to model the dynamics of the different input dimensions, and so the different dimensions will show different time courses. Whether the models would make differential predictions under asynchronous presentation, and hence whether the paradigm discriminates between the models (and hence the theoretical approaches implemented within them), is an open question. It is our suspicion that they will not. Indeed, in terms of connectivity, both models can at an abstract level be mapped onto the model of Zhang et al. (1999), with Sauser and Billard’s cue integration module corresponding to Zhang et al.’s intermediate units. The issue then becomes one of whether imitative inputs should feed directly to motor response units or via such intermediate units. The work reported here suggests that intermediate units are not required. This again reinforces our contention that the empirical effects of Catmur and Heyes are consistent with a model in which ideomotor associations are no different in kind from spatial stimulus-response associations. Our view therefore is that the Sauser and Billard (2006) models demonstrate the neural plausibility of an approach to stimulus–response compatibility tasks based on successive layers of nodes operating according to principles of interactive activation, but they do not obviously add anything at the cognitive level, that is, at the level at which our model is pitched.
Thus, to summarize, the novel aspect of the model presented here is the use of the interactive activation framework to capture simultaneously the differing time courses of both spatial and imitative compatibility effects within a single simple model and when the sensory and imperative stimuli are presented asynchronously. The success of the model implies that these effects need not be mediated by qualitatively different mechanisms.
3.2. Specific aspects of the model
Several aspects of the model require further comment. One is the use of transient excitation of irrelevant sensory inputs. If sensory nodes were to remain active for as long as matching stimuli were present in the perceptual input, then not only would the model be unable to capture any of the empirical effects related to time course of SRC effects, but it would also be liable to produce responses in the absence of an imperative stimulus. That is, the mere presence of a sensory stimulus would elicit the compatible response. This kind of imitative behavior has been reported following neurological damage (Lhermitte, Pillon, & Serdaru, 1986). It may also occur (apparently without awareness) in some social situations (Chartrand & Bargh, 1999). However, in neurologically healthy individuals, such imitative behavior is both limited and subject to intentional control. The transient stimulation of task-irrelevant sensory nodes may be considered as a form of habituation, or, following Davelaar and Huber (2006) a form of “perceptual discounting.”
The mechanism by which transient excitation is implemented in the current model—allowing the excitation of sensory nodes to decay once they reach a threshold—is just one of many ways that transience of sensory input might be implemented. It follows a similar approach used in an unpublished interactive activation model of sequential effects in the flanker task by Davelaar and Huber (2006), but an alternative might be to allow sensory excitation to continue throughout the trial but to counteract this with explicit (controlled) inhibition of sensory nodes.
A final consequence of the transient nature of task-irrelevant sensory excitation is that in the model reported here, in comparison with some of the models described above, associative links between sensory nodes and motor nodes are almost as strong (at 4.0 for imitative and 8.0 for spatial associations) as those between imperative nodes and motor nodes (at 10.0). In comparison, in the Boyer et al. (2009) model the irrelevant sensory to motor associations are of the order 0.001 while the equivalent of the imperative to motor associations are greater than 0.1. Given the basic compatibility effects, there are two clear options for the influence of task-irrelevant stimulation over the time-course of a typical trial—either continuous low-level sensory stimulation for the duration of the stimulus (as in the Boyer et al. model), or a short, sharp peak in stimulation. The model presented here adopts the latter, but this is not an arbitrary decision. As discussed below, the former would, given Eqs. (1)–(3), result in an interaction between spatial and imitative compatibility effects—an interaction which is not present in the empirical data.
A second unusual aspect of the model is that, contrary to many of the interactive activation models referred to above, we have not assumed lateral inhibition between nodes within each “layer” (i.e., within each ellipse shown in Fig. 1). Lateral inhibition is an inhibitory influence that affects nodes in proportion to the activation of other nodes within the layer. It implements a form of within-layer competition. Thus, if one node in a layer becomes more active than other nodes in the layer, it will tend to have a large suppressing effect on the activation of other nodes, while it will suffer a relatively small suppressing effect on its own activation from them. Over time, the “winning” node will increase in activation while the “losing” nodes will decrease in activation. We have not included lateral inhibition in the current simulations for two reasons. First, it was not necessary to account for the effects of interest. Second, the stimuli and responses used in the Catmur and Heyes (2011) task are not antagonistic and hence do not compete. For example, in principle a task-irrelevant movement could occur on both the left and right of the stimulus at the same time. Similarly, participants could in principle produce both index-finger and little-finger responses simultaneously. This contrasts with some experimental designs where stimuli or responses are antagonistic (e.g., Stroop stimuli, or responses that involve opening versus closing the hand, e.g., Stuermer et al., 2000).
More generally, we are agnostic about the inclusion of lateral inhibition. Including it has the effect of prolonging excitatory “bursts” such as those of the sensory nodes of Fig. 2. A key requirement for the absence of an interaction between imitative and spatial compatibility in the current model (as found in the study of Catmur & Heyes, 2011; but in contrast to some other studies, e.g., Press et al., 2010) is that such bursts do not overlap in time. If they do overlap, then it can be shown by manipulation of Eqs. (1)–(3) that the effect of spatial and imitative compatibility will be sub-additive (i.e., their combined effect will be less than the sum of their individual effects). However, this does not mean that the current data rule out lateral inhibition as a mechanism within the current model. The prolonging of excitatory bursts of sensory nodes resulting from lateral inhibition may be mitigated by decreasing further the persistence of sensory nodes, and additional simulations (not reported here) show that, for the quintile analysis at least, a model with lateral inhibition fits the data just as well as the model described here.6
Stimulus–response compatibility experiments typically yield low but non-zero error rates, with participants occasionally erring even when the stimulus and response are compatible. As in many studies, such errors were also observed by Catmur and Heyes (2011) and paralleled the response time effects (i.e., more errors with slower responses). Following most of the modeling work cited above, we have not attempted to simulate errors. Indeed, with the parameter settings used in the work reported here errors were exceedingly rare. The interactive activation framework is, however, capable of producing them. Increasing noise (η), particularly in the response nodes, for example, increases the probability of error, as does decreasing the response threshold (τ). Alternatively, errors might arise from a more sophisticated approach to response selection. For example, the response selected may be based not just on the activity of the winning response node but also on the difference between the activity of this node and competing response nodes, or participants may dynamically adjust their response threshold on a trial by trial basis, decreasing it when performance is good (and thereby decreasing RT) until an error is produced, and then increasing it (cf. Botvinick, Braver, Barch, Carter, & Cohen, 2001).
We have also not attempted to simulate individual differences in compatibility effects, or indeed groups of multiple subjects. Thus, the simulations reported above have focused on capturing group means. Within the interactive activation framework individual differences might be captured by differences in key parameters. One particularly important parameter is likely to be the strength of the intentional route, which intuitively corresponds to motivation. Increasing the strength of this route decreases RT and both compatibility effects, while decreasing the strength has the opposite effect.
One potential criticism of the model is that its behavior is governed by a great many parameters (input strengths, connection strengths, delay parameters, persistence, and biases on different types of node). One might even argue that the model, with appropriate parameter settings, could produce any of a range of behaviors. Certainly different behaviors do result from different parameter settings—as illustrated by the above comments on modeling possible interactions between spatial and imitative compatibility. However, it is not the case that the model could be crafted to produce any pattern of effects. Critically, a single set of parameter values was used to simulate quantitatively all RT results of both of the experiments of Catmur and Heyes (2011). Moreover, some parameters are constrained by meta-theoretical considerations (e.g., the persistence parameters, which, as discussed, are set to ensure a processing cycle corresponding approximately to 1 msec). Others are constrained by theoretical commitments (e.g., that the delay on finger identity nodes should be greater than the delay on location nodes, given that the task of finger discrimination is more complex than the task of location discrimination), and for other parameters considerable variability is possible without impairing the fit of the model to the data, as shown by the two pairs of parameter variation studies.
3.3. Theoretical implications
As noted in the Introduction, automatic imitation lies at the intersection of at least two sets of recent developments in research on social cognition—relating to mirror neurons and to motor mimicry. A focus of debate in these fields is the extent to which social cognitive phenomena demand “specialist” rather than “generalist” explanation (Brass & Heyes, 2005). Specialist theories suggest that social cognitive phenomena are mediated by functionally and neurologically distinctive mechanisms, whereas generalist theories assume that mechanisms of the same kind mediate the processing of input from social and nonsocial sources. Our model and simulation data do not bear on the question of whether automatic imitation and spatial compatibility are mediated by different neurological mechanisms. However, they support the generalist view that automatic imitation, a social cognitive effect, is mediated by the same kind of functional, or cognitive, process as spatial compatibility. More specifically, they suggest that key features of this process can be modeled by interactive activation within a dual-route architecture.
Wolpert, Doya, and Kawato (2003) have argued that imitation is mediated by the same computational algorithms as motor control; algorithms that are captured within their MOSAIC model as multiple pairs of predictor–controller (or forward–inverse) internal models. Our findings are entirely consistent with this suggestion, but the MOSAIC model does not address the correspondence problem (Brass & Heyes, 2005) that characterizes many examples of imitation. It does not explain how visual input from action observation is translated into corresponding motor variables—how the neurocognitive system knows which motor commands are likely to produce a body movement that looks, from a third party perspective, like the one that the subject is currently observing. Two informal models have addressed this correspondence problem: the active intermodal matching model (AIM, Meltzoff & Moore, 1997), and the associative sequence learning model (ASL, Catmur, Walsh, & Heyes, 2009; Heyes, 2001, 2010). The former suggests that the correspondence problem is solved by a specialized, innate mechanism; that human infants are born with a mechanism that can infer corresponding motor commands from visual input, and that this mechanism evolved specifically to enable imitation. AIM does not specify the computations performed by the innate mechanism. In contrast, the ASL model suggests that knowledge of visuomotor correspondences is encoded in binary associations, each of which links a visual representation of an action with a motor representation of the same action. These links are acquired in the course of normal human development through associative learning—the same, evolutionarily ancient learning mechanisms that mediate Pavlovian and instrumental conditioning—in contexts where humans experience a contingency between observation and execution of the same action. These contexts include direct and mirror-mediated visual observation of one’s own movements; synchronous action of the kind received in sports and dance; and, crucially in early development, imitation by others. Given that associative learning is a domain-general process of behavioral adaptation, and a plausible developmental source of the automatic links in the dual-route architecture of our model, it is clear that our findings support the ASL, rather than the AIM, account of the evolutionary and developmental origins of imitation.
Response nodes require a larger negative bias than stimulus nodes because even when no stimulus is present response nodes receive some excitation from stimulus nodes (due to the stimulus nodes settling to positive base-line value of approximately 0.2). The bias of −6 counteracts this excitatory influence in the absence of a stimulus.
The resting activation of motor nodes is greater than that given by Eq. (4) as they receive input from sensory nodes, which have non-zero resting activations. Allowing the model to settle for a few hundred cycles prior to stimulus presentation allows the activations of all nodes to stabilize.
Stimuli in flanker tasks consist of a central image that is “flanked” on the left and right by distractor images which may be associated with a compatible or incompatible response. Compatible flankers have a facilitatory influence on response production while incompatible flankers interfere with response production.
Stimuli in the SNARC (Spatial Numerical Association of Response Codes) effect task are digits, and participants are required to make some binary judgment (e.g., odd/even) on those digits. Small digits are responded to more quickly with the left hand than large digits, and vice versa.
Applying the Sauser and Billard (2006) model to the Catmur and Heyes (2011) task would require an extension to the model for processing color, but this would presumably simply involve addition of a “color cue” module within that part of the model implementing medial superior temporal cortex. Different parameter settings would presumably be required for this module to fit with differences in the speed of color processing compared to spatial and motor processing.
In fact, Brass et al. (2000), using a simpler experimental design but with a neutral condition in addition to compatible and incompatible conditions, found both facilitation in compatible trials and interference in incompatible trials. This pattern of results requires that incompatible stimuli interfere with response generation or selection. This would seem to require lateral inhibition at least between response nodes.
We are grateful to Bennett Bertenthal and several anonymous reviewers for constructive criticism of an earlier draft of this paper.