Intention and Motor Representation in Purposive Action
Are there distinct roles for intention and motor representation in explaining the purposiveness of action? Standard accounts of action assign a role to intention but are silent on motor representation. The temptation is to suppose that nothing need be said here because motor representation is either only an enabling condition for purposive action or else merely a variety of intention. This paper provides reasons for resisting that temptation. Some motor representations, like intentions, coordinate actions in virtue of representing outcomes; but, unlike intentions, motor representations cannot feature as premises or conclusions in practical reasoning. This implies that motor representation has a distinctive role in explaining the purposiveness of action. It also gives rise to a problem: were the roles of intention and motor representation entirely independent, this would impair effective action. It is therefore necessary to explain how intentions interlock with motor representations. The solution, we argue, is to recognise that the contents of intentions can be partially determined by the contents of motor representations. Understanding this content-determining relation enables better understanding how intentions relate to actions.
What is the relation between a purposive action and the outcome or outcomes to which it is directed? The standard way of answering this question appeals to intention, a propositional attitude which plays a characteristic role in planning and coordinating action, is linked to practical reasoning and is subject to characteristic norms (Bratman 1987). On the standard view, an action is directed to an outcome in virtue of the action's being appropriately related to an intention which represents this outcome or some related outcome. As this view is usually expounded, the relation between actions and outcomes to which they are directed is treated as largely independent of the motor processes and representations underpinning action execution. Motor representations are usually considered either as philosophically irrelevant enabling conditions, or else as merely filling in additional details of the basic schema provided by the standard story. Our aim is to show that this is a mistake. We shall argue that, once some basic features of motor representations are properly understood, the standard view must be refined and extended in ways that allow us to recognize distinct roles for intention and motor representation in explaining the purposiveness of action.
Twin temptations stand in our way. The first temptation is to suppose that motor representation, although significant for action execution, has no role in grounding the outcome-directedness of action. Surrendering to this temptation might be reasonable if all motor representations were about merely kinematic or dynamic features of actions. But some motor representations do represent action outcomes and some actions are directed to those outcomes in virtue of the guiding role of motor representations, or so we shall argue. This reason to resist the first temptation throws us straight into the arms of a second temptation, the temptation to suppose that where motor representations represent action outcomes, they are intentions. Surrendering to this second temptation would be reasonable if motor representations were like intentions in that they could feature as premises or conclusions in practical reasoning, or if there were some other planning processes in which both intention and motor representation could feature. However, as we shall argue, this is impossible due to differences between the propositional format of intentions and the distinctively motor, non-propositional format of motor representations. In short, we must resist both temptations because some motor representations are like intentions in representing action outcomes while also remaining sufficiently unlike intentions in that no single planning process can integrate both intention and motor representation.
While there are good reasons to resist both temptations (or so we shall argue), the recognition that motor representation can ground the outcome-directedness of purposive actions independently of intention gives rise to a problem. For a single action, which outcomes it is directed to may be multiply determined by an intention and, seemingly independently, by a motor representation. Unless such intentions and motor representations are to pull an agent in incompatible directions, which would typically impair action execution, there are requirements concerning how the outcomes they represent must be related to each other. The problem is to explain how any such requirements could be met non-accidentally; we call this ‘the interface problem’. The key to solving this problem, we shall suggest, is to recognise that intentions can have constituents which refer to outcomes by deferring to motor representations of those outcomes. Effective action sometimes requires that the contents of intentions and of motor representations interlock, and this interlocking occurs when the contents of intentions are determined in part by the contents of motor representations. So whereas discussions of intention frequently ignore motor aspects of action, it turns out that understanding how intentions interface with motor representations is required for fully understanding how intentions are related to actions.
2. Motor Representations Link Actions to Outcomes
Our overall aim is to show that intention and motor representation play distinct roles in explaining the purposiveness of action, that is, in relating actions to outcomes to which they are directed. The first temptation standing in our way is the temptation to suppose that motor representation, although perhaps important for action execution in some agents, has no bearing at all on our question about the relation between actions and the outcomes to which they are directed. Just as it would be an error to suppose that details of musculoskeletal structure are relevant to this question, so equally—so the temptation—it would be an error to suppose that facts about motor representation are relevant here.
Surrendering to this temptation might be reasonable if all motor representations represented only kinematic or dynamic features of actions, such as mere joint displacements or muscle contractions. However, we shall argue in this section that some motor representations represent action outcomes such as grasping, tearing or throwing. Furthermore, as we shall go on to argue (also in this section), such representations ground purposive actions. This in outline is why the first temptation should be resisted.
How does representing an outcome differ from representing merely kinematic or dynamic features of action? How, for instance, does a motor representation of grasping (assuming for now that such a thing exists) differ from a representation of a sequence of joint displacements? First, a motor representation of grasping captures something common to many different sequences of joint displacements and postures involving a variety of effectors. To illustrate, an agent might grasp an object with her hand, with her mouth, with normal pliers (where grasping requires closing the hand), or with reverse pliers (where grasping requires opening the hand). A motor representation of grasping identifies something common to all these cases. Second, and conversely, a motor representation of grasping potentially distinguishes between the same sequence of joint displacements in different contexts. For instance, how a grasping action is represented may depend on a relatively distal outcome: on whether (say) it results in the object grasped being eaten or placed. In addition, the joint displacements which realise grasping in one context might in another context realise a different action, such as scratching; or they might fail to realise any action at all (because there is no target object, say). Third, a motor representation of an action outcome locates the action within a hierarchical structure. To illustrate, a left-handed precision grip and a right-handed whole hand grip are different action outcomes and both instances of grasping.1
Why accept that there are motor representations of action outcomes? The first step is to consider evidence that motor processes carry information about action outcomes. For any given marker of motor processing (such as a pattern of neuronal discharge or motor-evoked potentials), how can we test whether that marker carries information about action outcomes? The basic principle is straightforward: vary kinematic and dynamic features while holding constant an action outcome; and, conversely, vary action outcomes while holding kinematic and dynamic features constant. In practice researchers have devised many ingenious ways to achieve this. In order to vary kinematic and dynamic features while holding action outcomes constant, in some studies a single action outcome is achieved using different effectors, hand, mouth or foot, say (Rizzolatti et al. 1988, 2001; Cattaneo et al. 2010). A variation on this approach is to contrast performing a grasping action with different tools, so that the same action outcome might require closing or opening the hand depending on the tool used (Umiltàet al. 2008; Cattaneo et al. 2009; Rochat et al. 2010). In order to vary action outcome while holding kinematic and dynamic features constant, researchers have contrasted grasping movements with different distal outcomes such as eating and placing (Fogassi et al. 2005; Bonini et al. 2010; Cattaneo et al. 2007). Another approach is to contrast the same grasping movements performed in the presence or manifest absence of a target object (Umiltàet al. 2001; Villiger et al. 2010). A related alternative is to contrast the same grasping movements in the presence of objects which could, or manifestly could not, be grasped by means of such movements (Koch et al. 2010). In each of these cases there is evidence that some markers of motor processing are correlated with action outcomes rather than narrowly kinematic or dynamic features of action.2
To say that motor processing involves information about action outcomes is not, of course, to say that there are motor representations of action outcomes. To make the step from information to representation we have to show that information about action outcomes guides processing (compare Dretske 1988). To this end we shall now consider how information about action outcomes is relevant in motor planning and monitoring, which are two of the functional roles of motor representation.
First take motor planning. This involves satisfying a variety of requirements. For example, in grasping a mug it is necessary for the hand to prefigure the shape of the mug, to move towards it avoiding potential obstacles and to reach it at a velocity that is both compatible with achieving the type of grip to be used and also suitable given features of the mug such as its fragility and weight (Jeannerod et al. 1995; Jeannerod 1998). The need to plan sequences of actions, which may overlap, imposes further requirements. How one should reach for and then grasp a heavy frying pan (say) depends on what one will then do with it. One way of grasping it might be ideal for safely transporting its contents, another for emptying it. The many requirements on motor planning cannot normally be met by explicit practical reasoning, especially given the rapid and fluid transitions involved in many action sequences. Rather they require motor processes and representations.
Second, motor representations function in monitoring action. They provide inputs to internal predictive models that estimate likely effects of actions. Sensory feedback provides information about the actual course of the action which can be compared against the predictions. Adjustments are made in order to minimise the discrepancy between these (Wolpert et al. 1995; Miall and Wolpert 1996).
These facts about the functional roles of motor representation in planning and monitoring action reveal that it would be advantageous for some such representations to represent action outcomes rather than merely patterns of joint displacement or muscle activation. In the case of planning, many of the requirements to be satisfied are partially dependent on action outcomes and not only on more narrowly kinematic or dynamic features of action. Representation of action outcomes could therefore play a role in simplifying planning processes. Equally, monitoring involves predicting relatively distal effects of bodily movements including facts about the locations of objects. To make such predictions exclusively in terms of individual joint displacements or muscle activations would be computationally demanding as the human hand alone has over 20 degrees of freedom. Efficient prediction plausibly demands several higher-level representations of action including representations of action outcomes (Arbib 1985; Mason et al. 2001; Santello et al. 2002). In short, it is clear that motor representations of action outcomes could be useful: in planning, because some requirements for efficient motor planning concern outcomes to which actions are directed; and in monitoring, because representations of outcomes could simplify prediction. Given that, as we saw, markers of motor processes carry information about action outcomes, it is reasonable to conclude from the relevance of action outcomes to the functional roles of motor representation that some motor representations are representations of action outcomes.
So far we have argued that motor processes involve representations of action outcomes. It remains to show that such representations ground purposive actions. But this is hardly a further step. How do intentions ground the purposiveness of actions? On any standard view, an intention represents an outcome, causes an action, and does so in a way that would normally facilitate the outcome's occurrence. Similarly, motor representations of outcomes represent action outcomes, play a role in generating actions, and do this in a way that normally facilitates the occurrence of the outcomes represented. To say that motor representations do all this is one way of making precise the metaphor involved in saying that purposive actions are directed to outcomes. Moreover, there is a clear resemblance between the natural way of understanding intentions as grounding outcome-directedness and the way in which motor representations ground outcome-directedness (as Pacherie 2008, pp. 189–90 has also argued). So given the functional roles and contents of motor representations, there is little room for doubt that motor representations can ground the outcome-directedness of purposive actions.
To sum up, the first temptation was to suppose that motor representation has no bearing at all on our question about the relation between actions and the outcomes to which they are directed. One reason to resist this temptation is the fact that, as we have just argued, motor representations not only represent action outcomes but also ground the directedness of actions to outcomes. This argument might be taken as grounds for giving in to a second temptation.
3. A Motor Format for Representation
The second temptation is to suppose that motor representation, or some species of it, is a variety of intention, where intention is understood in the standard way as a propositional attitude with a characteristic role in practical reasoning (Bratman 1987). In this section we explain why this temptation should also be resisted.
As background we first need a generic distinction between content and format. Imagine you are in an unfamiliar city and are trying to get to the central station. A stranger offers you two routes. Each route could be represented by a distinct line on a paper map. The difference between the two lines is a difference in content. Each of the routes could alternatively have been represented by a distinct series of instructions written on the same piece of paper; these cartographic and propositional representations differ in format.3 The format of a representation constrains its possible contents. For example, a representation with a cartographic format cannot represent what is represented by sentences such as ‘There could not be a mountain whose summit is inaccessible.’ The distinction between content and format is necessary because, as our illustration shows, each can be varied independently of the other.
Our aim in this section is to show that motor representations differ from intentions with respect to their format. This is why the second temptation should be resisted. That motor representations differ in format from intentions shows that they are genuinely distinct phenomena.4
How in general can we identify or distinguish representational formats? Because representational formats are typically associated with characteristic performance profiles, it is sometimes possible to infer similarities and differences in representational format from similarities and differences in the processes in which representations feature. This works both for artefactual and mental representations. To illustrate in terms of our earlier example, suppose that two people have representations of the same route but for one person the route is represented by a line on a map (so in a cartographic format) whereas the other person has a propositional representation of the route. Some transformations are likely to be easier for the person with the cartographic route representation (depending on the projection used, of course); examples include reversing the route, determining how many times a certain river is crossed and transforming the route into a sequence of compass bearings. Other transformations, such as turning the route into a list of salient landmarks, may be easier for the person with the propositional route description. So some patterns of difference in the two people's performances may be explained by the difference in the formats of their representations; and some similarities in performance profile may likewise be explained by sameness of format. If we did not already know that the two people's route representations involved different formats, we might infer this from the facility with which each performed various transformations of the route.
Cognitive neuroscience frequently depends on inferences of just this type. To illustrate, compare imagining seeing an object moving with actually seeing it move. For this comparison we need to distinguish two ways of imagining seeing. There is a way of imagining seeing which phenomenologically is something like seeing except that it does not necessarily involve being receptive to stimuli. This way of imagining seeing, sometimes called ‘sensory imagining’, is commonly distinguished from cognitive ways of imagining seeing which might for example involve thinking about seeing (Gendler 2011, §2.1).5 It is this way of imagining seeing an object move that we wish to compare with actually seeing an object move. These two have similarities in characteristic performance profile. For instance, whether an object can be seen all at once depends on its size and distance from the perceiver; strikingly, when subjects imagine seeing an object, whether they can imagine seeing it all at once depends in the same way on size and distance (Kosslyn 1978, 1996, p. 99ff). Also, how long it takes to imagine looking over an object depends on the object's subjective size in the same way that how long it would take to actually look over that object would depend on its subjective size (Kosslyn et al. 1978).6 Further, imagining seeing something (for example, imagining seeing a visual mask) can modulate and interfere with actually seeing in much the way that actually seeing the thing imagined would (Pearson et al. 2008; Ishai and Sagi 1995). The similarities in characteristic performance profile and the particular patterns of interference are good (if non-decisive) reasons to conjecture that imagining seeing and actually seeing involve representations with a common format. This conjecture is indirectly supported by evidence that imagining seeing and actually seeing not only have a common neural basis but also involve similar patterns of cortical activation (e.g., Page et al. 2011).
Let us turn to motor representation. Compare imagining moving a ball with actually moving a ball. To fully specify the comparison we intend, it is again necessary to distinguish two ways of imagining. One way of imagining action is phenomenologically something like acting except that such imaginings are not necessarily responsive to the features of actual objects and do not necessarily result in bodily movements. To illustrate suppose you are about to dive into a pool and, standing at the edge, mentally pantomime launching yourself from the bank. In some respects the experiences involved in this imaginative exercise may be barely distinguishable from experiences that might be involved in actually launching yourself into the pool. This way of imagining action can be distinguished from cognitive ways of imagining action which might involve thinking about an action.7 The comparison we intend is between imagining moving a ball in the former, phenomenologically action-like way and actually moving a ball. There is evidence that the way imagining performing an action unfolds in time is similar in some respects to the way actually performing an action of the same type would unfold. For instance, how long it takes to imagine moving an object is closely related to how long it would take to actually move that object (Decety et al. 1989; Decety 1996; Jeannerod 1994). In addition, for actions such as grasping the handle of a cup, manipulating the target object in ways that make the action harder (such as orienting the cup's handle to make it less convenient for you to grasp) make a corresponding difference to the effort involved in imagining performing the action (Parsons 1994; Frak et al. 2001). Further, imagining performing an action can selectively interfere with performance of a related action. For example, suppose you are faced with an array of objects one of which—the target—you will shortly be required to grasp. Subjects who imagine grasping an object other than the target object tend to be slower in subsequently grasping the target object than subjects who do not imagine acting or subjects who imagine grasping the target object (Ramsey et al. 2010). Just as the similarities between imagining seeing and actually seeing are evidence for the hypothesis that the representations involved in imagining seeing and actually seeing have a common format, so also the similarities in characteristic performance profile between imagining acting and actually acting, together with the particular patterns of interference between the two, suggest that imagining acting and actually acting involve a common representational format. And much as in the case of seeing and imagining seeing, acting and imagining acting involve many of the same processes almost up to the actual muscle contractions (Jeannerod and Decety 1995; Jeannerod 2003).
We have been comparing actually seeing with imagining seeing and actually acting with imagining acting as a first step towards arguing that visual and motor representations differ in format (which in turn will be background for an argument that motor representation and intention differ in format). This claim could not easily be established by comparing actually seeing with actually acting because performance differences between these might be explained by bodily or environmental factors only distantly related to the representations involved. By contrast, comparing performance in imagining seeing and imagining acting does provide reasons to conclude that visual and motor representations differ in format. To see why, contrast imagining rotating a ball with imagining seeing a ball rotating.8 As already mentioned, how quickly the former can be done is a function of how long it would take the agent to rotate the ball, whereas how quickly the latter can be done depends on how rapidly the ball can rotate and still be perceived as rotating. Similarly, we mentioned that factors making actually acting more effortful also make imagining acting more effortful. For instance, in some cases rotating a ball clockwise is easier than rotating it anti-clockwise, and so is imagining rotating a ball. By contrast, the effort involved in actually seeing or imagining seeing a ball rotate does not similarly differ depending on direction. These and other performance differences are plausibly a consequence of a difference in format between motor and visual representations.
It may be objected that performance differences such as these can be explained without appealing to a difference in format. After all, rotating a ball involves an action whereas a ball rotating does not; consequently, imagining the former may be thought to differ from imagining the latter with respect to the contents of the representations involved. Supposing that there are differences in content here and in other cases, could these fully explain differences in performance profile? To see why not, consider two tasks involving mental rotation. Judging the laterality of a rotated letter is thought to involve phenomenologically vision-like imagination (Jordan et al. 2001), whereas judging the laterality of a rotated hand is thought to involve phenomenologically action-like imagination (Parsons 1987; Gentilucci et al. 1998). Ordinary subjects who are asked to judge the laterality of a hand rotated to various degrees are less accurate when the hand's position is biomechanically awkward. By contrast, no such effect occurs for comparable tasks involving letters rather than hands. How could this difference in performance in imagining hands and letters be explained? Consider the claim that the difference in performance can be fully explained by a difference in the content of the representations involved. Initially this might seem plausible because one task involves hands whereas the other involves letters. However, there are subjects who can perform both tasks but whose performance is not different for hands and letters (Fiori et al. 2012). These are subjects suffering Amyotrophic Lateral Sclerosis (ALS), which impairs motor representation (Parsons et al. 1998). Since ALS and ordinary subjects encounter the same stimuli and perform the same tasks, there seems to be no reason (other than our hypothesis about a difference in format) to suppose that the two groups’ performance involves representations with different contents. So if the hand-letter difference in performance were entirely explained by a difference in content, we would expect ALS and ordinary subjects to exhibit the same difference in performance. But this is not the case. This is an obstacle to supposing that the hand-letter difference in performance in ordinary subjects could be explained by appeal to content.
The hypothesis that visual and motor representations differ in format is consistent with evidence that imagining acting and imagining seeing involve different processes (Kosslyn et al. 2001). For instance, each can be selectively impaired (Sirigu & Duhamel 2011); and factors such as limb amputation or hand posture can interfere with imagining acting without interfering with imagining seeing (Nico et al. 2004; Vargas et al. 2004; Fourkas et al. 2006).
So far we have been arguing that motor and visual representations differ in format. Why suppose that motor representations also differ in format from intentions? Contrast two ways of imagining taking a shot in basketball, one involving the phenomenologically action-like kind of imagination and the other involving a cognitive kind of imagination. The contrast we require is roughly between the way a former player might imagine this and the way that someone who has only ever read about basketball might imagine it. As we have seen, the way phenomenologically action-like imagination unfolds in time and the amount of effort it involves will depend on bio-mechanical, dynamical and postural constraints, among others. These constraints are closely related to those which govern actually performing such actions (Jeannerod 2001), and some can be altered by acquiring or losing motor expertise. By contrast no such constraints would be expected always to apply where a cognitive kind of imagination is involved. In line with the general strategy of inferring differences in format from differences in characteristic performance profile, we conclude that motor representations differ in format from those involved in cognitive kinds of imagination, which are plausibly propositional.
But are we too hasty in concluding that motor and propositional representations differ in format? It might be objected that their characteristic performance profiles are not so different, for in cognitive imagination the fact that an agent is imagining herself acting will mean that how she imagines the action unfolding will reflect constraints on what she can do. But while this will sometimes be the case, a cognitive kind of imagining need not involve imagining an action unfolding in a way consistent with one's actual abilities. What distinguishes the phenomenologically action-like form of imagination is that some bio-mechanical, dynamical and postural constraints are inescapable. To make this vivid consider imagining reaching for a distant object. If the object is manifestly far out of reach it will not normally be possible to do this using the phenomenologically action-like kind of imagination, whereas no such difficulty need occur where a cognitive kind of imagination is involved. After all, where a cognitive kind of imagination is involved one might imagine having much longer limbs (or an entirely different body) whereas this cannot be achieved at will where the phenomenologically action-like kind of imagination is involved. Finally, where there are constraints on a cognitive kind of imagination these are generally mediated by beliefs or suppositions about one's own abilities; it seems unlikely that this is true of the phenomenologically action-like kind of imagination.
To conclude that intention and motor representation are genuinely distinct phenomena it is not quite enough to know that motor representations are non-propositional, of course. In addition we must know that intentions are propositional. We take this claim to be a consequence of the role of intention in practical reasoning and of the fact that one can have intentions involving quantification and identity; for example, one can intend that one cross seven distinct bridges in 48 hours without yet specifying which bridges or hours.9
4. The Interface Problem
We have just argued for three claims. First, some motor representations represent outcomes (rather than, say, only bodily movements). Second, there are actions whose directedness to an outcome is grounded in motor representation. And third, motor representation differs from intention with respect to representational format. A consequence of these claims is that a single purposive action may involve representations of the outcomes to which it is directed in at least two different representational formats, motor and propositional. This contributes to a problem we call the interface problem; in this section we explain how it arises.
Imagine that you are strapped to a spinning wheel facing near certain death as it plunges you into freezing water. To your right you can see a lever and to your left there is a button. In deciding that pulling the lever offers you a better chance of survival than pushing the button, you form an intention to pull the lever, hoping that this will stop the wheel. If things go well, and if intentions are not mere epiphenomena, this intention will result in your reaching for, grasping and pulling the lever. These actions—reaching, grasping and pulling—may be directed to specific outcomes in virtue of motor representations which guide their execution. It shouldn't be an accident that, in your situation, you both intend to pull a lever and you end up with motor representations of reaching for, grasping and pulling that very lever, so that the outcomes specified by your intention match those specified by motor representations. If this match between outcomes variously specified by intentions and by motor representations is not to be accidental, what could explain it?
The interface problem is the problem of answering this question, of explaining how intentions and motor representations, with their distinct representational formats, are related in such a way that, in at least some cases, the outcomes they specify non-accidentally match. But why think that this question poses a problem at all?
Let us start by putting the question more precisely. First we should define the relevant notion of matching. Two collections of outcomes, A and B, match in a particular context just if, in that context, either the occurrence of the A-outcomes would normally constitute or cause, at least partially, the occurrence of the B-outcomes or vice versa. To illustrate, one way of matching is for the B-outcomes to be the A-outcomes. Another way of matching is for the B-outcomes to stand to the A-outcomes as elements of a more detailed plan stand to those of a less detailed one.
Now we can put the question more generally. There are cases in which a particular action is guided both by one or more intentions and by one or more motor representations. In at least some such cases, the outcomes specified by the intentions match the outcomes specified by the motor representations. Furthermore, this match is not always accidental. How do non-accidental matches come about?
In principle one might try to explain the match by supposing that intentions and motor representations have a common cause. If the mere presence (or the mere perception) of a lever invariably triggered intentions and motor representations specifying grasping (say), it might be possible to explain matching in this way. But this sort of consideration cannot provide a full explanation of non-accidental matching for two reasons. First, intentions are not always triggered in straightforward ways by agents’ environments or perceptions; to suppose otherwise is to ignore the very phenomena, decision and planning, which make intention so interesting. Second, motor representations are also not always triggered in any straightforward way by agents’ environments or perceptions either. For these reasons there seems to be no hope of fully explaining matching by postulating common causes for intentions and motor representations.
If common cause explanations are ruled out, another natural approach is to appeal to content-respecting causal processes. Perhaps, for example, intentions with certain contents (concerning grasping, say) reliably cause motor representations with corresponding contents (also concerning grasping, say). Alternatively we might suppose, very crudely, that some comparator process checks that the contents of motor representations are appropriate given what the agent intends. Either way, the idea is to explain matches between outcomes specified by the contents of certain states in terms of content-respecting causal processes linking those states.
This type of explanation is arguably appropriate where the states in question have the same representational format. For example, this type of explanation would arguably be appropriate if our aim were to explain matches between large-scale intentions and the smaller-scale, more detailed intentions which serve as building blocks for them. But in fact we are concerned with intentions and motor representations which, as argued above, have different representational formats. This creates a potential difficulty.
In general, when two representations differ in format, postulating reliably content-respecting causal processes linking them requires us to explain how their contents are coordinated. To illustrate, suppose you are given some verbal instructions describing a route. You are then shown a representation of a route on a map and asked whether this is the same route that was verbally described. You are not allowed to find out by following the routes or by imagining following them. This puts you in something like the position of the comparator process envisaged above. Special cases aside, answering the question will involve a process of translation because two distinct representational formats are involved, propositional and cartographic. It is not be enough that you could follow either representation of the route. You will also need to be able to translate from at least one representational format into at least one other format. Similarly, for there to be reliably content-respecting causal processes linking intentions with motor representations there would have to be some process of translation.
But why is this a potential difficulty? What is wrong with postulating a process of translation? The difficulty is that nothing at all is known about this hypothetical translation between intention and motor representation, nor about how it might be achieved, nor even about how it might be investigated. Of course this doesn't show that we couldn't fully explain matching by appeal to content-respecting causal processes. But it does show that no such explanation is currently available.
This, then, is why our question about the interface between intentions and motor representations amounts to a problem. It is a problem because of two natural routes to answering the question, the first (appealing to common causes of intentions and motor representations) is a non-starter and the second (appealing to content-respecting causal processes) amounts to no more than a stab in the dark. Our aim in what follows is to solve the interface problem without postulating either common causes or translation processes.
5. Demonstrative and Deferential Action Concepts
The interface problem is the problem of explaining how, some of the time, there could be non-accidental matches between outcomes variously specified by intentions and motor representations. As the previous section explained, the problem arises because intentions and motor representations have different representational formats.
There is a way to link representations with different formats that requires neither common causes nor translations. To illustrate, imagine once again that we have two representations of a route, one propositional the other cartographic. But this time suppose that the propositional representation is simply ‘Follow this route!’ where the demonstrative phrase ‘this route’ refers to the route marked on the map. This instruction does not describe the route but merely defers to another representation of it. Because the representation deferred to is cartographic, comparing the instruction with the map no longer requires translation between representational formats. We shall suggest that something analogous holds concerning the relation between intention and motor representation. To anticipate what will be explained below, some intentions involve demonstrative concepts (or other constituents) which refer to actions by deferring to motor representations.
Let us first step back and consider thought generally. The existence of demonstrative thoughts and concepts is quite widely accepted. For instance, some philosophers have proposed that there are demonstrative colour concepts (McDowell 1996; Brewer 1999). While there are objections to uses to which these have been put (e.g., Heck Jr. 2000; Dokic and Pacherie 2001), the claim that there might be such concepts is barely controversial.
Consider the possibility that some action concepts are demonstrative. Someone who says to herself, ‘I wish I could do that too’ may be entertaining a proposition which involves a demonstrative concept, that action. Clearly this demonstrative concept cannot refer to a token action. The wish was not to perform another agent's action but to perform an action of a certain type. Now the mere fact that the demonstrative concept must refer to a type is not a problem. As the case of colour concepts indicates, it is plausible that propositional attitudes can involve elements which demonstrate types and not just individuals (see also Levine 2010, §3.4).
How could the existence of demonstrative concepts for actions be relevant to solving the interface problem? In our earlier illustration, the instruction ‘follow this route!’ succeeds in referring to a route by deferring to another representation with a different format. The instruction is about a route not a representation, but it succeeds in referring to a route by deferring to a representation of that route. Similarly, some demonstrative concepts may refer to types of action such as grasping or throwing by deferring to motor representations (see Levine 2010 for one way of developing this idea). These demonstrative concepts would be concepts of actions not of motor representations, but they would succeed in being concepts of actions by deferring to motor representations. For any such concept, it is a motor representation which ultimately determines what it is a concept of.
The idea that some demonstrative concepts refer to actions by deferring to motor representations immediately raises a question. Could a demonstrative concept really defer to a motor representation? It seems clear that we can't select a motor representation to defer to in the same way that we can select a map when we say ‘Follow this route!’. After all, motor representations are not things we can point to with our hands. Nevertheless, it does seem that motor representations are available in some sense. To start with an analogy, consider pantomiming an action to yourself. You are rehearsing part of an operation which involves precisely grasping a delicate structure with some tweezers. Just as someone might point to a map and say ‘Follow that route!’, so also they could point to your pantomime and say ‘Do that!’. In our analogy, the pantomime stands in for a motor representation of the action. The demonstrative in ‘Do that!’ refers to an action by deferring to the pantomime. Of course this analogy doesn't show that demonstrative concepts can defer to motor representations (at least not unless pantomimes are motor representations of actions). But now consider purely mental pantomime—that is, phenomenologically action-like imagination. One might use this kind of imagination to explore different ways of completing a task and then, having hit on a good solution, think to oneself ‘Do that!’. It seems possible that in some such cases the demonstrative refers by deferring to a motor representation of action involved in imagining acting.
But is this really possible? Someone might object that in appealing to imagining acting we are sneaking intentions in through the back door. How can we be sure that it is ever really a motor representation rather than an intention that one defers to in thinking ‘Do that!’? Contrast two cases of phenomenologically action-like imagination, both involving a tool. In the first case, imagine grasping an object with the tool; and in the second case, imagine releasing the same object with the tool. The two cases are different but in what does their difference consist? Could it be a difference in intention that explains how they differ? Since it is possible to grasp without intending to grasp, it is surely also possible to imagine grasping without imagining intending to grasp. And note that to imagine grasping without imagining intending to grasp does not necessarily require one to imagine non-intentionally grasping. (It is easy to miss this point by confusing it with the issue of whether the imagining must itself be intentional. Whether or not it is possible to imagine grasping or releasing without intending to imagine so acting, one can certainly imagine grasping or releasing without imagining intending to so act.) So to rule out the possibility that the two imaginings differ with respect to intention, let both be neutral with respect to intention. Is the difference between the two cases then due to differing movements or patterns of muscle contraction? To rule this possibility out, let the movements and muscle contractions involved in both cases be as similar as possible: let the difference between grasping and releasing the object be a matter of how the tool is configured rather than of how your body moves. With the contrast cases elaborated by these two stipulations, it is plausible that the difference between the two imaginings will be due to the different motor representations involved in grasping and releasing. So in these cases, the thought ‘Do that!’ will refer to grasping (or releasing) by deferring to a motor representation of grasping (or of releasing). It follows that demonstrative concepts can refer to actions by deferring to motor representations.
So far we have focused on imagining acting. We are not committed to claiming that demonstrative concepts can only refer to actions by deferring to motor representations involved in imagination. Given the parallels between the phenomenologically action-like kind of imagination and actually acting already discussed, it is plausible that demonstrative reference to action by deference to motor representation is made possible not just by experiences associated with imagining acting but also by experiences associated with actually acting.
Having an intention with such a demonstrative concept does not generally require imagining or actually acting at the time the intention is formed. To see why, first note that many of the things we do are either things we have already done or else novel combinations of actions we have already performed, such as reaching, grasping and throwing. Relatedly, motor representations of actions which, taken as a whole, are novel can often be built up from motor representations of familiar actions (Rizzolatti and Sinigaglia 2008, pp. 45–49). The second step in our proposal is similar to what one might say about demonstrative route concepts. Someone encounters a map with a route marked on it. Her experience of this route is necessary for her to acquire a demonstrative concept which refers to the route by deferring to the cartographic representation of it. But once she has this demonstrative concept, she can use it on future occasions without fresh experiences of the route (although there may be some dependence on memory); and her use of this concept does not depend on the continued existence of the original representation of the route. Similarly, on our view experience of action is necessary for the acquisition of demonstrative concepts of action such as concepts of grasping and reaching but, perhaps subject to requirements on memory, not for their continued use in practical thought.
We have been arguing for the existence of demonstrative concepts which refer to actions by deferring to motor representations in order to solve the interface problem. As already stated, the interface problem is the problem of explaining how it is sometimes no accident that an intention and a motor representation specify matching outcomes despite differing in format. As long as we think of intentions and motor representations as having logically independent contents, it seems that fully solving the problem would require appeal to processes of translation linking intention with motor representation. But where intentions involve demonstrative action concepts, their contents are not necessarily logically independent of the contents of motor representations. For a demonstrative component of an intention may refer to an action by deferring to a motor representation. Where this happens, which actions the intention specifies is partially or wholly determined by the motor representation, and so the interface problem is solved.
Or is it? So far we have not distinguished between two aspects of action outcomes. Action outcomes often specify both a way of acting—whether to grasp or release, say—and also what to act on—on the mug or the pen, say. Since we have been focusing on ways of acting rather than on what is to be acted on, it would be consistent with our arguments to accept that there are demonstrative concepts of ways of acting (such as grasping and reaching, perhaps) which defer by referring to motor representations while denying that any such demonstrative concepts are also about what is to be acted on. If this combination of views were correct, we would have only a partial solution to the interface problem. We would only have explained how the outcomes variously specified by intentions and motor representations non-accidentally match with respect to ways of acting, not with respect to what is to be acted on. However a range of behavioural and neurophysiological evidence shows that motor representations represent not only ways of acting but also objects on which actions might be performed and some of their features related to possible action outcomes involving them (for a review see Gallese & Sinigaglia 2011; for discussion see Pacherie 2000, pp. 410–3). For example encountering a mug sometimes involves representing features such as the orientation and shape of its handle in motor terms (Buccino et al. 2009; Costantini et al. 2010; Cardellicchio et al. 2011; Tucker and Ellis 1998, 2001). Reference by deference to motor representation could therefore explain how action outcomes match with respect not only to ways of acting but also with respect to what is to be acted on.10
At this point we should acknowledge a complication. Above we argued that experience is needed for the acquisition of demonstrative concepts of ways of acting but not for their continued use in practical reasoning. This matters because forming an intention does not always involve newly experiencing a way of acting. However, things are more complicated concerning what to act on. Where an action involves particular things to be acted on, agents will not generally have past experiences of these. Accordingly, demonstrative reference by deference to motor representation requires that some motor representations of particular targets for action are associated with experiences of those targets. When we spontaneously intend to act on an object of which we have no previous experience (and perhaps in other cases too), our experience of that object would have to depend in part on motor representations of it. There is indeed some evidence for the ubiquity of motor aspects of experience (and also evidence that motor representation does not always modulate how objects are experienced, but that is not relevant here). For example, temporarily changing subjects’ motor abilities by artificially extending their reach systematically affects their judgements of how far away objects are (Linkenauger et al. 2009; Costantini et al. 2011). Given that such judgements are based on experience,11 this illustrates one way in which experience may be shaped in part by motor representation of objects. There is also evidence that experiential judgements of size can be influenced by ability (e.g., Witt and Dorsch 2009). It is plausible, then, that some motor representations of objects modulate experiences of them. This indicates that it is possible to refer not only to ways of acting by also to what to act on by deferring to motor representations.
Solving the interface problem may not always involve demonstrative concepts. What matters for solving the interface problem is deference, not demonstration. Suppose that someone acts on an intention to grasp the handle of a mug. Suppose also that the outcome to which her action is directed, grasping the mug's handle, is specified by motor representations. As long as the concept of grasp involved in her intention refers by deferring to a motor representation of grasping, the two specifications of action outcomes (in intention and in motor representation) will not be independent and so the interface problem will be solved.
While most philosophers would probably agree that, as a matter of fact, intentional action often involves motor representation, this is typically treated as only an enabling condition for intentional action or as merely a variety of intention. In fact most philosophical theories of action apply indifferently to agents whose actions involve motor planning and monitoring and to (imaginary) agents who need explicit practical reasoning for each muscle contraction. In our view this is a defect of those theories. To fully understand intention, we need to understand how intentions relate to motor representations. Or so we have argued.
The need to understand how intentions relate to motor representations follows from three claims defended in early sections of this paper. First, some motor representations represent outcomes (rather than, say, only bodily movements). Second, there are actions whose directedness to an outcome is grounded in motor representations. And, third, motor representations differ from intentions with respect to their representational format. These claims reveal that a single purposive action may involve representations of the outcomes to which it is directed in at least two different representational formats, motor and propositional. It is necessary to understand something of how intentions relate to motor representations in order to explain how the various representations involved in a single action sometimes non-accidentally specify matching outcomes.
The problem of explaining this we called the interface problem. Theoretically uncomplicated approaches to solving it would involve appeal to common causes or to processes of translation. The interface problem is a problem because no such approach is viable given the present state of knowledge. However, we have shown that there is another way of solving the interface problem. This solution hinges on the existence of concepts which refer to actions by deferring to motor representations. Where intentions involve such deferential concepts, their contents are logically linked to those of motor representations and so the interface problem can be solved. In short, then, we have shown that there is a theoretically coherent and empirically plausible solution to the interface problem, one which requires neither common causes nor processes of translation to link intention and motor representation.
A further conjecture is that whenever there is a non-accidental match between the contents of intentions and of motor representations, this is due at least in part to deferential action concepts. The plausibility of this conjecture rests in part on the absence of any other viable full solution to the interface problem, and of course we do not claim to have shown that no alternatives exist. If the conjecture is right, then concepts which refer by deferring to motor representations are what tie explicit practical reasoning to motor processes.
More speculatively still, we suggest that where an intention properly and reliably produces bodily movement, either acting on that intention involves a further intention or else the intention involves concepts which refer to actions by deferring to motor representations. If so, it is deferential action concepts that ultimately connect intentions to bodily movements. Only by recognising how intentions interlock with motor representations can we hope to understand how our intentions ever make a difference to the world around us.
On this view experience of acting plays a novel role. Experiences made possible by motor representation, such as those associated with phenomenologically action-like imagination and those associated with actual action, are arguably necessary for there to be concepts which are constituents of intentions and refer to actions by deferring to motor representations. But if, as we conjecture, such deference is necessary for intentions to properly and reliably result in bodily movements, it may turn out that intending to act in the world depends on experiences which are made possible by motor representation. Much as on some views thought about objects depends on perceptual experience (e.g., Campbell 2002), so also intending actions may depend on motor experience.12
On the notion of action outcome in motor representation, see Gallese (2000), Jeannerod (2006) and Rizzolatti and Sinigaglia (2008).
Of course some researchers have raised doubts concerning some of the evidence for the claim that there are markers of motor processing which carry information about action outcomes (e.g., Cavallo et al. 2011; Borroni et al. 2011). On balance, however, the evidence supports this conclusion.
Note that the distinction between content and format is orthogonal to issues about representational medium. The maps in our illustration may be paper map or electronic maps, and the instructions may be spoken, signed or written. This difference is one of medium.
Readers already convinced that motor representation differs from intention in being non-conceptual will not need the following argument in order to conclude that they are distinct phenomena. However the following considerations also indicate that motor and perceptual representations differ in format, which will be relevant later when we consider how motor representations and intentions jointly lead to action.
Note that we define this way of imagining seeing in terms of phenomenology and stipulate nothing about the processes and representations involved. This is essential for our purposes, since we wish to consider evidence for conjectures about the format of representations it involves.
These and further examples are discussed by Currie and Ravenscroft (1997, p. 165).
On distinguishing these two ways of imagining action, see Currie and Ravenscroft (1997, p. 161), Jeannerod and Decety (1995, p. 727), and Kosslyn et al. (2001, p. 638–9). The former, phenomenologically action-like imagining is sometimes labelled ‘motor’ or ‘internal’ and occasionally identified by its links to motor processes or by features of the format or content of the representations involved (Annett 1995, p. 1400). We avoid these labels because we introduce the distinction by appeal to phenomenology only and do not stipulate that motor representations are involved. It is essential for what follows that the involvement of motor representations in the phenomenologically action-like way of imagining action is a discovery rather than a stipulation.
Imagining acting without also imagining seeing may be difficult in practice, and conversely; it may also sometimes be difficult to distinguish imagining acting from imagining seeing (as Currie and Ravenscroft 1997, p. 170 suggest). However ordinary subjects can separate the two well enough to confirm predictions about their differences (see, e.g., Kosslyn et al. 2001).
Of course some use the term ‘intention’ for non-propositional representations involved in the execution and control of action. This is a narrowly terminological issue.
In fact, there may be another way of explaining non-accidental matching with respect to what is to be acted on due to Campbell (2002, pp. 36–8, 44–5, 48–57). In outline, Campbell's idea is to note that intentions can refer to objects by means of perceptual demonstrative elements, and that perceptual and motor representations of objects may be sufficiently commensurate for a perceptually represented object to be reliably selected as a target for motor processes. It would be consistent to hold both that a view like Campbell's correctly explains non-accidental matching of action outcomes with respect to what is to be acted on, and also that non-accidental matching with respect to ways of acting is explained by demonstrative action concepts which defer to motor representations. We shall leave open the question of whether non-accidental matching involves elements additional to those involved in the view we have been developing. Our claim is just that a solution to the interface problem need not involve anything other than components of intentions which refer to actions by deferring to motor representations of action outcomes.
It may be objected that these judgements could reflect non-experiential expectations; Witt (2011, pp. 203–4) reviews evidence against this possibility.
Heartfelt thanks to Gabriella Bottini, Chiara Brozzo, Silvano Zipoli Caiani, Naomi Eilan, Vittorio Gallese, Francesco Guala, Élisabeth Pacherie, Natalie Sebanz and an anonymous referee. Corado Sinigaglia was supported by EU grant TESIS, a Fellowship from the Institute of Philosophy (University of London) and a Fellowship from the Center for the Epistemology of Cognitive Sciences, Ecole Normale Superiéure de Lyon.