should be sent to Marco Mirolli, Istituto di Scienze e Tecnologie della Cognizione, CNR Via San Martino della Battaglia 44, I-00185 Roma, Italy. E-mail: firstname.lastname@example.org
Understanding the role of ‘‘representations’’ in cognitive science is a fundamental problem facing the emerging framework of embodied, situated, dynamical cognition. To make progress, I follow the approach proposed by an influential representational skeptic, Randall Beer: building artificial agents capable of minimally cognitive behaviors and assessing whether their internal states can be considered to involve representations. Hence, I operationalize the concept of representing as ‘‘standing in,’’ and I look for representations in embodied agents involved in simple categorization tasks. In a first experiment, no representation can be found, but the relevance of the task is undermined by the fact that agents with no internal states can reach high performance. A simple modification makes the task more “representationally hungry,” and in this case, agents’ internal states are found to qualify as representations. I conclude by discussing the benefits of reconciling the embodied-dynamical approach with the notion of representation.
Recently, one of the most influential proponents of both the dynamical approach to cognition and the skepticism on the notion of representation, Randal Beer (Beer, 1995, 2000), has rightly noticed that the debate on representations has been mostly based on philosophical argumentations, intuitions, and analogies, rather than on concrete models (Beer, 1996). To overcome this problem, Beer proposed using simple evolutionary simulations of adaptive behavior as a means to ground such a debate on concrete models relying on as few a priori assumptions as possible (Beer 2003b). Beer's idea was to construct simple idealized models of a ‘‘minimally cognitive behavior’’ as ‘‘prosthetically controlled thought experiments’’ (Bedau, 2002; Dennett, 1994) with which to explore the fundamental issues posed by the embodied cognition framework, in particular, the issue of representations.
In line with this view, in a recent seminal paper (Beer, 2003b), Beer developed a detailed analysis of an artificial agent capable of exhibiting categorical perception behavior using exclusively the methods and concepts of dynamical systems theory. The provision of a detailed explanation of a minimally cognitive behavior in dynamical terms and without relying on the concept of representation is rightly considered by Beer as a substantial contribution in favor of representational skepticism. But notwithstanding the importance of his work, Beer did not actually check whether in fact his evolved agent's internal states could be considered as representations. In this paper, I try to complete Beer's work: I assess whether (and, eventually, in which conditions) minimally cognitive agents actually use representations.
The rest of the article is structured as follows: In the next section, I briefly discuss the shortcomings of Beer's representational skepticism. In section 3, I describe my replication of the simulations of Beer (2003b). In section 4, I operationalize the basic definition of representations described in section 2 and assess the presence of representations in my best evolved agent, with negative results. In section 5, I present a control simulation with which I check the relevance of the chosen task for the representational debate, again, with negative results. In section 6, I present a simple modification of the task that makes it more relevant for our purposes, and I assess the presence of representations in the best evolved agent of this task, this time with positive results. Finally, in section 7, I discuss the relevance of the presented work with respect to the debate on representations, and in section 8, I draw my conclusions.
2. Minimally cognitive agents and representations
The dynamical analyses of a minimally cognitive agent exhibiting categorization behavior presented in Beer (2003b) are truly impressive: a brilliant example of the application of the dynamical system approach to cognition. Nonetheless, some perplexities remain about the conclusions that Beers wants to draw on the issue of representations. Several of the commentators of Beer's target article questioned the relevance of Beer's work on the ground that the analyzed model was too simple, and that the same kind of analysis might not scale up to more complex agents (e.g., Clark, 2003; Edelman, 2003). While there might be some truth in this kind of criticism, as I will show later, just asserting that more complex cognitive skills need to be supported by representations is simply begging the question. The need for representations, as Beer rightly argues in his response (Beer, 2003a), is something which must be demonstrated, not established a priori.
But even if one accepts to play Beer's game, an even more compelling perplexity arises: Do Beer's evolved agents really not possess any representation or have representations simply not been looked for? In fact, although Beer's declared strategy was to ‘‘evolve agents on tasks that are rich enough to be representationally interesting, then examine whether or not these agents actually use representations in their operation’’ (Beer, 2003a, p. 304, emphasis added), Beer's analyses did not include such a careful examination. While a great deal of effort was made in providing an impressive amount of dynamical analyses, Beer never explicitly and concretely tried to assess whether representational content might be ascribed to his agent's internal states. Although apparently odd, this is quite typical from researchers in the adaptive behavior community who are critics of representations. Most probably, this is because these critics usually consider the notion of internal representation too vague and ill-defined to support such an investigation (Harvey, 1996). Beer seems to share this concern when he says that ‘‘‘representation’ is far too malleable a label’’ (Beer, 2003b, p. 237) and that ‘‘...this methodology will only work if advocates of particular sorts of representation are willing to specify clear criteria for identifying them’’ (Beer, 2003a, p. 304).
Notwithstanding the perplexities of the representational skeptics, close scrutiny of the debate over representations reveals not only that definitions of representation have indeed been given but also that there is a significant amount of consensus with respect to the basic idea. In fact, all the participants in the debate agree that not all internal states are representations (one of the skeptic's most pressing concerns), and that the simple fact that an internal state correlates with some external feature is not sufficient either (I use the term “feature” to refer to any possible content of a representation: objects, categories, events, actions, contexts...). Rather, for an internal state to be a representation, the correlation must be functional to the agent's behavior: in other words, the internal state must contribute to the agent's behavior in a way that is adaptive to what it correlates with. For example, Clark claims: ‘‘...standing-in requires not mere correlation, but adaptively or functionally intended correlation’’ (Clark, 1997b, p. 464, emphasis added). In the same spirit, Bechtel says: ‘‘...reliable covariation is not sufficient... the additional component is that a representation has as its function the carrying of specific information’’ (Bechtel, 1998, p. 298, emphasis added). Surprisingly, Beer himself seems to be aware of this consensus as he states: ‘‘The basic idea is that these internal representations stand in for (re-present) external things in a way that both carries a coherent semantic interpretation and plays a direct causal role in the cognitive machinery’’ (Beer, 2003b, p. 237, emphasis added). In my view, all these characterizations clearly convey the same basic idea, and they are also detailed enough to make the search of internal representations in a simple model agent feasible. Hence, it is time to complete what Beer has left undone; that is, assess whether minimally cognitive agents actually use internal representations.1
I replicate the categorization experiments presented in Beer (1996, 2003b) with some modifications to irrelevant details (regarding the network's constraints and the genetic algorithm). The agent has a circular body with a diameter of 30 and is situated in an environment of size 400 × 275 (Fig. 1A). The agent's ‘‘eye’’ consists of seven rays of maximum length 220 uniformly distributed over a visual angle of π/6. The intersection between a ray and an object causes an input to be injected into the corresponding sensory node with a magnitude inversely proportional to the distance from the object. When the rays are at their maximum length, no input is injected, while the maximum input (1) is injected for rays of zero length. The agent can move horizontally as objects fall from above with horizontal velocity proportional to the sum of the opposing forces produced by two motors (maximum speed = 5). The behavior of the agent is controlled by a network with seven input units (ray sensors) sending connections to three fully recurrent internal units, which in turn send connections to the two motor units (Fig. 1b). All neurons, including the input ones, are leaky integrators; their activation (oi) is updated according to the following equation:
where τi is a time constant, θi is the bias (or the input injected to the sensor in the case of input neurons), and wij is the weight from neuron j to neuron i.
The parameters of the neural controllers are evolved using a standard genetic algorithm. All the weights, biases, and time constants are encoded in the genotype of evolving individuals as 8-bit strings. Weights and biases are normalized in [−5,5], time constants are normalized in [0,1]. Agents are evolved for their ability to catch circles and avoid diamonds. Each individual is tested for 50 trials, half of which include a circle and half a diamond. In each trial, the agent starts from the middle of the environment and an object falls straight down with a random initial horizontal offset in the range [−50,50] (with respect to the agent) and a vertical velocity of 3. Circular objects have a diameter of 30 and diamonds have a side of 30.
The fitness of individuals is computed as , where N is the total number of trials (50) and for a circular object and for diamonds; di is the horizontal distance between the center of the object and the agent when their vertical separation goes to 0 in trial i, and it is clipped to maxDist and normalized in [0,1]; maxDist is 1.5 times the sum of the radii of the object and the agent (during post-evolutionary tests, I also calculate a discrete trial performance, pdi, which equals 1 if pi > 0.5, and 0 otherwise). The population is composed of 100 individuals. The 20 best genotypes of each generation are allowed to reproduce by generating five copies each, with 3% of their bits re-placed with a new randomly selected value (with elitism: i.e., in each generation the best individual of the previous generation, is retained without mutations). Evolution lasts 250 generations.
The basic results of the target experiments were successfully replicated: Over 10 evolutionary runs the best evolved agent reached a fitness of 0.999 on the 50 evaluation trials, and a mean performance of 0.9936 when tested on 1,000 new randomly generated trials (discrete average performance = 1). This is an almost perfect performance, which is even slightly higher than that of Beer's best evolved agent, which reached an average of 0.9708 on 10,000 test trials. As the focus of the present paper is on searching for internal representations, I will not provide a detailed analysis of agent's behavior. Suffice it to say that, behaviorally, the strategy of our best evolved agent looks qualitatively very similar to that of the agent analyzed in Beer (2003b), being based on an initial foveation of the falling object, an active scanning of it through back-and-forth movements, and then a decrease of the amplitude of the scanning until circles are centered or a large avoidance movement with respect to diamonds.
4. In search of the representational grail
Now that we have an embodied agent that is able to categorize objects thanks to its dynamical interactions with the environment, we want to assess whether the agent's internal states can be characterized as representations. To do that, we must operationalize the basic, well-accepted notion of representation that we have discussed in section 2. According to such a notion, to qualify as representational, an internal state must (a) tend to correlate with some relevant (for the agent) feature of the environment and (b) contribute to agent's behavior in a way that is functional (adaptive) to the feature with which it correlates. Fortunately, the simplicity of the experimental set-up makes the task of operationalizing these two requisites relatively easy.
As all the agent has to do is catch circles and avoid diamonds, and as the agent's simple neural network cannot do very complex perceptual computations, the most plausible (if not the only) features that our agent might represent are just circles and diamonds. Hence, for checking criterion (a) we must look at whether any of the agent's internal states do indeed reliably correlate with circles and diamonds. The agent control system is composed of the input, hidden, and output units and by the connection weights (linking input-to-hidden, hidden-to-hidden, and hidden-to-output). The inputs constitute the agent's sensory system that is directly activated by the objects in the environment, and the outputs constitute the agent's motor system that directly affects behavior. Hence, inputs and outputs constitute the interface between the system and the environment and cannot be considered part of its internal machinery. The connection weights are what determine the behavior of the system, given the input received from the environment. As they never change during the whole “life” of the agent, they do not correlate with anything, and hence they cannot represent anything. The only things that are internal to the network and that do change during the agent interactions with the environment are the activations of the hidden units. Hence, for checking criterion (a) we need to look at the activations of the three hidden units and check whether they correlate with circles and diamonds (as the pattern of the activations of the three hidden units is the only thing that is internal to the agent and that changes through time, I will refer to that pattern in each point in time as the “internal state” of the agent at that point in time).
Fig. 2 shows the trajectories of the internal state of the best-evolved agent during 50 tests, half of which involve interacting with circles and the other half with diamonds. While individual trajectories during different interactions with the same kind of object vary depending on the object's initial off-set, the agent's internal state at the end of the trial does correlate very well with the category of the object with which the agent has interacted (and with the action that the agent has performed: catch or avoid). In fact, the vector of the activations of the hidden units is very similar every time the agent caught a circle (the centroid of end points of the trajectories with circles is <0.0000,0.0002,0.8741>) and very similar every time the agent avoided a diamond (in this case, the centroid's coordinates are <0.8190,0.0186,0.7542>); but the centroids in the two cases are quite different, in particular, with respect to the activation of the first hidden unit.
Now that we have discovered a significant correlation between the patterns of activation of the hidden units at the end of the trial and the category to which objects in the environment belong, we have a plausible candidate for a representation, that is, the centroid of the trajectories’ end-points for each category. To check whether these internal states can be considered real representations, we must ensure that this correlation is functionally significant, as required by criterion (b) above. How can criterion (b) be operationalized in a simple embodied system like our own? As the requirement is that the candidate representation must contribute to the agent's behavior in a way that is adaptive with respect to what it is supposed to represent, the most straightforward way to operationalize this criterion is to impose the candidate representation in the agent's “brain” and check whether the resulting behavior is functional with respect to what is supposed to be represented. In particular, I implement this by imposing the candidate representations before the interaction with the object has started (i.e., at the beginning of the trial). If the resulting behavior is (immediately) adaptive with respect to the supposedly represented feature (in our case object's category), then the criterion is met, and we can say that we have found “real” representations; otherwise, the criterion is not met, and we have discovered that our agent does not represent anything after all, as Beer suggested. Note that, in line with the intuitive notion of representation that we are trying to operationalize here, this second criterion for representationhood is purely behavioral and does not make any assumption with respect to what should be happening inside the system. The internal state might be a point attractor that never changes throughout the trial, it might follow a cyclic trajectory, or it might (at least in theory) even change chaotically. As far as our criterion for representationhood is concerned, the only thing that matters is that the internal state leads to an external behavior that is appropriate with respect to the object category that it is supposed to represent (either circle or diamond).
Hence, I run 50 random tests (half with circles and half with diamonds) in which I initialize the activations of the internal neurons of the agent to the centroid of the final patterns of the category of the object with which the agent has to interact (in this way, the agent is provided with the candidate representation of the correct category a few time steps—about 10—before the falling object can impact the sensors). All other units are initialized, as usual, to 0. The results of such a test are quite interesting. Imposing the candidate representations onto the agent before it can interact with the object does not facilitate the agent. On the contrary, average performance is disrupted to chance level: 0.5 for both the continuous and the discrete measure. In fact, although starting with different internal states, the agent's behavior is almost identical in both circles and diamond trials: the agent always moves right, thus always producing an avoidance behavior irrespective of the category of the falling object (thus scoring 1 with all the diamonds and 0 with all the circles). The reason is that even though the regions in the internal state space correlating with the two categories of objects seem to correspond to two attractors, from which the internal dynamics tend not to move (Fig. 3), the attractor correlating with circles does not, by itself, make the agent behave in a way which is appropriate to circles, that is, staying still while centering the falling circle. Rather, it is only when the internal state is coupled with the appropriate agent–environment dynamical interactions that the whole system composed by the state, the agent, and the environment produces the appropriate behavior. Hence, the states correlating with the presence of circles cannot be considered a representation of circles, as they satisfy only one (the first) of the two criteria for representationhood.
At this point, a strenuous defender of representations might claim that even if we have not found a representation of “circles,” at least we have found a representation of ‘‘diamonds.’’ This would be misleading because there is substantial agreement in the literature that for an internal state to be a representation, it must be part of a system of representations in which different states represent different things. If this is not the case, then the use of the notion of representation is of no value. Hence, we are forced to conclude that the internal states of our model do not have representational value. In fact, as they cannot appropriately guide behavior independently from their appropriate coupling with the agent-environment interactions, we cannot ascribe them any coherent semantic interpretation.
5. Representational “hungryness”
Now that we have assessed that no representational role can be ascribed to our minimally cognitive agent, a doubt can still come to mind (acknowledged by Beer himself): Is the chosen task sufficiently representationally hungry (Clark & Toribio, 1994) to ‘‘fully engage questions regarding the utility of representational thinking’’ (Beer, 2003b, p. 239)? Or is it too simple to be relevant for the debate, as some of Beer's critics (e.g. Clark, 2003; Edelman, 2003) suggest? With respect to this important issue, Beer seems to betray his own methodological rigor. In fact, he addresses the issue at the same level of his critics, that is, with verbal argumentations and by appealing to intuitions (Beer, 2003a,b), instead of trying to ground the debate on concrete modeling work. In fact, even though we currently have no way to measure “representational hungryness,” at least we can set a clear, operational threshold for what cannot be said to be representationally hungry in any meaningful sense: A task is surely not representationally hungry if it can be solved by agents that do not possess internal states.
In the spirit of the minimally cognitive behavior approach, with our operationalized threshold, we can start to ground the debate on (minimal) representational hungryness on concrete simulations rather than on intuitions and verbal argumentations. To do this, I re-ran the same evolutionary experiments described above with the only difference that in this case agents’ neural network has no hidden units, and the seven input units are directly connected to the two output units. As the only thing that is “internal” (in between the inputs and the outputs) to such a network is the weight connection matrix, which does not change during the agent's “life” and hence cannot correlate with anything, the agents of this simulation have no internal states: hence, they cannot, by definition, have internal representations.
The results of these simulation are quite illuminating: over 10 evolutionary runs, the best-evolved agent reached a fitness of 0.97 on the 50 evaluation trials. The average performance of this agent on a test with 1,000 new randomly generated trials was 0.9027, with a discrete average performance of 0.93. This means that the agent mis-categorize the object only in 7% of the trials. All errors happen with circles, in particular, when they start to fall at the very left with respect to the agent. In those cases, the agent simply ignores the object: It moves right from the very beginning, immediately loses “visual” contact with the object, and ends the trial with an erroneous “avoid.” Every time the agent interacts with the object, which happens 93% of the time, it correctly categorizes it, catching circles and avoiding diamonds.
From these results, it is clear that the task cannot really be considered suitable to assess the utility of representational talk, as it can be solved by an agent that has no internal states and, a fortiori, cannot have any internal representations. Note that this does not mean that we have to raise the threshold for considering a behavior cognitively interesting so that categorical perception does not qualify. Categorization is surely one of the most (if not the most: Harnad, 2005) fundamental cognitive phenomena, and as our agents are able to produce a minimal form of categorization, they do qualify as minimally cognitive. But if agents that cannot possess representations are able to solve the task, this means that the task, although relevant for cognitive science, is not relevant for the representational debate. In fact, this is indeed another confirmation of one of the main contributions that embodied cognition research has provided to cognitive science, namely, the realization that at least some of the problems which, from a classical stance, might seem to require the use of internal representations, can in fact be solved by relying on purely sensory-motor, non representational strategies (Brooks, 1991; Nolfi, 1998; Pfeifer & Scheier, 1999). As discussed above, to be relevant not only for cognitive science in general, but for the debate on representations in particular, a task should at least be such that it cannot be solved by agents without internal states. Again, this does not mean that we have to stop discussing the issue until we will manage to evolve agents that are able to perform extremely complicated tasks requiring, for example, abstract thinking and problem-solving abilities or the use of language, as Beer's critics seemed to suggest. Even though it is always difficult to establish a priori whether a task is solvable or not by purely sensory-motor strategies, we can use our evolutionary simulations to find this out “empirically.”
To do so, we just need to find the simplest modification of the task we have used so far that makes it pass the operational threshold we have just set for representational hungryness; that is, we need to modify it so that it cannot be solved by agents that have no internal states. One possible way of achieving this might consist of making the task require some (minimal) form of memory: If the task requires that the agent “remember” the object category also after it has stopped its dynamical interactions with the object, it seems that an agent without internal states cannot solve it. A task with similar characteristics can be obtained by just slightly changing the behavior that is required from the agent: Instead of avoiding diamonds and catching circles, we just need to ask the agent to move as leftward as possible with circles and as rightward as possible with diamonds. In this way, once the agent has categorized the object and moves toward the appropriate direction, the object disappears from sight. In such a situation, for the agent to keep moving correctly, it seems to require some form of memory about which kind of object it has interacted with.
To “empirically” test whether such a simple modification would indeed be sufficient for making the task more representationally hungry (according to our operational threshold), I ran another set of simulations in which the agents’ neural networks have no internal units and the categorization task is changed accordingly. In particular, in this new experiment, everything runs as in the previous simulation except for the fitness function. In this case, the contribution to fitness in each trial i is computed as follows: for a circular object and for diamonds, where di is the normalized distance between the agent and the left wall of the environment when the vertical separation between the agent and the falling object goes to 0. The results of the simulations confirm our analysis. Indeed, purely reactive agents are not able to evolve a successful strategy; in this case, over 10 evolutionary runs, the best-evolved agent reached a fitness of 0.675 on the 50 evaluation trials, and a mean performance of 0.5985 on a test on 1,000 new randomly generated trials. Agent's behavior has little to do with categorization: It tends always to move leftward, while, sometimes it centers the object when it falls from the left. As a result, the discrete average performance of this agent is 0.56, meaning that the ability of the agent to categorize objects is just very slightly higher than chance level.
It is important to bear in mind that this modification to the task for increasing its representational hungryness is not the only possible one; it is only the simplest and most straightforward that could do the required job. In particular, although I made the task more representationally hungry by making it more memory hungry, this must not be taken to imply that there is a special, necessary relationship between the need of representations and the need of memory. Indeed, there may very well be tasks that do not require memory, but that are nonetheless representationally hungry, and, vice versa, tasks that require some form of memory that do not require representations. Neither our operational criteria for representationhood nor the operational threshold for representational hungryness makes any reference to the need of memory. It just happened that, in this particular case, adding a need for memory in the task was sufficient for making it pass the operational threshold for representational hungryness.
6. Representations, at last
The slightly modified task just described is considerably more relevant for a discussion about representation than the original one, as it seems to require the possession of internal states to be solved (of course, it is possible that other kinds of neural networks without hidden states but, for example, with a memory of the motor states might be able to solve the task; nonetheless, if we do not provide this kind of information to the network, the previous simulation shows that this task requires internal states). So I ran a final set of evolutionary experiments in which the neural network is that of the original task (with three fully recurrent internal units), but the task is the slightly modified one, that is, going to the left for circles and to the right for diamonds. Agents provided with internal states are able to efficiently solve the task: Over 10 evolutionary runs, the best-evolved agent reached a fitness of 0.923 on the 50 evaluation trials, and a mean performance of 0.8985 on 1,000 new randomly generated trials (discrete average performance = 1).
Now we can apply the two tests for assessing the presence of representations to the best-evolved agent in this task. The results are shown in Figs. 4 and 5. As in the original experiment, the internal state of the agent always ends up in one of two very small regions of its state space depending on the object the agent has interacted with (Fig. 4). But, contrary to what happened to the agent of the original experiment, in this case, the internal state correlating with the object's category consistently guides the agent's behavior in a way that is appropriate to what it correlates with (i.e., the category). If we impose this internal state before the agent has interacted with the object (i.e., at the beginning of a trial), the internal state is maintained constant during the whole trial (Fig. 5). As a result, the agent always produces the appropriate behavior: moving leftward in case of circles and rightward in case of diamonds. Indeed, the performance in these tests is even significantly better (0.9568) than in the normal tests (0.8985). The reason is clear: As in this case the agent “knows” from the beginning of the trial which object is present, it can directly produce the appropriate behavior without wasting time in actively scanning the object to discover its category, as it needs to do in normal trials. Hence, the two attractors in the agent's internal state space satisfy both the criteria for representationhood: They both (a) correlate with a feature in the environment that is relevant for the agent's survival and reproduction (i.e., the category of the falling object) and (b) guide agent's behavior in a way that is adaptive with respect to that feature (i.e., moving leftward or rightward).
It is important to clarify that the fact that the internal states that do qualify as representations are states that are maintained constant throughout the trial is contingent to this particular case, and they should not be considered as necessarily related to representationhood. Our operational criteria for representationhood do not mention any stability requirement. Hence, there may well be internal states that are not enduring but that qualify as representations and, vice versa, it is certainly possible to have enduring, internal states that do not possess the requisites for qualifying as representations. Indeed, the states that were tested for representationhood in the original task in section 4 happened to be maintained relatively constant during the trial (see Fig. 3), but they did not pass the test for representationhood.
Having discovered that our agent solves its task by relying on internal representations, we can use this knowledge to make predictions about the agent's behavior. For example, we can predict that if the agent “thinks” that a circle is present in its environment (i.e., if its internal state is the one representing circles), it will behave accordingly, irrespective of the effective presence of a circle. I tested this prediction by running another set of 50 trials in which I impose, at the beginning of the trial, the internal state representing the wrong object. The prediction was verified: in 100% of cases, the agent behaved in a way which was appropriate to the category of the (wrongly) represented object, and not to the one actually present in the environment. As a result, the resulting average performance was 0.0448 (discrete average performance = 0).
Understanding the role (if any) that the concept of representation might play in the future science of behavior is one of the most important, fundamental problems facing the emerging framework of embodied cognition. To make progress in this endeavor, in the present article I have followed the approach recently proposed by one of the most influential representational skeptics, that is Randall Beer (Beer, 1996, 2000, 2003b): Instead of continuing to argue endlessly on the basis of intuitions and speculations, build artificial, idealized models of agents capable of minimally cognitive behaviors, and then assess whether their internal states can be meaningfully considered to involve “representations.” In this spirit, I looked for the presence of representations in evolved agents, which had to solve Beer's well-known categorical perception experiment. To do that, I had to operationalize a fairly standard concept of representations (Bechtel, 1998; Beer, 2003b; Clark, 1997b) according to which an internal state can be considered a representation if (a) it tends to correlate with some relevant (for the agent) feature of the environment, and (b) it contributes to agent's behavior in a way which is functional to the feature with which it correlates. While I was able to find states satisfying (a), criterion (b) was not met, thus supporting Beer's view that no representations could be found in these agents. On the other hand, the successful replication of the same experiments with agents whose controller did not possess any internal state and hence could not have internal representations by definition questioned the relevance of the chosen task for a fair assessment of the representational issue. A simple modification of the original task made it unsolvable by agents with no internal states, and consequently much more suitable for our purposes. The assessment of the presence of representations in agents provided with internal states able to solve the modified task gave positive results: In this case, I found internal states that not only correlated with environmental features of the environment but also guided agent's behavior in a way that was functional with respect to those features, thus meeting both above-mentioned criteria for representationhood.
A few points still need to be clarified. First, the operationalization of the concept of representation I have provided is surely very limited and simple, although it is in line with what is commonly done both in neuroscientific research and in neural network modeling. When looking for internal representations in real brains, the vast majority of neuroscientists look at correlations between external events and the frequencies of spikes (firing rate) of neurons, which is just what the activation of an artificial neuron corresponds to. There are other, complementary proposals, according to which some information in the brain might be encoded by the timing of single spikes, or by the synchronization between neurons or groups of neurons (see, e.g., Singer, 1999; VanRullen, Guyonneau, & Thorpe, 2005). Although this is certainly an interesting possibility that merits further investigation, such coding cannot have any correspondent in neural network models, like the one used in this study, in which neurons are firing rate units and do not have spikes. Furthermore, the idea of looking at the pattern of activations of the hidden units for candidate representations in a neural network is in fact the standard approach in connectionist research at least since the seminal paper by Rumelhart, Hintont, and Williams (1986a). Indeed, in a classical connectionist network, where the network's output patterns are produced in response to single input patterns, the network's “behavior” crucially depends on the patterns in the hidden units. For this reason, if a network behaves correctly, one can be almost certain that internal representations can be found, because the activation patterns in (sub-sets of) the hidden units need to (a) correlate with something relevant—that is, the category of the input that determines what is the appropriate response—and (b) contribute to drive the output of the system in a manner that is appropriate to what they correlate with, because the appropriate response depends just on those patterns. Taking into account embodiment makes two critical differences with respect to the issue of representations. First, the presence of an internal state that would be classified as a representation cannot be taken for granted anymore, because in an embodied system, behaviors crucially depend not just on internal states but (also) on the dynamical interactions between the system and its environment. This is the ground on which proponents of embodied cognition and/or of the dynamical approach have based their skeptical challenge to the notion of representation. Second, in an embodied system there is no longer a single internal state that corresponds to a single input and to a single response. This is why, to search for internal representations, we needed to look at the whole sequence of internal states produced during the agent's interaction with the environment and tried to find a sub-set of such states that did correlate with something relevant in the environment. The way we did this was quite simple. Several researchers have been trying to develop more sophisticated methods for understanding the causal relationships between the internal dynamics of a neural network and the resulting behavior (e.g., Aharonov, Segev, Meilijson, & Ruppin, 2003; Hope, Stoianov, & Zorzi, 2010; Keinan, Meilijson, Ruppin, Hilgetag, & Meilijson, 2003; Seth, 2008). In particular, Hope et al. (2010) proposed what they call the “behavioral manipulation method” (BMM) and applied it to a version of the Beer's categorical perception model used also in the present study. Such a method offers a statistically justifiable way of identifying the hidden units that are causally significant with respect to the network behavior, and thus it represents a more principled (even if much more complex and computationally expensive) way to interpret the network's states than the one used in this study. However, the goal of the present study was not to propose the best possible way to study the internal activity of an embodied neural network, but rather to check whether something which could meaningfully be called “representations” could be found in a minimally cognitive model agent. With respect to this goal, and notwithstanding its limits, my operationalization of the notion of representation has at least four undeniable merits: (1) it is simple, (2) it is not vacuous, (3) it is in line with the standard, pre-theoretic notion of representation, and (4) it can be used for assessing the presence (or absence!) of representations in embodied agents. Hence, my operationalization has all the characteristics that are necessary for playing its role in the debate on representations: that is, preventing any future complaint that the notion of representation is too vague and malleable to be of any use in a rational debate (e.g., Beer, 2003b; Cliff & Noble, 1997; Harvey, 1996; Haselager et al., 2003).
Second, the notion of internal representation I have discussed in this article is not only meaningful and applicable, it is also predictive, and hence it can be profitably used for understanding an agent's behavior. For example, in the last simulation, I have successfully tested the prediction that the agent would behave according to its internal representation of objects irrespective of whether the object is actually present in the environment. This kind of test demonstrates another feature that is considered to be characteristic of representations, that is the possibility to have wrong representations: ‘‘it seems an essential aspect of representation that misrepresentation is possible’’ (Bechtel, 1998, pag. 298; see also, for example, Cummins, 1989; Millikan, 1984).
Third, it is probably worth clarifying the relationships between the broad notion of representation that has been used in this study and narrower notions. For example, Clark and Grush (1999) distinguish between a weak and a strong sense of representations (see also Clark, 1997b): for a state to classify as a representation in the weak sense, only adaptive functional correlation is required; while representations in the strong sense are ‘‘inner items capable of playing their roles in the absence of the on-going perceptual inputs that ordinarily allow us to key our actions to our world’’ (Clark & Grush, 1999, p. 9). This stronger notion is in line with the idea that representations are related to the internal tracking of things that are currently absent (Haugeland, 1991; Smith, 1996). Clark and Grush introduced the distinction to discriminate between the kind of inner states that might be present in systems like the one I analyzed in the present study and the inner states of an emulator in the control-theoretic sense: that is, states that constitute predictions of the perceptual consequences of the system's actions. This notwithstanding, it is interesting to note that to make my agent develop internal representations (in the weak sense), I modified the original task so that the agent had to behave appropriately to the object's category even when not directly perceiving the object. This prevented the possibility for the agent to rely on a purely sensory-motor reactive strategy; in fact, the category of the perceived object had to be internally stored so that the agent might keep producing the appropriate behavior after the object had exited the agent's visual field. In other words, the function of my agent's representational states is just to play the role of stand-ins for the category of the perceived object when the object is no longer providing any perceptual input to the agent. Hence, our minimalist simulations seem to demonstrate that it is possible to have an agent possessing ‘‘full-blooded’’ representations (in the strong sense of Clark and Grush) without having coupled inverse and forward models, as the emulator theory of representations suggests (Grush, 2004). To clarify my position, I agree that there are good reasons to believe that something like emulators developed in real animals for motor control and that they also might play a fundamental role in higher level cognitive activities (Barsalou, 1999; Grush, 2004; Hesslow, 2002; Pezzulo, 2008), but the usefulness of the (broader) concept of representation is not limited to the cases in which emulations (or simulations) are present.
Fourth, the previous issue is related to another general point that is worth clarifying. Most accounts of representations involve references to some kind of “cognizer” to which representations must be delivered for their representational status to be considered effective. This is consistent with the intuitive notion that the representational status of a given entity crucially depends on the ‘‘consumer’’ of the representation: the same object, action, or event can represent different things (including no-thing) to different agents. It is on this ground that representational skeptic Inman Harvey stated that ‘‘when talking of representation it is imperative to make clear who the users of the representation are’’ (Harvey, 1996). How does our approach deal with the problem of identifying the user of the representation? The first thing to consider is that if the suggested ‘‘cognizer’’ is taken too literally (as is typically done), it just begs the question about representationhood: In its cognitive activities, the cognizer would need to manipulate its own representations, thus requiring another sub-cognizer to which its own representations have to be delivered, and so on. If one wants to escape from an infinite regress, there must be a final “user” of the representation that is simple enough not to contain any representation (the potential problem of infinite regress in cognitive explanations is well known in the philosophy of mind: for a valuable discussion, see, for example, Dennett, 1978). In this respect, a simple three-layer neural network like the one used in this work can be conceived as being composed of two sub-systems: the first, consisting of the connection weights linking the input to the hidden units plus the recurrent connections in the hidden layer, is the part of the system that ‘‘produces’’ the representations; the second, consisting of the connection weights linking the hidden to the output units, is the part of the system that “uses” the representations produced by the other sub-system. This way of framing the problem is interesting because, on the one hand, it satisfies the intuition that representations require a user (and thus responds to Harvey's concern); on the other, it demonstrates that this requirement does not necessarily lead to an infinite regress of homunculi inside the cognitive system.
Fifth, I think that the exercise of looking for representations in a principled way has proved to be quite instructive and rewarding. For example, it has permitted us to verify once again that there are cognitively interesting tasks, like the original categorization task, that can be solved by relying only on purely sensory-motor reactive processes (for previous demonstration of this point see, for example, Floreano, Kato, Marocco, & Sauser, 2004; Harvey, Husbands, & Cliff, 1994; Nolfi, 1997; Scheier et al., 1998). It has also allowed us to discover that other tasks, like the modified task involving memory, are much less tractable with purely sensory-motor strategies (on this point, see also Mirolli, Ferrauto, & Nolfi, 2010; Nolfi, 2002). Furthermore, and most important, our exercise has made us discover another interesting fact: Agents endowed with internal states that have to solve the first kind of simpler tasks tend to rely on sensory-motor strategies and to not develop internal representational states. On the other hand, the results of our last simulations suggest that agents dealing with more complex tasks that can't be solved by relying on purely sensory-motor strategies might indeed tend to develop internal states to which we can ascribe a representational status. However, it is important to remember that regarding the question on which kind of tasks requires which kind of ability, we can only provide partial indications of general tendencies: definite conclusions and general rules can seldom, if ever, be drawn. To clarify, my simulations demonstrate that, in this particular case, making the task more memory-hungry resulted in making it more representationally hungry. This by no means implies that all and only the tasks that require some form of memory need internal representations to be solved. As discussed above, artificial life/evolutionary robotic research has already sufficiently demonstrated that our intuitions on what is the space of possible solutions to a given task can be completely wrong, and that whether a task requires representations (and which kind of representations) cannot in general be established a priori. The only thing we can do is to keep investigating, both through real experimentation and through simplified computational models, which kinds of tasks tend to require which kinds of abilities. In this respect, our simulations do suggest that memory-hungry tasks tend to be representationally hungry. But clearly, much more work has to be done to find out which kinds of task tend to promote or even require the development of internal representations and which ones do not. And I think that the kind of idealized models that have been used in this study constitute very appropriate tools for investigating this kind of issues.
Finally, it is useful to briefly clarify my general position with respect to the standard account of representations in Cognitive Science. The view of cognition that is emerging from embodied, situated, dynamical approaches strongly contrasts the old view that cognition is to be understood as the manipulation of the kind of objective, discrete, abstract, amodal, rule-governed representations assumed by classical cognitive science. Indeed, even though it is possible that some aspects of cognitive activity might involve the use of internal representations that have some of the features of the representations assumed by classical cognitive science, as argued, for example, by Markman and Dietrich (2000a,b), I think that the best way to understand high-level human cognition is to investigate how lower level sub-symbolic cognitive functions are transformed by the internalization of ‘‘cognitive tricks’’ mediated by language. The idea, originally developed by Vygotsky in the 1930s (Vygotsky, 1978) and recently re-proposed in cognitive-science-oriented philosophy of mind (Clark, 1997b, 2006; Dennett, 1991, 1993) and in several areas of psychology (e.g., Gentner, 2003; Spelke, 2003; Tomasello, 2003), is that human cognition depends on the internalization of the social interactions that a developing child has with adults and more skillful peers. The behavior of the child is continually influenced by social interactions, typically mediated by language, that help the child in solving all kinds of (cognitive) tasks: linguistic social aid helps the child in learning how to categorize experiences, to focus attention on important aspects of the environment, to memorize and recall useful information, to inhibit inappropriate spontaneous behavior, to construct plans for solving complex tasks, and so on. During development, the child learns to rehearse the social linguistic aid when faced with similar tasks all alone, and then, when the linguistic self-aid has been mastered, private speech is internalized, thus becoming inner speech. If this analysis is correct, human cognition just seems to be constituted by language-like symbol manipulation because it is the result of the transformation of sub-symbolic embodied and dynamic processes by internalized linguistic social aids (for more detailed discussions of these ideas, see Mirolli & Parisi, 2009, 2011; for preliminary modeling works along these lines, see Cangelosi & Harnad, 2000; Clowes & Morse, 2005; Lupyan, 2005; Mirolli & Parisi, 2005a,b, 2006; Schyns, 1991; Steels, 2003b).
The main goal of the present study was to show that the new embodied cognitive science is not in contrast with the idea of internal representations per se, but only with the symbolic representations that were prototypical of classical cognitive explanations. If this is the case, as I hope I have demonstrated, what must be done to further develop the new embodied cognitive science is to stop quarreling about the necessity of the concept of representation in general and to focus on the development of new ideas and theories regarding the kinds of representations that embodied systems might need to use for solving their cognitive tasks. Indeed, new ideas should be developed to re-think all aspects of representationhood, including:
I posit that this challenging but exciting endeavor will require the fruitful collaboration among open-minded philosophers, psychologists, neuroscientists, and computational modelers.
In what follows, even when I use the term ‘‘representation’’ without further qualifications, I actually mean ‘‘internal representation,’’ where with “internal” I just mean ‘‘in between the input and output of a system.’’ This is just to clarify that here we are not interested in external representations like paintings, writings, signs posed in the environment, and so on, nor in cases in which an embodied agent uses ‘‘the world as its own best model’’ (Brooks, 1990). Here we are exclusively dealing with the psychological notion of ‘‘representation’’ as an internal stand-in for external reality. Hence, an agent that has no internal states, like the one that will be described in section 5, cannot, by definition, have representations in our sense.
I thank Domenico Parisi, three anonymous reviewers, and the Editor for their comments and suggestions, which helped to substantially improve the manuscript. I also thank Andrew Szabados for his help with the linguistic editing of the manuscript.