Does empirical evidence support perceptual mindreading?

According to perceptual accounts of mindreading, we can see, rather than cognize, other people's mental states. On one version of this approach, certain mental properties figure in the contents of our perceptual experiences. In a recent paper, Varga has appealed to empirical research to argue that intentions and emotions can indeed be seen, rather than cognized. In this paper, I argue that none of the evidence adduced to support the perceptual account of mindreading shows that we see mental properties, as opposed to non-mental properties.

A version of this approach has been recently defended by Varga (2020). According to Varga, some mental properties are observable properties, similar to colour, texture, etc., and there is evidence in favour of the perceivability of those properties. If we accept this way of formulating the perceptual account of mindreading, the debate between proponents and opponents of such an account becomes a debate about the permissible contents of perceptual experience that is about whether we can experience mental high-level properties in perception. 2 In other words, according to this version of the view, perceptual mindreading amounts to an extension of the reach of perception to include mental properties in the list of properties that can be perceived. 3 What is the support for such a claim? Varga appeals to evidence from empirical studies on the detection of animacy and on perceptual adaptation to emotional expressions. His main focus is on the perception of emotional expressions, while the animacy literature is used to provide "some preliminary reasons to suspect that [visual experience] is able to present human bodies as instantiating mental properties like 'being angry'" (Varga, 2020, p. 392). In this article, I will challenge this appeal to research on animacy and on perceptual adaptation to emotional expressions in support of the perceptual account of mindreading. In doing so, I will grant as much as possible to the proponent of the claim and show that even if we accept the controversial view that we perceive high-level properties, this won't vindicate perceptual mindreading.
In what follows, I present the argument used by Varga to defend a perceptual account of mindreading formulated as a thesis about contents (Section 1). I critically present the empirical evidence invoked to support some of the steps of the argument (Section 2). Then I attack one premise in the argument for the perceptual account (Section 3). In conclusion, I show that even if we concede that the empirical data support the claim that we can perceive properties that go beyond low-level properties, nothing guarantees that these properties are mental.

| THE ARGUMENT IN FAVOUR OF THE PERCEPTION OF MENTAL PROPERTIES
An inference to the best explanation in favour of the perception of mental properties based on empirical evidence can be formulated as follows: 1. There are empirical effects that seem connected with the detection of some mental properties (intentions and emotions): detection of animacy and adaptation to emotional expressions. 2. The effects are perceptual, because they are automatic, encapsulated, and influence attention in a bottom-up manner. 3. The effects are due to the perception of a high-level property, and not to the perception of a low-level property (or a group of low-level properties). 4. This high-level property is a mental property. 5. The best explanation of the empirical effects is that subjects perceive some mental properties.
Premise (1) is supported by the empirical data that I will review in Section 2. These data on their own are not sufficient to show that some mental properties are perceived, because we also need to rule out alternative explanations. Premise (2) is aimed at ruling out the alternative explanation according to which the effects are not perceptual, but post-perceptual, or cognitive. To rule out this alternative, Varga appeals to some features that are usually associated with perception, and in particular automaticity, encapsulation, and attentional effects. While there are worries associated with this premise, I will set them aside in this article.
The focus of this paper is on Steps (3) and (4). I want to show that even if we concede as much as possible to the proponent of perceptual mindreading and assume that (3) is right, we still can't establish (4), that is we cannot establish that the observational properties experienced in perception are psychological or mental properties (e.g., intention or emotion). My main argument is that there are high-level properties that are not mental properties and that could equally well account for the effects.

| THE DATA
The two main experimental effects used in support of perceptual mindreading are visual detection of animacy and adaptation to emotional expressions. The brief discussion of the experiments will show that, while there is little doubt that there is something perceptual going on, we can't conclude much about the contents of the perceptual experience from the experiments alone.
In empirical studies on detection of animacy, subjects look at displays with moving geometrical figures. In these studies, psychologists try to uncover perceptual cues that trigger an experience of animacy. One such cue is "chasing," when a geometrical figure is reliably detected as pursuing another one on the basis of some perceptual cues, such as directionality and chasing subtlety (Gao, Newman, & Scholl, 2009). Another cue of animacy is the "wolfpack effect" (Gao, McCarthy, & Scholl, 2010), where instead of a lone geometrical chaser, subjects detect a group of objects moving in a coordinated manner. Psychologists rule out alternative explanations in terms of domain-general detection of a spatio-temporal relation: Subjects can immediately detect a chaser in a group of randomly moving figures, even if the movements share low-level dynamical properties. The perceptual cues are very powerful attention-grabbers. The presence of chasers influences the subjects' performance even when their movements are irrelevant to the task. Yet, changes to the perceptual cues (such as a change in chasing subtlety) can easily disrupt the experience.
In the case of emotions, support comes from studies on perceptual adaptation to emotional expressions, an effect well-known for low-level properties such as orientation (Gibson & Radner, 1937). In the case of emotional expressions, adaptation works in the following way: in the experiments, subjects are presented with a sad face (or a happy face) for a certain amount of time, afterwards they are presented with a neutral face (which means that the face does not express a specific emotion), and they have to express a judgement, prompted by a question about the display. Subjects report that the (objectively neutral) face appears sadder if they adapted to a happy face and that the face appears happier if they adapted to a sad face (Webster & MacLeod, 2011).
According to Varga, the best explanation of these phenomena is that we perceive mental properties. The defence of this interpretation proceeds in two steps. The first step consists in showing that these effects involve a high-level property by ruling out alternative explanations that appeal to low-level properties. The second step is to show that the high-level property in question corresponds to a mental property (such as an intention or an emotion).
In the case of intentions, Varga tries to rule out low-level property explanations by appealing to how subjects differentiate animacy from mechanical movements. To support the second step, Varga's argument is based on the idea that in order to perceive chasing one has to perceive geometrical figures as entities endowed with mental states, such as intentions and desires ("perceiving chasing requires perceiving chasers," p. 391).
For emotional expressions, some experiments on adaptation suggest that variations in the low-level properties of faces (such as colours, shapes, etc.) do not interfere with high-level adaptation to emotional expressions, and, in some cases, adaptation can even transfer between a body and a face, which plausibly share very few low-level properties 4 (Palumbo, D'Ascenzo, & Tommasi, 2015). While this rules out interpretations in terms of low-level properties, it's not sufficient to provide support for the second step. However, Varga does not argue in favour of the second step in the case of emotional expressions, and I suspect the reason is that seeing emotions simply seems like the most obvious explanation of adaptation effects.

| PROBLEMS FOR THE ARGUMENT
A closer look at the empirical evidence suggests two possible interpretations: a mental one and a non-mental one. The mental interpretation is that we perceive mental properties. The non-mental one is that we perceive a high-level property that is not mental. Both of these options agree on the fact that these effects cannot be accounted for by low-level properties only. In this section, I will defend the non-mental interpretation. If I'm right, all that the argument in Section 1 has established is that there are rich perceptual contents in the animacy and adaptation cases, but not that the rich content represents a mental property. To clarify, the worry does not concern what makes a property "mental," nor how to best define "mental." It's relatively uncontroversial to count intentions and emotions as mental states. The question is whether their corresponding categorical properties (being an intention, being an emotion) can be seen.
I want to rule out a first objection: subjects in these experiments use mentalistic terms to describe what they detect. Contra this objection, we shouldn't take people's reports at face value. There are two worries with subjects' reports. The first worry is that, when given the freedom to just describe that they "see," they might fail to distinguish between reporting their perceptual experiences and reporting their post-perceptual thoughts that are triggered by the experience. While "chasing" might be a good candidate for being perceived, "being a nasty triangle" 5 stretches the boundaries of plausibility. Second, when their reports are constrained by the type of questions they're asked, the problem is that labels might provide them with categories that do not mirror their perceptual experiences, even if they aptly categorize the displays. In other words when exploring the "murky zone between categorical perception and post-perception" (Bayne & Montague, 2011, p. 23) we need to replace reports with a close examination of the possible interpretative options, in order to find the best one.
Having blocked the objection from first-person reports, in what follows I show, case by case, starting with animacy and then proceeding on to emotions that an alternative high-level property can account for the effects.
Let's start with animacy: as mentioned in the introduction Varga uses data from animacy as a preliminary case to support perceptual mindreading. In his appeal to animacy, Varga takes for granted that the representation of chasing cannot be separated from that of being propelled by psychological states ("mindedness"). Assuming this, he argues that if the subjects in the experiments had perceptual experiences not representing the relevant objects as propelled by such states, they would be more likely to describe their experiences in terms of mechanical movements than in terms of chasing.
I think that there are two problems with this suggestion. Firstly, it might be that what the subjects report mirrors the contents of their perceptual beliefs rather than the contents of their perceptual states. Secondly, an alternative non-mental high-level property could be the one perceived in these experiments: goal-directed movements. These two critiques are connected: the first critique suggests that there might be a non-mental property such that its detection by the perceptual module gives rise to an automatic ascription of mentality by the viewer to the objects in the display. This opens up the need to explore alternative nonmental properties that could be the ones perceived. Hence the second point identifies a candidate non-mental property, which could underscore subjects' impressions and ascriptions of mentality.
To support the first point, we might want to explore the possibility that subjects can see the chasing display as being of a different type from the mechanical movement display, even in the absence of an attribution of mindedness to the geometrical figures. In this case, observers could still experience the movement of the objects as not being mechanical, but this would not be based on the pre-condition of seeing the entities as minded. In other words, one could perceptually detect animacy cues (that is goal-directedness, wolfpack effect, etc.) without perceptually detecting "mindedness." Is there any evidence for this? Some studies on people with autism suggest such a possibility. Both observers with autism and without autism detect chasing in a visual chase paradigm, showing thus the relevant difference in their experiences (Vanmarcke, van de Cruys, Moors, & Wagemans, 2017). But people with autism are impaired in spontaneous attributions of mental states and mindedness to geometrical figures in a Heider and Simmel type of display (Klin, 2000) which suggests that, while they might experience the difference between animated entities and mechanically moving entities in perception, they don't assume that the former entities are "minded". As a result, they are impaired in attributions of mindedness and mental states to the geometrical figures, but not in the perception of "animacy". Further research is needed to establish definitely that individuals with autism differ in respect to their attributions of mental states to entities exhibiting animacy cues, but it suggests the possibility that we can dissociate perception of animacy cues from attributions of mental states (or "mindedness" more in general).
That chasers are (normally) entities endowed with psychological states is part of our folk theory about minds. But perceptual contents might not mirror our folk theories. Varga himself grants that encapsulation (from central cognition or from other modules) is a feature that accompanies perceptual processes (Premise 2 in Argument 1). Encapsulation could explain why we distinguish between animated and mechanical movement without assuming that the chasing movement of the chaser is due to the possession of psychological states. An alternative compatible with the evidence is that our perceptual judgements could be formulated based on the automatic assumption that chasing movements can only be done by minded entities, but the contents of perception are the outputs of modular processes that distinguish between animated and non-animated movements without such an assumption.
How could this work? Take as an analogy essentialist thinking (Gelman, 2003): when seeing a living being, such as an animal, we might have an automatic activation of a theory that treats living beings as endowed with an "essential property" that causes the observable properties of these living beings (their phenotypes). Indeed, very young infants show such patterns of thinking, even before they have a full-blown conceptual understanding of how natural kinds work. The automatic mechanism that treats living beings as endowed with "something" causing them to exhibit their observable properties is not a precondition on detecting them. Similarly seeing perceptual cues for animacy might elicit the automatic attribution of mindedness, but mindedness is not necessary for detecting animacy.
A second problem with the appeal to data from animacy is that it fails to distinguish between perceiving a mental property and perceiving a high-level property that is not mental. In the case of animacy, the non-mental high-level property that could enter into the contents of perception could be "being a goal directed movement." A goal-directed movement can be analysed into a relation between perceivable components, corresponding to the type of elements one typically finds in displays used in psychological experiments which rely on an operationalization of the notion (Gergely & Csibra, 2003). 6 These components are: a visually presented target (goal), a spatiotemporal path to the goal, an object moving along the path to the goal, and visually presented situational constraints, such as obstacles (if there are any). In the psychological literature "goal" is usually used in a deflationary sense, to refer to the outcomes of a visible trajectory, whether or not the movement towards the outcome is triggered by a mental state. 7 Goals, understood this way, are just visible objects, locations, or state of affairs that are the end-points of a movement (Csibra, 2008;Smortchkova, 2020).
Perceiving a goal-directed movement, as opposed to an intentional action, is closely related to adopting a teleological stance as opposed to an intentional stance (Gergely & Csibra, 2003). The teleological stance allows for the representation of actions by relating the different components of the display (movement, goal, situational constraints) via a principle of efficiency (which roughly corresponds to "take the shortest path to the goal"). An object might be perceived as moving in a goal-directed fashion, for example, when it takes a straight line to reach a point when no obstacles are present, whereas it "jumps" above the obstacle when there is one lying across its path to the end-point. Crucially, to see a movement as an efficient means to reach a goal that is visually present doesn't require the detection of mindedness: all that is needed are sensitivity to the presence of the type of visual configuration just described, and to the principle of efficiency. The latter does not depend on the attribution of psychological states to the observed entities. It is applied when relevant visual cues (self-propelledness, equifinality of paths towards one end-point, etc.) are present. 8 In the end, what the preliminary argument establishes at most is that our default postperceptual interpretation of these displays includes a defeasible assumption that entities that exhibit animacy cues (which, as we have seen in Section 2 can be multiple) are "minded" entities, that is entities guided by psychological states. If the content of the perceptual state in seeing chasing is "goal-directed movement," being "minded" need not be part of the contents of perception.
Let's turn to emotions. I would like to suggest a competing property the experiential representation of which would provide an at least equally good explanation of the data. This competing property is valence, and the competing explanation of the data is that we experience valences rather than emotional categories (such as sadness or happiness). Valence indicates the "goodness" or "badness" of the emotion. Emotions with positive valence are joy and love, while emotions with negative valence are anger, fear, and disgust. While by no means perfect, such a classification corresponds to cross-cultural and cross-societal ways of characterising emotions 9 (Barrett, 1998). Valence is clearly a high-level property. First of all, it's not a property carried by sensory transducers. Secondly, valence cannot be reduced to low-level properties, because these low-level properties should be common to facial expressions in the same category (say, angry and fearful faces), but a simple look at an angry face and at a fearful face suffice to show that they share very few low-level properties.
How could we arbitrate between the emotional category option and the valence category option? I have three arguments in favour of the valence option. The first is a broad methodological point about the kind of questions asked in experiments on adaptation. The second is based on categorical effects in adaptation. The third is based on ontogenetic data.
The first, methodological, point is that in the majority of the experiments on adaptation the subject is prompted with ready-made categories: they are asked either "does the face look sad?" or "how sad does the face look?" on a scale. This creates a confound on whether their judgement about the display is based on the perceptual experience of the emotional expression, or whether it is based on the perception of either low-level properties or of a high-level property that is not the target emotion. Of course, this option does not per se support the valence option, but it casts doubts on the emotional category option by pointing to the fact that the categorical property is provided to the subjects. This suggests a general point: it's unclear how to distinguish, for cases of perceptual adaptation to emotional expressions, between perceptual effects, and effects that might depend on the subject's judgement.
All the adaptation experiments involving emotional expressions involve emotions that belong to opposite valence categories. For example, after adapting to a happy face, a neutral face is classified as being sadder or angrier. After adapting to a sad face or an angry face, a neutral face seems happier. What happens when stimuli are in the same valence category? In this case, the results become muddier: after-effect to disgust biases perception away from disgust, after-effect to fear biases towards disgust, after-effect to anger and disgust biases away from anger, while after-effect to anger has no effect on disgust or fear (Pell & Richards, 2011). This indicates that adaptation works well for emotions in opposite valence categories, but not for emotions in the same valence category. Such results are compatible with our hypothesis according to which valence categories rather than emotions are perceived.
Finally, if we look at development, young infants have two initial categories for emotional expressions: good and bad. It takes time for them to develop full-blown emotional categories that sort out and distinguish between facial expressions within the same valence category (Widen, 2013). The limits of children's reasoning about emotions can be used as a guide to explore the contents of perception as opposed to the contents of post-perceptual psychological states. For creatures lacking sophisticated ways of thinking about emotions, facial expressions are detected as valences rather than as discrete emotions. This makes it likely that facial expressions are perceived as valences even by normal adults.
The possibility that we experience, for example, negative valence, as opposed to a specific emotion, undermines the relevance of the rich content view for perception-based accounts of mindreading. While experiencing valence is an example of perception of a high-level property, it falls short of constituting a case of perception-based mindreading of emotions.
In summary, I have shown that the appeal to empirical data about the detection of animacy and adaptation to emotional expressions does not establish the distinct claim that we perceive mental properties as long as there are competing alternative explanations involving non-mental (high-level) properties. Even if it turns out that they do establish that we perceive some highlevel properties that are not mental, this is not the vindication of perceptual mindreading that its proponents are looking for.  Wittgenstein (1953) and McDowell (1978) among others, and versions of this approach have been recently defended by Gallagher (2008), Toribio (2015), Newen (2017), and Krueger (2018). While these approaches defend a (broadly) similar claim, they differ substantially in their arguments and background assumptions. 2 A high-level property is a property that falls outside the cluster of properties usually associated with perceptual contents (orientation, shape, colour, motion…). The claim that we perceive high-level properties (of any kind) is of course controversial, but in this paper, I set aside the more general debate about the reach of perceptual content (see Hawley & Macpherson, 2011 for an overview) to concentrate only on the version of the debate focussed on perceptual mindreading. Some philosophers who have defended the rich content view (outside the context of mindreading) are Bayne and McClelland (2019) and Siegel (2009). 3 In this paper, I assume the representationalist framework for perception (for the debate about the contents of perception see the papers in Brogaard, 2014 andSiegel, 2016). 4 This experiment uses the property "gender" (looking feminine or masculine) and not emotion, but, on the assumption that adaptation to high-level properties is a unified phenomenon, this can be applied to emotions as well. 5 Since there is no principled way of drawing the boundary between low-level and high-level properties, there are not actual limits on the kind of properties that are a candidate for entering the contents of perception. 6 I am thankful to an anonymous reviewer for inviting me to clarify this point. The reviewer suggests that we need a criterion to distinguish mental from non-mental properties. One reason is that the distinction between goal-directed actions and intentions is not tenable if we conceptualise intentions not as internal, causal variables but as spatio-temporal relations. A defender of perceptual mindreading might indeed attempt to defend their view by drawing on this non-standard conception of intentions. However, this is not the view of intentions defended in Varga, 2020, so examination of this line of defence falls outside the scope of the present paper. Moreover, even if one endorses a view of intentions that rejects the standard Davidsonian view in favour of a view inspired by Ryle (Curry, 2018), it's not obvious that intentions are represented by mindreaders as mere spatio-temporal relations, since they also think of intentions as a way of getting an insight into the reasons for action and the personalities of the person they mindread. 7 A non-deflationary reading of "goal" is when goal corresponds to an intention or a desire, such as "My goal is to bake a two-tiered cake." 8 Note, however, that for Gergely and Csibra the teleological stance is a type of reasoning about visual displays, and is not introduced to describe the contents of perception. Even so, in Smortchkova (2020) I argue that humans possess a capacity for non-mentalistic detection of actions, thanks to which goal-directed movements are perceived rather than thought about. According to this account, the principle of efficiency is one of the perceptual constraints embedded in the modular non-mentalistic action detection system rather than a principle one thinks with. This position is supported by experimental data on the neural correlates of action perception and understanding, on perceptual learning and on the adaptation to visual displays representing goal-directed movements. 9 For the purposes of this paper I'm simplifying the story about emotions. Emotions don't only have a valence, they also have an arousal level, which indicates the intensity of the emotion. Since the topic is whether emotional expressions can be perceived, rather than what emotions are and what are they components, I bracket these complexities.