Does it matter how you deny it?: The role of demeanour in evaluations of criminal suspects




In some cases of wrongful convictions, demeanour seen as inappropriate can trigger suspicions of guilt. Two experiments systematically manipulated the demeanour of criminal suspects in interrogations to test its impact on guilt ratings.


In Experiment 1 (N = 60), participants saw a videotaped interrogation in which the suspect displayed flat demeanour or emotional demeanour. Before viewing the interrogation, participants were told that normal reactions to trauma consisted of either flat or emotional demeanour. In Experiment 2 (= 147), the presence of the suspect's coerced confession and demeanour evidence were both manipulated.


In Experiment 1, a suspect who displayed flat demeanour during the interrogation produced higher ratings of guilt than did a suspect who displayed emotional demeanour, especially when participants were told to expect emotional demeanour. In Experiment 2, without a confession, flat demeanour inflated guilt ratings, whereas emotional demeanour slightly (but non-significantly) decreased guilt ratings compared with a no demeanour information condition. When a confession was introduced, guilt ratings increased for all groups, with the highest ratings in the emotional demeanour condition.


Flat demeanour biases judgments against defendants. On its own, emotional demeanour is neutral (or potentially exonerating), but when paired with a confession, it becomes just as incriminating as flat demeanour. Recommendations for educating police professionals on the wide range of appropriate reactions to trauma are described.


Although the testimony at trial showed that the defendant was upset and agitated that morning, the combination of emotions which one would think he should have been displaying, such as overwhelming grief, fear, panic, bewilderment, did not appear to be present.

Judge Stephen Braslow, in New York v. Tankleff, Motion for Vacatur of Judgment of Conviction/New Trial, March 17, 2006, p. 18.

Judge Braslow is not alone in factoring demeanour into evaluations of culpability and punishment. Judges report looking for emotional displays in statements of responsibility when considering sentence reductions (Everett & Nienstedt, 1992; as cited in Robinson, Smith-Lovin, & Tsoudis, 1994). Indeed, attention to demeanour evidence occurs in justice systems around the world (Blumenthal, 1993). In one Australian case, a mother was convicted of murdering her child after being evaluated as too ‘stoic’ in the wake of the child's disappearance (Chamberlain v. The Queen, 1983; Salekin, Ogloff, McFarland, & Rogers, 1995, p. 294). In a survey of Florida jurors, one third cited the defendant's demeanour as a factor in their decision to recommend the death penalty (Geimer & Amsterdam, 1988). In another case, even the US Supreme Court recognized that flattened demeanour resulting from psychotropic medications can ‘prejudice all facets of the defence’ because the defendant could not display the behaviour he claimed was typical for him at the time of the crime (Riggins v. Nevada, 1992; see also Commonwealth v. Louraine, 1983; for an analysis of this case, see Geller & Appelbaum, 1985). The two experiments reported here focus on how demeanour displayed during an interrogation affects judgments of criminal suspects, supplementing prior research that has only focused on demeanour as displayed by defendants or witnesses at trial. Below, we describe the case that inspired the empirical questions tested in these experiments.

In 1988, Marty Tankleff, then 17 years old, found his parents bludgeoned in their Long Island, New York home. Tankleff became the prime suspect, in part, because his demeanour at the scene of the crime was not consistent with what police assumed he should display. According to a CBS news report (2008), the lead detective, James McCready, recalled, ‘He was sitting as calm as could be, with his hands clasped…’ (p. 1). Pressed further on the demeanour he had expected Tankleff to display, Detective McCready said, ‘Oh, I think he would have been crying, I think he would have been shaken, been very upset’ (CBS, 2008, p.1). During Tankleff's police interview, the police lied about his father's medical status telling him that his father awoke from a coma and named him as the attacker. In reality, his father never regained consciousness and died several weeks later from his injuries. However, based on the strength of this accusation from his father, Tankleff confessed. He never signed the confession and almost immediately recanted it. However, on 29 June 1990, he was convicted of murdering his parents. All charges against Tankleff were eventually dismissed in July 2008 – 18 years later (for a detailed analysis of this case, see Kassin, 2006).

In the long struggle to clear his name, Tankleff and his family repeatedly encountered people who equated his demeanour with guilt. Indeed, most people agree that the only appropriate response to grief is significant, visible distress (Robinson et al., 1994; Wortman & Silver, 1989). In direct contrast, clinical research shows that responses to bereavement vary widely (e.g., Flynn & Norwood, 2004; see also Stroebe, van den Bout, & Schut, 1994), including reactions wherein distress is not apparent (Wortman & Silver, 1989, 2001). In a comprehensive review of reactions to bereavement, this point is clear: ‘the assumption that distress is inevitable … has resulted in its absence being treated as pathological, even if there is no objective reason to assume this to be true’ (Wortman & Silver, 1989, p. 355). As Heath (2009) notes, actual guilt is only one of several possible explanations of unemotional suspects or defendants. Other possible explanations provided by Heath include: when ‘one generally is subdued’, brain damage, medication, or the stress inherent in being accused of a crime (p. 320).

Because expectations about reactions to trauma are so strongly held, individuals who fail to display the expected visible distress risk increased scrutiny from police. This occurs because individuals expect a ‘fit’ between an event's seriousness and the intensity of emotional responses to it (Rose, Nadler, & Clark, 2006). For example, research investigating rape victims' emotional displays and perceptions of their credibility consistently demonstrates a link between emotional expressiveness and perceived credibility among lay individuals (Calhoun, Cann, Selby, & Magee, 1981; Kaufman, Drevland, Wessel, Overskeid, & Magnussen, 2003; Nadler & Rose, 2003) and police investigators (Bollingmo, Wessel, Eilertsen, & Magnussen, 2008). In a criminal context, violating expectations is likely to increase suspicions of guilt, in part, because these violations create arousal (Burgoon, 1993). According to Burgoon and Hale (1988) this arousal ‘directs [attention] towards the source of the arousal – the initiator of the violation’ (p. 62). Even in mundane circumstances, people who see unexpected behaviours generally interpret them as signalling deception. In one clever demonstration, actors were judged as deceptive when displaying odd behaviours (e.g., holding their arms forward, closing their eyes) while describing an acquaintance (Bond et al., 1992).

Investigators' conclusion that an expectancy violation suggests guilt is consistent with the fundamental attribution error or the tendency to attribute others' behaviours to dispositional factors without fully crediting situational factors (Gilbert & Jones, 1986; Ross, 1977). In the context of emotional displays in a criminal investigation, this bias can explain why investigators are likely to attribute a suspect's verbal and non-verbal behavioural as cues to their guilt, instead of to the stress of the situation (e.g., Heath, 2009; Kassin & Sukel, 1997).

Because all later phases of criminal justice proceedings build upon the information and perceptions generated during initial stages of an investigation, non-verbal ‘violations’ that suggest guilt could easily lead to ‘tunnel vision’ in which investigators filter all evidence in a case through the lens of their theory of the crime (e.g., Findley & Scott, 2006; Heath, 2009). When tunnel vision occurs, information that supports a chosen theory is deemed probative, whereas all evidence that is inconsistent with that theory is dismissed as irrelevant or unreliable (e.g., Hill, Memon, & McGeorge, 2008). This bias has significant consequences, including shaping the investigator's questioning tactics, resulting in an abundance of guilt-presumptive questions directed at the suspect (Kassin, Goldstein, & Savitsky, 2003; see also Meissner & Kassin, 2004). The case of Marty Tankleff makes clear that inappropriate assessments of emotion are possible ‘instigators’ of misguided prosecutions (Heath, 2009, p. 327).


The primary question for this experiment was how participants would react to a suspect whose behaviour does not conform to society's expectations for trauma victims (i.e., his demeanour is flat) and whether explicit information about appropriate reactions to trauma would moderate that pattern. Most research on this topic concerns defendants or victims (not potential suspects) who display flattened demeanour, typically in the context of witness statements or testimony provided in court. In those cases, research outcomes are inconsistent. In one study, children who testified in a calm or hysterical state were less persuasive than children who were moderately emotional (i.e., ‘teary’, Golding, Fryman, Marsil, & Yozwiak, 2003; see also Regan & Baker, 1998). In another, demeanour of the victim while making an impact statement had no effect on sentencing judgments of mock jurors (Myers, Lynn, & Arbuthnot, 2002; but see Ask & Landstrom, 2010). In yet another, videotaped testimony only produced changes in judgments of a female defendant: perceptions of her guilt increased when her demeanour was flat or high, but not when it was moderate (Salekin et al., 1995). Finally, a similar study found that a female defendant who displayed flat versus emotional demeanour was perceived as guiltier, especially when evidence strength was low (i.e., as operationalized by having a fingerprint expert testify to ‘an extremely low probability’ that the prints from the crime scene matched the defendant and a ‘very small possibility’ that the defendant fired a gun; Heath, Grannemann, & Peacock, 2004, 'EXPERIMENT 2', p. 644). Subsequent analyses indicated that reactions to defendant emotion were mediated by evaluations of the defendant's honesty: Stronger emotional displays suggested that the defendant was being more honest.

The current research examined reactions to a suspect's demeanour observed during the initial stages of an investigation rather than during testimony in court. Participants watched a fictitious videotaped interrogation of a high school male who just found his parents murdered. Two variables were manipulated. First, participants' expectations regarding appropriate behaviour were manipulated by reading about the suspect's behaviour at the crime scene when the police first interviewed him. Half were led to believe that flat demeanour (i.e., shock) was an appropriate response to trauma; half were led to believe that emotional demeanour (i.e., crying) was appropriate. Second, the suspect displayed either flat or emotional demeanour during a subsequent videotaped interrogation viewed by participants. The resulting design was a 2 (Flat Demeanour Expectation vs. Emotional Demeanour Expectation) × 2 (Emotional Demeanour Suspect vs. Flat Demeanour Suspect) fully randomized, between-subjects factorial.

We expected that the suspect would be rated as more guilty when his demeanour was flat compared with emotional because people expect emotional displays in response to loss (e.g., Bond et al., 1992; Wortman & Silver, 1989). We also expected an interaction between the expectation and suspect demeanour variables. In particular, we predicted that the flat expectation manipulation would mute the effect of the suspect demeanour variable because it would encourage participants to see flat demeanour as a reasonable reaction to trauma. Therefore, we expected participants to produce a smaller difference in guilt ratings between the emotional and flat suspect interrogation videos when they were told to expect flat demeanour than when they were told to expect emotional demeanour.



Sixty participants at a small, private north-eastern college in the United States completed this study in exchange for extra credit in their introductory psychology class.


Expectation manipulation

Participants were randomly assigned to read one of two paragraphs describing the case of ‘Mark Dunlevy’. Both paragraphs began with the same basic case information, based closely on the circumstances surrounding the murder of Marty Tankleff's parents:

Mark Dunlevy is a 18-year-old male who lives with his adoptive parents and sister in Long Island, NY. On his first day of high school as a senior, he started his day as he normally would have by showering, getting dressed and going downstairs for breakfast. He found it strange that no one in the house was up yet. So, he decided to go back upstairs to see why his parents were still sleeping. Upon entering their room, he found his mother and father lying dead on the floor having been bludgeoned to death with a blunt object. He immediately responded by calling 911. When officials arrived at the scene, they scanned the house and it appeared that there had been no forced entry or signs of struggle. Because Mark was the only one home, he was the first person interviewed by police.

Following the case summary, participants read one of two descriptions of Mark's behaviour during the initial police interview at the crime scene. Embedded in each description was the expectation manipulation which described Mark's behaviour at the crime scene (either flat or emotional) as typical behaviour for individuals experiencing traumatic events. Based on our factorial design, participants' expectations for reactions to trauma were then reinforced or contradicted in the subsequent interrogation video (described below). In the flat demeanour expectation condition participants read:

[At the initial interview] Mark appeared to be in shock as would be expected in this situation. He displayed behaviors consistent with what is normal for individuals who have been traumatized as he was. A plethora of psychological research confirms that individuals often behave in ways consistent with symptoms of shock shortly after a trauma. Many researchers have tested the immediate effects of trauma on an individual's behavior. Like many, Mark displayed a severe reduction of emotional expressiveness. His responses to answers were delayed and were in a monotonous tone. He displayed few facial expressions and hand gestures, and appeared extremely flat throughout the course of the interview.

In the emotional demeanour expectation condition, participants read:

[At the initial interview] Mark was very emotional. He displayed behaviors consistent with what is normal for individuals who have been traumatized as he was. A plethora of psychological research confirms that individuals often behave in ways consistent with symptoms of shock shortly after a trauma. Many researchers have tested the immediate effects of trauma on an individual's behavior. Like many, Mark was hysterically crying, displayed rapidly changing facial expressions and hand gestures while answering questions. He spoke rapidly in a high-pitched voice.

Interrogation video

After reading the description of Mark's behaviour during the initial police interview at the crime scene (i.e., before the interrogation took place), participants viewed a 150 s video of a staged interrogation performed by paid actors. The interrogation was filmed with a Sony HandyCam video camera (New York City, NY, USA). The camera angle captured two people sitting across from one another at a desk in a non-descript room. Both people sat in straight-backed chairs. The camera was fixed on a tripod and captured profile views of both individuals. The interrogation consisted of the police interviewer asking the suspect questions about his family and the crime. Participants viewed one of two versions of this interrogation video. The only difference between the videos was the demeanour displayed by the suspect. In the emotional suspect demeanour condition, the suspect was visibly distressed and displayed non-verbal behaviour consistent with being distressed. For example, while describing how he found his parents murdered, the suspect sobbed audibly, placing his head on the desk in front of him. In the flat suspect demeanour condition, the suspect displayed no emotion while answering the investigator's questions.

Dependent measures questionnaire

After viewing one of the two interrogation videos, participants completed a 7-item questionnaire, including questions about the suspect's guilt from 1 (not guilty) to 7 (guilty); the likelihood of pursuing Mark Dunlevy as a suspect if the participant were an investigator, from 1 (not likely) to 7 (very likely); the extent to which the investigator was pressuring Mark to respond in a particular way, from 1 (not at all) to 7 (very much); and the appropriate sentence if the suspect were convicted, from 1 (1 year) to 7 (life in prison). The remaining three questions used a scale from 1 (not at all) to 7 (very much). They were as follows: how emotional the suspect was, how flat (emotionally unresponsive) the suspect was, and whether the suspect's behaviour was consistent with the participant's expectations.


Participants signed up for the experiment individually. Upon arrival at the experiment location, participants read and signed an informed consent form. At this point, participants were randomly assigned to condition. Participants then learned that their task was to read a short paragraph describing a recent crime, watch a video of the suspect being interrogated, and answer some questions about the video. When they finished reading their assigned paragraph, participants watched one of the two interrogation videos. After the video was over, participants completed the dependent measures questionnaire; all data were anonymous. Finally, participants were debriefed, thanked for their participation, and dismissed.


Manipulation checks

The interrogation videos successfully manipulated suspect demeanour. Participants who watched the emotional demeanour interrogation rated the suspect as significantly more emotional (= 4.09, SD = 1.68) than did participants who watched the flat demeanour interrogation (= 1.44, SD = 0.63), t(58) = 7.97, < .001, = 2.28, 95% CI for d [1.42, 2.68]. Similarly, participants who watched the flat demeanour interrogation rated the suspect as significantly more emotionally unresponsive (= 5.48, SD = 2.08) than did participants who watched the emotional demeanour interrogation (= 4.03, SD = 1.60), t(58) = 3.04, < .001, = 0.79, 95% CI for d [0.27, 1.31].

Is the suspect guilty?

Each of the remaining dependent variables was analysed with a two-way 2 (Emotional Demeanour Suspect vs. Flat Demeanour Suspect) × 2 (Flat Demeanour Expectation vs. Emotional Demeanour Expectation) analysis of variance (ANOVA). On ratings of the suspect's guilt, this analysis revealed two significant effects. Participants who viewed the flat demeanour interrogation rated the suspect as guiltier than did participants who viewed the emotional demeanour interrogation, F(1, 56) = 6.94, < .01, = 0.67, 95% CI for d [0.14, 1.18]. This main effect was qualified by a significant interaction, F(1, 56) = 9.11, < .01, ηp2 = .14, 95% CI for ηp2 [.02, .30]. When participants expected emotional demeanour, guilt was higher for the flat demeanour suspect (= 4.07, SD = 1.33) than for the emotional demeanour suspect (= 2.25, SD = 1.13), t(29) = 4.11, < .01, = 1.48, 95% CI for d [0.67, 2.27]. In contrast, when participants expected flat demeanour, guilt ratings were equivalent for the flat demeanour suspect (= 3.14, SD = 1.17) and for the emotional demeanour suspect (= 3.27, SD = 1.34), t(27) = 0.27, = .79, = 0.09, 95% CI for d [−0.62, 0.83] (see Figure 1).

Figure 1.

Significant interaction between demeanour expectation and suspect demeanour on guilt ratings in Experiment 1. Bars represent standard error values.

Should the investigator pursue the suspect?

To assess participants' opinions about whether the investigator should pursue the suspect, we conducted another 2 (Emotional Demeanour Suspect vs. Flat Demeanour Suspect) × 2 (Flat Demeanour Expectation vs. Emotional Demeanour Expectation) ANOVA. This analysis revealed no main effects, Fs(1, 56) < 1.86, ps > .18, ηp2s < .03. However, there was a significant interaction, F(1, 56) = 4.24, = .04, ηp2 = .07, 95% CI for ηp2 [0, .21]. In the emotional expectation condition, participants who watched the flat demeanour interrogation were significantly more interested in pursuing the suspect (= 5.20, SD = 1.42) compared with participants who watched the emotional demeanour interrogation (= 3.75, SD = 1.91), t(29) = 2.38, = .02, = 0.79, 95% CI for d [0.11, 1.57]. However, participants in the flat expectation condition were equally likely to support pursuing the suspect, regardless of whether they watched the flat or emotional demeanour interrogation, Ms (SDs) = 4.57 (1.40) and 4.87 (1.73), respectively, t(27) = 0.50, = .62, = 0.19, 95% CI for d [−0.55, 0.91].


The only significant effect on participants' recommendations for the length of the suspect's sentence was that participants in the emotional expectation condition thought he deserved a longer sentence (= 5.71, SD = 2.08), compared with participants in the flat expectation condition (= 4.68, SD = 2.08), F(1, 56) = 5.74, = .02, ηp2 = .09, 95% CI for ηp2 [0, .18]. No other effects were significant, Fs(1, 56) < 1.88, ps > .18, ηp2s < .03.

Remaining dependent variables

No significant effects emerged on ratings of the extent to which the interviewer pressured the suspect or whether the suspect's behaviour was consistent with participants' expectations, Fs(1, 56) < 2.42, ps > .13, ηp2s < .03.


Before watching a brief video of a mock interrogation, participants were led to believe that flat demeanour (vs. emotional demeanour) was typical for someone in the suspect's position. The suspect's demeanour during the mock interrogation was also manipulated to be either flat or emotional. When the suspect displayed flat demeanour, participants thought he was guiltier than when he displayed emotional demeanour. This effect was qualified by an interaction in which participants' expectations about appropriate behaviour moderated the effect of demeanour. In particular, when participants were instructed that flat demeanour was typical, there was no effect of the suspect's actual demeanour on guilt ratings. In contrast, when participants were instructed that emotional demeanour was typical, flat demeanour produced higher ratings of guilt than did emotional demeanour. A similar interaction pattern appeared on participants' judgments of whether an investigator should pursue the suspect.

The interaction pattern on both guilt and recommendations to pursue the suspect can be explained by individuals' pre-existing assumptions regarding appropriate behaviour in response to trauma (e.g., Robinson et al., 1994; Wortman & Silver, 1989) and Bond et al.'s (1992) expectancy violation model (see also Burgoon, 1993). To confirm that people would expect emotional behaviour in a situation mirroring the one used in our experiment, we had a separate set of 15 undergraduates read the initial information about Mark Dunlevy. Participants did not read any information about how Mark behaved upon finding his parents or whether his behaviour was typical. After reading the paragraph, participants indicated their opinion about ‘a normal display of emotion’ under those circumstances from 1 (not at all emotional [flat]) to 7 (highly emotional [heightened]). Based on a one-sample test, participants' ratings were significantly higher than the midpoint of the scale, suggesting that people expect emotional displays in response to events paralleling those used in this experiment, t(14) = 5.33, < .001.

One feature of our experimental design is that, for some participants, the description of Mark's behaviour at the crime scene was different from what they subsequently saw in the interrogation video. Because the videotaped interrogation immediately preceded the dependent measures questionnaire, this behaviour is what likely drove participants' answers to those questions. Future research should determine whether consistency between descriptions of crime scene behaviour and interrogation demeanour moderate reactions to unexpected (i.e., flat) demeanour. In addition, future research should address questions of whether unexpected demeanour at the crime scene produces different reactions than does unexpected demeanour during an interrogation. For example, people may find flat demeanour at a crime scene to be less of an expectancy violation than during an interrogation because of the former's temporal proximity to the traumatic event.

Unexpectedly, participants in the flat expectation condition recommended a shorter sentence than did participants in the emotional expectation condition. This was the only significant main effect of the expectation variable and seems inconsistent with the higher guilt ratings in the flat demeanour interrogation condition. Although one would expect guilt and sentence recommendations to be positively correlated, they were not related here, r(58) = −.04, = .79. The unexpected disconnect between guilt ratings and ostensibly related variables is apparent in other patterns of data (cf. alibi believability and guilt ratings, Olson & Wells, 2004). In the current context, it suggests that participants factored other considerations into sentence recommendations beyond mere guilt. For example, perhaps the suspect's youth prompted participants to consider the likelihood of rehabilitation when making a sentence recommendation. If they felt that a prison setting would not facilitate rehabilitation, they might have recommended a light sentence rather than a harsh one.

There were no effects on participants' ratings of whether the interviewer pressured the suspect or whether the suspect's behaviour matched their expectations. That participants rated the interviewer's behaviour as consistent across interrogation videos is consistent with the fact that interviewer's behaviour was not intended to vary as a function of either independent variable. In contrast, we expected participants' ratings of whether the suspect's behaviour matched their expectations to serve as a manipulation check of the expectation manipulation. However, it is possible that at least some participants interpreted the question as one about their intuitive reactions to trauma rather than a question specific to the experimental context.


Experiment 1 clearly demonstrated the effect of flat demeanour evidence on ratings of guilt and recommendations to pursue the suspect. However, once the initial investigation has concluded and a suspect is charged with the crime, a jury is often tasked with determining whether they believe the defendant is guilty or not, given the totality of the evidence. Therefore, it is important to examine how lay people evaluate evidence about the suspect's demeanour during the initial questioning when it is embedded in the context of a trial along with other pieces of evidence. In Marty Tankleff's case, the other damning evidence against him was the fact that he confessed to his parents' murders. Therefore, the question of how demeanour evidence and confessions combine is highly relevant. To test the impact of this combination of evidence on guilt ratings, we manipulated the presence of confession evidence in Experiment 2.

One possibility is that a confession contaminates any other evidence, boosting its perceived probative value beyond what is found in a no confession control condition. Evidence for this hypothesis comes from multiple studies showing the dramatic effects of confession evidence on everything from evaluations of a defendant (e.g., Kassin & Neumann, 1997) to eyewitnesses, some of whom are even willing to change their identification decisions when confronted with information that another line-up member confessed (Hasel & Kassin, 2009). This possibility is also consistent with the conjunction fallacy, a heuristic in which people assume the probability of two events occurring together is higher than the likelihood that either of the two events occurred separately (Tversky & Kahneman, 1983; see also Briggs & Krantz, 1992). On the basis of this prior research, we expected that confession evidence would buttress evidence that is perceived as weak on its own. In particular, we expected demeanour testimony to be evaluated as even more damning when combined with a confession than when presented alone.

In addition to manipulating the presence of a confession, we made three other changes to Experiment 1 to create a conceptual replication in Experiment 2. First, instead of learning about the suspect's demeanour through the detective's description of the initial suspect interview, participants in Experiment 2 learned about suspect demeanour at the initial interview through the tone and content of the detective's testimony at trial. Second, we used different operationalizations of the flat and emotional demeanour evidence used in Experiment 1 so as to increase the generalizability of conclusions about demeanour evidence. Finally, to establish a base level of belief in the defendant's guilt with a confession present, we added a control condition in which no information was provided about the defendant's demeanour. The resulting design was a 2 (Confession vs. No Confession) × 3 (Flat Demeanour Testimony vs. Emotional Demeanour Testimony vs. No Demeanour Testimony) between-subjects, fully randomized factorial design.



One hundred and forty-seven undergraduates at a large south-eastern university in the United States participated in exchange for partial course credit.


Participants were randomly assigned to read one of six versions of a trial summary of defendant ‘Ryan Willard’. In each version, participants read a brief sketch of the actual case against Marty Tankleff:

Ryan Willard said he woke up on the first day of his senior year in high school to discover his mother and father brutally stabbed and bludgeoned. His mother, Arlene, was dead. His father, Seymour, was unconscious but alive. Ryan called 911 and then said he tried to give first-aid to his father. When police arrived, led by Detective McCready, they became suspicious of Ryan, reporting that he gave confusing and contradictory accounts of what he had done that morning and previous night. Police also found that there were no signs of a forced entry into the large house, which was in an affluent neighbourhood. According to police, Ryan was stunned when told that his father was still alive. Then he tried to steer the detectives to Dan Hays, his father's bagel-store partner, as being responsible for the attacks. According to Ryan, Hays owed his father money, threatened him, and was the last guest at the Willard home the previous night.

Based on Detective McCready's suspicions, police took Ryan to the Suffolk County Police Headquarters for questioning. Ryan said he did not recall any details of the crime and claimed to be asleep for the whole attack. Police asked Ryan all about his background and his relationship with his parents and urged him to tell the truth. Ryan insisted that he loved his parents and did nothing wrong. Yet during questioning, Detective McCready got Willard to admit that parts of his story were inconsistent. At one point, police led Ryan to believe that they had physical evidence that implicated him in the attack.

Manipulation of confession evidence

At this point, participants in the no confession condition read: ‘Even after being told about that evidence, Ryan did not confess to the crime. He continued to deny that he had or would ever kill his parents'. Participants in the confession condition read,

After being told about that evidence, Ryan lowered his head and started to confess. Summarizing his confession, Detective McCready hand wrote the following statement: ‘Yesterday, I went to the mall shopping. I was supposed to be home early enough to set up the card table for my father and his friends. When I got home about 9:10 p.m. my mother was mad at me because I didn't do it. My father punished me mostly, but my mother had been siding with my father lately. I was angry because my father's partner Dan Hays was going to stay with me when they went to Florida in October. They ruined my summer by not letting me use the boat as much as I wanted. They wanted me to drive the crummy old Lincoln. They were fighting a lot and taking it out on me. When my mother sided with my father about the card table I was really mad. I decided that I wanted to kill them both. I set my alarm for 5:35 to make sure that I would be up before them. I decided to use the barbell when I went to bed. When I got up I was surprised to see the lights still on. When I looked in my parents' room my mother was there. She was sleeping. I went down to the office and saw my father sleeping in the chair. I decided to kill my mother first. I ran across the bed. I got to her quick. I hit her four or five times on the head. She fought me. I went to the kitchen and got a knife. I ran back with the knife. I cut her throat. I don't know how many times but I stabbed at her also. Mostly, I cut her throat and neck. I left to…'.

Detective McCready said that at that point, an attorney for Ryan Willard's family showed up and no further statements were taken. Ryan did not sign the confession. He later said that he was coerced into confessing because of how he was treated and what he was told during the interrogation. Ryan maintained that he was innocent, even though he started confessing during the interrogation.

Manipulation of demeanour evidence

At this point, participants read about Ryan's demeanour during the initial police encounter. In the emotional demeanour condition, participants read:

Both Detective McCready and another detective on the scene testified that Ryan was extremely emotional and distressed. ‘When we arrived, he was crying hysterically, and, whenever he talked with anyone, he spoke rapidly in a high pitched voice and used a lot of hand gestures’, McCready said. ‘It was very obvious that he was upset’. According to the second detective, ‘Considering what happened, and compared to everyone else in the family, he was frantic and distraught, like he might break down at any minute’.

In the flat demeanour condition, participants read:

Both Detective McCready and another detective on the scene testified that Ryan was unemotional and evasive. ‘When we arrived, he was sitting there with his legs crossed and his hands folded over his knees’, McCready said. ‘It struck me as odd that he would be so calm and didn't appear to be upset’. According to the second detective, ‘Considering what happened, and compared to everyone else in the family, he was cool and calm, like he had ice in his veins’.

Dependent measures questionnaire

Participants rated their impression of Ryan's probable guilt (guilty or not guilty), and their confidence in their guilt rating on a scale from 0% (not at all confident) to 100% (extremely confident). They rated the likelihood of the defendant's involvement in the crime on a scale from 0 (extremely unlikely) to 10 (extremely likely). Participants also indicated whether they read testimony from the detectives about the defendant's emotional state when they arrived at the scene of the crime, and if yes, they rated how the detectives described the defendant on a scale from 0 (highly unemotional) to 10 (highly emotional). Finally, participants were asked whether Ryan confessed to the crime and, if yes, whether he claimed his confession was coerced.


Participants signed up for the experiment through the departmental participant pool's SONA experiment management system and arrived at the laboratory to participate in the study individually. Participants read the case summary and responded to all questions through an online experiment-hosting website, Qualtrics; all data were anonymous. All case summaries began with instructions telling participants to imagine serving as a grand juror in the proceedings, People of the State of New York v. Willard. After participants read the case summary, they completed the dependent measures questionnaire, were thanked for their time, and were fully debriefed.


Manipulation checks

We conducted a 2 (Confession vs. No Confession) × 3 (Flat Demeanour Testimony vs. Emotional Demeanour Testimony vs. No Demeanour Testimony) ANOVA on ratings of the defendant's demeanour. There was a main effect for the defendant's demeanour F(2, 141) = 95.52, < .001, ηp2 = .58, 95% CI for ηp2 [.33, .95]. Pairwise comparisons using Tukey's post-hoc analysis revealed that participants rated the defendant's demeanour as more emotional when they were given testimony about the defendant's emotional demeanour (= 8.63, SD = 1.38) than if they did not read testimony about the defendant's demeanour (= 6.58, SD = 2.24). They also rated the defendant's demeanour as less emotional when they were given testimony about the defendant's flat demeanour than if they were not given testimony about the defendant's demeanour (= 2.94, SD = 2.42).

Participants in the confession conditions reported that Ryan confessed more often than did participants in the no confession conditions, χP2(2, = 147) = 139.21, < .001, Cramer's V = .97, 95% CI for Cramer's V [.82, 1.13]. In addition, participants viewed the confession as coerced, as opposed to voluntary, χP2(1, = 72) = 68.06, < .001, ϕ2 = .95.

Three participants failed the categorical manipulation check question asking whether the defendant confessed. There was no difference in the rates at which people failed the manipulation check across all conditions, χ2(5, = 147) = 7.28, = .20, confession conditions, χ2(1, = 147) = 0.35, = .55, or across demeanour conditions, χ2(2, = 147) = 2.13, = .35. When we excluded participants who failed the categorical manipulation check, our resulting sample size for the analyses reported below was = 144.

Is the suspect guilty?

Ratings of probable guilt

To investigate whether confession evidence or demeanour testimony affected the likelihood of a guilty verdict, we used generalized linear models. Overall, 61 participants (42.4%) voted guilty. We found a main effect of confession evidence such that participants who read the confession had a 65% probability of believing the defendant was guilty, whereas participants who did not read this evidence had only a 15% probability of believing in guilt, Wald χ2(1, = 144) = 23.37, < .001. Regarding the impact of police demeanour testimony, we found a significant main effect, χ2(2, = 144) = 6.26, = .04. Further investigation revealed that this main effect was driven by the difference in participants' reactions to the flat demeanour testimony (54% voted that he was guilty) as compared to no demeanour testimony (31%), = .025, and as compared to emotional demeanour testimony (27%), = .035. Finally, we found a marginally significant interaction between confession evidence and police demeanour testimony, χ2(2, = 144) = 5.91, = .052. Planned comparisons revealed that in the no confession condition, flat demeanour produced higher guilt ratings (40%) than either no demeanour testimony (17%), = .06, or emotional demeanour (4%) = .001. However, in the confession condition, emotional demeanour (75%) produced marginally higher guilt ratings than no demeanour testimony (50%), = .07. Guilt ratings in the flat demeanour condition (68%) were not significantly different from either the emotional demeanour condition or the no demeanour testimony condition (see Table 1 and Figure 2).

Table 1. Means (standard deviations) for primary dependent variables in Experiment 2
  Testimony condition
Dependent variableConfession conditionFlat demeanourEmotional demeanourNo demeanour
Probability of guiltNo confession40% (118%)4% (49%)17% (91%)
Confession68% (112%)75% (106%)50% (128%)
Likelihood of involvementNo confession46.76 (22.06)33.13 (15.95)42.58 (17.38)
Confession57.56 (32.20)66.42 (29.27)49.23 (33.73)
Confidence in guilt decisionNo confession89.04 (60.79)47.08 (28.57)63.13 (39.80)
Confession125.92 (64.59)135.17 (68.53)110.00 (68.18)
Figure 2.

Interaction between demeanour expectation and suspect demeanour on probability of guilt ratings in Experiment 2. Bars represent standard error values.

Confidence in guilt decision

To obtain a more sensitive measure of participants' decisions, we created a scalar variable called guilt confidence scores by combining guilt judgments and confidence ratings. Specifically, if participants rated the suspect as guilty we added their guilt confidence rating to 100. If participants rated the suspect as not guilty, we subtracted their guilt confidence rating from 101. Thus, we created a scale in which scores could range from 1 (100% confidence that Ryan was not guilty) to 200 (100% confidence that Ryan was guilty). We then conducted a two-way Confession × Demeanour Testimony ANOVA on the guilt confidence scores.

There was a significant main effect for confession, F(1, 138) = 36.18, < .001, ηp2 = .21, 95% CI for ηp2 [.07, .40]. The defendant was seen as more guilty when he confessed (= 124.11, SD = 66.89) than when he did not confess (= 66.73, SD = 48.00). There was no main effect for demeanour testimony, F(2, 128) = 1.80, = .17, ηp2 = .03, 95% CI for ηp2 [−.02, .08], but there was a marginally significant interaction between the confession evidence and demeanour testimony, F(2, 138) = 2.74, = .068, ηp2 = .04, 95% CI for ηp2 [−.02, .11].

Planned one-way ANOVAs with Tukey's post-hoc analyses revealed that in the no confession condition, demeanour evidence affected guilt confidence, F(2, 70) = 5.37, = .007, η2 = .13, 95% CI for ηp2 [−.02, .33]. In particular, participants had higher guilt confidence scores if the defendant had flat demeanour (= 89.04, SD = 60.79) than if the defendant had emotional demeanour (= 47.08, SD = 28.57), = .005. No other comparisons were significant, ps > .12 (no demeanour condition, = 63.13, SD = 39.80). In contrast, if participants received testimony about a confession, their guilt confidence scores did not differ across the three levels of demeanour testimony, F(2, 68) = 0.82, = .44 (no demeanour testimony: = 110.00, SD = 68.18; flat demeanour testimony: = 125.92, SD = 64.59; and emotional demeanour testimony: = 135.17, SD = 68.53).

Likelihood of involvement

We found similar results to the guilt confidence scores when we asked the participants to indicate on a scale from 0 to 100 how likely they thought it was that the defendant had committed the crime. There was a significant main effect for confession, F(1, 138) = 15.26, < .001, ηp2 = .10, 95% CI for ηp2 [0, .22]. The defendant was seen as more likely to have committed the crime when he confessed (= 57.97, SD = 32.05) than when he did not confess (= 40.90, SD = 19.31). There was no significant main effect for demeanour testimony, F(1, 138) = 0.70, = .50, ηp2 = .01, 95% CI for ηp2 [−.02, .04], but there was a significant interaction between the type of confession and demeanour testimony, F(1, 138) = 3.63, = .029, ηp2 = .05, 95% CI for ηp2 [−.02, .13].

Planned one-way ANOVAs with Tukey's post-hoc analyses revealed that in the no confession condition, demeanour evidence affected perceptions of the suspect's likelihood of involvement, F(2, 70) = 3.40, = .04, η2 = .09, 95% CI for ηp2 [−.03, .25]. In particular, participants reported higher likelihood of involvement ratings if the defendant had flat demeanour (= 46.76, SD = 22.06) than if the defendant had emotional demeanour (= 33.13, SD = 15.95), = .03. No other pairwise comparisons were significant, ps > .19 (no demeanour testimony: = 42.58, SD = 17.38). In contrast, if participants received testimony about a confession, their likelihood of involvement ratings did not differ across the three levels of demeanour testimony, F(2, 68) = 1.69, = .19 (no demeanour testimony: = 49.23, SD = 33.73; flat demeanour testimony: = 57.56, SD = 32.20; and emotional demeanour testimony: = 66.42, SD = 29.27).


Experiment 2 sheds light on two important points about demeanour evidence. First, using the same basic case facts as in Experiment 1, the current experiment confirms that flat demeanour can be incriminating. This is most evident when evaluating guilt likelihood ratings in the no confession condition: guilt ratings increased from 17%, when no demeanour information was available, to 40%, when flat demeanour evidence was available. Second, at least as operationalized here, emotional demeanour does not, by itself, inflate guilt ratings. Instead, emotional demeanour is potentially exonerating: the lowest guilt ratings in the experiment (4%) were obtained when the defendant was presented with emotional demeanour and no confession. However, the second important point here is that any exonerating potential from emotional demeanour is eliminated when a confession is present: guilt ratings went from 4%, when a confession was absent, to 75%, when a confession was present. Because flat demeanour is already seen as incriminating, adding a confession did not increase guilt ratings as dramatically; ratings increased from 40%, when a confession was absent, to 68%, when a confession was introduced.

The current experiment further underscores the problems inherent in weak confession evidence. In the absence of any demeanour evidence, knowing about the defendant's confession inflated guilt ratings from 17% to 50%. Introducing emotional demeanour evidence to the confession should have had no additional effect on guilt ratings because the emotional demeanour evidence on its own produced guilt ratings equivalent to the no confession control group. However, participants who had a confession plus emotional demeanour evidence produced guilt ratings marginally higher than people who read about a confession, but not demeanour evidence. At least in this context, emotional demeanour evidence that was previously ignored (or even potentially exonerating) in the absence of a confession became evidence of guilt when paired with a confession. The guilt confidence and likelihood of involvement measures support this interpretation. For both of these dependent measures, the presence of a confession eliminated differences between flat and emotional demeanour testimony suggesting that the exonerating potential of emotional demeanour disappears when a confession is present.

The data presented here offer a useful supplement to prior research on reactions to demeanour evidence. For example, in Heath et al. (2004) both emotional and flat demeanour testimony inflated guilt ratings compared with a control condition, but only when evidence against the defendant was weak. In the current research, when the confession evidence was present, neither type of demeanour testimony significantly inflated guilt ratings beyond the group that had a confession only (where guilt ratings were 50%). The discrepancy in these patterns is likely explained by how participants evaluated the confession evidence in the current experiment. Even though 100% of participants who read confession evidence reported that Ryan Willard's confession was involuntary, it was probably not viewed as ‘weak’ evidence. Indeed, confessions have immense power to convict (e.g., Kassin & Neumann, 1997), even among people who claim to recognize that the confession has been coerced (Kassin & Sukel, 1997).


The questions examined in this research were inspired by the case of Marty Tankleff in which ‘inappropriate’ demeanour prompted assumptions of guilt and resulted in an eventual wrongful conviction (e.g., Kassin, 2006). The data presented here provide empirical confirmation of the important role of demeanour evidence. In Experiment 1, flat demeanour produced higher ratings of guilt than did emotional demeanour, especially when participants were instructed that emotional reactions to trauma were typical (a view that aligns with participants' naïve expectations, e.g., Robinson et al., 1994). In Experiment 2, using the same case facts, we manipulated participants' expectations about normal reactions to trauma through police detective testimony, rather than explicit instructions. We replicated the effect of flat demeanour on assessments of guilt. In addition, even though emotional demeanour did not inflate guilt probability ratings on its own, it bolstered participants' guilt assessments when they did have a piece of evidence that suggested guilt (i.e., a confession).

The current research included questions relevant to jury decision-making contexts and police investigative contexts. In both areas, these data have broad implications. First, data from both experiments suggest that demeanour information available from the interrogation can influence decisions about guilt. This finding is particularly relevant for jurisdictions requiring videotaped interrogations. In those jurisdictions, the suspect's interrogation demeanour will be readily apparent to jurors if the interrogation video is presented at trial. In evaluations of eyewitnesses, at least, recent research demonstrates the importance of tangible evidence of pre-trial events. In one experiment, defendants were judged as less guilty when an eyewitness's initial low confidence was presented in the context of a previously videotaped identification procedure rather than presented as a statement read by the witness at trial (Douglass & Jones, 2013). To the extent that interrogation demeanour biases judgments against defendants, the recommendation to videotape interrogations may actually work against defendants in certain cases. In spite of this possibility, we endorse the videotaping recommendation because the clear benefits of establishing a record of interrogation methods outweigh the potential cost in making demeanour evidence available. Future research should target interventions for circumstances in which interrogation demeanour may prejudice the case against a defendant (e.g., using expert testimony to educate the jury about the wide range of reactions to trauma).

Second, these data suggest the need for education on appropriate reactions to trauma for police personnel. Education on the wide range of appropriate responses might forestall premature judgments about guilt, preventing the kind of tunnel vision that characterized Tankleff's case. That education might guide police reactions to emotional displays (or lack thereof) is supported by research demonstrating that reactions to emotional displays can be moderated by instructions (e.g., Bollingmo, Wessel, Sandvold, Eilertsen, & Magnussen, 2009). Unfortunately, the potential for simple instructions to produce lasting changes in investigators' natural reactions to demeanour is probably limited. In general, long-term improvements in professional investigators' behaviour are notoriously difficult to achieve. For example, training on the cognitive interview does improve interviewers' questioning tactics, but only temporarily (e.g., Smith, Powell, & Lum, 2009). A superior alternative might be to formally emphasize thoroughness over efficiency so that police have room to consider unexpected demeanour as merely odd rather than incriminating (cf. Ask, Granhag, & Rebelius, 2011). If police are encouraged to value thoroughness, they might be less vulnerable to tunnel vision (e.g., Findley & Scott, 2006).

The imprimatur of guilt conveyed by the combination of demeanour and confession evidence may help explain another wrongful conviction, that of DNA exoneree Jeffrey Deskovic. After his high school classmate was raped and murdered, Deskovic visited her wake three times and started his own ‘investigation’ of the case, speaking with police eight times before becoming a suspect himself. Police interpreted his behaviour as that of someone who was ‘overly distraught’ at the victim's death and decided to interrogate him about the crimes ( During a 6-hr interrogation, 17-year-old Deskovic was given coffee, but no food. He was told that he failed three polygraph tests, even though he passed all three. Eventually, the police informed Deskovic that there was DNA from the semen inside the victim. At that point, Deskovic confessed because he knew that the DNA would not match him. Even though the DNA did not match Deskovic, the prosecutor dismissed this evidence, claiming that the victim must have had unprotected sex with another man before Deskovic raped her. Almost 16 years later, DNA from the perpetrator's semen was tested with newer technology and matched a man who was incarcerated for strangling the sister of his live-in girlfriend. Deskovic was released from jail and his indictment was dismissed on the grounds of actual innocence. The details of the Deskovic case suggest that expectations about proper emotionality may in fact interact with the potential suspect's relationship with the victim (see Heath, 2009 for a description of other cases where defendant or suspect emotion played a role in the investigation).

In cases like those of Marty Tankleff and Jeffrey Deskovic, defence attorneys might introduce competing expert testimony about confession evidence. Unfortunately, this is not likely to be an entirely effective safeguard. First, jurors routinely ignore expert testimony on a wide range of issues including eyewitness identifications (e.g., Martire & Kemp, 2011) and forensic evidence (e.g., Jenkins & Schuller, 2007). Second, even if expert testimony was included, the trial stage could be too late. Research on alibi evaluations clearly shows that people are much more critical of evidence if placed in the role of a juror versus the role of an investigator, presumably because mock jurors reasonably conclude that the evidence must be weak, given that the case proceeded to trial (Sommers & Douglass, 2007). Therefore, the investigation stage is the ideal point of intervention precisely because this is the point at which ‘inappropriate’ demeanour may trigger guilt-presumptive questioning and high-pressure interrogation techniques, setting in motion an irrevocable conviction process (e.g., Kassin et al., 2003).

An additional avenue for future research is to determine whether defendant or suspect gender moderates reactions to demeanour (cf. Heath, 2009). The extent to which people view flat or emotional demeanour as more typical of one gender could determine evaluations of guilt. This question is directly relevant to research showing that reactions to mental illness vary considerably as a function of gender typicality. For example, people are less negative in their evaluations of individuals with gender atypical diagnoses. In practice, this means that men with depression and women with alcoholism (gender atypical diagnoses) produce less negative reactions than do men with alcoholism and women with depression (gender typical diagnoses, Wirth & Bodenhausen, 2009). If detectives and jurors expect more emotional reactions from women than from men, would a female suspect who presents with flat demeanour be viewed as even guiltier than a man who presents with the same demeanour? Conversely, would a man who presents with emotional demeanour be seen as guiltier than a woman with the same demeanour (cf. Salekin et al., 1995)?

Finally, extrapolating from our data leads to the troubling conclusion that weak confession evidence can contaminate any other ‘evidence’ with which it is paired, producing an effect that is greater than additive. That confessions have such power is consistent with legions of data indicating their persuasiveness in producing convictions (e.g., Kassin & Sukel, 1997). In addition, as noted above, confessions can even prompt eyewitnesses to change their prior identifications so that the new person identified matches the individual who confessed (Hasel & Kassin, 2009). If investigators understand that a wide range of emotions (or lack thereof) are appropriate, they might be less likely to use demeanour as grounds for an interrogation or as evidence against a defendant at trial. Had the investigators in Marty Tankleff's case understood that his demeanour reflected a teenage boy in shock instead of a calm, cold-blooded killer, the guilt-presumptive interrogation may have been avoided. Similarly, had the investigators in Jeffrey Deskovic's case considered the possibility that his excessive interest was motivated by a misguided desire to help investigate his classmate's murder, they may have turned their attention to other suspects. In both cases, without damning demeanour evidence, it seems likely that both men would have grown into adulthood in freedom, rather than spending years behind bars for crimes they did not commit.


Experiment 1 was conducted as a senior thesis by the fourth author, under the supervision of the first author. Experiment 2 was based on a senior honours thesis completed by the second author, under the expert guidance of Dr. Saul Kassin. We thank Dr. Erin Hennes and Dr. Patrick Shrout for their help with data analysis in Experiment 2.