Veracity is in the eye of the beholder: A lens model examination of consistency and deception

Summary Research has attempted to explain perceived cues to deception based upon self-report of what participants believe are ‘ good ’ cues to deception, or self-report of what cues participants say they base their veracity judgements on. However, it is not clear to what extent participants can accurately self-report what influences their decision-making. Using a within-subjects design, 285 participants completed a questionnaire regarding their beliefs about deception before rating a selection of truthful and deceptive statements on a variety of cues. Expert coders also rated the statements for the same cues. Laypeople and expert coders do not conceptualise between-subject consistency in the same way. A lens model showed that whilst perceptions of cues, such as consistency and amount of detail, influence veracity judgements, these perceptions (and overall veracity judgements) are mostly inaccurate. Fundamentally, there seems to be inconsistencies between how deception research examines consistency and how it is understood and used by laypeople.

reported by both laypeople (Krix, Sauerland, Lorei, & Rispens, 2015) and legal professionals . When examining the perceptions of the deception-consistency relationship amongst uninformed laypeople and investigators, Masip et al. (2018) found that uninformed laypeople who were asked to make veracity judgements on a series of written statements most commonly reported (90%) utilising (in)consistency to assist in making their judgement.
The general belief in the 'consistency heuristic', as reported by both laypeople and practitioners, is that between-statement consistency implies honesty and accuracy, and that any inconsistencies are indicative of deception or inaccurate answers . However, when considering multiple statements from the same individual, the consistency heuristic is incongruent with both theoretical and empirical memory and deception research, which to date has shown there is little difference between the consistency of liars and truth tellers (Vredeveldt et al., 2014). Even when differences do occur, the evidence suggests that consistency is more indicative of deception, rather than honesty (Vredeveldt et al., 2014).
Whilst there is a large body of research showing that people generally agree how the consistency heuristic should be used, identifying consistency in a series of consecutive statements is still idiosyncratic. Granhag and Strömwall (2000) asked participants to act as lie detectors and rate a series of three videos of the same suspect interrogated on three separate occasions. When the subjective cues reported as justification for participants' veracity-judgements were examined, consistency was the most commonly reported cue. Of 125 participants, 78 reported using the consistency heuristic in making their veracity judgement. However, 38 of these participants reported that they considered the three consecutive statements consistent over time, whereas the other 40 participants reported that the statements were inconsistent over time. Thus, the exact same series of consecutive statements were considered consistent by one judge, and inconsistent by another. Although people seem to agree that the consistency heuristic is a good cue to deception, they display little agreement as whether a set of statements are consistent or not.
Given these discrepant findings, it seems that whether a series of statements are perceived as consistent or not is moderated by the individual factors pertaining to the judge. It has been suggested that personality traits underpin deception production abilities, and as such should in turn affect lie detection abilities (Semrad, Scott-Parker, & Nagel, 2019). Previous research is contrasting, with lie production skills having been found to facilitate lie detection abilities (Wright, Berry, & Bird, 2012;Wright, Berry, Catmur, & Bird, 2015), and also to have no effect (DePaulo & Rosenthal, 1979;Levine, 2016).
Previous meta-analyses have found no relationship between individual differences and accuracy in judging deception, however, the studies included in the meta-analyses did not have adequate sample sizes to explore the relationship between personality dimensions and accuracy in veracity judgements (Aamodt & Custer, 2006). The majority of past research has used samples of N ≤ 75, however, correlations required to determine meaningful relationships stabilise at N > 250 (Schönbrodt & Perugini, 2013). A recent study with a sample of 207 participants found that dark and maladaptive personality traits (captured in the honesty-humility domain) were associated with judgemental biases, however, did not relate to the ability to detect deception (Wissing & Reinhard, 2017). Honesty-humility traits (e.g., sincerity, fairness) may influence someone to be more likely to consider statements as truthful, whereas social skills may make a person more socially discerning. Therefore, there is a need for adequately powered research to examine personality traits and individual lie detection abilities.
To date, researchers have attempted to examine subjective cues to deception based on self-report (see meta-analysis by Hartwig & Bond Jr, 2011 or what cues they base their veracity judgements on. There is a methodological limitation to this approach, as there is no way to establish that the cues people report using are actually the ones that best explain their decision-making process. It is possible due to low cue validity, people are unaware of what drives their veracity judgements, and when asked about it choose to report explicit social stereotypes about deceptive behaviour.
One approach to address this question it to use lens modelling.
The lens model (Brunswik, 1952) was designed to examine the extent to which perception is influenced by differing cues available in a stimulus. Fiedler and Walka (1993) used lens modelling techniques to explore the contribution of common cues (e.g., head movements, smiling, speech rate) on judgments of deception. They demonstrated how certain perceived cues predicted lie condition and subjective evaluation when participants were informed about how to interpret the cues and given feedback.
A lens model meta-analysis by Hartwig and Bond Jr (2011) summarised (i) the relationship between non-verbal and verbal cues created by deception and (ii) how the non-verbal and verbal cues influenced perceptions of deception, to assess which behaviours were enabling accurate judgments of deception. Through the examination of 81 distinct judgement cues, it was found that individuals often based their veracity judgements on cues that were not indicative of deception. Furthermore, the cues that participants reported using did not influence their decision-making. Consequently, this meta-analysis suggests that individuals are not only holding false beliefs about valid information for deception detection but are also not aware of their inaccurate beliefs. Similarly, a meta-analysis lens model by Sporer and Schwandt (2007) examined 12 non-verbal cues to deception, taken from 54 studies. Only three of these cues (nodding, foot and leg movements and hand movements) were found to be reliably associated with deception, whereas those most commonly believed to be associated with deception (adaptors, foot and leg movements and illustrators) had either opposing or negligible effects, again suggesting that individuals hold false beliefs about valid information for deception detection. However, these meta-analyses do not fit within the traditional lens model approach. For a lens model analysis, all participants should be exposed to the same type of stimuli. In a metaanalytic approach, there is great variability in participants' experiences, stimulus material (e.g., written statements, videos or live interview) and study methodology.
Research is yet to examine beliefs about the consistency heuristic and veracity judgements using a lens model approach. In the current research, we examined how much of the variance in veracity judgments is explained by essential sources of information present in a pair of statements, such as the level of detail and between-statement consistency. Participants were asked to report their beliefs regarding a variety of cues to deception, before making veracity judgements on a series of statements and rating them for level of detail and consistency. As not all judges interpret cues equally, and in turn, not all statements contain all cues, using a within subjects repeated measures design we account for the variance that is contributed by the judge (i.e., a participants' personal beliefs about consistency and deception) as well as the statement. Given the widely reported use of betweenstatement consistency to inform veracity judgements (Granhag & Strömwall, 2001;Masip et al., 2018;, it was predicted that participants will use the consistency heuristic to inform their credibility assessments. As such, participants' perception of between-statement consistency will influence their veracity judgements such that statements that are perceived to be higher in consistency will be rated significantly more truthful than statements that are lower in consistency (Hypothesis 1). As research shows that truth tellers provide more details than liars (DePaulo et al., 2003), it is predicted that statements that are perceived as more detailed will be rated as more truthful than statements that are lower in detail (Hypothesis 2). We also predicted that statements that are perceived to be higher in terms of both between-statement consistency and detail will be considered significantly more truthful than statements that are perceived to be lower in between-statement consistency and lower in detail (Hypothesis 3). It was anticipated that participants will prioritise the consistency heuristic in their veracity decision-making, as it is a more commonly reported perceived cue to deception than amount of detail. As such we predict that statements that are perceived to be higher in between-statement consistency and lower in detail will be considered significantly more truthful than statements that are perceived lower in between-statement consistency and higher in detail (Hypothesis 4). As participants can interpret the same cues in different ways (Granhag & Strömwall, 2000), we predicted there will be variability in veracity ratings. We propose this variability will be greater accounted for by individual factors of the participant rating the statement rather than by level of detail and level of consistency between statements (Hypothesis 5). However, as the effect of individual factors of the participant and the influence of the statement cannot be empirically compared, we report the SD of the statement and participant intercepts separately in the model.
In total, 772 participants clicked on the survey link, but due to incomplete data (as per pre-registration, completion of the questionnaire was a requirement to include data for analysis, osf.io/vqga4), a final sample of 285 (36.81%) were retained for analysis. Our cessation of data collection rule was that the questionnaire would be taken offline on the Friday after a minimum of 120 participants had completed it. This sample was based upon a priori power calculations indicating that for the study to detect a medium effect size (r = .30) with high power (.90), given alpha = .05, a minimum of 112 participants were required.

| Design
This study used a within-subjects design, where participants reported their beliefs about cues to deception as well as their judgments of four statements for perception of: veracity, amount of detail, consistency, number of repetitions, number of omissions, number of reminiscences, number of contradictions and confidence for their veracity judgement.
To explore whether laypeople consider consistency in the same way as expert coders, participants provided an overall rating of betweenstatement consistency as well as ratings of the four factors of consistency identified by Fisher et al. (2013).

| Beliefs questionnaire
Participants completed a short 20-item questionnaire regarding their beliefs about cues to deception, including six critical questions (regarding detail, consistency, repetitions, omissions, reminiscences and contradictions) and 14 foil questions. The foil questions concerned individuals' beliefs in other identified cues to deception, taken from Reality Monitoring criteria (Sporer & Sharman, 2006), verifiability approach (Nahari, Vrij, & Fisher, 2014) and a meta-analysis regarding cues to deception (DePaulo et al., 2003). We included foil questions to prevent participants identifying that consistency and detail were the focus of the questionnaire, and do not analyse these beliefs as they are not relevant to our research questions. Participants were asked to rate the cues with respect to how much they believed they were useful cues for determining veracity. Each cue was presented as a statement (e.g., 'Statements that contain unexpected complications') and was rated on an 11-point Likert scale where 0% represented a strong cue to honesty, 50% was neutral, and 100% represented a strong cue to deception. Question order was randomised to reduce 998 order effects, and the beliefs questionnaire took approximately 3 minutes to complete. A copy of the beliefs questionnaire is available at osf.io/ambcf/.

| Personality questionnaire
An exploratory component of this study was the inclusion of a personality measure. Participants were asked to complete a short personality questionnaire (HEXACO-60, Ashton & Lee, 2009)

| Rated statements
To generate statements for rating, 17 individuals (6 male, 11 female, aged 18-52, M age = 28.44, SD age = 9.35) were randomly allocated to either telling the truth or lying about a memorable event that took place in the past 2 years. They were interviewed by an instructionblind interviewer. Truth tellers were asked to recall a positive memorable event that occurred within the past 2 years, and were given examples such as of going on holiday or having a celebratory birthday dinner. They were instructed to choose an event that they were comfortable talking about honestly and could remember in detail. Those who were asked to lie were asked to invent the details of a positive memorable event provided by the researcher (e.g., 'Spending Christmas on a beach'). Following a delay of 1 week, all participants returned to be interviewed by the same interviewer for a second time.
Participants were either truthful in both interviews or deceptive in both interviews. Interviews were audio recorded and conducted using a standardised script, featuring one open-ended request for the interviewee to tell the interviewer everything about their memorable event. The transcripts were then used as stimulus materials within the subsequent online questionnaire.
The memorable event interviews were transcribed and coded for detail and between-statement consistency. Each interview transcript was first coded for number of details provided. (e.g., 'on Christmas day itself we went for a walk on the beach which was sandy and freezing cold'. would contain five details; 'sandy', 'freezing cold', 'walk', 'Christmas day' and 'beach'). Details were only counted the first time they were mentioned for each account. For both truth tellers and liars, the details provided in their second accounts were compared with those provided in their first accounts, and categorised as the elements of consistency (specified as 'repetition', 'omission', 'reminiscent' and 'contradiction', as described by Fisher et al., 2013). Repetitions were details reported in both phases of the interview, omissions were details reported in the first phase but not in the second phase of the interviews, reminiscent details were details reported in the second phase but not in the first phase of the interviews, and contradictions were details reported in the first phase that were reported differently in the second phase. A subset of four interviews (25%) were coded by a second researcher, who was blind to the experimental conditions. The inter-rater reliability between the coders was high for details in the first accounts (intra-class correlation coefficient [ICC] = .96) and detail in the second accounts (ICC = .98). High reliability was also found across the two coders for repetitions (ICC = .98), reminiscences (ICC = .95), contradictions (ICC = .88) and omissions (ICC = .89). There were no significant differences between truth teller and liar statements for the amount of detail provided, or any of the factors of consistency, all t's < .81, all p's > .43.
For the online questionnaire, we used the first 16 sets of the paired statements, and disregarded the statements provided by the 17th participant. Participants were shown pairs of statements so that they could compare the details provided at time one to the details at time two to assess between-statement consistency. We chose not to include the 17th pair of statements, to ensure an even number of Completing the ratings for four sets of paired statements took on average around 15 minutes to complete.

| Procedure
Participants were first presented with the 20-item questionnaire regarding their beliefs about cues to deception. Questions were presented one at a time and in a random order. Following this, participants completed the HEXACO-60 as a filler task, before moving on to the veracity judgement task. Participants were each asked to read a HUDSON ET AL. 999 different selection of four randomly selected pairs of statements from the bank of 16 sets of paired statements. Participants were informed that each pair of statements was taken from the same individual 1 week apart, and that the individual may be lying in both statements or telling the truth in both statements about their experience. Participants were asked to read each pair of statements and select whether they believed the pair of statements were an honest or deceptive report, and provide a confidence rating for their judgement. They were then asked to rate the statement with respect to the amount of detail, consistency between the statements, number of repetitions, number of omitted details, number of reminiscent details and number of contradictions, which were sequentially presented below the statement pairs, and in a randomised order. Participants were informed that all of the rating variables were of equal importance, and as a statement evaluator, they needed to look for each of the qualities in the statements.
Once the ratings for all four paired statements were completed, participants were debriefed and thanked for their time. Data was collected using online survey platform Qualtrics, and in total took approximately 30 minutes to complete.

| Analysis
The statistical tests used in this study (linear mixed effects modelling) as well as the overall analytic framework (lens modelling) and are not tor' in our design. Linear mixed models, much like standard regression, will allow us to see how participant ratings predict our fixed factor of statement condition and will additionally report how much variance in this effect is explained by our random factors.
We will use these models to evaluate the relationship between participant judgments and statement qualities. So we will first investigate the relationship between participant and expert coding of statement features, such as repetitions or omission, whilst controlling for random participant and statement factors. When we report linear mixed effects models in the article, we report the unstandardised overall sample slope (the 'estimate' in Table 2 or 'β = .xx' in text) with 95% CI of that effect ('[.xx, .xx]'), a p-value and the standard deviation of participant-level slopes ('SD = .xx'). An effect with a larger SD of slopes suggests more variability in the overall effect at the participant level than an effect with a smaller SD of slopes.
Second, we will use linear mixed models in a 'lens model' framework to understand the relationship between participant binary judgments of veracity, statement veracity and participant coded cues. A lens model framework allows us to test the 'achievement' of participant veracity judgments at detecting statement veracity and explore how participants' use of the cues they coded (i) influence their veracity judgment and (ii) detect the veracity condition of the statements.
Lens modelling is distinct to mediation analysis approaches as there is reasonable expectancy that the potential cues are not all correlated with judgments or statement veracity. Whilst it is possible that some cues relate to both participants' judgments and statements (a 'useful cue'), it is also expected that some of the coded cues relate to neither (a 'useless cue'), some coded cues may relate to statement veracity but not judgments (a 'missed cue') or some coded cues may influence judgment but not relate to veracity (a 'red herring'). Lens modelling is often presented graphically to show the effects most efficiently. Similarly, correlations will be used to explain the relationship between variance in HEXACO personality and variance in participantlevel slopes for the detecting statement veracity with veracity judgments (participant 'accuracy').
Due to the number of tests run in this study, we correct our alpha level so that we only report tests meeting a conservative criteria of p < .005 as statistically significant throughout. All our code for analysis can be found at osf.io/nd8ke/, and data is available at osf.io/ a2yxq/ and osf.io/39qby/.

| Expert-participant agreement on cues
To establish whether laypeople interpreted statements in the same way expert coders are trained to, we examined a linear mixed effects model (Table 1). We found that participants' ratings of details concurred with those given by expert coders for the first account and the second account. To a lesser extent, participants agreed with expert coders about the number of reminiscences. There was no such agreement between participants and expert coders for omissions, contradictions or repetitions.

| Veracity cue perception and judgements
We explored whether participants' perceptions of between-statement consistency influenced their veracity judgements, and whether these perceptions of consistency were accurate in identifying truthful or deceptive statements. Participants were accurate in identifying statement veracity 50.88% of the time, and accurate in identifying truthful statements 65.63% of the time. First, we discuss the relationship between participants' ratings of the statements and their binary judgment of veracity. Full linear mixed effects models are reported in Table 2. We found that when participants rated statements as more detailed, they were more likely to consider statements as truthful, supporting Hypothesis 1. Furthermore, when participants rated statements as more consistent, they were significantly more likely to consider statements as truthful (see Table 2), supporting Hypothesis 2. In addition, fewer contradictions, fewer omissions, and more repetitions were significantly more likely to influence veracity perceptions to be truthful (all p < .001, see Table 2

| Actual and self-reported cue usage
We next investigated the relationship between self-reported cue influence and the extracted participant-level slopes of cue usage (i.e., their actual use of the cue, see the right-hand side of Figure 1) to explore whether participants were utilising the cues they report as being important to the decision-making process. A significant negative correlation was found between the participants self-reported utility of cues, and participants actual use of cues for reminiscence, r

| Personality of perception of deception
We conducted exploratory analyses to look at participant HEXACO personality traits and accuracy of veracity judgements to identify whether there was a difference between the personality traits of those who were accurate and inaccurate in their veracity judgements. There were no notable correlations between participant personality traits and accuracy of veracity judgements (all absolute r ≤ .14, all p ≥ .02).

| DISCUSSION
We examined the relationship between participants' perceptions of between-statement consistency, their perceptions of four components of consistency (repetitions, omissions, reminiscences and contradictions), and their veracity judgements. This research addressed three main questions. First, we wanted to establish whether laypeople would interpret statements in the same way as trained expert coders. The four components of consistency were operationalised by expert coders, and the participants' rating of the components were compared with this. We operationalised between-statement consistency using expert coders, as this is how consistency is commonly determined and empirically tested within a research setting. We found that while participants reported perceiving the same amount of overall detail and reminiscent detail in a statement as the expert coders, they did not agree on the amount of omission, repetition or contradiction. This would suggest that expert coders and laypeople are not relying on the same cues when they assess between-statement consistency across multiple statements. Therefore, research reporting the behaviours of experts who code statements for between-statement consistency in detail, with dedicated time and resources, may not examine the same behaviour as novices trying to identify 'consistency' for the first time. In this research, participants' holistic consistency judgements accurately predicted the veracity condition of a statement, but did not seem to relate to the components of consistency. It is therefore important for future research to more precisely delineate what is meant by participants when they report utilising consistency to inform their veracity judgements.
Second, we explored whether participants' perceptions of betweenstatement consistency influenced their veracity judgements, and whether these perceptions of consistency were accurate in identifying truthful or deceptive statements. We found that perception of amount  Table 2 for values amounts of repetition (an increase in consistency) led to more truthful judgements, and was positively correlated with participants' holistic perceptions of consistency. Conversely, greater amounts of omission and contradiction (a decrease in consistency) led to more judgements of deception, and both components were negatively correlated with participants' holistic perceptions of consistency. Reminiscence was not used in the formation of veracity judgements and had the weakest correlation with holistic consistency, which may be due to reminiscence being prevalent across truth tellers repeated accounts (Gilbert & Fisher, 2006 Finally, we were interested in whether participants used the cues they reported as being important to the decision-making process. There is a possibility that extraneous cues that were not examined informed participants' veracity judgements, and that the cues examined in the current research simply correlate with these extraneous cues. We therefore interpret the following results with caution. A significant negative correlation was found between participants' self-reported use and actual use of reminiscence, however, no other correlations between self-reported use and actual use were found. This suggests that although participants reported using reminiscence in their judgement-making, they displayed no evidence of this when completing their statement ratings. It is possible that the reminiscence could simply be a correlate of deception judgements, as opposed to being the cause of the deception judgements (despite it being self-reported as useful). Regardless, this discrepancy suggests a poor level of introspection for identifying the features of the statements used to make veracity judgements, and therefore a lack of insight regarding intended cue-usage. It is likely that individuals report a priori, implicit causal theories for their behaviours (Nisbett & Wilson, 1977), and therefore self-reported cue usage does not represent participants actual cue usage. We opted to use real statements, in which the consistency was not experimenter-manipulated in order to maximise ecological validity. Future research could benefit from using manufactured statements, where the four components of consistency are deliberately manipulated and all other features are held constant, to determine whether betweenstatement consistency is used in veracity judgement-making.
There are a number of limitations to the current study, which may have implications for the results. The online questionnaire took around 30 minutes to finish, with only 37% of those who initially clicked the survey link completing all questions. It is reasonable to expect a level of fatigue and a biased sample for a voluntary online study of this length. Participants were asked to make multiple ratings of the same statements on different cues, which may have lowered the attentiveness of participants and resulted in less discrimination in cue perception. Statement presentation was randomised to statistically correct for this, however, the repeated nature of the task may have influenced subsequent judgements, and therefore there it a chance that cue perception altered across the task as a function of fatigue. Future research could examine a detection-accuracy curve when participants make multiple judgements, to establish whether previous judgements influence future judgements.
Participants were asked to make binary 'truthful' or 'deceptive' judgements about statements. However, it is possible that some participants believed the statements contained embedded lies, and as such were neither fully truthful nor deceptive, but were forced to make a choice. Participants may have made different binary judgements based upon their own biases. This study examined individual-level performance, and consequently these biases would have been accounted for in the participants' random slopes although it cannot be extrapolated from this dataset where these biases may have influenced the decision-making process.
A large amount of deception occurs online over a number of interactions in written format, such as internet scams, financial fraud and online grooming. In this research, participants were asked to make third-party veracity judgements based upon a written transcript of an interaction. It stands to reason that the scale labels 'low amount', 'average amount' and 'high amount' are somewhat ambiguous and could be interpreted differently without the social context of being involved in the interaction. For instance, what is considered an average amount of detail in a statement may vary, dependent on the sender and the context of the situation. Therefore, to establish what the 'average amount' of anything is, based on a written transcript provided by a stranger with little surrounding context to the interaction, is a challenging task, and can be considered a limitation of the design.
In conclusion, this research highlights several key points. Primarily, there seems to be a discrepancy between how research explores between-statement consistency and how it is understood by laypeople. Further to this, no notable correlations were found between selfreported reliance on cues and actual usage of cues, and as such it appears that individuals do not understand the cues they personally use to make veracity decisions. Further exploration is needed to establish what cues reliably influence veracity perceptions, and how individuals vary in the decision-making process. The lens model highlights that perceptions of factors of consistency and amount of detail directly influence the veracity judgement, however, these perceptions are often inaccurate, and consequently, so are the veracity judgements. A holistic perception of consistency could facilitate accurate perceptions of veracity, however, it does not appear to be conceptualised in the same way it is constructed by researchers.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available