Do Stronger
 Wise‐Thinking
 Dispositions Facilitate Auditors' Objective Evaluation of Evidence When Assessing and Addressing Fraud Risk?*

The objective evaluation of evidence is imperative for audit effectiveness and the proper exercise of professional skepticism. However, numerous studies suggest that auditors fail to evaluate evidence objectively when assessing or addressing the risk of material misstatement due to fraud. We develop theory to predict that auditors do evaluate evidence objectively but only when they have stronger wise-thinking dispositions (WTDs), a construct that is new to the audit literature. We define WTDs as the tendency of individuals to naturally engage in the balanced revision of beliefs and doubts about target phenomena by thinking openly and reflectively about evidence. We report prediction-consistent results from two experiments that measure the strength of participants’ WTDs and manipulate whether the underlying evidence is less or more indicative of fraud. The experimental results also document that auditors vary considerably in WTD strength and collectively demonstrate the reproducibility of audit judgment-quality benefits of stronger WTDs. We further validate the WTD construct in auditing using confirmatory bi-factor analyses to show that it has one higher-order general factor along with several subfactors. Overall, our theory and results advance the literature by identifying WTDs as a determinant of auditors’ ability to objectively evaluate evidence. Additionally, our findings have implications for standard setters and audit firms as quality control standards and audit working paper review processes might benefit from revisions that take into account that auditors do not objectively evaluate evidence unless they have stronger WTDs.


I. Introduction
Audit standards prescribe that auditors should objectively evaluate evidence when assessing and addressing the risk of fraud (e.g., AICPA AU-C 200, 240, 330, 500;PCAOB AS 1015, 1101, 1105, 2110, 2401. Failure to objectively evaluate evidence heightens investors' exposure to loss and audit firms' exposure to adverse inspections and litigation (Knechel 2013;Rapoport 2018;Kowsmann et al. 2020;Rothenberg 2020). Despite these adverse consequences, numerous experimental studies suggest that auditors often fail to objectively evaluate evidence (e.g., Nelson and Tan 2005;Nelson 2009;Knechel et al. 2013;Bauer et al. 2020). Concerns about auditor failure to evaluate evidence objectively have triggered numerous calls by regulators and commentators for auditors to heighten their exercise of professional skepticism-that is, do better at critically evaluating evidence with a questioning mind (Nagarajan 2020; Nicodemus 2020). 1 Academics also have called for research that identifies characteristics of individual auditors that facilitate or inhibit objective evaluation of evidence (e.g., Brown et al. 1999;Kachelmeier et al. 2014;Nolder and Kadous 2018).
We introduce wise-thinking dispositions (WTDs), a new construct to the audit judgment literature, and develop theory predicting that auditors objectively evaluate evidence of fraud when they have stronger, as opposed to weaker, WTDs. Thinking dispositions are behavioral capacities and inclinations to engage in particular patterns of cognitive behavior (Perkins et al. 1993), or higher-level cognitive styles (Stanovich 2001). They guide the application of decision-makers' intellect, knowledge, and abilities (Fuller and Kaplan 1994;West and Sá 1999;Sternberg and Grigorenko 1997;Perkins et al. 2000;Sternberg 2001). We define stronger WTDs as dispositions 1 Questions about auditor objectivity often arise when alleged financial statement frauds are exposed (e.g., WorldCom and Wirecard). To the extent that auditors fail to objectively evaluate "red flag" evidence indicative of fraud during the audit, auditor detection of fraud is less likely (Davies 2020;Tokar and Davies 2021). of individuals to naturally engage in wise thinking. Wise thinking entails the balanced revision of beliefs and doubt about target phenomena by thinking openly and reflectively about available evidence (Meacham 1983;Kitchener and Brenner 1990;Ardelt 2003;Staudinger and Glück 2011a). We develop the WTD construct by drawing on literatures that conceptualize thinking dispositions, wisdom, and epistemic rationality (e.g., Meacham 1983;West 1997, 1998;West and Sá 1999;Ardelt 2003Ardelt , 2004Birren and Svensson 2005;West 2007, 2008; Staudinger and Glück 2011a;Glück, et al. 2013). One substantive contribution of this paper is introducing and articulating the WTD construct and demonstrating its amenability to low-cost, valid empirical measurement.
An open question which we address herein is, "To what extent do audit professionals vary in WTD strength?" On the one hand, if auditors with weaker WTDs consistently fail to objectively evaluate evidence, as we predict, audit firms may directly or indirectly prevent such auditors from entry and promotion if they do not possess or develop strong WTDs. On the other hand, research suggests that auditors often are not rewarded solely for objectively evaluating evidence of financial statement fraud (e.g., Peecher et al. 2013;Hobson et al. 2017). Further, auditors seemingly are rewarded for attaining management-preferred goals and for seeing the world through management's eyes (e.g., Kadous et al. 2003;Koch and Salterio 2017;Bhaskar et al. 2019). Thus, our second contribution is providing initial empirical evidence on variation in auditors' WTD strength.
A third contribution of the paper is providing theory-based evidence from two experiments showing that auditors with relatively stronger WTDs objectively evaluate and respond to audit evidence when assessing and addressing fraud risk. Our experiments use different audit contexts for enhanced confidence in the reproducibility of the predicted judgment benefits of stronger

Accepted Article
This article is protected by copyright. All rights reserved.
WTDs (Hail et al. 2020). 2 Importantly, the dependent variables in the two experiments are different in a key respect. Experiment 1 participants assess fraud risk after management makes last-minute reduction to warranty expense that allows earnings to narrowly beat analysts' forecasts. In Experiment 2 participants also assess fraud risk, but in a goodwill impairment setting. Moreover, Experiment 2 participants also specify planned actions to address fraud risk. This latter dependent variable is important because, unless auditors undertake different actions to address heightened fraud risk, the chance of fraud detection does not change.
Another key difference is that while all Experiment 1 conditions include a skepticism prompt, Experiment 2 excludes this manipulation. This design difference allows a fourth contribution: investigation of whether the presence or absence of skepticism prompts attenuates the judgment benefits of stronger WTDs. That is, even if stronger WTDs enable auditors to evaluate evidence objectively when coupled with a skepticism prompt, the question of whether stronger WTDs help unprompted auditors' objectivity remains unaddressed. Thus, reporting the results of a second experiment re-examining whether stronger WTDs enable auditors to objectively evaluate evidence given the absence of any skepticism prompt strengthens our theory-testing (Asay et al. 2021).

Findings in both experiments indicate that the strength of auditors' WTDs varies
considerably and provide support for our prediction. In Experiment 1, only auditors with stronger WTDs assess fraud risk to be significantly higher (lower) when Evidence is more (less) indicative of fraud, and they do so regardless of skepticism Prompt condition. In Experiment 2 only auditors with stronger WTDs are significantly more likely to take actions that more persuasively address the heightened risk of fraud when the underlying Evidence is more indicative rather than less indicative of fraud. Finally, we examine WTDs within a confirmatory bi-factor model as well as use a third experiment to examine the divergence of WTDs from Hurtt Trait Skepticism (HTS) scores. Results confirm that WTDs are comprised of one higher-order general factor as well as three subfactors and that WTDs readily diverges from HTS. We conclude the paper by discussing implications and by identifying numerous opportunities for future research.

Auditors' "on average" failure to objectively respond to evidence
Audit standards (e.g., AU-C 240; PCAOB AS 1105; 2110; 2301) and audit research (e.g., Bell et al. 2005;Peecher, et al. 2007) prescribe that auditors should objectively evaluate evidence when assessing and addressing the risk of material misstatement due to error or fraud. The objective evaluation of evidence necessitates a "balanced assessment of all relevant circumstances" when reaching audit conclusions (e.g., IIA, Code of Ethics 2021), and it is paramount for the proper exercise of professional skepticism (PCAOB AS 1101.08-.11;1105). Nolder and Kadous (2018, 5) observe how audit standards' description of professional skepticism calls for ". . . receptivity, openness, or alertness to new information and an objective and unbiased assessment of the merits of evidence …" Auditors should increase (decrease) the assessed risk of fraud when evidence is objectively more (less) indicative of material misstatement due to fraud. They should also objectively alter their audit plans to address higher or lower fraud risk. However, numerous studies find that, on average, auditors often fail to adapt their plans (e.g., Brown et al. 1999;Waller and Zimbelman 2003;Nelson and Tan 2005;Nelson 2009;Hammersley et al. 2010;Kochetova-

Accepted Article
This article is protected by copyright. All rights reserved. Kozloski et al. 2011;Trotman and Wright 2012;Trompeter et al. 2013;Knechel et al. 2013;Hammersley et al. 2011;Kachelmeier et al. 2014;Griffith et al. 2015). Zimbelman (1997), for example, finds that audit seniors and managers fail to alter assessed misstatement risk based on the presence versus absence of incentives for management to materially misstate assets. Similarly, audit seniors diagnosing the cause of an unexpected increase in income fail to prioritize evidence having greater diagnostic value (Brown et al. 1999). Hoffman and Zimbelman (2009) find that, unless prompted to think strategically, auditors fail to appropriately revise planned audit procedures to address seeded fraud cues. Trotman and Wright (2012) find that auditors fail to respond to external performance indicators that objectively heighten misstatement risk despite seemingly positive internal performance indicators under management's control. In an abstract setting, Kachelmeier et al. (2014) find that participants in an auditor role fail to distinguish between evidence that is more versus less indicative of intentional misstatement. Overall, auditors appear on average to have difficulty objectively evaluating evidence. 3

The cult of the average, thinking dispositions, and wise-thinking dispositions
Social psychologists warn against "the cult of the average" when studying determinants of human behavior. Achor (2010, 9-10), as an example, notes: The typical approach to understanding human behavior has always been to look for average behavior….This misguided approach has created what I call 'the cult of the average' in the behavioral sciences.…If someone asks a question such as 'How fast can a child learn how to read in a classroom? science changes that to 'How fast does the average child learn to read in the classroom?' We ignore the children who read faster or slower and tailor the classroom toward the 'average' child….That's the first mistake psychology makes. 4 With few exceptions (e.g., fluid intelligence, as in Brown et al. 1999), the audit literature has yet to identify auditor-specific constructs that result in "above average" objectivity in evaluating evidence. To address this gap, we introduce the construct of a wise-thinking disposition and explain that this construct is a way to distinguish among auditors in terms of their propensity for objectively evaluating audit evidence. Thinking dispositions are behavioral capacities and inclinations to engage in particular patterns of cognitive behavior (Perkins et al. 1993), or higherlevel cognitive styles (Stanovich 2001). They capture "…variation in peoples' goal management, epistemic values, and epistemic self-regulation-differences in the operation of the reflective mind" (Stanovich et al. 2018, 26). Thinking dispositions also guide the application of decisionmakers' intellect, knowledge, and abilities (West and Sá 1999;Sternberg and Grigorenko 1997;Perkins et al. 2000;Sternberg 2001), which helps calibrate how strongly they believe something to be true in light of available evidence (Stanovich et al. 2018). A useful analogy is to compare individuals' intellect, knowledge, and abilities to the engines of cars while thinking dispositions are the drivers of cars (Stanovich 2009). Stanovich (2009, 35) also notes that "many different studies involving thousands of participants have indicated that measures of intelligence display only moderate to weak correlations (usually less than 0.30) with some thinking dispositions (for example, actively openminded thinking, need for cognition) and near zero correlations with others (such as conscientiousness, curiosity, diligence)." Thinking dispositions also differ from knowledge in that while knowledge improves rapidly with instruction, thinking dispositions "are stable traits that help explain intellectual performance ..." This stability results in them not being easily influenced who don't fit the norm." He goes on to argue that "Often, it is those outside the norm, the exceptional ones, who point to the truth of what is possible." We similarly argue that stronger WTD auditors will demonstrate that auditors can objectively evaluate evidence, despite prior findings that "average" auditors fail to objectively evaluate evidence.
Accepted Article by situational factors (Perkins et al. 2000, 269). Thinking dispositions likely can improve, but only over the longer term, with certain instructional approaches. For example, compared to a traditional knowledge-transmission model of instruction, an enculturation model better reinforces thinking dispositions; one that regularly asks learners to exercise the desired disposition by thinking out loud or by listening to others do so and then reflecting on verbalized thoughts (Tishman et al. 1993;Perkins et al. 2000).
The disposition on which we focus concerns individuals' disposition to engage in wise thinking. Wise thinking is rooted in the construct of wisdom in both sociology (Ardelt 2003;2004; and psychology (Sternberg 2003;Staudinger and Glück 2011a). 5 Until recently, few studies characterized the wisdom construct in ways that enabled empirical measurement (Staudinger et al. 1992;Birren and Svensson 2005). Several recent studies, however, find that reflective thinking, which balances the dialectic between knowing and doubting in light of uncertainty, is the paramount taproot of wisdom (Ardelt 2003;Stanovich and West 2008;Hall 2010;Staudinger and Glück 2011b). Birren and Fisher (1990, 325) also focus on this dialectic by describing wisdom as a "metacognitive style that involves the ability to make sound judgments; knowing that one does not know everything, seeking the truth to the extent that it is knowable." Our working definition of stronger WTDs, therefore, is dispositions by which individuals naturally engage in wise thinking, which is the balanced revision of beliefs and doubts about target phenomena by thinking openly and reflectively about available evidence (Birren and Fisher 1990;Meacham 1983;Kitchener and Brenner 1990;Ardelt 2003;Staudinger and Glück 2011a). Thus, stronger WTDs likely improve auditors' propensity to form different beliefs and take different actions given an 5 We thank sociologist Monika Ardelt for her suggestion to use wise thinking instead of wisdom to describe our dispositional construct. As she notes, "Wisdom entails more than a thinking disposition, even though wise thinking is certainly worthwhile to investigate in itself" (Ardelt, personal e-mail communication, November 30, 2015).

Accepted Article
objective evaluation of different states of evidence (Baron 1993(Baron , 2008Meacham 1990;Tishman et al. 1993;Ardelt 2003;Stanovich 2009). For example, compared to weaker WTD auditors, stronger WTD auditors would be more concerned about an auditee management explanation that a spike in its inventory simply reflects a forthcoming season of favorable holiday sales when the auditee's competitors are writing down similar inventory items (Bell et al. 2005). Formally, we predict: HYPOTHESIS 1. Auditors' fraud risk assessments and planned actions to address fraud risk will reflect a higher (lower) risk of fraud when evidence is more (less) indicative of fraud, but only when they have stronger wise-thinking dispositions.
Implicit in this hypothesis is the assumption that sufficient variation exists in WTD strength across auditors. While prior studies demonstrate that adults vary in wisdom and epistemic rationality, no study of which we are aware examines these two phenomena in professionals charged with a responsibility to be objective. Adults who self-select into the audit profession may do so in part because they have and enjoy applying relatively strong WTDs. It also could be that professions have developed screening processes to weed out individuals with weaker WTDs. This possibility is not easily reconciled, however, with prior experimental findings that, at least on average, auditors fail to objectively respond to audit evidence. Such findings suggest either our hypothesis is true but auditors generally have weaker WTDs, or that our hypothesis is false in that even auditors who have stronger WTDs fail to objectively evaluate evidence. As such, we first conduct an initial analysis in each experiment to investigate the following research question (RQ):

RQ1. To what extent do auditors vary in the strength of their WTDs?
In addition, we investigate whether the presence and the nature of external professional skepticism prompts constitute boundary conditions for our hypothesis. Audit standards implicitly assume that being mindful of the duty to exercise professional skepticism throughout the audit will Accepted Article help ensure the objective, good-faith evaluation of audit evidence (e.g., PCAOB AS 1015.07 -0.09). Skepticism prompts are not designed to be evidence of fraud in and of themselves, but they remind auditors of particular attitudes to take when evaluating evidence (e.g., Peecher 1996; Brown et al. 1999;Turner 2001;Peytcheva 2014;Grenier 2017). 6 Thus, we investigate whether skepticism prompts moderate our hypothesis. We do so by including two skepticism prompts in Experiment 1 and by excluding skepticism prompts in Experiment 2. Experiment 1's skepticism prompts capture elements of the ongoing evolution in audit standards' prescriptions about the exercise of professional skepticism. Until the late 1990's, audit standards had emphasized that professional skepticism requires a neutral attitude towards evidence (e.g., SAS No. 53). 7 Today's audit standards, by contrast, have evolved towards an attitude of presumptive doubt (Bell et al. 2005;Nelson 2009;Cohen et al. 2017). Current standards have removed some of the neutrality language and added language that requires auditors to set aside prior beliefs about management's honesty during fraud brainstorming sessions (e.g., SAS No. 99). 8 To capture this evolution, we employ both a less doubtful prompt (neutral) that reminds auditors to be skeptical if signs emerge 6 These studies provide mixed evidence that skepticism prompts can improve auditor judgment. In brief, compared to auditors prompted to be objective, auditors prompted to be skeptical of management's explanations for unusual account-balance fluctuations do not change how likely they assess these explanations as the cause of the fluctuations, nor do they generate more misstatement explanations for the fluctuation (Peecher 1996). Similarly, as compared to a prompt that expresses concern about how auditors approach management explanations, a prompt to be sufficiently skeptical does not result in auditors searching for more evidence, spending a longer time searching for evidence, or shifting away from a management-directed search strategy (Turner 2001). Of some concern, this type of skepticism prompt results in auditors becoming information prone by ascribing high value to evidence without regard to its diagnosticity (Brown et al. 1999). On perhaps a more positive note, skepticism prompts do improve the cognitive performance of audit students (but not auditors) in Peytcheva (2014), and skepticism prompts increase audit specialists' generation of fraud explanations for unusual account-balance fluctuations as well as increase their assessment of the likelihood of fraud in Grenier (2017). 7 SAS No. 53 ¶16 (AICPA 1988) states, "The auditor neither assumes that management is dishonest nor assumes unquestioned honesty." It further explains, "A presumption of management dishonesty … would be contrary to the accumulated experience of auditors. Moreover, if dishonesty were presumed, the auditor would potentially need to question the genuineness of all records and documents obtained from the client …. An audit conducted on these terms would be unreasonably costly and impractical" (SAS No. 53, ¶17 AICPA 1988). 8 SAS No. 99 ¶13 (AICPA 2006) also notes, "In exercising professional skepticism in gathering and evaluating evidence, the auditor should not be satisfied with less-than-persuasive evidence because of a belief that management is honest." This prescription is tenuous normatively, as how much evidential matter one needs to be sufficiently persuasive itself differs depending on whether the source is trustworthy or untrustworthy.
Accepted Article that management lacks integrity and a more doubtful prompt (doubtful) that reminds them to be skeptical even if no signs emerge that management lacks integrity.
A priori, both the neutral and the doubtful levels of skepticism Prompt could be boundary conditions of our hypothesis that auditors objectively evaluate evidence only when they have stronger WTDs. One, auditors with stronger WTDs could opt to override their natural thinking disposition to objectively evaluate evidence of fraud if prompted to be doubtful. That is, when prompted to be doubtful, auditors with stronger WTDs could assess and address fraud risk in the way we predict that auditors with weaker WTDs naturally do so-without regard to the underlying evidence of fraud. Two, prompting neutrality could motivate auditors with weaker WTDs to become more aware of whether or not red-flag signs of fraud exist. If prompted to be neutral, auditors with weaker WTDs could assess and address fraud risk in the way we predict that auditors with stronger WTDs do so naturally-with objective regard to the underlying evidence of fraud.
Thus, we investigate the following additional RQs: RQ2.1 Does a doubtful prompt impair the propensity for auditors with stronger WTDs to reach judgments and decisions that reflect a higher (lower) risk of fraud when the underlying audit evidence is more (less) indicative of fraud?
RQ2.2 Does a neutral prompt enhance the propensity for auditors with weaker WTDs to reach judgments and decisions that reflect a higher (lower) risk of fraud when the underlying audit evidence is more (less) indicative of fraud?
RQ2.3 Does the propensity for auditors with stronger WTDs to respond objectively to evidence that is more indicative versus less indicative of fraud (Hypothesis 1) hold in both the presence and the absence of skepticism prompts?

Participants and task
Participants are 87 auditors from a Big Four firm with an average audit experience of 44.5 months (standard deviation = 15.5). 9 Auditors at this level commonly evaluate management's nonrecurring journal entries. To verify that our participants had the requisite task experience, we asked them to report the number of engagements in which they had reviewed nonrecurring journal entries. The mean (standard deviation) of their response is 10.1 (19.7), so we conclude that our participants have the requisite task experience. 10 In Part I, participants first read a randomly assigned skepticism prompt (see below) and then they read about a fictitious audit client which manufactures and sells auto parts. They receive summary financial performance, a description of a prior favorable relationship with management, and a mock Forbes magazine interview by the CFO emphasizing integrity and a commitment to high-quality financial reporting. 11 Participants next evaluate a nonrecurring journal entry that reduces warranty expense. Earnings fell short of analysts' forecasts before this entry but beat forecasts afterwards. The CFO explains that lower warranty expense reflects higher product quality and lower future product returns. The Evidence manipulation (see below) either corroborates or undermines management's explanation. Participants then assess the likelihood that management is trying to commit fraud, engage in within-GAAP earnings management, or use high-quality 9 We obtained Institutional Review Board approval for each reported study. 10 Neither months of experience (t1,85 = 0.44, p = 0.663) nor number of prior engagements with nonrecurring entries (t1,73 = 0.39, p = 0.699) differ across auditors who have stronger versus weaker WTDs. We did not collect information on age, gender, or ethnicity. 11 Specifically, the CFO says, "… the whole mentality behind earnings management is unhealthy…." And, after being asked, "So integrity trumps earnings management in your eyes?" responds, "The two terms are essentially contradictory…. Let the short-term periodic earnings fall where they may…. If you were to give me a choice between high-quality earnings that barely miss analysts' forecasts versus low-quality earnings that beat their forecasts, I'd take the high-quality earnings because that's best for the long term." accounting. After Part I, participants place materials in an envelope and begin Part II, providing demographics and completing scales for WTD and Tacit Knowledge. 12

Experimental design
We use a 2×2×2(×3) mixed design. The (×3) within-subjects part of the design entails the assessed likelihoods of Attempted Fraud, Within-GAAP Earnings Management, and High-Quality Attempt.
The 2×2×2 entails between-subjects manipulations of Evidence (more indicative vs. less indicative of fraud) and skepticism Prompt (doubtful vs. neutral) as well as a median-split measure of participants' WTDs (stronger vs weaker).
A standard way to reveal if participants rely on a given factor in reaching judgments or decisions is to manipulate that factor and observe whether or not it influences their responses (e.g., Tversky and Kahneman 1982;Toplak et al. 2011). The approach we use is as follows: In the less indicative Evidence condition, two key performance indicators (KPIs), lower product returns and less rework, corroborate management's explanation for reducing warranty liability (see Exhibit 1, panel A). In the more indicative Evidence condition (Exhibit 1, panel B) two additional KPIs cast doubt on management's higher product quality story: assembly-worker overtime and productionline utilization rates increased dramatically in the 4th quarter relative to the prior year, likely leading to higher worker fatigue and reduced product quality, thereby increasing future product returns.
To implement the skepticism Prompt, we use an FBI double-agent vignette in both levels.
This vignette is unrelated to the client but captures the fact that auditors may face a strategic 12 We included tacit knowledge questions to rule out the possibility that auditors with stronger WTDs also tend to have higher tacit knowledge, in which case tacit knowledge could also be driving our results. However, the correlation (untabulated) between these two measures is not significant (r = +0.011, t = 0.14, p = 0.891). Further, untabulated analyses show that Tacit Knowledge does not influence auditors' risk assessments in isolation or in interaction with other independent variables (lowest p = 0.307).
Accepted Article opponent who tries to commit and conceal fraud (e.g., Bowlin 2011). The doubtful level emphasizes being skeptical even if no signs appear to exist that management lacks integrity (see, e.g., SAS No. 99, para 13). The less doubtful neutral level emphasizes being skeptical if red-flag signs emerge that management lacks integrity (see, e.g., SAS No. 99, paras 31-40). The doubtful vignette describes an FBI double agent who provided no outward cues of treason (see Exhibit 2, panel A). The neutral vignette describes the same double agent but reveals red flags of treasonthat is, a lot of cash inexplicably found in his home (see Exhibit 2, panel B).
To measure auditors' WTDs, we construct a survey with 22 statements from everyday life (see Exhibit 3). The survey adapts statements from Ardelt's (2003) Three-Dimensional Wisdom Scale (i.e., 3D-WS) and from Stanovich and West's (2007) Actively Open-Minded Thinking (AOMT) scale. 13 The three dimensions of 3D-WS are reflective, cognitive, and affective, with the reflective dimension being paramount (Ardelt 2003). As such, 5 of the WTD survey statements adapted from 3D-WS relate to its reflective dimension, and two WTD survey statements are adapted from the cognitive dimension (see Exhibit 3). 14 Our remaining 15 WTD survey statements are reflective and cognitive items from the AOMT. A disposition to use active open-minded thinking is foundational for wise thinking, as prior literature notes (Stanovich 2001;Stanovich and West 2007, 226;Stanovich and West 2008, 130;Stanovich 2009). High AOMT scores, for example, 13 Like our scale, statements used in these two scales originate from other sources, including other thinking disposition scales. Questions from Ardelt's 2003 3D-WS scale originate from Perspective-Taking Scale of the Interpersonal Reactivity (Davis 1980), Need for Cognition (Cacioppo et al. 1996), Dogmatism (Rokeach 1960), Ambiguity Tolerance (MacDonald 1970), Intolerance of Ambiguity (King and Hunt 1975), and Personal Problem-Solving Inventory (Heppner and Peterson 1982). Questions from Stanovich and West's (2007) composite AOMT that we use also come from earlier-developed scales: Cognitive Flexibility (Stanovich and West 1997), Constructive Thinking (Epstein and Meier 1989), Dogmatism (Paulhus and Reid 1991), Willingness to Consider Contradictory Evidence (Stanovich and West 1997), Tolerance for Ambiguity (e.g., Epstein and Meier 1989), Need for Cognition (e.g., Cacioppo et al. 1996), Counterfactual Thinking (e.g., Stanovich and West 1997); and Belief Identification (e.g., Sá et al. 1999). 14 While the affective dimension is relevant for interpersonal contexts involving empathy and compassion (Ardelt 2003(Ardelt , 2004, it is less relevant to reflective, epistemically rational thought needed to evaluate evidence objectively. Future research can investigate whether adding the affective dimension improves measurement of WTDs.

Accepted Article
"indicate openness to belief change and cognitive flexibility, whereas low scores indicate cognitive rigidity and resistance to change" (Stanovich and West 1997, 347). 15 Participants express agreement, neutrality, or disagreement with each statement, using two categories of agreement and disagreement (e.g., strongly agree, agree, neutral, disagree, or strongly disagree). For each statement, the wisest response is strongly agree or strongly disagree.
We score WTDs using the sum of the squared deviations: we record a deviation of 0, 1, 2, 3, or 4 and then square these deviations-that is, 0, 1, 4, 9, or 16, going from most-wise to least-wise responses. We then sum the squared deviations across the 22 statements in Exhibit 3 with lower (higher) values indicative of stronger (weaker) WTD.
Using the sum of squared deviations is a common way to score forms of practical intelligence. For example, tacit knowledge scales in psychology and accounting studies use this approach (e.g., Wagner 1987; Tan and Libby 1997;Bol et al. 2018). Using the sum of squared deviations helpfully increases the systematic variation in a measured construct relative to using the sum of linear deviations, but also is more appropriate than using the sum of linear deviations for such scales, for the following reasons: (i) It exacts less of a relative penalty for responses that directionally agree with the best response; (ii) It better captures the intuition that responses that run in the opposite direction of the best response are qualitatively worse than being neutral about what is the best response; (iii) It more heavily penalizes strong dissent from the best response; and (iv) It more heavily penalizes responses that contain more random variation across the 22 items. 16

Dependent measures
We aimed to capture the degree of participants' concern about potential fraud in light of the fact that earnings distortion activities fall along a "continuum that ranges from complete legitimacy at one extreme to fraud at the other" (POB 2000, 78). As shown in Exhibit 4, we asked participants to assess the likelihood that management is attempting to engage in each of the following: • Fraud or Near Fraud -An intentional attempt to distort business reality that clearly violates GAAP or that really "pushes the envelope" of GAAP. • Earnings Management -An intentional attempt to "window dress" business reality that is acceptable under GAAP. • High-Quality Attempt -A good-faith attempt by management to use accounting that fairly and neutrally reflects business reality. This attempt could be successful, resulting in High Quality Accounting, or unsuccessful, resulting in Erroneous Accounting.
We include the near fraud category to capture instances in which auditors are concerned about potential fraud but reluctant to believe management is committing out-and-out fraud (e.g., Hobson et al. 2017). A factor analysis (untabulated) on these five categories returns three factors, but with the first factor being the strongest. The first factor explains 95% of the common variance, equally weights Fraud (loading = 0.78) and Near Fraud (loading = 0.74), and has an eigenvalue of 1.36. Eigenvalues for the two other factors are both below 1.0, with the second factor equally weighting High-Quality Accounting (loading = 0.54) and Erroneous Accounting (loading = 0.57).
Earnings Management is the third factor (loading = 0.25). Given this factor structure, we use two dependent measures to test our hypothesis.

Accepted Article
This article is protected by copyright. All rights reserved.

Check on prompt manipulation
Right after reading the Prompt manipulation, participants assess whether the main point of the FBI double-agent narrative is that auditors should be skeptical when cues emerge that signal management lacks integrity (boldface in original). On a 7-point scale (1=completely disagree; 4 no opinion; 7=completely agree), participants' evaluations fall below the midpoint in the doubtful condition but above the midpoint in the neutral condition (3.03 < 5.80; t75 = -6.23, p < 0.001, onetailed), consistent with a successful Prompt manipulation. 18

Research Question 1: Does the strength of auditors' WTDs vary?
In untabulated analyses, we find considerable variation in the strength of auditors

Hypothesis 1 Test
Operationally, Hypothesis 1 predicts an Evidence effect, but only for stronger WTD auditors which would manifest as a two-way Evidence×WTD interaction. That is, we predict relatively higher likelihood assessments of fraud given Evidence that is more indicative as compared to less indicative of fraud, but only for stronger WTD auditors. Using the language of Guggenmos et al. (2018, 232), the hypothesis predicts a "Pac-Man" shaped interaction. 21 Table 1 (1), (2), and (3), respectively. Column (4)

Investigation of RQ2
RQ2 has two parts addressed by Experiment 1, with RQ2.1 focusing on stronger WTD participants and RQ2.2 focusing on weaker WTD participants. RQ2.1 asks: "Does a doubtful 22 Our inferences are robust to using a continuous rather than median-split specification based on a measure of WTD strength that uses the sum of squared deviations. They also are robust to both a continuous and median-split specification based on a measure of WTD strength that uses the sum of linear deviations.

Accepted Article
prompt impair the propensity for auditors with relatively stronger WTDs to make judgments and decisions that reflect a higher (lower) risk of fraud when the underlying audit evidence is more (less) indicative of fraud?" Table 3 reports a two-way ANOVA with Evidence, Prompt and their interaction as independent factors on stronger WTD auditors' risk assessments. Panel A of Table   3 examines the assessed likelihood of fraud (Attempted Fraud), and panel B of Table 3

Accepted Article
Next, we investigate RQ2.2 which asks, "Does a neutral prompt enhance the propensity for auditors with relatively weaker WTDs to reach judgments and decisions that reflect a higher (lower) risk of fraud when the underlying audit evidence is more (less) indicative of fraud? Further, using the Tukey HSD test, no Evidence effect is significant (lowest p=0.286).
What we observe instead is that the skepticism Prompt affects how highly auditors with weaker WTDs assessed fraud risk, as they assess it higher when reminded to be doubtful than when reminded to be neutral. Notably, the Prompt main effect is significant in each of the two-way ANOVAs without any Evidence×Prompt interaction. The exploratory simple effects analysis, however, reveals the only simple effect of Prompt that attains significance is when weaker WTD auditors encountered less indicative Evidence. Overall, these results provide only limited evidence that our neutral skepticism prompt can be helpful to auditors with weaker WTDs.

Experiment 2
In Experiment 1, we find that, as hypothesized, auditors evaluate Evidence objectively, but only when they have stronger WTDs. However, in all of the Experiment 1 conditions auditors were prompted to exercise professional skepticism. Thus, an alternative interpretation of Experiment 1 findings is that the objective evaluation of evidence requires a combination of stronger WTDs and a skepticism prompt, rather than stronger WTDs only. Experiment 2 thus tests our hypothesis without use of skepticism prompts. We also designed the new experiment such that its dependent measures go beyond the assessment of fraud risk (i.e., a judgment) to include action plans to address fraud risk (i.e., a choice). We wanted to do so because, unless auditors undertake different actions to address fraud risk, the chance of fraud detection is unlikely to change.
In Experiment 2 the participants have less audit experience and there are new case materials and new dependent variables which span both the assessment and plans to address fraud risk, all of which subject our hypothesis to a reproducibility test. Graduate students with an average Accepted Article (standard deviation) age of 26.9 (8.2) from a large state university participated. 25 These participants had completed an audit internship and/or were enrolled in a masters-level auditing course. We adapt experimental materials from two prior studies (Griffith et al. 2015; Kadous and Zhou 2019). Participants assess and address the risk that management is using unreasonable (overly aggressive) revenue projections to support their assertion that goodwill is properly valued (not impaired). Participants view the following audit evidence: summary of discussions with management regarding the revenue projections, historical revenue projections, peer company revenue projections, a sensitivity analysis, and market and industry information. 26 Next, participants assess the reasonableness of management's revenue growth projections and choose an action to address concerns about its reasonableness. Last, participants complete a post-test, including WTD items.
We use a 2×2 between-subjects design and manipulate Evidence at more indicative vs. less indicative of overly aggressive revenue projections. In the more indicative condition, for example, management's projections are more favorable than those of peer firms, while in the less indicative condition, they are in line with those of peer firms (see Exhibit 5 for other details). The second factor is a measured variable-that is, a median split of participants' WTDs (stronger vs. weaker).
For dependent variables, participants assess the reasonableness of management's revenue projections and address misstatement risk arising from management's revenue projections by choosing a Next Action. Our focus is whether the participants are sufficiently unsure about the reasonableness of revenue projections to warrant a discussion of their concerns with their senior 25 We collected Experiment 2 data in 2019, and Experiment 1 data in 2004. The latter are still relevant today as theory provides no reason to expect the date of data collection to alter the WTD x Evidence interaction, and audit studies during the intervening period continue to demonstrate that, on average, auditors struggle to evaluate evidence objectively (e.g., Bauer et al. 2020;Griffith et al. 2015;Glover et al. 2017;Joe et al. 2017). 26 The auditee (an electronics manufacturer) uses a discounted cash flow model to estimate the fair value of its United States reporting unit. Participants are informed that other members of the audit team have evaluated and determined to be reasonable the operating expense projections, capital expenditure projections, and the model's discount rate.

Accepted Article
associate. Following prior research (Griffith et al. 2015; Kadous and Zhou 2019), we code chosen actions as more appropriate or less appropriate. Figure 2 lists the five alternative Next Actions from which participants chose. Note that these listed Next Actions provide additional insight on how highly participants assess the risk of intentional misstatement and also their choice from alternative actions to address this risk. Two of these responses are more appropriate for more indicative Evidence (i.e., the third or fourth option), and two others are more appropriate responses for less indicative Evidence (i.e., the first or second option). 27 We infer more objective evaluation of evidence in assessing and addressing fraud risk if participants choose more appropriate responses. Hypothesis 1 predicts that auditors will choose objectively more appropriate actions based on Evidence, but only if they have stronger WTDs. We perform no manipulation check on Evidence as the findings below demonstrate a successful manipulation (Sigall and Mills 1998, 221).

RQ1 variation in the strength of auditors' WTD
We first investigate RQ1, again observing considerable variation in the strength of auditors' WTDs, ranging from 32 to 115 (untabulated mean 66.30, standard deviation = 19.33). Of the 71 participants, 39 (54.9%) have prior audit experience, and of these 39, there is an average (standard deviation) of 0.9 (2.1) years of audit experience. Audit experience is not correlated with participants' WTD score (untabulated r= +0.023, p=0.849). 28 Further, the average WTD strength 27 The fifth option involves concluding that the revenue projections are unreasonable without indicating that a follow up discussion with the senior associate is necessary. Consistent with our expectation, few participants (two) chose this option. We code this as a more (less) appropriate action for more (less) indicative Evidence given that respondents who choose this option express concern about the reasonableness of the revenue projections. However, untabulated analysis shows that statistical inferences would not change were we to instead code this option as less appropriate for more indicative Evidence. 28 We asked participants to report age and GPAs. Neither age nor GPA affect our inferences when included in our hypothesis tests. Untabulated analysis shows age is weakly positively correlated with WTD strength (Pearson r = +0.25, Bartlett Chi-Square(1) = 4.08, p = 0.043), but GPA is not significantly correlated with WTD strength (Pearson r = -0.13, Bartlett Chi-Square(1) = 1.18, p = 0.277). We did not ask participants to identify their gender or ethnicity.  Projection Reasonableness Assessments. Participants responded to the following question: "Based on your evaluation, how likely is it that the five-year revenue projections are reasonable?" on an 11-point scale (0 -Not likely at all, 5 -As likely as not, 10 -Extremely likely). We observe no significant effects in the two-way ANOVA (lowest p=0.272). Instead, most participants express high uncertainty about the reasonableness of these projections. When asked "… how likely is it that the five-year revenue projections are reasonable?" the average response falls between 5 and 6. In fact, over half (80%) of the participants respond within 1 point (2 points) of the midpoint 5 "As Likely as not" on the 11-point scale.

Test of Hypothesis 1's reproducibility
The more critical dependent variable asks participants how they would address concerns about the reasonableness of management's revenue projections. Figure 2 and Table 5 illustrate and provide results indicating a significant interaction between WTD strength and Evidence (Z = 2.07,

Accepted Article
This article is protected by copyright. All rights reserved. p = 0.019, one-tailed). 29 Panel B of Table 5 also presents the planned contrast using weights of 0 less indicative Evidence, weaker WTD; 0 more indicative Evidence, weaker WTD; -1 less indicative Evidence, stronger WTD; and +1 more indicative Evidence, stronger WTD. As in Experiment 1, these contrast weights fit the predicted pattern: auditors respond to Evidence in their choice of additional procedures but only when they have a stronger WTD. The visual fit predominantly supports Hypothesis 1, as does the significant contrast test (Z = 1.84, p = 0.033, one-tailed), the insignificant semi-omnibus test (untabulated χ 2 2 = 0.54, p = 0.763), and relatively small contrast variance residual (q 2 ) of 26.9% (Guggenmos et al. 2018).
Importantly, simple effects at the bottom of panel B of Table 5 show that stronger WTD auditors follow up with their senior associate more often when Evidence suggests that management's revenue projections are overly aggressive (p = 0.020, one-tailed), but that weaker WTD auditors do not do so (p = 0.331, two-tailed). 30 From an audit effectiveness perspective, it is notable that the simple effect of stronger vs. weaker WTD is significant given Evidence that is more indicative of fraud (p = 0.020, one-tailed) Collectively, these findings demonstrate the reproducibility of our hypothesis, provide evidence that it holds in the absence of skepticism prompts (RQ2.3), and strengthen the validity of the WTD construct. 31 Having shown that auditors 29 Untabulated analysis shows that the interaction remains significant when using a continuous measure of the strength of participants' WTDs (p=0.009, one-tailed). Also, the same inferences hold if we include participants' assessed reasonableness of revenue projections as a covariate. 30 If anything, weaker WTD auditors follow up with their senior associate less often facing Evidence suggesting that management's revenue projections are overly aggressive, and this dysfunctionality increases the size of the contrast variance residual, q 2 . However, this simple effect is not statistically significant, which does match our planned contrast weights (0 less indicative Evidence, weaker WTD; 0 more indicative Evidence, weaker WTD). Clearly, auditors respond less appropriately to Evidence when they have weaker WTDs versus stronger WTDs. 31 Untabulated analysis shows our inferences are robust to using a continuous rather than median-split specification based on a measure of WTD strength that uses the sum of squared deviations. They also are robust to both a continuous and median-split specification based on a measure of WTD strength that uses the sum of linear deviations.

Accepted Article
This article is protected by copyright. All rights reserved.
(fail to) objectively evaluate evidence if they have stronger (weaker) WTDs, we next probe whether WTD coheres as a multifaceted construct and diverges from Hurtt Trait Skepticism (HTS).

Further analyses to assess WTD construct and divergent validity
To further validate the multifaceted nature of our WTD construct, we perform a confirmatory bifactor analysis. A bi-factor model is a hierarchical factor model that determines whether a construct includes a higher-order general factor that captures a commonality threading among two or more lower-order subfactors. In selecting each of our 22 scale items, we anticipated that they would entail a higher-order general wise thinking factor in addition to a small number of subfactors (see, e.g., Chen et al. 2012 Quadackers et al. (2014) find that regardless of their HTS score, auditors assess fraud risk to be higher in response to a stark manipulation by which management either: aggressively manages earnings, frequently gets into disputes with external auditors, and emphasizes productivity; or strives to report accurately, works harmoniously with external auditors, and emphasizes integrity. Auditors with higher versus lower HTS scores 32 We do not rely on Cronbach's alpha as a metric for reliability for WTD. Rodriguez et al. (2016, 140) emphasize that, "…When the data are multidimensional, as in a bifactor model, alpha … loses its appropriateness as an indicator of how well a total or subscale score reflects a single latent variable." For completeness, our Cronbach's alpha is 0.90, well above standard cutoffs. A study of wisdom measures, including the Ardelt scale, reports Cronbach alphas between 0.47 and 0.88 (Glück et al. 2013).
Accepted Article also assess fraud risk to be higher regardless of which management type they face. In other studies, higher HTS scores reportedly have no significant effect on auditors' evaluation of evidence (see, e.g., Rasso 2015, 47-49). In short, we know of no study that reports findings consistent with higher HTS scores enabling auditors to more objectively evaluate evidence. Thus, the observed effects of higher HTS on auditors' evidence evaluation diverges from that which we observe in two separate experiments herein. We find that auditors objectively evaluate Evidence that is less versus more indicative of potential fraud when assessing fraud risk and planning follow-up procedures to address fraud risk, but only when they have stronger WTDs. As such, higher HTS scores and stronger WTD scores would appear to be capturing decidedly different constructs.
Nonetheless, to further investigate the possibility that HTS and WTD capture essentially identical constructs we perform further untabulated analyses using undergraduate senior participants from a large state university from the aforementioned Experiment 3. They completed our 22-item WTD scale and the 30-item HTS scale in a post-test. Four participants failed to fill out all measures, leaving 136 observations. To begin, we simply regress participants' HTS scores on their WTD scores. Unsurprisingly, HTS scores explain some of the variation in WTD (R 2 =21%, p < 0.001), but the larger point is that they fail to explain most of the variation in WTD (i.e., 79%, 1 -R 2 ). 33 We next conduct a factor analysis on 52 items, 22 from the WTD scale and 30 from the HTS scale. We use promax rotation, which allows extracted factors to be correlated. Following 33 For this regression, we use the sum of linear deviations for both WTD and HTS to increase the chance of finding a stronger linear association. Hurtt (2010) explains that six factors underlie HTS: Search for Knowledge, Suspension of Judgment, Interpersonal Understanding, Self-Determining, Self-Confidence, and Questioning Mind. Even larger percentages of the variation in WTD scores are unexplained by each of the six factors that comprise HTS, as follows: Search for Knowledge 80%, Suspension of Judgment 89%, Interpersonal Understanding 90%, Self-Confidence 92%, Self-Determining 94%, and Questioning Mind 96%.

Accepted Article
convention (Boateng et al. 2018;Nunnally 1978), we extract 10 factors as they have loadings of an absolute value of at least 0.40 (see online Appendix). Of these 10, 6 factors load exclusively on HTS items. 34 Further, the remaining four factors load exclusively on WTD items. Finally, only 1 of the 24 inter-factor correlations between the 6 HTS factors and the 4 WTD factors is even moderately strongly correlated (i.e., r ≥ +0.40, at +0.44). Of the 23 remaining inter-factor correlations, nine are weakly correlated (i.e., +0.20 ≤ r ≤ +0.39) and 14 are very weakly correlated (+0 ≤ r ≤ +0.19). We conclude that HTS and WTD readily diverge from one another empirically. 35

Concluding comments
Audit theory and standards prescribe that auditors should evaluate evidence objectively and describe objective evaluation as indispensable for the proper exercise professional skepticism.
However, both anecdotal inspection findings and empirical findings from numerous prior experimental studies indicate that auditors, at least on average, fail to objectively evaluate evidence. This paper contributes to the literature by developing the hypothesis that auditors objectively evaluate evidence, provided that they have relatively strong WTDs. We find support for this hypothesis across two experiments that differ with regard to inclusion or exclusion of a prompt to exercise professional skepticism. The experiments also use different audit contexts that feature potential fraud as well as different dependent variables and different participants. In both experiments, auditors objectively evaluate whether evidence is more or less indicative of fraud only when they have stronger WTDs, demonstrating the reproducibility of the audit judgmentquality benefits of stronger WTDs. This finding has potential implications for standard setters and 34 The 6-factor structure of HTS discussed in the prior footnote largely replicates in our data. Factor one includes all 5 Self-Confidence items, factor two includes 5 of the 6 Search for Knowledge items, factor three includes all 5 Interpersonal Understanding items, factor four includes all 5 Self-Determining items and 2 Questioning Mind items, factor five includes all 5 Suspension of Judgment items, and factor ten is one Interpersonal Understanding item. 35 In Experiment 3, we use Experiment 1 case materials and manipulate skepticism Prompt, but hold Evidence constant. Since Evidence is not manipulated, this experiment is not designed to provide data useful for testing our hypothesis.
Accepted Article audit firms. In particular, quality control standards and audit working paper review processes might benefit from a revision that takes into account our theory-consistent findings that auditors do not objectively evaluate evidence unless they have stronger WTDs.
A second contribution is identification of WTDs as a new construct, defining it as a thinking disposition by which individuals naturally engage in balanced revision of prior beliefs and doubts about target phenomena by thinking openly and reflectively about available evidence (Meacham 1983;Kitchener and Brenner 1990;Ardelt 2003;Staudinger and Glück 2011a that strengthen auditors' WTDs as well as endeavor to identify better measures of WTD strength.
As an example, one could use a multi-method approach by pairing our WTD survey with participants' reactions to scenarios depicting complicated life problems and solicit participants' goals, beliefs, doubts, and possible courses of action. Trained judges could then code participants' responses (Staudinger and Glück 2011b). While our studies conveniently provided auditors with evidence of fraud, future investigations could examine whether stronger WTDs improve auditors' evidence search or mitigate over-aversion or over-eagerness to use AI-based audit evidence.
Finally, future research could examine whether the failure of auditors with weaker WTDs to objectively evaluate evidence can be overcome by interventions that induce deliberative mindsets such as situational prompts or advisor roles (see Griffith et al. 2015or Bauer et al. 2020.

Accepted Article
This article is protected by copyright. All rights reserved.

Panel A: Less Indicative of Fraud
Audit team member's rationale: Why change this reserve? AUP has increased this reserve steadily in the past four years. In response to my inquiry, management explains that the change reflects the fact that AUP's production quality significantly surpassed its expectations this year. Management says re-work and returns have both been lower this year than both last year's results and this year's expectations. When I asked management whether other indicators would tell another story, management said, "No." While reviewing KPIs in two of AUP's core business processes (production and customer fulfillment), I did find some corroborative evidence. Specifically, some KPIs support management's statement in that this year's returns and re-work were both lower (by about 30% compared to last year).
Still, I remain concerned that AUP's real motivation for this entry may be to keep their earnings from falling short of analysts' forecasts and that changing the reserve is simply a way to "generate" earnings.

Panel B: More Indicative of Fraud
Audit team member's rationale: Why change this reserve? AUP has increased this reserve steadily in the past four years. In response to my inquiry, management explains that the change reflects the fact that AUP's production quality significantly surpassed its expectations this year. Management says re-work and returns have both been lower this year than both last year's results and this year's expectations. When I asked management whether other indicators would tell another story, management said, "No." However, while reviewing KPIs in two of AUP's core business processes (production and customer fulfillment), I found mixed evidence. While some KPIs support management's statement in that this year's returns and re-work are both lower (by about 30% compared to last year), other KPIs indicate that both assembly-worker overtime and production-line utilization rates increased this year. The increase in worker overtime was very high in the 4 th quarter (187% of last year's 4 th quarter overtime).
My concern is that management's story (inadvertently?) doesn't tell the whole story. Instead, it omits any mention of increased overtime and utilization rates. The 20X3 decrease in returns could well be quite transitory. And, 20X4 returns could well increase to historic levels once enough time passes for the effects of worker fatigue (due to 4 th Qtr. 20X3 overtime) to show up in reduced product quality (in the form of 20X4 returns). I am concerned that AUP's real motivation for this entry may be to prevent their earnings from falling short of analysts' forecasts and that changing the reserve is simply a way to "generate" earnings.

Accepted Article
This article is protected by copyright. All rights reserved.

Panel A: More Doubtful -Be Skeptical Even if No Signs Emerge that Management Lacks Integrity
Auditors should always be skeptical, and not just when they have had negative experiences with client managment and not just when cues emerge that signal a lack of integrity.
The infamous case of FBI counterintelligence traitor Robert Hanssen provides a non-audit analogy that highlights how being insufficiently skeptical in the absence of cues that signal a lack of integrity increases one's exposure to devastating negative consequences. Hanssen's "home-fordinner family-man" persona and other endearing qualities made him appear to be trustworthy; hence, he was a perfect spy. Over time, US intelligence officials became increasingly trusting of him and gave him access to secret and top-secret information. Hanssen exploited their misplaced trust by spying for the Russian government for 20 years. His espionage resulted in the transfer of a "store of national security secrets" and to the execution of three intelligence sources.
The Hanssen analogy clarifies why auditors always should be skeptical, even when no cues emerge that potentially signal that client-management lacks integrity.

Panel B: Less Doubtful (Neutral) -Be Skeptical if Signs Emerge that Management Lacks Integrity
Auditors should be skeptical when cues emerge that signal a lack of integrity, because it significantly constrains the effectiveness of an entity's controls.
The infamous case of FBI counterintelligence traitor Robert Hanssen provides a non-audit analogy that highlights how being insufficiently attentive to cues signalling a lack of integrity increases one's exposure to devastating negative consequences. Hanssen's 20 years of espionage resulted in the transfer of a "store of national security secrets" and to the execution of three intelligence sources. Yet, the FBI did not follow up on cues signaling that he was a Russian spy. As just one example, in 1990, eleven years before his arrest, Hanssen's family inexplicably found thousands of dollars in his home. Hanssen's brother-in-law, himself an FBI agent, became suspicious and told bureau members about the cash and that he suspected Hanssen was a Russian spy. Amazingly, the FBI failed to investigate.
The Hanssen analogy clarifies why auditors should be skeptical whenever cues emerge that potentially signal that client-management lacks integrity.

EXHIBIT 4 Risk assessment dependent measures (Experiments 1 and 3)
Following are five categories that one could use to describe AUP's preferred accounting treatment. Based on what you have learned from the case, interview, and audit team member's rationale, please judge how likely it is that AUP's preferred accounting treatment is a member of each category. (Place a vertical line at the appropriate points on each of the scales.) Extremely As Likely Extremely Unlikely As Notes: Excerpted in this exhibit are case materials participants used to record likelihood assessments regarding management's intentions for the nonrecurring warranty-reducing journal entry. We combined "Fraudulent Financial Reporting" and "Near Fraud" into one overall measure. We further combined "High-Quality Accounting" and "Erroneous Accounting" into one measure. Combining these items simplifies without qualitatively affecting the reported results.

Accepted Article
This article is protected by copyright. All rights reserved.

Accepted Article
they selected an appropriate action, and zero otherwise. The proportion of appropriate follow-up Next Actions are reported in Table 5. The two appropriate responses when Evidence is more indicative of possible fraud are the third and fourth options. The boldfaced items shown above were also in the instrument.   (1) and (4) are dependent variables for Hypothesis 1. Statistical significance of Evidence-effect differences greater than zero is indicated where *, **, and *** represent levels of 0.05, 0.01., and 0.001 respectively, using one-tailed tests.   Notes: *All p-values are two-tailed equivalents except for the boldfaced simple effects of Evidence given stronger WTD, which use a one-tailed equivalent as they pertain to our hypothesis. For variable definitions, see Table 1. a Fisher's Least-Significant Difference reflects t-tests for all pairwise comparisons, with no adjustments to the observed significance levels for multiple comparisons. b Tukey Honestly-Significant Difference test uses the largest value of the difference between all pairwise comparisons and the Studentized Range Distribution to control for experiment-wise alpha = 0.05. It is a conservative test of the significance of comparisons, which is useful for exploratory analyses.

Accepted Article
This article is protected by copyright. All rights reserved.

Accepted Article
This article is protected by copyright. All rights reserved. Contrast and Simple Effects for Next Action Z p-value Planned Contrast: Auditors will follow up with their senior associate more often when Evidence is more vs. less indicative that revenue projections are overly aggressive, but only when they have stronger WTDs. (Contrast weights: 0 more indicative Evidence, weaker WTD; +1 more indicative Evidence, stronger WTD ; 0 less indicative Evidence, weaker WTD; and -1 less indicative Evidence, stronger WTD) 1.84 0.033*

Simple Effects
• More vs. less indicative Evidence given auditors with stronger WTD 2.06 0.020* • More vs. less indicative Evidence given auditors with weaker WTD -0.97 0.331* • Stronger vs. weaker WTD given more indicative Evidence 2.06 0.020* • Stronger vs. weaker WTD given less indicative Evidence -0.97 0.331* Notes: All p-values are two-tailed except for Hypothesis 1 contrast and indicated (*) simple effects for Next Action, which are one-tailed. Revenue Projection Reasonableness Assessments: auditors' assessment of the likelihood that the five-year revenue projections used in management's goodwill impairment assessment analysis are reasonable. Specifically, participants were asked, "Based on your evaluation, how likely is it that the five-year revenue projections are reasonable?" on an 11-point scale. The scale had the following labels: 0 "Not at all Likely", 5 "As Likely as not" and 10 "Extremely likely" with contrast weights of [0, +1, 0, -1]. Next Action: auditors' choice of Accepted Article whether or not to initiate discussion with their senior associate regarding any concerns about the reasonableness of the revenue projections prepared by management. See Figure 2 for the alternative responses. The contrast weights of [0, +1, 0, -1] correspond to [A, B, C, D] in Figure  2. See Table 1 for definitions of independent factors.