The influence of viewing time on visual diagnostic accuracy: Less is more

Understanding the factors that contribute to diagnostic errors is critical if we are to correct or prevent them. Some scholars influenced by the default interventionist dual‐process theory of cognition (dual‐process theory) emphasise a narrow focus on individual clinician's faulty reasoning as a significant contributor. In this paper, we examine the validity of claims that dual process theory is a key to error reduction.


| BACKGROUND
[3] Understanding the factors that contribute to diagnostic errors is critical if we are to correct or prevent them.Some scholars influenced by the default interventionist dual-process theory of cognition (dualprocess theory) emphasise a narrow focus on individual clinician's faulty reasoning as a significant contributor.Furthermore, this narrow focus has inspired a popular conceptualization that clinicians can learn to assess their cognitive processes, and consequently correct their reasoning independently, placing the burden for patient safety on individual clinicians. 4Continuing to focus on individual clinicians as both the source of the problem and the solution may be distracting efforts from addressing more complex systems factors.In this paper, we examine the validity of claims that dual process theory is a key to error reduction.

| INTRODUCTION
6][7] This can be a misleading description as System 1 operates without conscious attention or intention.10] Therefore, in this paper, when we refer to System 1, we mean a greater reliance on System 1, and very little effort in System 2. When we refer to System 2, however, we mean both are operating. 8,9,11 experimental studies of clinical reasoning, physicians are given a variety of written or verbal instructions that amount to Use System 1 or Use System 2, when diagnosing written patient vignettes.Instructions such as 'go fast, go with your instinct' encourage reliance on System 1, and 'go slow, think carefully about the rules' encourage waiting for input from System 2. 12,13 Response times, faster or slower, are taken as evidence that the manipulation induced different processing styles.Physicians with more experience are typically faster yet more accurate and physicians with less experience tend to be less accurate while taking more time to respond. 12,14,15[7] Scholars who view System 1 as error-prone mainly describe ways in which cognitive bias facilitates investment in a wrong System 1 solution. 3,17,18We have previously argued this caution to be unwarranted, albeit from a theoretical stance. 14,19First, cognitive bias is not exclusively a non-analytic mechanism.Given that cognitive bias is a tendency to commit to the first idea and interrupt further analysis, it is possible for this mechanism to operate in System 2 as well.For example, engaging System 2, but committing to the first solution.Second, evidence from cognitive psychology and neuroscience confers several important benefits to non-analytic processes.System 1 is theorised to operate subconsciously, allowing many processes to work in parallel.
In this way, attention may be guided automatically to important information while ignoring irrelevant input. 20Additionally, non-analytic processes are considered less susceptible to environmental and physiological stressors such as stimulants, fatigue and cognitive load. 8nally, frequently practised analytic strategies slowly transition to operating non-analytically.When clinicians have more experience, they will have acquired many associations and patterns, so System 1 reasoning should be sufficient and effective in guiding the next steps.When clinicians have less experience, System 2 is likely to be more effective.
Unfortunately, translation of this work is lacking in research on diagnostic error and the influence of experience and processing on diagnostic error can be clarified further.Our goal in this paper was to describe the relationship between experience and accuracy as a function of viewing time, from which we may infer processing style.
As the transition from Systems 1 to 2 is almost instantaneous, the systems cannot be dissociated using instructional manipulations.
Drawing from psychology and neuroscience, we can translate methods for dissociating cognitive processes in perceptual identification tasks. 8,9,11,21,22One paradigm used in these disciplines, asks participants to categorise stimuli using only two possible responses; a forced choice task. 8,9,11,21,22While processing type is still inferred from response times, these times are less than 1 s and electroencephalograph measures indicate two processes: an almost instantaneous process that emerges with the presentation of the image, and a slower process that joins in after about 500 ms. 8,9,11,21,22Based on prior work, we may infer that System 1 operates alone before 500 ms, and both Systems operate together after 500 ms. 8,9,21By restricting visual processing to time windows less than or more than 500 ms, our study aims to dissociate the influence of non-analytic and analytic processing on diagnosis.4][25][26][27] Both studies were approved by the Hamilton Integrated Research Ethics Board (HiREB #1089).Upon completion of the study, participants received a small honorarium for their participation.

| Materials
We used LiveCode computer software (https://livecode.org/) to create a self-directed online study platform.
We used two kinds of diagnostic images-chest X-rays (CXRs) and electrocardiograms (ECGs).For both CXRs and ECGs, a set of 100 images was selected from anonymized patient charts by coinvestigators.Each image was confirmed to be normal or feature a clinically relevant, unambiguous single diagnosis; 50 images per set were abnormal with a single most likely diagnosis, and 50 were normal.All diagnoses were confirmed by two clinician co-investigators (MS and JS).Diagnoses selected for the ECGs were left ventricular hypertrophy, pericarditis, left bundle branch block, inferior STelevated myocardial infarction, ventricular pacing and anterior STelevated myocardial infarction.Diagnoses selected for the CXRs were congestive heart failure, lung cancer, pneumonia and pneumothorax.

| Procedure
In one arm of the study, participants viewed ECGs, and in a second arm, they viewed CXRs.Participants were randomised to start with CXRs or ECGs.Each image was presented for one of four durations: 175, 250, 500 and 1000 ms.As mentioned previously, these viewing times offered a theoretical dissociation between non-analytic processing (175 and 250 ms) and analytic processing (500 and 1000 ms).We conducted systems and psychophysics testing with undergraduate participants to confirm that images (e.g.animals and landscapes) could be detected and identified at these viewing times.Another quarter of the participants would see Set B for 175 ms, then C for 250 ms, D for 500 ms and A for 1000 ms, and so on.However, the sequence of images in the sets was randomised so that images were presented in random order of viewing times.
Each volunteer participant was sent a link to the study, which they could open on their own computer or device when convenient.
Participants were informed that they would view a total of 100 CXR followed by 100 ECG images (or the reverse).Total study time did not exceed 10 min.Instructions were identical for both the CXR and ECG study arms.They were told that some of the images would be normal and did not represent any pathology that required intervention, further investigations or follow-up.Other images would show clinically important pathology.Each participant was instructed to view an image and once the image disappeared, indicate whether the image was normal or abnormal by clicking on one of two radio buttons: normal or abnormal.After each response, the participants saw a blank screen and had to press the mouse to advance to the next image.Participants were informed that each image would be presented for a fraction of a second, so it was necessary to maintain attention to the screen.Participants could take breaks and pause between images/ responses as they were in control of initiating each trial but could not predict what viewing time they would get with each image.After reviewing the instructions, participants completed a practice set before proceeding with the study.

| Outcomes
We were interested in how the variables of viewing time and experience impacted performance, and if this varied for normal and abnormal images.We considered examining false positives (FPs), false negatives (FN), true positives (TPs) and true negatives (TNs).As TP and FN are not independent measures, and FP and TN are not independent measures, we selected TP and FP as our primary outcomes.
F I G U R E 1 Study 1 trial sequence.A single trial included a blank screen, an image, a screen with two response options and a blank screen.Starting with a blank screen, the participant hit a key to advance to the next screen.An image was presented at one of four time windows-175, 250, 500 and 1000 ms-but participants did not know which time window.The next screen presented two options-normal and abnormal.Participants selected one with their mouse.The next screen was blank and the participant pressed their mouse to advance.
Participants received one score for categorising images as normal and abnormal; correct answers were scored 1 and incorrect scored 0. TP was calculated as the proportion of correctly identified abnormal images over the total number of abnormal images presented to each participant.FP was calculated as the proportion of incorrectly identified normal images over the total number of normal images presented to each participant.A TP and FP rate was calculated for each viewing time in each study arm.

| Analysis
Initial descriptive statistics were used to compare performance by level and time window.Confidence intervals (CIs) were set to 95%.
Separate repeated measures ANOVAs were conducted for ECG TPs, ECG FPs, CXR TPs and CXR FPs.For each analysis, the betweensubject factor was experience (resident physician versus staff physician), while the within-subjects factor was viewing time (175, 250, 500 and 1000 ms).For all results, we report CI, p-values with alpha = 0.05, and effect sizes as partial eta squared (η p 2 ).

| Descriptives
To determine if the overall proportion of correct responses in our study design was similar to prior work, [23][24][25][26][27] we first explored the distribution of correct responses, that is TP and TN, for each level and time window.On average, staff physicians (0.75, CI = 0.67-0.75)and resident physicians (0.71, CI = 0.71-0.79)had similar performance when identifying ECGs.On average, staff physicians (0.78, CI = 0.76-0.80)and resident physicians (0.78, 95% CI = 0.75-0.81)had similar performance when identifying CXRs.Table 1 summarises the average proportion correct by viewing time.

| True positives
For the CXR TP outcome, there was no overall benefit of longer viewing time, no effect of experience and no interaction between viewing time and experience; the average CXR TP rate (0.78, 95% CI = 0.62-0.94)remained consistent for different viewing times and participant groups.

| False positives
The The ECG FP rate of 0.21, 95% CI = 0-0.45,was consistent across viewing times and participant experience.There was no effect of viewing time and experience, and no interaction between viewing time and experience.

| Interim discussion
4][25][26][27] This was not exclusively due to higher performance at the longer viewing times of 500 and 1000 ms, but also at the shortest viewing times of 175 and 250 ms, where we may infer that primarily non-analytic processing is operating.There was a small yet significant signal that experience improved ECG TP, but we did not detect an effect of experience in other comparisons.That is, resident and staff physicians' performance was equivocal for ECG FPs and CXR TPs and FPs.
In Study 2, we extended viewing times from 1 to 20 s to explore the possibility that experience facilitates performance at longer viewing times, when we may infer that both systems are operating.In

| Materials
To accommodate the additional viewing time per image and to keep participation time short, we reduced the total number of images to 50 per arm.Images were randomly selected from the Study 1 image sets.Because an explicit goal of the study was to examine specific diagnosis, in addition to TP and FP, a greater proportion of abnormal stimuli (80%/20%) was included. 28The same diagnoses used in Study 1 were represented.A similar randomisation and counterbalancing process ensured that images were presented equally at all viewing times.

| Procedure
The procedure for

| Outcomes
Similar to Study 1, the primary outcomes included TP and FP.We also calculated diagnostic accuracy, as the proportion correct for each abnormal image correctly identified.The scoring scheme is described below.

| Diagnosis scoring scheme
Participants received two scores for each case.As in Study 1, participants received one score for categorising images as normal and abnormal; correct answers were scored 1 and incorrect scored 0. Participants also received a score for diagnoses using a scheme: number of studies by our group. 12,15,29,30In both the CXR and ECG study, diagnoses for all FPs received a score of 0.

| Analysis
Initial descriptive statistics were used to compare performance by 6 | RESULTS
Table 2 summarises the average proportion correct by viewing time.To facilitate comparison of performance across all viewing times, TP and FP for both staff and resident physicians across all viewing times for Studies 1 and 2 are presented in Figure 3 (CXR) and Figure 4 (ECG).

| True positives
The average CXR TP rate was 0.72, 95% CI = 0.52-0.92.There was a small effect of viewing time on TP, p < 0.01 and η p 2 = 0.1.TP increased from 1 to 5 s, but then did not change at 10 or 20 s: 0.66, (95% CI = 0.61-0.71),0.74, (95% CI = 0.70-0.78),0.73, (95% F I G U R E 3 Studies 1 and 2 proportion of true positive (TP) and false positive (FP) for chest X-ray (CXR) images.In the analyses of these data, there was a significant effect of viewing time for CXR FP in Study 1.All other comparisons were not statistically significant in Study 1.There was a significant effect of viewing time in Study 2 CXR TP.There was also an interaction for CXR FP in Study 2. All other comparisons were not statistically significant in Study 2. CI = 0.696-0.77)and 0.74, (95% CI = 0.70-0.79).There was no effect of experience and no interaction.
F I G U R E 4 Studies 1 and 2 proportion of true positive (TP) and false positive (FP) for electrocardiogram (ECG) images.In the analyses of these data, there was a significant effect of viewing time and experience for ECG TP in Study 1.All other comparisons were not statistically significant in Study 1.There was an effect of viewing time and an interaction for ECG TP in Study 2. There was an effect of experience on ECG FP in study.Other comparisons were not significant.
F I G U E 5 Proportion of correct chest X-ray (CXR) diagnoses.There was a significant effect of viewing time on diagnostic accuracy, but no effect of experience or an interaction.
presented at short time windows, were normal or abnormal.In Study 1, we restricted most viewing times to less than 1 s, with the goal of examining the influence of non-analytic diagnostic reasoning. 6,9,23,27ior empirical explorations of dual process theories 6,9,27 support our proposal that participants in Study 1 relied on non-analytic processing, almost exclusively at viewing times of 175 and 250 ms.In Study 2, we extended viewing times to 20 s and included a diagnostic task.As analytic processing is theorised to join in after 500 ms, we aimed to measure the incremental and additive influence of analytic processing from 1 to 20 s.All viewing times in both studies were likely too brief to represent clinical practice.Despite these constraints, the average proportion correct for detecting abnormality was far above chance, ranging from 60% to 90%, consistent with prior work. 23,25,31Our remaining results offer novel insights into the complex interplay between experience, the nature of the task, and viewing time.Amongst these insights was a recurring pattern, seen in CXR FP, ECG TP and diagnostic accuracy overall, indicating that increased viewing time, and possibly increased analytic processing, may not be beneficial.
Examining the influence of experience on TP and FP, staff physicians did not generally make more errors than resident physicians.
While our findings indicated better performance for staff physicians, this was only for ECG images.In Study 1, TP was generally higher for staff compared with resident physicians, and in Study 2, FP was generally lower for staff compared with resident physicians. 32Studies that have reported a consistent benefit of increased experience have compared groups with distinct differences in experience, such as medical students versus resident physicians, or physicians versus lay people. 27,33Our sample of physicians may have been more homogeneous than in prior work.It is also possible that the impact of experience is different for the development of skill at diagnosing CXRs than for ECGs.Therefore, it may be valuable to explore individual differences across multiple diagnostic tasks and clinical contexts, to better understand the impact of experience on diagnosis.
Turning to the impact of viewing time, and by inference processing style on TP and FP, our findings suggest that differences in how information is extracted from CXRs, and ECGs may modulate the effect of experience and viewing time.CXR TP was relatively constant; 0.77 for both groups at 175 ms, and 0.74 at 20 s.In contrast, CXR FP generally improved from 175 ms (0.31) to 20 s.However, there was an interaction between experience and viewing time as staff physicians had significantly lower CXR FP (0.08) than resident physicians (0.27) at 20 s.It may be that in the absence of any salient abnormality but with access to increased viewing time, normal CXR were examined further, which allowed staff physicians to confirm normality, while resident physicians categorised more normal variants as abnormal. 28,34Perhaps, for CXRs, our sample of resident physicians lacked sufficient exposure to normal variants rendering further analytic processing less effective. 35The decrease in resident CXR FP with increased viewing time, and by inference, additional analytic processing can introduce errors.However, the ECG TP and FP results tell a slightly different story.
Overall ECG TP consistently increased with longer viewing time across both studies, however, staff physicians' performance dropped below that of resident physicians at 10 and 20 s, when viewing times facilitated extended analytic reasoning.These results present a second indicator that additional processing may not be beneficial.As mentioned in Section 2, System 2 may be least effective in routine situations.Therefore, the interaction observed for ECG TP in Study 2 may represent the benefit of experience up until the point when relying on further analytic processing was an ineffective strategy for those with more experience but remained an effective strategy for those with less experience.It is also possible that staff physicians were less likely to engage in extended analytic reasoning for ECGs and so did not benefit from added viewing time.
Diagnostic accuracy for both CXRs and ECGs generally increased with viewing time for both groups, but there were some distinctions.
Staff and resident CXR diagnostic accuracy were almost identical at 1 s and then began to diverge so that at 20 s average staff accuracy was higher than residents.For ECGs, there was an average difference of 0.24 between groups at 1 s, which then converged to an almost identical score at 20 s.As suggested earlier, these different patterns may be reflective of differences in how information is extracted from CXRs and ECGs.Indeed, aside from minor fluctuations, ECG and CXR diagnostic accuracy did not improve past 10 s.Note that diagnostic accuracy was based only on TP calls and consequently not influenced by the change in FP rate with viewing time.However, our results present a third indicator that added viewing time, and possibly also analytic reasoning, may not be consistently beneficial.
There are several possible limitations and contextual influences in our study that must be considered.Our sampling strategy drew from an academic institution where the differences in clinical experience may be more subtle.While prior work 12,15,29,30,36,37 has sampled from similar populations, study designs incorporated patient vignettes describing complex cases in a rich context.In our present study, physicians diagnosed either CXRs or ECGs representing a limited set of skills and diagnoses.As we targeted discrete skills, our sample size may have been underpowered to facilitate discriminating between groups based on TP and FP rates alone.While performance was equivocal across both groups for the diagnostic task as well, overall accuracy was higher than in previous work. 15,29Therefore, our results likely reflect different processing demands than prior work. 12,15,29,30,38It may also be that our findings do not generalise to community-based practice settings and that a larger sample may provide additional insights.Finally, diagnostic images used in this study were selected based on clear salient visual findings, whereas in clinical practice, most imaging is normal and frequently contains ambiguous findings.The influence of saliency on speeded judgements is difficult to predict and may limit the application of these results to more diverse inputs.Still, there were several significant differences for which our sample was powered to detect, and that provide a nuanced perspective on the influence of processing time on diagnostic accuracy.
Whereas dual process theories have emphasised the role of prior experience in effective non-analytic processing, the dominant view in the clinical reasoning literature remains that analytic processing has a critical role in correcting errors derived from rapid, unconscious non-analytic processing.In the two studies presented in this paper, we reported findings that support both viewpoints, indicating that neither system may be effective alone.Given the experimental context of the data presented in this paper and the enforced manipulation of processing time, we also propose that there may be unintended consequences when either system is employed strategically, as some scholars have recommended.Overall, our findings raise concerns about the practical application of either viewpoint, perhaps even dual process theory, towards the specific development of diagnostic error reduction strategies.

3 | STUDY 1 3. 1 |
Methods 3.1.1| Participants Resident physicians (n = 12) and staff physicians in emergency medicine (n = 17) from McMaster University were recruited.Years of training for post-medical school medical residents ranged from 1 to 5, and years in practice for staff physicians ranged from 1 to 22.All participants consented to participate in both study arms of the experiment.

Figure 1
Figure 1 depicts the trial sequence for Study 1.For each participant in our study, a relatively equal number of normal and abnormal images were presented at each viewing time.As each set of 50 normal and 50 abnormal images was divided equally across the four viewing times, the count of normal and abnormal images presented to each participant varied as a function of the image set.A counterbalanced design ensured that all images were equally presented at all viewing

Study 2 ,
we also asked participants to enter a most likely diagnosis for each image to measure the influence of increased viewing time on diagnostic accuracy.Resident physicians (n = 26) and physicians (n = 20) in emergency medicine, who had not participated in Study 1, were recruited from McMaster University.All resident physicians and staff physicians participated in the CXR arm; 21 resident physicians and 17 physicians completed the ECG arm.Resident physicians' years of training ranged from 1 to 5. The number of years in practice ranged from 1 to 28 for staff physicians.
Study 2 was similar to Study 1 except for the viewing time windows, the number of images and the additional requirement of entering a diagnosis.Normal (n = 10) and abnormal (n = 40) images were randomly distributed across the four viewing times: 1, 5, 10 and 20 s.Participants were again asked to first indicate whether the image was normal or abnormal.If they selected abnormal, they were then prompted to enter their most likely diagnosis using text response.Figure 2 depicts the trial sequence for Study 2.

( 0 =
incorrect diagnosis, 1 = partially correct diagnosis, 2 = correct diagnosis).A partially correct diagnosis was one that would result in similar diagnostic tests or management.For example, a diagnosis of airspace disease was given a score of 1, for a CXR showing congestive heart failure.Responses were scored by two of the investigators (JS and MS).The strategy of awarding part scoring has been used in a level and time window.CIs were set to 95%.Separate repeated F I G U R E 2 Study 2 trial sequence.A single trial included a blank screen, an image, a screen with two response options and a blank screen.Starting with a blank screen, the participant hit a key to advance to the next screen.An image was presented at one of four time windows-1, 5, 10 and 20 s-but participants did not know which time window.The next screen presented two options-normal and abnormal.Participants selected one with their mouse.If participants selected abnormal, they could provide a diagnosis on the next screen.Once they submitted a diagnosis, the next screen was blank and the participant pressed their mouse to advance.[Color figure can be viewed at wileyonlinelibrary.com] measures ANOVA were conducted for CXR TP, CXR FP, ECG TP, ECG FP, CXR diagnostic accuracy and ECG diagnostic accuracy.As in Study 1, the within-subjects factor was viewing time (1, 5, 10 and 20 s) and the between-subjects factor was experience level (resident physician versus staff physician).
CXR FP rate improved (i.e.FP decreased) with viewing time,