Performance assessment and simulation fidelity for dummies


  • Victoria Brazil, MB BS, FACEM, MBA, Staff Specialist.

See also pp. 508–14

Identifying valid, reliable and feasible methods of clinician assessment is important yet challenging. Society expects transparency of certification and recertification processes for doctors and other health-care professionals. There is now an expanded range of health-care curricular domains for Australasian College for Emergency Medicine trainees that includes communication, professionalism, teamwork and system-based practice. However, performance across these domains is difficult to measure.

Scenario-based assessment of clinician performance offers a number of obvious advantages. Challenging candidates with standardized scenarios confers reliability, whereas the requirement to perform across clinical, procedural and communication domains in a realistic clinical context arguably makes the assessment more valid than tests of knowledge alone. Physician training courses, such as Early Management of Severe Trauma and Advanced Cardiac Life Support, have incorporated scenarios in their assessment processes for many years.

The use of human patient simulators and immersive environments offers further opportunities, especially in the realistic performance of procedures and for teamwork-based tasks. However, this method is resource-intensive, lacks accessibility for many educational programmes and still cannot offer sufficient realism for many clinical and communication tasks.

In this issue of the journal, Lee et al. examine the use of two types of simulator in the assessment of intensive care paramedics.1 They compare the test results of a small cohort of paramedics assessed by using a high-fidelity mannequin versus a low-fidelity mannequin, using a cross-over design. They also sought subjective data on the acceptability of these assessment devices.

Scenario-based training and assessment have been an important part of paramedic education for many years. The move towards using expensive mannequins in these scenarios has not yet been accompanied by rigorous evaluation of ‘bang for buck’ in terms of validity, reliability or acceptability of this method, so the authors' contribution is a timely one.

The issue of scenario fidelity is central to assessment validity – how close to clinical reality is the assessment challenge? Fidelity is complex and multifaceted, and includes physical aspects, such as environment and equipment, and psychological aspects, such as ‘task fidelity’, including scenario design and tempo, and team composition.2,3 Lee et al. focus entirely on the equipment aspects of fidelity, a perspective naturally encouraged by mannequin manufacturers.1

The relative contribution of either physical or psychological aspects to the learning or assessment experience has not yet been quantified, but it is suggested that the psychological dimension is paramount in the ‘suspension of disbelief’. Although technology can increase the psychological fidelity of well-designed training scenarios, it cannot compensate for poorly designed ones.3 Contextual factors have been shown to be crucial in the perception of training by the participants.4,5

Scenarios need to be carefully designed to provide valid assessment tools. Lee et al. outline their scenario ‘storyboards’ in the appendices and provide a checklist of expected participant actions.1 As with most simulation-based assessment scenarios, theirs are designed by content experts, using an intuitive approach to the realism and expected difficulty of the challenge. This provides a degree of face validity. However, other authors have sought to establish content validity through examination of the consistency of performance across a number of scenarios, and by comparison with clinical supervisor ratings.6 For instance, using an educational approach, the Simulation Module for Assessment of Resident Targeted Event Therapies tool consists of an eight-step process that starts with competencies and ends up with a scenario script accompanied by appropriate measures.7

Lee et al. found no significant difference between the two simulators used for assessments.1 However, the scenarios used for each simulator were not the same, which might have influenced that outcome. Additionally, it has been suggested that precise measures of performance might require more than six scenarios.6

Assessors require training to perform reliable simulation-based assessment, especially when assessing complex tasks and non-technical skills. Lee et al. used assessors who were experienced Advanced Paediatric Life Support instructors, using a formalized marking template.1 The published work on performance measurement suggests that scores based on checklists are highly correlated with scores based on global rating scales. Although checklists might be perceived to be more objective, they might not be as good as global ratings in capturing increasing levels of medical expertise.8 One of the key issues in using a scenario checklist approach for recertification purposes is that experts tend to use shortcuts and might achieve clinical end-points with variance in process.9

The use of videotaping in a simulated environment allows extensive analysis by assessors through replaying aspects of the scenario, and also defensibility in the case of unsuccessful candidates challenging the assessment process.

Lee et al. used a simple feedback tool to ascertain the acceptability of this method by candidates and found a preference for the higher-fidelity mannequin, although the reasons for this preference were not explored further.1 Learning using simulation-based scenarios is generally popular among trainees, but its acceptability for ‘high stakes assessment’ has not been widely studied.

In conclusion, there remains great potential for scenario-based assessment using mannequin technology, especially for clinicians in procedure-rich, high-acuity health-care practice. However, the key educational aspects of the assessment are yet to be fully explored. Higher-fidelity mannequins are expensive, and significant validity and reliability advantages must be demonstrated before widespread adoption of this technology for assessment of clinician competence can be recommended. Lee et al.'s article suggests that such advantages are not yet demonstrated for this learner group.

Competing interests

Dr Victoria Brazil is the director of Medical Education Solutions, a private Brisbane-based medical education provider specializing in the delivery of simulation-based training.