Objectives: High-fidelity medical simulation (HFMS) is increasingly utilized in resident education and evaluation. No criterion standard of assessing performance currently exists. This study compared the intermethod reliability of real-time versus videotaped evaluation of HFMS participant performance.
Methods: Twenty-five emergency medicine residents and one transitional resident participated in a septic shock HFMS scenario. Four evaluators assessed the performance of participants on technical (26-item yes/no completion) and nontechnical (seven item, five-point Likert scale assessment) scorecards. Two evaluators provided assessment in real time, and two provided delayed videotape review. After 13 scenarios, evaluators crossed over and completed the scenarios in the opposite method. Real-time evaluations were completed immediately at the end of the simulation; videotape reviewers were allowed to review the scenarios with no time limit. Agreement between raters was tested using the intraclass correlation coefficient (ICC), with Cronbach’s alpha used to measure consistency among items on the scores on the checklists.
Results: Bland-Altman plot analysis of both conditions revealed substantial agreement between the real-time and videotaped review scores by reviewers. The mean difference between the reviewers was 0.0 (95% confidence interval [CI] = –3.7 to 3.6) on the technical evaluation and –1.6 (95% CI = –11.4 to 8.2) on the nontechnical scorecard assessment. Comparison of evaluations for the videotape technical scorecard demonstrated a Cronbach’s alpha of 0.914, with an ICC of 0.842 (95% CI = 0.679 to 0.926), and the real-time technical scorecard demonstrated a Cronbach’s alpha of 0.899, with an ICC of 0.817 (95% CI = 0.633 to 0.914), demonstrating excellent intermethod reliability. Comparison of evaluations for the videotape nontechnical scorecard demonstrated a Cronbach’s alpha of 0.888, with an ICC of 0.798 (95% CI = 0.600 to 0.904), and the real-time nontechnical scorecard demonstrated a Cronbach’s alpha of 0.833, with an ICC of 0.714 (95% CI = 0.457 to 0.861), demonstrating substantial interrater reliability. The raters were consistent in agreement on performance within each level of training, as the analysis of variance demonstrated no significant differences between the technical scorecard (p = 0.176) and nontechnical scorecard (p = 0.367).
Conclusions: Real-time and videotaped-based evaluations of resident performance of both technical and nontechnical skills during an HFMS septic shock scenario provided equally reliable methods of assessment.