Get access

Evaluation of a novel assessment form for observing medical residents: a randomised, controlled trial


Anthony A Donato, Reading Hospital and Medical Center, PO Box 16052, Reading, Pennsylvania 19612-6052, USA. Tel: + 1 610 988 8480; Fax: + 1 610 988 9003; E-mail:


Context  Teaching faculty cannot reliably distinguish between satisfactory and unsatisfactory resident performances and give non-specific feedback.

Objectives  This study aimed to test whether a novel rating form can improve faculty accuracy in detecting unsatisfactory performances, generate more rater observations and improve feedback quality.

Methods  Participants included two groups of 40 internal medicine residency faculty staff. Both groups received 1-hour training on how to rate trainees in the mini-clinical evaluation exercise (mini-CEX) format. The intervention group was given a new rating form structured with prompts, space for free-text comments, behavioural anchors and fewer scoring levels, whereas the control group used the current American Board of Internal Medicine Mini-CEX form. Participants watched and scored six scripted videotapes of resident performances 2–3 weeks after the training session.

Results  Intervention group participants were more accurate in discriminating satisfactory from unsatisfactory performances (85% versus 73% correct; odds ratio [OR] 2.13, 95% confidence interval [CI] 1.16–3.14, = 0.02) and yielded more correctly identified unsatisfactory performances (96% versus 52% correct; OR 25.35, 95% CI 9.12–70.46), but were less accurate in identifying satisfactory performances (73% versus 95% correct; OR 0.15, 95% CI 0.05–0.39). Intervention group participants averaged one fewer declared intended feedback item (4.7 versus 5.7) and showed no difference in the amount of feedback that was above minimal in quality. Intervention group participants generated more written evaluative observations (10.8 versus 5.7). Inter-rater agreement improved with the new form (Fleiss’ kappa, 0.52 versus 0.30).

Conclusions  Modifying the currently used direct observations process may produce more recorded observations, increase inter-rater agreement and improve overall rater accuracy, but it may also increase severity error.