Good questions, good answers: construct alignment improves the performance of workplace-based assessment scales
Article first published online: 18 APR 2011
© Blackwell Publishing Ltd 2011
Volume 45, Issue 6, pages 560–569, June 2011
How to Cite
Crossley, J., Johnson, G., Booth, J. and Wade, W. (2011), Good questions, good answers: construct alignment improves the performance of workplace-based assessment scales. Medical Education, 45: 560–569. doi: 10.1111/j.1365-2923.2010.03913.x
- Issue published online: 12 MAY 2011
- Article first published online: 18 APR 2011
- Received 20 July 2010; editorial comments to authors 17 September 2010; accepted for publication 8 November 2010
Medical Education 2011: 45: 560–569
Context Assessment in the workplace is important, but many evaluations have shown that assessor agreement and discrimination are poor. Training discussions suggest that assessors find conventional scales invalid. We evaluate scales constructed to reflect developing clinical sophistication and independence in parallel with conventional scales.
Methods A valid scale should reduce assessor disagreement and increase assessor discrimination. We compare conventional and construct-aligned scales used in parallel to assess approximately 2000 medical trainees by each of three methods of workplace-based assessment (WBA): the mini-clinical evaluation exercise (mini-CEX); the acute care assessment tool (ACAT), and the case-based discussion (CBD). We evaluate how scores reflect assessor disagreement (Vj and Vj*p) and assessor discrimination (Vp), and we model reliability using generalisability theory.
Results In all three cases the conventional scale gave a performance similar to that in previous evaluations, but the construct-aligned scales substantially reduced assessor disagreement and substantially increased assessor discrimination. Reliability modelling shows that, using the new scales, the number of assessors required to achieve a generalisability coefficient ≥ 0.70 fell from six to three for the mini-CEX, from eight to three for the CBD, from 10 to nine for ‘on-take’ ACAT, and from 30 to 12 for ‘post-take’ ACAT.
Conclusions The results indicate that construct-aligned scales have greater utility, both because they are more reliable and because that reliability provides evidence of greater validity. There is also a wider implication: the disappointing reliability of existing WBA methods may reflect not assessors’ differing assessments of performance, but, rather, different interpretations of poorly aligned scales. Scales aligned to the expertise of clinician-assessors and the developing independence of trainees may improve confidence in WBA.