Context Despite the impartiality implied in its title, the objective structured clinical examination (OSCE) is vulnerable to systematic biases, particularly those affecting raters’ performance. In this study our aim was to examine OSCE ratings for evidence of differential rater function over time (DRIFT), and to explore potential causes of DRIFT.
Methods We studied ratings for 14 internal medicine resident doctors over the course of a single formative OSCE, comprising 10 12-minute stations, each with a single rater. We evaluated the association between time-slot and rating for a station. We also explored a possible interaction between time-slot and station difficulty, which would support the hypothesis that rater fatigue causes DRIFT, and considered ‘warm-up’ as an alternative explanation for DRIFT by repeating our analysis after excluding the first two OSCE stations.
Results Time-slot was positively associated with rating on a station (regression coefficient 0.88, 95% confidence interval [CI] 0.38–1.38; P = 0.001). There was an interaction between time-slot and station difficulty: for the more difficult stations the regression coefficient for time-slot was 1.24 (95% CI 0.55–1.93; P = 0.001) compared with 0.52 (95% CI − 0.08 to 1.13; P = 0.09) for the less difficult stations. Removing the first two stations from our analyses did not correct DRIFT.
Conclusions Systematic biases, such as DRIFT, may compromise internal validity in an OSCE. Further work is needed to confirm this finding and to explore whether DRIFT also affects ratings on summative OSCEs. If confirmed, the factors contributing to DRIFT, and ways to reduce these, should then be explored.