Summary The difficulties in measurement of the clinical performance of students in the health professions are well known by educators. One innovative measure incorporated in several of the educational programmes, including the BSc in Nursing programme, in the Faculty of Health Sciences, at McMaster University, Hamilton, Ontario, Canada is the objective structured clinical examination (OSCE). The purpose of this study was to determine the reliability of this evaluation method, both within and between stations.
One problem that has been noted by users of the OSCE method is that performance on individual OSCE stations is poorly correlated across stations, apparently regardless of the particular content of the station. A number of hypotheses have been advanced to attempt to explain this phenomenon: performance of any skill is sufficiently variable that the correlation is poor; different skills have little common basis, so that there is no generalizability from one to another, or reliability of assessment in any one station is low. To test these hypotheses, a study was designed for test-retest and interrater reliability. Students undergoing a 10-station OSCE also repeated their starting OSCE station at the end of the examination circuit. In addition, several stations were rated by more than one observer (interrater).
This study of 71 first-year BScN students showed that the interrater reliability was high (ICC = 0.80 to 0.99), and test-retest reliability on the same station was good (ICC = 0.66 to 0.86); however, correlation across stations was low (α= 0.198). Thus it is apparent that there is high consistency of repeated performance of a skill but little consistency of performance on different skills.