Context The College of Medicine and Medical Sciences at the Arabian Gulf University, Bahrain, replaced the traditional long case/short case clinical examination on the final MD examination with a direct observation clinical encounter examination (DOCEE). Each student encountered four real patients. Two pairs of examiners from different disciplines observed the students taking history and conducting physical examinations and jointly assessed their clinical competence.
Objectives To determine the reliability and validity of the DOCEE by investigating whether examiners agree when scoring, ranking and classifying students; to determine the number of cases and examiners necessary to produce a reliable examination, and to establish whether the examination has content and concurrent validity.
Subjects Fifty-six final year medical students and 22 examiners (in pairs) participated in the DOCEE in 2001.
Methods Generalisability theory, intraclass correlation, Pearson correlation and kappa were used to study reliability and agreement between the examiners. Case content and Pearson correlation between DOCEE and other examination components were used to study validity.
Results Cronbach's alpha for DOCEE was 0·85. The intraclass and Pearson correlation of scores given by specialists and non-specialists ranged from 0·82 to 0·93. Kappa scores ranged from 0·56 to 1·00. The overall intraclass correlation of students' scores was 0·86. The generalisability coefficient with four cases and two raters was 0·84. Decision studies showed that increasing the cases from one to four improved reliability to above 0·8. However, increasing the number of raters had little impact on reliability. The use of a pre-examination blueprint for selecting the cases improved the content validity. The disattenuated Pearson correlations between DOCEE and other performance measures as a measure of concurrent validity ranged from 0·67 to 0·79.
Conclusions The DOCEE was shown to have good reliability and interrater agreement between two independent specialist and non-specialist examiners on the scoring, ranking and pass/fail classification of student performance. It has adequate content and concurrent validity and provides unique information about students' clinical competence.