• medical education;
  • clinical anatomy;
  • assessment;
  • latent trait model;
  • item response theory


The nature of anatomy education has changed substantially in recent decades, though the traditional multiple-choice written examination remains the cornerstone of assessing students' knowledge. This study sought to measure the quality of a clinical anatomy multiple-choice final examination using item response theory (IRT) models. One hundred seventy-six students took a multiple-choice clinical anatomy examination. One- and two-parameter IRT models (difficulty and discrimination parameters) were used to assess item quality. The two-parameter IRT model demonstrated a wide range in item difficulty, with a median of −1.0 and range from −2.0 to 0.0 (25th to 75th percentile). Similar results were seen for discrimination (median 0.6; range 0.4–0.8). The test information curve achieved maximum discrimination for an ability level one standard deviation below the average. There were 15 items with standardized loading less than 0.3, which was due to several factors: two items had two correct responses, one was not well constructed, two were too easy, and the others revealed a lack of detailed knowledge by students. The test used in this study was more effective in discriminating students of lower ability than those of higher ability. Overall, the quality of the examination in clinical anatomy was confirmed by the IRT models. Anat Sci Educ 3:17–24, 2010. © 2009 American Association of Anatomists.