Components of Rater Error in a Complex Performance Assessment


  • The authors thank Lee J. Cronbach, Rebecca Zwick, and three anonymous reviewers for their useful comments about this manuscript.

BRIAN E. CLAUSER is Senior Psychometrician, National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104; Degrees: BA, Lehigh University; MEd, EdD, University of Massachusetts at Amherst. Specialization: psychometric methods.

STEPHEN G. CLYMAN is Senior Medical Evaluation Officer and Project Director for CCS National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104; Degrees: BS, Pennsylvania State University; MD, Thomas Jefferson Medical College/Thomas Jefferson University; MS, University of Califomia, San Francisco. Specialization: medical informatics.

DAVID B. SWANSON is Senior Evaluation Officer, National Board of Medical Examiners. 3750 Market St., Philadelphia, PA 19104; Degrees: BS, PhD, University of Minnesota. Specialization: educational measurement.


Numerous studies have examined performance assessment data using generaliz-ability theory. Typically, these studies have treated raters as randomly sampled from a population, with each rater judging a given performance on a single occasion. This paper presents two studies that focus on aspects of the rating process that are not explicitly accounted for in this typical design. The first study makes explicit the “committee” facet, acknowledging that raters often work within groups. The second study makes explicit the “rating-occasion” facet by having each rater judge each performance on two separate occasions. The results of the first study highlight the importance of clearly specifying the relevant facets of the universe of interest. Failing to include the committee facet led to an overly optimistic estimate of the precision of the measurement procedure. By contrast, failing to include the rating-occasion facet, in the second study, had minimal impact on the estimated error variance.