Item Clusters and Computerized Adaptive Testing: A Case for Testlets


  • This paper is a result of work done while the second author was a summer predoctoral fellow at the Educational Testing Service, Princeton, NJ.

  • This work was supported by the Educational Testing Service. Our thanks for helpful comments to: Robert Mislevy, Malcolm Ree, Fumiko Samejima, David Thissen, and the test development staff at ETS. This version has benefited massively from critical comments from Bert Green and his (unnamed) graduate students, two anonymous reviewers, and Wendy Yen. Special thanks to Paul Rosenbaum, whose item bundles keep everything together. Despite this help, all remaining errors are ours.

HOWARD WAINER, Senior Research Scientist, Educational Testing Service, Princeton, NJ 08541. Degrees: BS, Rensselaer Polytechnic Institute; AM, PhD, Princeton University. Specializations: graphics, psychometrics, statistics.

GERARD L. KIELY, PhD Candidate in Psychometric Methods, University of Minnesota, Department of Psychology, 75 East River Rd., Minneapolis, MN 55455. Degrees: BA, MS, Upsala College. Specializations: psychometrics, item response theory, computerized adaptive testing.


It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the “testlet,” a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates.