Volume 43, Issue 4

A Comparative Study of IRT Fixed Parameter Calibration Methods

First published: 11 December 2006
Citations: 41

Abstract

This article provides technical descriptions of five fixed parameter calibration (FPC) methods, which were based on marginal maximum likelihood estimation via the EM algorithm, and evaluates them through simulation. The five FPC methods described are distinguished from each other by how many times they update the prior ability distribution and by how many EM cycles they use. Specifically, the five FPC methods included no prior weights updating and one EM cycle (NWU‐OEM) or multiple EM cycles (NWU‐MEM), one prior weights updating and one EM cycle (OWU‐OEM) or multiple EM cycles (OWU‐MEM), and multiple weights updating and multiple EM cycles (MWU‐MEM) methods. All the five FPC methods were evaluated in terms of recovery of the underlying ability distribution and item parameters. An important factor in the simulation was three different ability (normal) distributions—N(0, 1), N(0.5, 1.22), and N(1, 1.42)—for FPC groups, with the fixed item parameters obtained with a reference N(0, 1) group. Only the MWU‐MEM method appeared to perform properly under all the three distributions. Under the N(0, 1) distribution, the NWU‐MEM and OWU‐MEM methods also appeared to perform properly. Under the N(0.5, 1.22), and N(1, 1.42) distributions, however, the four methods other than the MWU‐MEM method resulted in some or severe under‐estimation in the recovery.

Number of times cited according to CrossRef: 41

  • Competence development of high achievers within the highest track in German secondary school: Evidence for Matthew effects or compensation?, Learning and Individual Differences, 10.1016/j.lindif.2019.101816, 77, (101816), (2020).
  • Working with Atypical Samples, Educational Measurement: Issues and Practice, 10.1111/emip.12360, 39, 3, (19-21), (2020).
  • Estimating standard errors of IRT true score equating coefficients using imputed item parameters, The Journal of Experimental Education, 10.1080/00220973.2020.1751579, (1-23), (2020).
  • irtplay : An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT , Applied Psychological Measurement, 10.1177/0146621620921247, (014662162092124), (2020).
  • Item Calibration Methods With Multiple Subscale Multistage Testing, Journal of Educational Measurement, 10.1111/jedm.12241, 57, 1, (3-28), (2019).
  • Two IRT Fixed Parameter Calibration Methods for the Bifactor Model, Journal of Educational Measurement, 10.1111/jedm.12230, 57, 1, (29-50), (2019).
  • Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data, Applied Measurement in Education, 10.1080/08957347.2019.1660344, 32, 4, (310-324), (2019).
  • When Nonresponse Mechanisms Change: Effects on Trends and Group Comparisons in International Large-Scale Assessments, Educational and Psychological Measurement, 10.1177/0013164419829196, (001316441982919), (2019).
  • New Efficient and Practicable Adaptive Designs for Calibrating Items Online, Applied Psychological Measurement, 10.1177/0146621618824854, (014662161882485), (2019).
  • Modeling Response Time and Responses in Multidimensional Health Measurement, Frontiers in Psychology, 10.3389/fpsyg.2019.00051, 10, (2019).
  • Restricted Recalibration of Item Response Theory Models, Psychometrika, 10.1007/s11336-019-09667-4, (2019).
  • Detection and Treatment of Careless Responses to Improve Item Parameter Estimation, Journal of Educational and Behavioral Statistics, 10.3102/1076998618825116, (107699861882511), (2019).
  • Addressing score comparability in diagnostic classification models: an observed-score equating and linking approach, Behaviormetrika, 10.1007/s41237-019-00102-7, (2019).
  • Optimal Online Calibration Designs for Item Replenishment in Adaptive Testing, Psychometrika, 10.1007/s11336-019-09687-0, (2019).
  • FIPC Linking Across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions, International Journal of Testing, 10.1080/15305058.2018.1428980, 18, 4, (323-345), (2018).
  • High agreement was obtained across scores from multiple equated scales for social anxiety disorder using item response theory, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2018.04.003, 99, (132-143), (2018).
  • IRT Linking and Equating, The Wiley Handbook of Psychometric Testing, 10.1002/9781118489772, (639-673), (2018).
  • Applying Kaplan-Meier to Item Response Data, The Journal of Experimental Education, 10.1080/00220973.2017.1301355, 86, 2, (308-324), (2017).
  • A New Online Calibration Method Based on Lord’s Bias-Correction, Applied Psychological Measurement, 10.1177/0146621617697958, 41, 6, (456-471), (2017).
  • Item Response Theory Equating, Applying Test Equating Methods, 10.1007/978-3-319-51824-4_5, (111-136), (2017).
  • Developing new online calibration methods for multidimensional computerized adaptive testing, British Journal of Mathematical and Statistical Psychology, 10.1111/bmsp.12083, 70, 1, (81-117), (2017).
  • Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data, Frontiers in Psychology, 10.3389/fpsyg.2017.00484, 8, (2017).
  • Item Response Theory Observed-Score Kernel Equating, Psychometrika, 10.1007/s11336-016-9528-7, 82, 1, (48-66), (2016).
  • An exploratory study on the ability parameter estimation method considering the differential item weighing in multiple choice items: use of the graded response model, The Korean Journal of Educational Methodology Studies, 10.17927/tkjems.2016.28.3.521, 28, 3, (521-538), (2016).
  • A Comparison of Linking Methods for Estimating National Trends in International Comparative Large‐Scale Assessments in the Presence of Cross‐National DIF, Journal of Educational Measurement, 10.1111/jedm.12106, 53, 2, (152-171), (2016).
  • How Does Calibration Timing and Seasonality Affect Item Parameter Estimates?, Educational and Psychological Measurement, 10.1177/0013164415588947, 76, 3, (508-527), (2015).
  • The Effect of Changing Content on IRT Scaling Methods, Applied Measurement in Education, 10.1080/08957347.2014.1002922, 28, 2, (99-114), (2015).
  • Item Response Theory Methods, Test Equating, Scaling, and Linking, 10.1007/978-1-4939-0317-7, (171-245), (2014).
  • The Long‐Term Sustainability of IRT Scaling Methods in Mixed‐Format Tests, Journal of Educational Measurement, 10.1111/jedm.12025, 50, 4, (390-407), (2014).
  • Capturing specific abilities as a window into human individuality: The example of face recognition, Cognitive Neuropsychology, 10.1080/02643294.2012.753433, 29, 5-6, (360-392), (2013).
  • Content-Based Collaborative Filtering for Question Difficulty Calibration, PRICAI 2012: Trends in Artificial Intelligence, 10.1007/978-3-642-32695-0_33, (359-371), (2012).
  • Software Note, Applied Psychological Measurement, 10.1177/0146621612438726, 36, 3, (232-236), (2012).
  • Item difficulty estimation: An auspicious collaboration between data and judgment, Computers & Education, 10.1016/j.compedu.2011.11.020, 58, 4, (1183-1193), (2012).
  • Linking item parameters to a base scale, Asia Pacific Education Review, 10.1007/s12564-011-9197-2, 13, 2, (311-321), (2011).
  • The Long-Term Sustainability of Different Item Response Theory Scaling Methods, Educational and Psychological Measurement, 10.1177/0013164410375111, 71, 2, (362-379), (2011).
  • Sensitivity to initial values in full non‐parametric maximum‐likelihood estimation of the two‐parameter logistic model, British Journal of Mathematical and Statistical Psychology, 10.1348/000711010X531957, 64, 2, (320-336), (2011).
  • A Graphical Approach to Evaluating Equating Using Test Characteristic Curves, Applied Psychological Measurement, 10.1177/0146621610377082, 35, 3, (217-234), (2010).
  • The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods, The Journal of Experimental Education, 10.1080/00220970903292959, 79, 1, (30-52), (2010).
  • A Comparison of IRT Linking Procedures, Applied Measurement in Education, 10.1080/08957340903423537, 23, 1, (23-48), (2009).
  • A Comparison of the Common‐Item and Random‐Groups Equating Designs Using Empirical Data, International Journal of Selection and Assessment, 10.1111/j.1468-2389.2008.00413.x, 16, 2, (83-92), (2008).
  • The Benefits of Fixed Item Parameter Calibration for Parameter Accuracy in Small Sample Situations in Large‐Scale Assessments, Educational Measurement: Issues and Practice, 10.1111/emip.12381, 0, 0, (undefined).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.