A Comparative Study of IRT Fixed Parameter Calibration Methods
Abstract
This article provides technical descriptions of five fixed parameter calibration (FPC) methods, which were based on marginal maximum likelihood estimation via the EM algorithm, and evaluates them through simulation. The five FPC methods described are distinguished from each other by how many times they update the prior ability distribution and by how many EM cycles they use. Specifically, the five FPC methods included no prior weights updating and one EM cycle (NWU‐OEM) or multiple EM cycles (NWU‐MEM), one prior weights updating and one EM cycle (OWU‐OEM) or multiple EM cycles (OWU‐MEM), and multiple weights updating and multiple EM cycles (MWU‐MEM) methods. All the five FPC methods were evaluated in terms of recovery of the underlying ability distribution and item parameters. An important factor in the simulation was three different ability (normal) distributions—N(0, 1), N(0.5, 1.22), and N(1, 1.42)—for FPC groups, with the fixed item parameters obtained with a reference N(0, 1) group. Only the MWU‐MEM method appeared to perform properly under all the three distributions. Under the N(0, 1) distribution, the NWU‐MEM and OWU‐MEM methods also appeared to perform properly. Under the N(0.5, 1.22), and N(1, 1.42) distributions, however, the four methods other than the MWU‐MEM method resulted in some or severe under‐estimation in the recovery.
Citing Literature
Number of times cited according to CrossRef: 41
- Claudia Neuendorf, Malte Jansen, Poldi Kuhl, Competence development of high achievers within the highest track in German secondary school: Evidence for Matthew effects or compensation?, Learning and Individual Differences, 10.1016/j.lindif.2019.101816, 77, (101816), (2020).
- Zhongmin Cui, Working with Atypical Samples, Educational Measurement: Issues and Practice, 10.1111/emip.12360, 39, 3, (19-21), (2020).
- Zhonghua Zhang, Estimating standard errors of IRT true score equating coefficients using imputed item parameters, The Journal of Experimental Education, 10.1080/00220973.2020.1751579, (1-23), (2020).
- Hwanggyu Lim, Craig S. Wells, irtplay : An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT , Applied Psychological Measurement, 10.1177/0146621620921247, (014662162092124), (2020).
- Chun Wang, Ping Chen, Shengyu Jiang, Item Calibration Methods With Multiple Subscale Multistage Testing, Journal of Educational Measurement, 10.1111/jedm.12241, 57, 1, (3-28), (2019).
- Kyung Yong Kim, Two IRT Fixed Parameter Calibration Methods for the Bifactor Model, Journal of Educational Measurement, 10.1111/jedm.12230, 57, 1, (29-50), (2019).
- Seonghoon Kim, Michael J. Kolen, Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data, Applied Measurement in Education, 10.1080/08957347.2019.1660344, 32, 4, (310-324), (2019).
- Karoline A. Sachse, Nicole Mahler, Steffi Pohl, When Nonresponse Mechanisms Change: Effects on Trends and Group Comparisons in International Large-Scale Assessments, Educational and Psychological Measurement, 10.1177/0013164419829196, (001316441982919), (2019).
- Yinhong He, Ping Chen, Yong Li, New Efficient and Practicable Adaptive Designs for Calibrating Items Online, Applied Psychological Measurement, 10.1177/0146621618824854, (014662161882485), (2019).
- Chun Wang, David J. Weiss, Shiyang Su, Modeling Response Time and Responses in Multidimensional Health Measurement, Frontiers in Psychology, 10.3389/fpsyg.2019.00051, 10, (2019).
- Yang Liu, Ji Seung Yang, Alberto Maydeu-Olivares, Restricted Recalibration of Item Response Theory Models, Psychometrika, 10.1007/s11336-019-09667-4, (2019).
- Jeffrey M. Patton, Ying Cheng, Maxwell Hong, Qi Diao, Detection and Treatment of Careless Responses to Improve Item Parameter Estimation, Journal of Educational and Behavioral Statistics, 10.3102/1076998618825116, (107699861882511), (2019).
- Ren Liu, Addressing score comparability in diagnostic classification models: an observed-score equating and linking approach, Behaviormetrika, 10.1007/s41237-019-00102-7, (2019).
- Yinhong He, Ping Chen, Optimal Online Calibration Designs for Item Replenishment in Adaptive Testing, Psychometrika, 10.1007/s11336-019-09687-0, (2019).
- Sohee Kim, Ki Lynn Cole, Mwarumba Mwavita, FIPC Linking Across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions, International Journal of Testing, 10.1080/15305058.2018.1428980, 18, 4, (323-345), (2018).
- Matthew Sunderland, Philip Batterham, Alison Calear, Natacha Carragher, Andrew Baillie, Tim Slade, High agreement was obtained across scores from multiple equated scales for social anxiety disorder using item response theory, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2018.04.003, 99, (132-143), (2018).
- Won‐Chan Lee, Guemin Lee, IRT Linking and Equating, The Wiley Handbook of Psychometric Testing, 10.1002/9781118489772, (639-673), (2018).
- Daniel McNeish, Applying Kaplan-Meier to Item Response Data, The Journal of Experimental Education, 10.1080/00220973.2017.1301355, 86, 2, (308-324), (2017).
- Yinhong He, Ping Chen, Yong Li, Shumei Zhang, A New Online Calibration Method Based on Lord’s Bias-Correction, Applied Psychological Measurement, 10.1177/0146621617697958, 41, 6, (456-471), (2017).
- Jorge González, Marie Wiberg, Jorge González, Marie Wiberg, Item Response Theory Equating, Applying Test Equating Methods, 10.1007/978-3-319-51824-4_5, (111-136), (2017).
- Ping Chen, Chun Wang, Tao Xin, Hua‐Hua Chang, Developing new online calibration methods for multidimensional computerized adaptive testing, British Journal of Mathematical and Statistical Psychology, 10.1111/bmsp.12083, 70, 1, (81-117), (2017).
- Yue Zhao, Ronald K. Hambleton, Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data, Frontiers in Psychology, 10.3389/fpsyg.2017.00484, 8, (2017).
- Björn Andersson, Marie Wiberg, Item Response Theory Observed-Score Kernel Equating, Psychometrika, 10.1007/s11336-016-9528-7, 82, 1, (48-66), (2016).
- undefined 김윤주, undefined 강태훈, An exploratory study on the ability parameter estimation method considering the differential item weighing in multiple choice items: use of the graded response model, The Korean Journal of Educational Methodology Studies, 10.17927/tkjems.2016.28.3.521, 28, 3, (521-538), (2016).
- Karoline A. Sachse, Alexander Roppelt, Nicole Haag, A Comparison of Linking Methods for Estimating National Trends in International Comparative Large‐Scale Assessments in the Presence of Cross‐National DIF, Journal of Educational Measurement, 10.1111/jedm.12106, 53, 2, (152-171), (2016).
- Adam E. Wyse, Ben Babcock, How Does Calibration Timing and Seasonality Affect Item Parameter Estimates?, Educational and Psychological Measurement, 10.1177/0013164415588947, 76, 3, (508-527), (2015).
- Lisa A. Keller, Robert R. Keller, The Effect of Changing Content on IRT Scaling Methods, Applied Measurement in Education, 10.1080/08957347.2014.1002922, 28, 2, (99-114), (2015).
- Michael J. Kolen, Robert L. Brennan, Michael J. Kolen, Robert L. Brennan, Item Response Theory Methods, Test Equating, Scaling, and Linking, 10.1007/978-1-4939-0317-7, (171-245), (2014).
- Lisa A. Keller, Ronald K. Hambleton, The Long‐Term Sustainability of IRT Scaling Methods in Mixed‐Format Tests, Journal of Educational Measurement, 10.1111/jedm.12025, 50, 4, (390-407), (2014).
- Jeremy B. Wilmer, Laura Germine, Christopher F. Chabris, Garga Chatterjee, Margaret Gerbasi, Ken Nakayama, Capturing specific abilities as a window into human individuality: The example of face recognition, Cognitive Neuropsychology, 10.1080/02643294.2012.753433, 29, 5-6, (360-392), (2013).
- Minh Luan Nguyen, Siu Cheung Hui, Alvis C. M. Fong, Content-Based Collaborative Filtering for Question Difficulty Calibration, PRICAI 2012: Trends in Artificial Intelligence, 10.1007/978-3-642-32695-0_33, (359-371), (2012).
- Christine E. DeMars, Daniel P. Jurich, Software Note, Applied Psychological Measurement, 10.1177/0146621612438726, 36, 3, (232-236), (2012).
- Kelly Wauters, Piet Desmet, Wim Van Den Noortgate, Item difficulty estimation: An auspicious collaboration between data and judgment, Computers & Education, 10.1016/j.compedu.2011.11.020, 58, 4, (1183-1193), (2012).
- Taehoon Kang, Nancy S. Petersen, Linking item parameters to a base scale, Asia Pacific Education Review, 10.1007/s12564-011-9197-2, 13, 2, (311-321), (2011).
- Lisa A. Keller, Robert R. Keller, The Long-Term Sustainability of Different Item Response Theory Scaling Methods, Educational and Psychological Measurement, 10.1177/0013164410375111, 71, 2, (362-379), (2011).
- Ingo W. Nader, Ulrich S. Tran, Anton K. Formann, Sensitivity to initial values in full non‐parametric maximum‐likelihood estimation of the two‐parameter logistic model, British Journal of Mathematical and Statistical Psychology, 10.1348/000711010X531957, 64, 2, (320-336), (2011).
- Adam E. Wyse, Mark D. Reckase, A Graphical Approach to Evaluating Equating Using Test Characteristic Curves, Applied Psychological Measurement, 10.1177/0146621610377082, 35, 3, (217-234), (2010).
- Lisa A. Keller, Robert R. Keller, Pauline A. Parker, The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods, The Journal of Experimental Education, 10.1080/00220970903292959, 79, 1, (30-52), (2010).
- Won-Chan Lee, Jae-Chun Ban, A Comparison of IRT Linking Procedures, Applied Measurement in Education, 10.1080/08957340903423537, 23, 1, (23-48), (2009).
- Dong‐In Kim, Seung W. Choi, Guemin Lee, Kooghyang R. Um, A Comparison of the Common‐Item and Random‐Groups Equating Designs Using Empirical Data, International Journal of Selection and Assessment, 10.1111/j.1468-2389.2008.00413.x, 16, 2, (83-92), (2008).
- Christoph König, Lale Khorramdel, Kentaro Yamamoto, Andreas Frey, The Benefits of Fixed Item Parameter Calibration for Parameter Accuracy in Small Sample Situations in Large‐Scale Assessments, Educational Measurement: Issues and Practice, 10.1111/emip.12381, 0, 0, (undefined).




