A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model
Abstract
The hierarchical rater model (HRM) re‐cognizes the hierarchical structure of data that arises when raters score constructed response items. In this approach, raters’ scores are not viewed as being direct indicators of examinee proficiency but rather as indicators of essay quality; the (latent categorical) quality of an examinee's essay in turn serves as an indicator of the examinee's proficiency, thus yielding a hierarchical structure. Here it is shown that a latent class model motivated by signal detection theory (SDT) is a natural candidate for the first level of the HRM, the rater model. The latent class SDT model provides measures of rater precision and various rater effects, above and beyond simply severity or leniency. The HRM‐SDT model is applied to data from a large‐scale assessment and is shown to provide a useful summary of various aspects of the raters’ performance.
Citing Literature
Number of times cited according to CrossRef: 36
- Evelyn S. Johnson, Yuzhu Zheng, Angela R. Crawford, Laura A. Moylan, Examining rater accuracy and consistency with a special education observation protocol, Studies in Educational Evaluation, 10.1016/j.stueduc.2019.100827, 64, (100827), (2020).
- Masaki Uto, Duc-Thien Nguyen, Maomi Ueno, Group Optimization to Maximize Peer Assessment Accuracy Using Item Response Theory and Integer Programming, IEEE Transactions on Learning Technologies, 10.1109/TLT.2019.2896966, 13, 1, (91-106), (2020).
- Lawrence T. DeCarlo, Xiaoliang Zhou, A Latent Class Signal Detection Model for Rater Scoring with Ordered Perceptual Distributions, Journal of Educational Measurement, 10.1111/jedm.12265, 0, 0, (2020).
- Xiaomin Li, Wen-Chung Wang, Qin Xie, Cognitive Diagnostic Models for Rater Effects, Frontiers in Psychology, 10.3389/fpsyg.2020.00525, 11, (2020).
- Martin Hecht, Steffen Zitzmann, A Computationally More Efficient Bayesian Approach for Estimating Continuous-Time Models, Structural Equation Modeling: A Multidisciplinary Journal, 10.1080/10705511.2020.1719107, (1-12), (2020).
- Masaki Uto, Maomi Ueno, A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo, Behaviormetrika, 10.1007/s41237-020-00115-7, (2020).
- Lawrence T. DeCarlo, Insights from Reparameterized DINA and Beyond, Handbook of Diagnostic Classification Models, 10.1007/978-3-030-05584-4_11, (223-243), (2019).
- Ricardo Nieto, Jodi M. Casabianca, Accounting for Rater Effects With the Hierarchical Rater Model Framework When Scoring Simple Structured Constructed Response Tests, Journal of Educational Measurement, 10.1111/jedm.12225, 56, 3, (547-581), (2019).
- Hyo Jeong Shin, Sophia Rabe-Hesketh, Mark Wilson, Trifactor Models for Multiple-Ratings Data, Multivariate Behavioral Research, 10.1080/00273171.2018.1530091, 54, 3, (360-381), (2019).
- Yu He, Xinying Hu, Guangzhong Sun, undefined Unknown, undefined, Proceedings of the ACM Turing Celebration Conference - China on - ACM TURC '19, 10.1145/3321408.3322850, (1-6), (2019).
- Stefanie A. Wind, Wenjing Guo, Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments, Educational and Psychological Measurement, 10.1177/0013164419834613, (001316441983461), (2019).
- Chris Bradley, Robert W. Massof, Estimating measures of latent variables from m-alternative forced choice responses, PLOS ONE, 10.1371/journal.pone.0225581, 14, 11, (e0225581), (2019).
- Letty Koopman, Bonne J. H. Zijlstra, Mark de Rooij, L. Andries van der Ark, Bias of Two-Level Scalability Coefficients and Their Standard Errors, Applied Psychological Measurement, 10.1177/0146621619843821, (014662161984382), (2019).
- Martin Hecht, Christian Gische, Daniel Vogel, Steffen Zitzmann, Integrating Out Nuisance Parameters for Computationally More Efficient Bayesian Estimation – An Illustration and Tutorial, Structural Equation Modeling: A Multidisciplinary Journal, 10.1080/10705511.2019.1647432, (1-11), (2019).
- Steffen Zitzmann, Martin Hecht, Going Beyond Convergence in Bayesian Estimation: Why Precision Matters Too and How to Assess It, Structural Equation Modeling: A Multidisciplinary Journal, 10.1080/10705511.2018.1545232, (1-16), (2019).
- Kuan‐Yu Jin, Wen‐Chung Wang, A New Facets Model for Rater's Centrality/Extremity Response Style, Journal of Educational Measurement, 10.1111/jedm.12191, 55, 4, (543-563), (2018).
- Masaki Uto, Maomi Ueno, Empirical comparison of item response theory models with rater's parameters, Heliyon, 10.1016/j.heliyon.2018.e00622, 4, 5, (e00622), (2018).
- Oliver Lüdtke, Alexander Robitzsch, Ulrich Trautwein, Integrating Covariates into Social Relations Models: A Plausible Values Approach for Handling Measurement Error in Perceiver and Target Effects, Multivariate Behavioral Research, 10.1080/00273171.2017.1406793, 53, 1, (102-124), (2018).
- Yoon Soo Park, Kuan Xing, Rater Model Using Signal Detection Theory for Latent Differential Rater Functioning, Multivariate Behavioral Research, 10.1080/00273171.2018.1522496, (1-13), (2018).
- Emine Burcu Tunç, Müge Uluman, ÖĞRETMEN ADAYLARININ AÇIK UÇLU VE ÇOKTAN SEÇMELİ MADDELERE YÖNELİK ALGILARININ METAFORLAR ARACILIĞIYLA BELİRLENMESİ, Elektronik Sosyal Bilimler Dergisi, 10.17755/esosder.312930, (2018).
- Chen-Wei Liu, Xue-Lan Qiu, Wen-Chung Wang, Item Response Theory Modeling for Examinee-selected Items with Rater Effect, Applied Psychological Measurement, 10.1177/0146621618798667, (014662161879866), (2018).
- Kaja Zupanc, Erik Štrumbelj, A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment, PLOS ONE, 10.1371/journal.pone.0195297, 13, 4, (e0195297), (2018).
- Adrienne Sgammato, John R. Donoghue, On the Performance of the Marginal Homogeneity Test to Detect Rater Drift, Applied Psychological Measurement, 10.1177/0146621617730390, 42, 4, (307-320), (2017).
- Brian F. Patterson, Stefanie A. Wind, George Engelhard, Incorporating Criterion Ratings Into Model-Based Rater Monitoring Procedures Using Latent-Class Signal Detection Theory, Applied Psychological Measurement, 10.1177/0146621617698452, 41, 6, (472-491), (2017).
- Jodi M. Casabianca, Brian W. Junker, Ricardo Nieto, Mark A. Bond, A Hierarchical Rater Model for Longitudinal Data, Multivariate Behavioral Research, 10.1080/00273171.2017.1342202, 52, 5, (576-592), (2017).
- Kuan-Yu Jin, Wen-Chung Wang, Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models, Multivariate Behavioral Research, 10.1080/00273171.2017.1299615, 52, 3, (391-402), (2017).
- Chun Wang, Tian Song, Zhuoran Wang, Edward Wolfe, Essay Selection Methods for Adaptive Rater Monitoring, Applied Psychological Measurement, 10.1177/0146621616672855, 41, 1, (60-79), (2016).
- Jue Wang, George Engelhard, Edward W. Wolfe, Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model, Educational and Psychological Measurement, 10.1177/0013164415621606, 76, 6, (1005-1025), (2016).
- Masaki Uto, Maomi Ueno, Item Response Theory for Peer Assessment, IEEE Transactions on Learning Technologies, 10.1109/TLT.2015.2476806, 9, 2, (157-170), (2016).
- Steffen Zitzmann, Oliver Lüdtke, Alexander Robitzsch, Herbert W. Marsh, A Bayesian Approach for Estimating Multilevel Latent Contextual Models, Structural Equation Modeling: A Multidisciplinary Journal, 10.1080/10705511.2016.1207179, 23, 5, (661-679), (2016).
- Andriy Olenko, Vitaliy Tsyganok, Double Entropy Inter-Rater Agreement Indices, Applied Psychological Measurement, 10.1177/0146621615592718, 40, 1, (37-55), (2015).
- Xiaomin Li, Wen‐Chung Wang, Assessment of Differential Item Functioning Under Cognitive Diagnosis Models: The DINA Model Example, Journal of Educational Measurement, 10.1111/jedm.12061, 52, 1, (28-54), (2015).
- Steffen Zitzmann, Oliver Lüdtke, Alexander Robitzsch, A Bayesian Approach to More Stable Estimates of Group-Level Effects in Contextual Studies, Multivariate Behavioral Research, 10.1080/00273171.2015.1090899, 50, 6, (688-705), (2015).
- Ju-Ho Lee, Sung Chang Ryoo, Sam-Ho Lee, From Multiple Choices to Performance Assessment: Theory, Practice, and Strategy, SSRN Electronic Journal, 10.2139/ssrn.2543415, (2014).
- Wen‐Chung Wang, Chi‐Ming Su, Xue‐Lan Qiu, Item Response Models for Local Dependence Among Multiple Ratings, Journal of Educational Measurement, 10.1111/jedm.12045, 51, 3, (260-280), (2014).
- Zhen Wang, Lihua Yao, THE EFFECTS OF RATER SEVERITY AND RATER DISTRIBUTION ON EXAMINEES' ABILITY ESTIMATION FOR CONSTRUCTED‐RESPONSE ITEMS, ETS Research Report Series, 10.1002/j.2333-8504.2013.tb02330.x, 2013, 2, (i-22), (2014).




