Assessment on homogeneity tests for kappa statistics under equal prevalence across studies in reliability†
This article is a U.S. Government work and is in the public domain in the U.S.A.
Abstract
In this paper, we assess the performance of homogeneity tests for two or more kappa statistics when prevalence rates across reliability studies are assumed to be equal. The likelihood score method and the chi‐square goodness‐of‐fit (GOF) test provide type 1 error rates that are satisfactorily close to the nominal level, but a Fleiss‐like test is not satisfactory for small or moderate sample sizes. Simulations show that the score test is more powerful than the chi‐square GOF test and the approximate sample size required for a specific power of the former is substantially smaller than the latter. In addition, the score test is robust to deviations from the equal prevalence assumption, while the GOF test is highly sensitive and it may give a grossly misleading type 1 error rate when the assumption of equal prevalence is violated. We conclude that the homogeneity score test is the preferred method. Published in 2005 by John Wiley & Sons, Ltd.
Citing Literature
Number of times cited according to CrossRef: 5
- Muammer Albayrak, Kemal Turhan, Yasemin Yavuz, Zeliha Aydin Kasap, kaphom: An R package for testing the homogeneity of intra-class kappa statistics, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2018.1538457, (1-16), (2019).
- Gregory E. Wilding, Joseph D. Consiglio, Guogen Shan, Exact approaches for testing hypotheses based on the intra‐class kappa coefficient, Statistics in Medicine, 10.1002/sim.6135, 33, 17, (2998-3012), (2014).
- Lawrence H. Brown, Michael W. Hubble, David C. Cone, Michael G. Millin, Brian Schwartz, P. Daniel Patterson, Brad Greenberg, Michael E. Richards, Paramedic Determinations of Medical Necessity: A Meta-Analysis, Prehospital Emergency Care, 10.1080/10903120903144809, 13, 4, (516-527), (2009).
- Sin-Ho Jung, Huiman X. Barnhart, Insuk Sohn, Sandra S. Stinnett, David K. Wallace, Sample Size for Comparing Correlated Concordance Rates, Journal of Biopharmaceutical Statistics, 10.1080/10543400701697216, 18, 2, (359-369), (2008).
- Johan Johansson, Hans-Olof Håkansson, Lennart Mellblom, Antti Kempas, Fredrik Granath, Karl-Erik Johansson, Olof Nyrén, Diagnosing Barrettʼs oesophagus: factors related to agreement between endoscopy and histology, European Journal of Gastroenterology & Hepatology, 10.1097/MEG.0b013e3282cf5018, 19, 10, (870-877), (2007).




