Detecting systematic bias between two raters


Dr John Ludbrook, 563 Canning Street, Carlton North, Victoria 3054, Australia. Email:


1. I recently reviewed, inter alia, methods for comparing two raters who make judgements on an ordered categorical scale, directed principally at the kappa statistic and its weaknesses.

2. The main weakness of the kappa statistic is that it fails to detect the all-important feature of systematic bias between raters.

3. I described various methods for detecting bias between two raters. These included a modified McNemar test and the single binomial test. Others that have been suggested are the symmetry of disagreement index (SD) and the marginal homogeneity test.

4. I now realize that none of the above four tests for bias is satisfactory, because all ignore the extent of agreement.

5. The bias index (BI) does take into account the extent of agreement, but its inventors did not propose how BI could be evaluated. I now describe a method for doing this.