A comparison of methods for calculating a stratified kappa



Investigators use the kappa coefficient to measure chance-corrected agreement among observers in the classification of subjects into nominal categories. The marginal probability of classification may depend, however, on one or more confounding variables. We consider assessment of interrater agreement with subjects grouped into strata on the basis of these confounders. We assume overall agreement across strata is constant and consider a stratified index of agreement, or ‘stratified kappa’, based on weighted summations of the individual kappas. We use three weighting schemes: (1) equal weighting; (2) weighting by the size of the table; and (3) weighting by the inverse of the variance. In a simulation study we compare these methods under differing probability structures and differing sample sizes for the tables. We find weighting by sample size moderately efficient under most conditions. We illustrate the techniques by assessing agreement between surgeons and graders of fundus photographs with respect to retinal characteristics, with stratification by initial severity of the disease.