• inter-rater reliability;
  • κ statistic;
  • interview;
  • qualitative;
  • Mezzich;
  • proportional overlap;
  • parents;
  • accident and emergency;
  • health services research;
  • service users

Accounting for overlap? An application of Mezzich’s κ statistic to test interrater reliability of interview data on parental accident and emergency attendance

Study rationale. The number of interview studies with service users is rising because of growth in health services research. The level of agreement between multiple interview data coders requires statistical calculation to support results. Basic κ statistics are often used but this depends on having mutually exclusive data. Researchers should be aware that this is not valid when an interview word or paragraph can be coded into more than one category. The ‘proportional overlap’κ extension by Mezzich et al. (1981, Journal of Psychiatric Research16, 29–39) has been investigated as an original solution.

Objectives. To assess the level of agreement beyond chance between several raters of interview data by applying the ‘proportional overlap’κ statistic by Mezzich et al. to verbal interview data. The clinical area investigated was child attendance at an Accident and Emergency Department, where parental attendance experiences have been under-explored.

Methods. Two researchers using a coding schedule coded a random sample of interview transcripts. These data were applied to Mezzich’s procedure; coder 1 notes that a paragraph refers to category A and B but coder 2 notes A, B and C. The total agreement overlap in this case was 0·66 because two actual agreements out of three possible agreements were made. This was repeated for each paragraph and divided by the number of coding pairs. All agreement values were summed then subsequently divided by the total number of paragraphs to get Po (total number of observed agreements) and by the total number of coding pairs to get Pe (total number of agreements by chance alone). Po and Pe were used in the basic κ formula to assess interview coding reliability.

Results. The overall mean Po was 0·61, the mean Pe was 0·32, with a κ score of 0·43; a moderate level of agreement which was statistically significant (t=4·8, P < 0·001, d.f.=23).

Conclusion. Mezzich’s procedure may be applied to interview data to calculate agreement levels between several coders.