Get access

Peer Assessment of Aviation Performance: Inconsistent for Good Reasons


  • Wolff-Michael Roth,

    Corresponding author
    1. Faculty of Education, University of Victoria
    2. Griffith Institute of Educational Research, Griffith University
    • Correspondence should be sent to Wolff-Michael Roth, Lansdowne Professor, Applied Cognitive Science, Faculty of Education, MacLaurin Building A567, University of Victoria, Victoria, BC V8P 5C2 Canada. E-mail:

    Search for more papers by this author
  • Timothy J. Mavin

    1. Griffith Institute of Educational Research, Griffith University
    Search for more papers by this author


Research into expertise is relatively common in cognitive science concerning expertise existing across many domains. However, much less research has examined how experts within the same domain assess the performance of their peer experts. We report the results of a modified think-aloud study conducted with 18 pilots (6 first officers, 6 captains, and 6 flight examiners). Pairs of same-ranked pilots were asked to rate the performance of a captain flying in a critical pre-recorded simulator scenario. Findings reveal (a) considerable variance within performance categories, (b) differences in the process used as evidence in support of a performance rating, (c) different numbers and types of facts (cues) identified, and (d) differences in how specific performance events affect choice of performance category and gravity of performance assessment. Such variance is consistent with low inter-rater reliability. Because raters exhibited good, albeit imprecise, reasons and facts, a fuzzy mathematical model of performance rating was developed. The model provides good agreement with observed variations.