• rater cognition;
  • qualitative methods;
  • scoring;
  • assessment judgments

Internationally, many assessment systems rely predominantly on human raters to score examinations. Arguably, this facilitates the assessment of multiple sophisticated educational constructs, strengthening assessment validity. It can introduce subjectivity into the scoring process, however, engendering threats to accuracy. The present objectives are to examine some key qualitative data collection methods used internationally to research this potential trade-off, and to consider some theoretical contexts within which the methods are usable. Self-report methods such as Kelly's Repertory Grid, think aloud, stimulated recall, and the NASA task load index have yielded important insights into the competencies needed for scoring expertise, as well as the sequences of mental activity that scoring typically involves. Examples of new data and of recent studies are used to illustrate these methods’ strengths and weaknesses. This investigation has significance for assessment designers, developers and administrators. It may inform decisions on the methods’ applicability in American and other rater cognition research contexts.