The purpose of this study was to define the psychometric properties of a simulation-based assessment of anaesthetists. Twenty-one anaesthetic trainees took part in three highly standardised simulations of anaesthetic emergencies. Scenarios were videotaped and rated independently by four judges. Trainees also assessed their own performance in the simulations. Results were analysed using generalisability theory to determine the influence of subject, case and judge on the variance in judges' scores and to determine the number of cases and judges required to produce a reliable result. Self-assessed scores were compared to the mean score of the judges. The results suggest that 12–15 cases are required to rank trainees reliably on their ability to manage simulated crises. Greater reliability is gained by increasing the number of cases than by increasing the number of judges. There was modest but significant correlation between self-assessed scores and external assessors' scores (rho = 0.321; p = 0.01). At the lower levels of performance, trainees consistently overrated their performance compared to those performing at higher levels (p = 0.0001).