• decadal hindcasts;
  • ensembles;
  • forecasts;
  • scores;
  • spurious skill;
  • verification


Evaluation is important for improving climate prediction systems and establishing the credibility of their predictions of the future. This paper shows how the choices that must be made about how to evaluate predictions affect the outcome and ultimately our view of the prediction system's quality. The aim of evaluation is to measure selected attributes of the predictions, but some attributes are susceptible to having their apparent performance artificially inflated by the presence of climate trends, thus rendering past performance an unreliable indicator of future performance. We describe a class of performance measures that are immune to such spurious skill. The way in which an ensemble prediction is interpreted also has strong implications for the apparent performance, so we give recommendations about how evaluation should be tailored to different interpretations. Finally, we explore the role of the timescale of the predictand in evaluation and suggest ways to describe the relationship between timescale and performance. The ideas in this paper are illustrated using decadal temperature hindcasts from the CMIP5 archive. © 2013 The Authors. Meteorological Applications published by John Wiley & Sons Ltd on behalf of the Royal Meteorological Society.