Measuring forecast skill: is it real skill or is it the varying climatology?
Article first published online: 1 MAY 2007
Copyright © 2006 Royal Meteorological Society
Quarterly Journal of the Royal Meteorological Society
Volume 132, Issue 621C, pages 2905–2923, October 2006 Part C
How to Cite
Hamill, T. M. and Juras, J. (2006), Measuring forecast skill: is it real skill or is it the varying climatology?. Q.J.R. Meteorol. Soc., 132: 2905–2923. doi: 10.1256/qj.06.25
- Issue published online: 1 MAY 2007
- Article first published online: 1 MAY 2007
- Manuscript Revised: 16 MAY 2006
- Manuscript Received: 14 FEB 2005
- Brier skill score;
- Contingency tables;
- Ensemble forecasting;
- Equitable threat score;
- Forecast verification;
- Probabilistic weather forecasts;
- Relative operating characteristic
It is common practice to summarize the skill of weather forecasts from an accumulation of samples spanning many locations and dates. In calculating many of these scores, there is an implicit assumption that the climatological frequency of event occurrence is approximately invariant over all samples. If the event frequency actually varies among the samples, the metrics may report a skill that is different from that expected. Many common deterministic verification metrics, such as threat scores, are prone to mis-reporting skill, and probabilistic forecast metrics such as the Brier skill score and relative operating characteristic skill score can also be affected.
Three examples are provided that demonstrate unexpected skill, two from synthetic data and one with actual forecast data. In the first example, positive skill was reported in a situation where metrics were calculated from a composite of forecasts that were comprised of random draws from the climatology of two distinct locations. As the difference in climatological event frequency between the two locations was increased, the reported skill also increased. A second example demonstrates that when the climatological event frequency varies among samples, the metrics may excessively weight samples with the greatest observational uncertainty. A final example demonstrates unexpectedly large skill in the equitable threat score of deterministic precipitation forecasts.
Guidelines are suggested for how to adjust skill computations to minimize these effects. Copyright © 2006 Royal Meteorological Society