Understanding forecast verification statistics
Article first published online: 20 MAR 2008
Copyright © 2008 Royal Meteorological Society
Special Issue: Forecast Verification
Volume 15, Issue 1, pages 31–40, March 2008
How to Cite
Mason, S. J. (2008), Understanding forecast verification statistics. Met. Apps, 15: 31–40. doi: 10.1002/met.51
- Issue published online: 20 MAR 2008
- Article first published online: 20 MAR 2008
- Manuscript Accepted: 2 JAN 2008
- Manuscript Revised: 6 DEC 2007
- Manuscript Received: 18 SEP 2007
- National Oceanic and Atmospheric Administration. Grant Number: AN07GP0213
- 2002. Categorical Data Analysis, 2nd edn. Wiley-Interscience: Hoboken; 734. .
- 2007. An Introduction to Categorical Data Analysis, 2nd edn. Wiley-Interscience: Hoboken; 372. .
- 1993. A degeneracy in cross-validated skill in regression-based forecasts. Journal of Climate 6: 963–977. , .
- 2007. Scoring probabilistic forecasts: the importance of being proper. Weather and Forecasting 22: 382–388. , .
- 1991. Use of statistical methods in the search for teleconnections. Teleconnections Linking Worldwide Climate Anomalies: Scientific Basis and Societal Impact. Cambridge University Press: Cambridge; 371–400. , .
- 1983. Effects of sampling errors in statistical estimation. Deep-Sea Research 30: 1083–1103. .
- 1985. Sensitivity of verification scores to the classification of the predictand. Monthly Weather Review 113: 1384–1392. .
- 1976. Predictability of sea surface temperature and sea level pressure anomalies over the North Pacific Ocean. Journal of Physical Oceanography 6: 249–266. .
- 2006. Field significance revisited: spatial bias errors in forecasts as applied to the Eta model. Monthly Weather Review 134: 519–534. , , .
- 1994. Assessing forecast skill through cross validation. Weather and Forecasting 9: 619–624. , .
- 1969. A scoring system for probability forecasts of ranked categories. Journal of Applied Meteorology 8: 985–987. .
- 1983. Effective scoring rules for probabilistic forecasts. Management Science 29: 447–454. .
- 1992. Equitable skill scores for categorical forecasts. Monthly Weather Review 120: 361–370. , .
- 1975. The predictive sample reuse method with applications. Journal of the American Statistical Association 70: 320–328. .
- 1992. A note on Gandin and Murphy's equitable skill score. Monthly Weather Review 120: 2707–2712. .
- 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102: 359–378. , .
- 2006. Measuring forecast skill: is it real skill or is it the varying climatology? Quarterly Journal of the Royal Meteorological Society 132: 2905–2923. , .
- 1996. Coupled model predictions of ENSO during the 1980s and the 1990s at the National Centers for Environmental Prediction. Journal of Climate 9: 3105–3120. , , .
- 2004. P stands for …. Weather 59: 77–79. .
- 2007. Uncertainty and inference for verification measures. Weather and Forecasting 22: 637–650. .
- 2008. The impenetrable hedge: a note on propriety, equitability and consistency. Meteorological Applications 15: 25–29. .
- 2003. Introduction. Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley: Chichester; 1–12. , .
- 2008. Proper scores for probability forecasts can never be equitable. Monthly Weather Review in press. , .
- 1988. Use of cross correlations in the search for teleconnections. Journal of Climatology 8: 241–253. .
- 1991. The problem of multiplicity in research on teleconnections. International Journal of Climatology 11: 505–513. , .
- 1998. Decadal variability in ENSO predictability and prediction. Journal of Climate 11: 2804–2822. , .
- 2006. Resampling methods for spatial region models under a class of stochastic designs. Annals of Statistics 34: 1774–1813. , .
- 1983. Statistical field significance and its determination by Monte Carlo techniques. Monthly Weather Review 111: 46–59. , .
- 2003. Binary events. Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley: Chichester; 37–76. .
- 2004. On using “climatology” as a reference strategy in the Brier and ranked probability skill scores. Monthly Weather Review 132: 1891–1895. .
- 2002. Areas beneath the relative operating characteristics (ROC) and levels (ROL) curves: statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society 128: 2145–2166. , .
- 2002. Comparison of some statistical methods of probabilistic forecasting of ENSO. Journal of Climate 15: 8–29. , .
- 2008. How can we know whether the forecasts are any good? Seasonal Climate Variability: Forecasting and Managing Risk. Kluwer Academic Publishers: Dordrecht, in press. , .
- 2008. Locality and the ranked probability skill score. Monthly Weather Review Submitted to. , , , .
- 1987. Cross-validation in statistical climate forecast models. Journal of Climate and Applied Meteorology 26: 1589–1600. .
- 1969. On the “ranked probability score”. Journal of Applied Meteorology 8: 988–989. .
- 1970. The ranked probability score and the probability score: a comparison. Monthly Weather Review 98: 917–924. .
- 1971. A note on the ranked probability score. Journal of Applied Meteorology 10: 155–156. .
- 1973. A new vector partition of the probability score. Journal of Applied Meteorology 12: 595–600. .
- 1991. Forecast verification: its complexity and dimensionality. Monthly Weather Review 119: 1590–1601. .
- 1993. What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting 8: 281–293. .
- 1996. The Finley affair: a signal event in the history of forecast verification. Weather and Forecasting 11: 3–20. .
- 1987. A general framework for forecast verification. Monthly Weather Review 115: 1330–1338. , .
- 1985. Should scoring rules be effective? Management Science 31: 527–535. .
- 2001. The insignificance of significance testing. Bulletin of the American Meteorological Society 82: 981–986. .
- 1996. Revised “LEPS” scores for assessing climate model simulations and long-range forecasts. Journal of Climate 9: 34–53. , , , .
- 1980. Inflation of R2 in best subset regression. Technometrics 22: 49–53. , .
- 1999. On cross validation for model selection. Neural Computation 11: 863–870. , .
- 2007. Performance targets and the Brier score. Meteorological Applications 14: 185–194. .
- 2002. Evaluating probabilistic forecasts using information theory. Monthly Weather Review 130: 1653–1660. , .
- 1993. Linear model selection by cross-validation. Journal of the American Statistical Association 88: 486–494. .
- 2007. Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman and Hall/CRC: Boca Raton; 1776. .
- 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society 36B: 111–147. .
- 2004. Controlling the proportion of falsely rejected hypotheses when conducting multiple tests with climatological data. Journal of Climate 17: 4343–4356. , , .
- 1991. Prediction of seasonal rainfall in the north Nordeste of Brazil using eigenvectors of sea-surface temperatures. International Journal of Climatology 11: 711–743. , .
- 1981. Tests of significance in forward selection regression with an F-to-enter stopping rule. Technometrics 23: 377–380. , .
- 1997. Resampling hypothesis tests for autocorrelated fields. Journal of Climate 10: 65–82. .
- 1998. Multisite generalizations of a daily stochastic precipitation generation model. Journal of Hydrology 210: 178–191. .
- 2006a. Statistical Methods in the Atmospheric Sciences, 2nd edn. Academic Press: San Diego; 627. .
- 2006b. On “field significance” and the false discovery rate. Journal of Applied Meteorology and Climatology 45: 1181–1189. .
- 1999. A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review 127: 956–970. , , .
- 2001. Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems 56: 1–11. , .
- 1987. Statistical considerations for climate experiments. Part II: multivariate tests. Journal of Climate and Applied Meteorology 26: 477–487. .