Get access

A refined index of model performance: a rejoinder



Willmott et al. [Willmott CJ, Robeson SM, Matsuura K. 2012. A refined index of model performance. International Journal of Climatology, forthcoming. DOI:10.1002/joc.2419.] recently suggest a refined index of model performance (dr) that they purport to be superior to other methods. Their refined index ranges from − 1.0 to 1.0 to resemble a correlation coefficient, but it is merely a linear rescaling of our modified coefficient of efficiency (E1) over the positive portion of the domain of dr. We disagree with Willmott et al. (2012) that dr provides a better interpretation; rather, E1 is more easily interpreted such that a value of E1 = 1.0 indicates a perfect model (no errors) while E1 = 0.0 indicates a model that is no better than the baseline comparison (usually the observed mean). Negative values of E1 (and, for that matter, dr < 0.5) indicate a substantially flawed model as they simply describe a ‘level of inefficacy’ for a model that is worse than the comparison baseline. Moreover, while dr is piecewise continuous, it is not continuous through the second and higher derivatives. We explain why the coefficient of efficiency (E or E2) and its modified form (E1) are superior and preferable to many other statistics, including dr, because of intuitive interpretability and because these indices have a fundamental meaning at zero.

We also expand on the discussion begun by Garrick et al. [Garrick M, Cunnane C, Nash JE. 1978. A criterion of efficiency for rainfall-runoff models. Journal of Hydrology 36: 375-381.] and continued by Legates and McCabe [Legates DR, McCabe GJ. 1999. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resources Research 35(1): 233-241.] and Schaefli and Gupta [Schaefli B, Gupta HV. 2007. Do Nash values have value? Hydrological Processes 21: 2075-2080. DOI: 10.1002/hyp.6825.]. This important discussion focuses on the appropriate baseline comparison to use, and why the observed mean often may be an inadequate choice for model evaluation and development. Copyright © 2012 Royal Meteorological Society

Get access to the full text of this article