## 1. Introduction

[2] Uncertainty has always been inherent in water resources engineering and management. For example, in coastal and river flood defenses it was treated implicitly through conservative design rules, or explicitly by probabilistic characterization of meteorological events leading to extreme floods. Along with the recognition of the uncertainty of physical processes, the uncertainty analysis of models of these processes has become a popular research topic over the last decade. Rapid growth in computational power, the increased availability of distributed hydrological observations and an improved understanding of the physics and dynamics of water systems permit more complex and sophisticated models to be built. While these advances in principle lead to more accurate (less uncertain) models, at the same time if such complex (distributed) models with many parameters and data inputs are not parameterized properly or lack input data, they could be an inaccurate representation of reality. This prompts more studies into the model uncertainty of various types.

[3] The model errors are typically seen as the mismatch between the observed and the simulated system behavior. In the context of hydrological modeling they are unavoidable owing to the inherent uncertainties in the process. These uncertainties stem mainly from the four important sources [see, e.g., *Melching*, 1995; *Refsgaard and Storm*, 1996; *Gupta et al.*, 2005] and relate our understanding and measurement capabilities regarding the real-world system under study: (1) uncertainties in input data (e.g., precipitation and temperature); (2) uncertainties in data used for calibration, (e.g., output data such as streamflow); (3) uncertainties in model parameters; and (4) uncertainties due to imperfect model structure.

[4] Explicit recognition of uncertainty is not enough; in order to have this notion adopted by decision makers in water resources management, uncertainty should be properly estimated and communicated [*Pappenberger and Beven*, 2006]. The research community, however, has done quite a lot in moving toward the recognition of the necessity of complementing point forecasts of decision variables by the uncertainty estimates. Hence, there is a widening recognition of the necessity to (1) understand and identify of the sources of uncertainty; (2) quantify uncertainty; (3) evaluate the propagation of uncertainty through the models; and (4) find means to reduce uncertainty. A number of methods have been proposed in the literature to estimate model uncertainty. In general, these methods fall into six categories [see, e.g., *Montanari*, 2007; *Shrestha and Solomatine*, 2008]: (1) analytical methods [see, e.g., *Tung*, 1996], (2) approximation methods (e.g., first-order second moment method [*Melching*, 1992]), (3) simulation and sampling-based methods [e.g., *Kuczera and Parent*, 1998], (4) Bayesian methods (e.g., “generalized likelihood uncertainty estimation” (GLUE) by *Beven and Binley* [1992]), (5) methods based on the analysis of model errors [e.g., *Montanari and Brath*, 2004] and (6) methods based on fuzzy set theory [see, e.g., *Maskey et al.*, 2004].

[5] Most of the existing methods (e.g., categories 3 and 4) analyze the uncertainty of the uncertain input variables by propagating it through the deterministic model to the outputs, and hence requires the assumption of their distributions. Most of the approaches based on the analysis of the model errors require certain assumptions regarding the residuals (e.g., normality and homoscedasticity). Obviously, the relevancy and accuracy of such approaches depend on the validity of these assumptions. The fuzzy theory-based approach requires knowledge of the membership function of the quantity subject to the uncertainty which could be very subjective. Furthermore, the majority of the uncertainty methods deal only with a single source of uncertainty. For instance, Monte Carlo-based methods analyze the propagation of uncertainty of parameters (measured by the probability distribution function, pdf) to the pdf of the output. Similar types of analysis are performed for the input or structural uncertainty independently. Note that the methods based on the analysis of the model errors typically compute the uncertainty of the “optimal model” (i.e., the model with the calibrated parameters and the fixed structure), and not of the “class of models” (i.e., a group of models with the same structure but parameterized differently) as, for example, Monte Carlo methods do.

[6] The contribution of various sources of errors to the model error is typically not known and, as pointed out by *Gupta et al.* [2005], disaggregation of errors into their source components is often difficult, particularly in hydrology where models are nonlinear and different sources of errors may interact to produce the measured deviation. Nevertheless, evaluating the contribution of different sources of uncertainty to the overall uncertainties in model prediction is important, for instance, for understanding where the greatest sources of uncertainties reside, and, therefore directing efforts toward these sources [*Brown and Heuvelink*, 2005]. In general, relatively few studies have been conducted to investigate the interaction between different sources of uncertainty and their contributions to the total model uncertainty [*Engeland et al.*, 2005; *Gupta et al.*, 2005]. For the decision-making process, especially in water resources management, it is more important to know the total model uncertainty accounting for all sources of uncertainty than the uncertainty resulting from individual sources. Recently *Shrestha and Solomatine* [2006, 2008] presented the basis of a novel method to estimate the uncertainty of the optimal model that takes into account all sources of errors without attempting to disaggregate the contribution given by their individual sources. The approach is referred to as an “uncertainty estimation based on local errors and clustering” (UNEEC). The method uses clustering and machine learning techniques to estimate the uncertainty of a process model by analyzing its residuals (errors). The distribution of model error is conditioned on the input and possible state variables of the model including the lagged variable of the observed response variable. Since the pdf of the model error is estimated through empirical distribution, it is not necessary to make any assumption about residuals. The method is computationally efficient, and therefore can be easily applied to computationally demanding process models. The method described here is based on the concept of optimality instead of equifinality as it analyzes the historical model residuals resulting from the optimal model (both in structure and parameters). If compared to earlier publications, in this paper the UNEEC method is extended further by introducing several quantiles of the error distribution, another case study is considered, and the results are also compared to those produced by several methods of uncertainty estimation.