## 1. Introduction

[2] Like other mathematical models, conceptual rainfall-runoff models (CRRMs) are simplified representations of heterogeneous and complex systems. Since simplification and aggregation (be it spatial and/or temporal) form the backbone of the modeling process, our predictions can produce reasonable results at best but will not be very accurate. This limitation enhances the need to estimate the confidence we can attribute to model-based predictions. One has to quantify the expected divergence between the model predictions and reality.

[3] Uncertainty can arise at any stage of the modeling process. When describing observed output, one can distinguish between the uncertainty of input, of model structure and parameters, and of the observations. The classical approach considers parameter and observation uncertainty only and describes the deviations between the deterministic model output and observations by a random noise term corresponding to an assumed measurement error. This corresponds to a calibration procedure of minimizing the sum of the squared deviations between the deterministic model results and the measurements to determine parameter estimates. However, uncertainty affects CRRMs and other hydrological models in much more profound ways. Generally, there is substantial input uncertainty in the atmospheric drivers (i.e., precipitation and evapotranspiration) to the hydrological response of a catchment. Due to the scarcity of gauging stations compared to the spatial variability of precipitation, a perfect model would even fail to reproduce the real discharge due to the observation error. Additional uncertainty arises from necessary simplification and (spatial) aggregation. Such structural model errors affect the model predictions in a different way than purely random measurement errors. Similarly to input uncertainty, structural uncertainty produces model residuals that are auto-correlated in time, which leads to “wrong” representations of the internal state of the catchment (e.g., soil moisture status and groundwater levels). Because the hydrological response is generally state dependent, such errors also affect subsequent time steps of the CRRM predictions. A further general issue in hydrological modeling is the identifiability problem of model parameters. This problem stems from the fact that a CRRM calibration data set consisting of time series of atmospheric input and observed discharge (at a single or a few monitoring stations in most cases) is typically insufficient to identify all parameters. As a consequence, numerous parameter sets may yield simulations having similarly good agreement with the observed data. All these factors will typically lead to model residuals that have more complex statistical properties (i.e., autocorrelation, heteroscedasticity, heavy tails, and skewness) than the classical white noise error.

[4] Since discrepancies between the statistical assumptions and the true properties of model residuals result in biased parameters and unreliable prediction uncertainty intervals, several approaches have been developed to build more realistic statistical error models for rainfall-runoff simulations. The normality of residuals can be improved by the standard statistical procedure of transformation: power transforming both the model output and the measurements simultaneously reduces heteroscedasticity, skewness, and heavy tails [*Abdulla et al*., 1999; *Bates and Campbell*, 2001; *Demaria et al*., 2007; *Duan et al*., 2007; *Yang et al*., 2007; *Frey et al*., 2011]. High autocorrelation can be treated with an autoregressive error model [*Sorooshian and Dracup*, 1980; *Bates and Campbell*, 2001; *Yang et al*., 2007; *Frey et al*., 2011]. *Schoups and Vrugt* [2010] combined a deterministic bias correction with a heteroscedastic autoregressive process and a versatile Skew Exponential Power (SEP) distribution to build a universal, yet entirely statistical, error model.

[5] While these techniques allow us to make less restrictive and thus more realistic statistical assumptions on total model error, they yield practically no insight into the origin and propagation of uncertainty. This is especially true for input uncertainty, which has the most complex propagation mechanism. Therefore, Bayesian uncertainty assessment frameworks have been developed that are able to propagate errors through the nonlinear deterministic model [*Kuczera et al*., 2006; *Ajami et al*., 2007]. The Bayesian foundation enables the analyst to treat model parameters as stochastic variables and incorporate existing knowledge about them via prior distributions. The Bayesian Total Error Analysis (BATEA) [*Kavetski et al*., 2006] and Integrated Bayesian Uncertainty Estimator (IBUNE) [*Ajami et al*., 2007] uncertainty assessment concepts and the study by *Vrugt et al*. [2008] using the Differential Evolution Adaptive Metropolis (DREAM) sampler (we refer to that whole study hereinafter as DREAM) all provide methods to treat uncertainty in rainfall measurements. Considering input uncertainty adds complexity to these calibration methods. The BATEA and DREAM studies introduce storm-specific rainfall multipliers and infer them together with the model parameters. The technical difficulty lies in the fact that the number of estimated parameters becomes much larger due to the storm-specific parameters that make the sampling of the posterior more demanding. IBUNE applies a set of rainfall multipliers a priori drawn from a normal distribution, which are later shifted and scaled according to two additional input error parameters (the unknown mean and variance) for the estimation of the likelihood. Besides the estimation of parameter uncertainty, error propagation can support the detection of structural deficiencies of models by using time-variable parameters [*Reichert and Mieleitner*, 2009] or can quantify other sources of uncertainty to derive a more precise prediction of uncertainty intervals [*Renard et al*., 2010]. These frameworks offer flexibility as one can account for almost any desired uncertainty component, but this comes at the price of a high-computational burden and mathematical complexity.

[6] In such studies, there is still a “remnant error” that contains all uncertainty that has not been accounted for elsewhere in the error model. While the remnant error represents only a part of the total uncertainty, it can still show some statistical complexity, which obviously depends on the ability of the rest of the error model to describe all sources of uncertainty. In the worst case (a totally inappropriate error model), the remnant error can be identical to the model residuals. Despite this and likely due to the limited theoretical relevance of the remnant error, some error-propagation studies still assume that it is independent and normally distributed [*Kuczera et al*., 2006; *Renard et al*., 2010].

[7] The complexity of Bayesian uncertainty assessment frameworks prevents their widespread usage in cases when the exact description of error propagation is not absolutely required. *Götzinger and Bárdossy* [2008] have already attempted to provide a simple standalone error model that separates the effects of various sources of uncertainty. Nevertheless, they still assumed that the errors were independent and that the structural uncertainty was bound to the process sensitivities through a linear combination. This neglects that sensitivities derived from a potentially incorrect model structure are not guaranteed to reflect the true importance of the main hydrological processes.

[8] Therefore, the goal of this study is to develop a formal statistical error model that is able to account for the effects of all sources of uncertainty by emulating the key properties of error propagation through the CRRM. Such a method could bridge the gap between the fast yet typically unsatisfactory traditional statistical error models and the accurate yet computationally demanding mechanistic error propagating methods. In addition, the method could be also used alongside mechanistic error propagation to describe remnant errors. We inspect whether the new method can fulfill the requirements of reasonable speed and of statistical assumptions by comparing it to three existing Gaussian error models: (i) the traditional model of independent, normally distributed (measurement) errors (error model E), (ii) the first-order autoregressive models (error model B), and (iii) a recently introduced error model (B + E), which describes the residual series as the composite of systematic and independent error processes.