#### 2.1. Nature of Data and Structural Uncertainties

[14] There is a fundamental difference between the uncertainty in the data and the structural uncertainty in the CRR model itself.

[15] 1. Data uncertainty stems from sampling, measurement and interpretation errors in the observed input/output data. Since these errors arise independently from the CRR model, their properties (e.g., means and variances of rainfall and runoff errors) can, at least in principle, be estimated prior to the calibration by analysis of the data acquisition instruments and procedures. However, current practice seldom reports statistical measures of accuracy and precision of hydrological data (but see *Di Baldassarre and Montanari* [2009] and *Dottori et al.* [2009] for recent exceptions). This paper investigates the impact of this deficiency on the predictive capabilities of hydrological models and the decomposition of input and structural errors.

[16] 2. Structural uncertainty is an inherent feature of the CRR model: it is a consequence of the simplifying assumptions made in approximating the actual environmental system with a mathematical hypothesis. In general, the structural error of a CRR model depends on the model formulation (e.g., number and connectivity of stores, choice of constitutive functions, etc), on the specific catchment, and on the spatial and temporal scale of the analysis. Moreover it may vary from storm to storm, or on some other time scale. Since this uncertainty is poorly understood, specifying a meaningful prior for structural uncertainty, indeed, even formulating it mathematically, is problematic.

[17] In practice, uncertainties in the calibration data and its finite length necessarily translate into uncertainties in the estimated CRR parameters and other inferred quantities (in a Bayesian context, “posterior parameter uncertainty”). This would occur even for an exact model, but can be particularly pronounced when the model is approximate. In Bayesian (and frequentist) inferences, this “derived” parametric uncertainty declines as more data is included in the calibration. However, if the likelihood and/or priors are misspecified (which, as discussed in this paper, can be detected using posterior diagnostics), the posterior will be in error [also see *Mantovan and Todini*, 2006; *Beven et al.*, 2008]. Despite its asymptotic behavior, parametric uncertainty should not be ignored because it may contribute significantly to the total predictive uncertainty.

#### 2.2. Characterizing Structural Uncertainty

[18] This section outlines two broad classes of probabilistic approaches used in this paper for characterizing structural error. We also briefly survey alternative approaches.

[19] Traditional approaches treat the CRR model as deterministic and represent structural error using an exogenous term, usually additive. Several options are possible.

[20] A1. Lump output and structural errors into a single “residual” error term, defined as the difference between simulated and observed outputs, possibly after a transformation. This approach can be implemented both within schemes that ignore input errors (e.g., the standard least squares calibration), and within input error sensitive methodologies [e.g., *Kavetski et al.*, 2006a].

[21] A2. Represent output and structural errors using two separate terms, e.g., such that the difference between simulated and true outputs is structural error, while the difference between true and observed outputs is output error [e.g., *Huard and Mailhot*, 2008]. Though this allows using more specialized error models and priors, e.g., estimating streamflow uncertainty from independent gauge data, specifying a meaningful prior for structural errors remains problematic (see section 2.1).

[22] More recent approaches abandon the notion that CRR models are deterministic. This is motivated by the stochastic nature of errors arising from spatial and temporal averaging of distributed and heterogeneous model inputs and internal fluxes, which are unavoidable in lumped models. Several related approaches have been proposed.

[23] B1. Stochastic perturbations of the internal model states. This approach has been used in state space approaches, such as the Ensemble Kalman Filter (EnKf) [e.g., *Moradkhani et al.*, 2005].

[24] B2. Stochastic variation of one or more CRR parameters through time. This approach can be used with transfer function models estimated using instrumental variables [*Young*, 1998], or with general CRR models within BATEA [*Kuczera et al.*, 2006].

[26] In approaches A1–A2, the CRR model is deterministic in the sense that, given fixed inputs, parameters and initial conditions, it generates the same output. Conversely, in approaches B1–B3, the CRR model is viewed as stochastic: it generates a random output even for fixed inputs, parameters and initial conditions. More specifically, output randomness arises due to random variations of internal states (B1) or stochastic parameters (B2), or, more generally, due to probabilistic formulation of the model structure (B3).

[27] As a result, in approaches A1–A2, as posterior CRR parameter uncertainty declines, the CRR model predictions quickly become deterministic and the total predictive uncertainty is dominated by the exogenous error term. Conversely, in approaches B1–B3, the CRR predictions are inherently stochastic even if the posterior uncertainty in its parameters is negligible.

[28] Also note that approaches B1–B3 can be used to (implicitly or explicitly) reflect all sources of uncertainty, rather than just inadequacies of the model structure. Indeed, even when *intended* solely for structural errors, they may also capture at least some effects of data errors. This interaction is a key focus of our study.

[29] The list above is not exhaustive. Assuming that structural uncertainty is epistemic rather than strictly stochastic, some authors have abandoned the formal probabilistic framework, e.g., GLUE [*Beven and Binley*, 1992] and possibilistic methods [*Jacquin and Shamseldin*, 2007]. Yet even when structural errors are epistemic, i.e., arise as a consequence of lack of knowledge of catchment dynamics, they may still behave stochastically and be characterized using standard probability theory, in particular, Bayesian methods.

[30] Alternatively, Bayesian Model Averaging (BMA) approaches [e.g., *Duan et al.*, 2007; *Marshall et al.*, 2007] attempt to quantify structural uncertainty by combining the predictions of multiple CRR models. However, BMA's key assumption that the supplied set of models is complete is difficult to achieve and scrutinize in practice; it is unclear what the posterior predictive uncertainty actually represents when this assumption is not met.

[31] Consequently, the calibration methods investigated in this paper are based on the hypothesis that structural uncertainty, whatever its cause, can be described by an explicit probabilistic model that is then subjected to direct scrutiny.

#### 2.3. Prior Specification of Data and Structural Uncertainties

[32] A critical aspect of uncertainty quantification is the specification of the parameters of the data and structural error models (e.g., variances of rainfall and runoff errors, variance of structural errors).

[33] Early applications of BATEA [*Kavetski et al.*, 2006a] used fixed rainfall error parameters, while *Huard and Mailhot* [2008] used fixed input/output/structural error parameters. In Bayesian theory, this corresponds to the strongest possible prior (parameters known exactly) and would be appropriate if the statistical properties of the errors were well understood. Since this remains a challenge in hydrology, a more general formulation of BATEA treats the error model parameters as unknown quantities that are inferred along with CRR parameters and other quantities of interest [*Kuczera et al.*, 2006]. This corresponds to weaker (more vague) priors.

[34] A major practical question considered in this paper is the accuracy and precision of prior information needed for (1) meaningful estimation of the total predictive uncertainty and (2) accurate attribution of the predictive uncertainty to individual sources. The influence of the priors on the reliability of the inference is of critical practical significance because it motivates the development of accurate and precise independent prior knowledge, e.g., based on densely gauged experimental basins, etc.