## 1. Introduction

### 1.1. Hydrological Modeling in the Presence of Rainfall and Runoff Errors

[2] Data and model errors conspire to make reliable and robust calibration of hydrological models a difficult task. Consequently, a multitude of paradigms for model estimation and prediction have been proposed and used over the last few decades, ranging from optimization approaches to probabilistic inference schemes (e.g., see the review by *Moradkhani and Sorooshian* [2008]).

[3] The use of rain gauges to estimate catchment average precipitation is currently prevalent in hydrological modeling [*Moulin et al.*, 2009]. A major source of uncertainty is then the poor representativeness of an often small set of gauges of the entire areal rain field, which is highly variable in both space and time [e.g., *Severino and Alpuim*, 2005; *Villarini et al.*, 2008]. The rain gauges themselves are subject to both systematic and random measurement errors, including mechanical limitations, wind effects, and evaporation losses, all of which are design specific and can vary substantially with rainfall intensity [*Molini et al.*, 2005]. Methods for quantifying rainfall uncertainty include geostatistical approaches such as kriging [e.g., *Goovaerts*, 2000; *Kuczera and Williams*, 1992] and conditional simulation [e.g., *Clark and Slater*, 2006; *Gotzinger and Bardossy*, 2008; *Onibon et al.*, 2004; *Vischel et al.*, 2009] or approaches based on dense rain gauge networks [e.g., *Villarini et al.*, 2008; *Willems*, 2001].

[4] Similarly, runoff data also contain significant observational errors because of discharge gauging errors, extrapolation of rating curves, unsteady flow conditions, flow regime hysteresis, and temporal changes in the channel properties. Several approaches have been proposed to quantify this uncertainty [e.g., *Di Baldassarre and Montanari*, 2009; *Herschy*, 1994; *Lang et al.*, 2010; *McMillan et al.*, 2010; *Reitan and Petersen-Overleir*, 2009].

[5] Finally, the characterization of structural uncertainty is a particularly challenging task, and the hydrological community has yet to agree on suitable definitions and approaches for handling structural model errors in the context of model calibration (e.g., see the conceptualizations proposed by *Beven* [2005], *Doherty and Welter* [2010], and *Kuczera et al.* [2006]).

### 1.2. Decomposing Predictive Uncertainty

[6] The focus of this paper is on the decomposition of the total uncertainty in hydrological predictions into its contributing sources. This is important in several scientific and operational contexts:

[7] 1. In operational prediction, separating data and structural uncertainties is important when data of differing quality are used in calibration and prediction.

[8] 2. Separating data and structural uncertainties also enables a more meaningful model comparison because structural errors are not obscured by data uncertainty.

[9] 3. Insights into the relative contributions of data and model structural errors may be useful when a calibrated model is transferred to a different catchment (prediction in ungauged basins). In addition, potential relationships between catchment characteristics and hydrological model parameters may be hidden or biased by data errors.

[10] 4. Insights into the relative contributions of individual sources of error suggest strategic guidance for reducing total predictive uncertainty. It helps in more informed research and experimental resource allocation, and, importantly, allow a meaningful a posteriori evaluation of these efforts.

[11] Uncertainty decomposition has a considerable history in the hydrologic forecasting community. For example, the Bayesian forecasting system (BFS) [*Krzysztofowicz*, 1999, 2002] distinguishes between two sources of uncertainties in hydrologic forecasts: (1) “input uncertainty” refers to the uncertainty in forecasting an unknown future rainfall, and (2) “hydrologic uncertainty” collectively refers to all other uncertainties, in particular structural errors of the hydrologic model, parameter estimation errors, input-output measurement and sampling errors [*Krzysztofowicz*, 1999].

[12] This description highlights a major difference between the uncertainty decomposition in forecasting mode versus the decomposition in prediction mode. In the former, input uncertainty is due to forecast errors, while in the latter, input uncertainty is due to errors in the estimation of areal rainfall using observations. Note that the word prediction is used here to denote an application where the hydrologic model is forced with observed inputs (as opposed to forecasted inputs).

[13] This paper focuses on decomposing uncertainty in the prediction context. This can be viewed as an attempt to further decompose what is termed “hydrologic uncertainty” in *Krzysztofowicz*'s [1999] BFS framework. Although *Seo et al.* [2006] discussed the potential benefits of such an additional decomposition, it is usually not viewed as a major objective because at least for forecast lead times exceeding the routing time of the catchment, rainfall forecast uncertainty will usually dominate other sources of error [*Krzysztofowicz*, 1999]. However, the situation is different in a prediction context, where no rainfall forecast is involved. In this case, the relative contributions of rainfall, runoff, and structural errors to the total predictive uncertainty are unclear and likely case specific.

[14] In a prediction context, attempts to decompose the total uncertainty into its three main sources have been made using several related methods. Multiple studies have employed recursive data assimilation methods such as extended and ensemble Kalman filters [*Evensen*, 1994; *Moradkhani et al.*, 2005b; *Rajaram and Georgakakos*, 1989; *Reichle et al.*, 2002; *Vrugt et al.*, 2005] or Bayesian filtering [*Moradkhani et al.*, 2005a, 2006; *Salamon and Feyen*, 2009; *Smith et al.*, 2008; *Weerts and El Serafy*, 2006]. In this paper, we consider Bayesian hierarchical approaches [e.g., *Huard and Mailhot*, 2008; *Kuczera et al.*, 2006], which to date have been implemented in batch estimation form (but can also be formulated recursively). While the distinction between recursive versus batch processing strategies is important from the computational perspective, our focus here is on the fundamental issues of the derivation of informative error models and their incorporation into the inference framework.

### 1.3. Specifying Data and Structural Error Models

[15] Although the importance of adequate descriptions of input, output, and structural errors is well known, developing quantitative error models is a considerable challenge in hydrological applications. In particular, assigning reasonable values to the variances of rainfall and runoff errors is notoriously difficult [e.g., *Huard and Mailhot*, 2008; *Reichle*, 2008; *Weerts and El Serafy*, 2006]. The characterization of structural errors of hydrological models is also a major research challenge (e.g., see the discussions by *Beven* [2005], *Doherty and Welter* [2010], and *Renard et al.* [2010]).

[16] As a result, it is currently common to use rule-of-thumb or literature values to fully specify the input, output, and structural error models and keep their parameters fixed during the hydrological model calibration. For example, *Huard and Mailhot* [2008] used literature values for rainfall errors and rule-of-thumb values for structural errors (∼15% standard error). Similarly, *Salamon and Feyen* [2010] used literature values for runoff errors (∼12.5% standard error for large runoff) and rule-of-thumb values for rainfall and structural errors (∼15% standard error).

[17] However, recent empirical and theoretical evidence reemphasizes the need for reliable descriptions of uncertainties in both the forcing and response data if a meaningful decomposition of predictive uncertainty is required [e.g., *Huard and Mailhot*, 2008; *Renard et al.*, 2010]. Since the inference can be sensitive to these specifications [*Renard et al.*, 2010; *Weerts and El Serafy*, 2006], using an unreliable error model will generally yield an unreliable uncertainty decomposition. Hence, using literature values from other studies may not always be adequate. For instance, rating curve errors depend on the hydraulic configuration of the gauging section, the number of stage-discharge measurements, the degree of extrapolation, etc., all of which are site specific. Similarly, structural errors of a hydrological model are likely to depend on the catchment, time period, etc., and are difficult to estimate a priori.

[18] An alternative to fixing the error model parameters a priori is to include them in the inference. For instance, the variance of rainfall errors can be estimated during hydrological model calibration, rather than being fixed a priori. Although this distinction may appear a superficial technicality, it is highly pertinent to the inference in the presence of multiple sources of errors [*Huard and Mailhot*, 2008; *Renard et al.*, 2010; *Weerts and El Serafy*, 2006]. In particular, fixing the error model parameters to incorrect values may yield a computationally tractable, yet statistically unreliable inference. On the other hand, the information content of the data may not be sufficient to support the inference of the error model parameters.

[19] The approach of inferring the error model parameters was used in the studies of *Kavetski et al.* [2006c], *Reichert and Mieleitner* [2009], and *Thyer et al.* [2009]. However, these studies did not attempt to fully decompose predictive uncertainty. *Kuczera et al.* [2006] attempted to simultaneously infer rainfall and structural errors but limited themselves to point estimates of inferred quantities, thus leaving open questions regarding parameter identifiability and posterior well posedness. More recently, *Renard et al.* [2010] and *Kuczera et al.* [2010b] quantitatively demonstrated the difficulties of simultaneously identifying rainfall and structural errors from rainfall-runoff data when only vague estimates of data uncertainty are known prior to the hydrological model calibration. This result confirms the earlier discussions by *Beven* [2005, 2006] of potential interactions between multiple sources of error. However, *Renard et al.* [2010] also illustrated that the use of more precise (though still inexact) statistical descriptions of data errors makes the posterior distribution well posed.

[20] It is therefore vital that priors on individual sources of error reflect actual knowledge, rather than be used as mere numerical tricks to achieve well posedness. Given the difficulty of obtaining prior estimates of structural errors (especially for highly conceptualized rainfall-runoff models), it may be more practical to first focus on the observational uncertainty in the rainfall-runoff data. Provided the data error models are reliable, they can achieve closure on the total errors and can allow reliably estimating structural errors as “what remains” once data errors are accounted for.

### 1.4. Study Aims

[21] The aims of this paper are the following: (1) demonstrate the development and incorporation of uncertainty models for forcing and response data into a Bayesian methodology for hydrological calibration and prediction, (2) examine the resulting improvements in the predictive performance, (3) evaluate whether using informative models for data errors enables inference of structural errors as part of the model calibration process, and (4) evaluate the ability of the inference to provide quantitative insights into the relative contributions of individual sources of uncertainty. Point 3 is of primary importance because of the intrinsic difficulty in defining structural error models a priori. This constitutes a major contribution of this paper since previous attempts at isolating the contribution of structural errors to predictive uncertainty [*Huard and Mailhot*, 2008; *Salamon and Feyen*, 2010] were based on assuming known parameters of the structural error model.

[22] This paper uses the Bayesian total error analysis (BATEA) [*Kavetski et al.*, 2002, 2006b; *Kuczera et al.*, 2006]. The Bayesian foundation of BATEA, in particular, its ability to exploit quantitative (though potentially vague) probabilistic insights into individual sources of error, makes it well suited for using independent knowledge to improve parameter inference and predictions and to quantify individual contributions to predictive uncertainties. However, the development of realistic error models for rainfall and runoff errors is of general interest for any method aiming at decomposing the predictive uncertainty into its three main contributive sources.

[23] Here the rainfall error model is developed using a geostatistical analysis of the rain gauge network coupled with condition simulation (CS) [e.g., *Vischel et al.*, 2009]. For the runoff data, the rating curve data and stage-discharge measurements are used to derive a heteroscedastic error model [*Thyer et al.*, 2009]. The BATEA framework is then used to explore different calibration schemes for integrating observational uncertainty into the inference and to evaluate their influence on calibration and validation, focusing on objectives 2–4.

[24] This work is innovative in several aspects. First, while the characterization of rainfall errors has received considerable attention [e.g., *Krajewski et al.*, 2003; *Villarini et al.*, 2008], a comprehensive integration of this knowledge within a Bayesian statistical inference for hydrological models has yet to be demonstrated in a real catchment case study. More generally, the integration of independently derived data error models into a Bayesian framework for probabilistic predictions and a stringent verification and refinement of all error models are of increasing interest not just in hydrology but elsewhere in environmental sciences [e.g., *Cressie et al.*, 2009]. Finally, a systematic disaggregation of predictive uncertainty into its contributing components in realistic case studies is only in its nascence. Previous studies in this area [e.g., *Huard and Mailhot*, 2008; *Salamon and Feyen*, 2010] were based on assuming known fixed values for the structural error parameters, which is hardly tenable, as discussed in section 1.3.

[25] Second, this study further develops the BATEA approach. Previous applications of BATEA focused primarily on rainfall errors and lacked a separate characterization of structural errors [*Kavetski et al.*, 2006a; *Thyer et al.*, 2009]. *Kuczera et al.* [2006] explored separate specifications of rainfall, runoff, and structural errors but did not use informative priors on the parameters of their error models nor carried out a full Bayesian treatment of the posterior distribution (they limited themselves to finding the posterior mode only). *Renard et al.* [2010] illustrated, on the basis of synthetic experiments, the necessity of deriving reliable and precise prior descriptions of data errors to achieve well-posed inferences. The present paper builds on the latter work and proposes a practical strategy toward these objectives. Moreover, it explicitly demonstrates the utility of independent rainfall error analysis for improving the predictive reliability and for gaining quantitative and qualitative insights into the contribution of different sources of errors in hydrological prediction.

### 1.5. Outline of Presentation

[26] The Bayesian inference framework is outlined in section 2. Section 3 describes the specific data and methods used in this case study: the hydrological model and catchment data are described in section 3.1; section 3.2 describes the geostatistical rain gauge analysis, the development of an error model for the catchment average rainfall data, and its incorporation into the Bayesian inference; section 3.3 describes the runoff error model, and section 3.4 discusses the treatment of structural errors. Section 4 presents the results of a case study that evaluates the utility of this information in improving the quantification and decomposition of the runoff predictive uncertainty, with an emphasis on posterior scrutiny of the hypotheses made during calibration. The results are discussed in section 5, followed by a summary of key conclusions in section 6.