This study explores the decomposition of predictive uncertainty in hydrological modeling into its contributing sources. This is pursued by developing data-based probability models describing uncertainties in rainfall and runoff data and incorporating them into the Bayesian total error analysis methodology (BATEA). A case study based on the Yzeron catchment (France) and the conceptual rainfall-runoff model GR4J is presented. It exploits a calibration period where dense rain gauge data are available to characterize the uncertainty in the catchment average rainfall using geostatistical conditional simulation. The inclusion of information about rainfall and runoff data uncertainties overcomes ill-posedness problems and enables simultaneous estimation of forcing and structural errors as part of the Bayesian inference. This yields more reliable predictions than approaches that ignore or lump different sources of uncertainty in a simplistic way (e.g., standard least squares). It is shown that independently derived data quality estimates are needed to decompose the total uncertainty in the runoff predictions into the individual contributions of rainfall, runoff, and structural errors. In this case study, the total predictive uncertainty appears dominated by structural errors. Although further research is needed to interpret and verify this decomposition, it can provide strategic guidance for investments in environmental data collection and/or modeling improvement. More generally, this study demonstrates the power of the Bayesian paradigm to improve the reliability of environmental modeling using independent estimates of sampling and instrumental data uncertainties.
1.1. Hydrological Modeling in the Presence of Rainfall and Runoff Errors
 Data and model errors conspire to make reliable and robust calibration of hydrological models a difficult task. Consequently, a multitude of paradigms for model estimation and prediction have been proposed and used over the last few decades, ranging from optimization approaches to probabilistic inference schemes (e.g., see the review by Moradkhani and Sorooshian ).
 Finally, the characterization of structural uncertainty is a particularly challenging task, and the hydrological community has yet to agree on suitable definitions and approaches for handling structural model errors in the context of model calibration (e.g., see the conceptualizations proposed by Beven , Doherty and Welter , and Kuczera et al. ).
1.2. Decomposing Predictive Uncertainty
 The focus of this paper is on the decomposition of the total uncertainty in hydrological predictions into its contributing sources. This is important in several scientific and operational contexts:
 1. In operational prediction, separating data and structural uncertainties is important when data of differing quality are used in calibration and prediction.
 2. Separating data and structural uncertainties also enables a more meaningful model comparison because structural errors are not obscured by data uncertainty.
 3. Insights into the relative contributions of data and model structural errors may be useful when a calibrated model is transferred to a different catchment (prediction in ungauged basins). In addition, potential relationships between catchment characteristics and hydrological model parameters may be hidden or biased by data errors.
 4. Insights into the relative contributions of individual sources of error suggest strategic guidance for reducing total predictive uncertainty. It helps in more informed research and experimental resource allocation, and, importantly, allow a meaningful a posteriori evaluation of these efforts.
 Uncertainty decomposition has a considerable history in the hydrologic forecasting community. For example, the Bayesian forecasting system (BFS) [Krzysztofowicz, 1999, 2002] distinguishes between two sources of uncertainties in hydrologic forecasts: (1) “input uncertainty” refers to the uncertainty in forecasting an unknown future rainfall, and (2) “hydrologic uncertainty” collectively refers to all other uncertainties, in particular structural errors of the hydrologic model, parameter estimation errors, input-output measurement and sampling errors [Krzysztofowicz, 1999].
 This description highlights a major difference between the uncertainty decomposition in forecasting mode versus the decomposition in prediction mode. In the former, input uncertainty is due to forecast errors, while in the latter, input uncertainty is due to errors in the estimation of areal rainfall using observations. Note that the word prediction is used here to denote an application where the hydrologic model is forced with observed inputs (as opposed to forecasted inputs).
 This paper focuses on decomposing uncertainty in the prediction context. This can be viewed as an attempt to further decompose what is termed “hydrologic uncertainty” in Krzysztofowicz's  BFS framework. Although Seo et al.  discussed the potential benefits of such an additional decomposition, it is usually not viewed as a major objective because at least for forecast lead times exceeding the routing time of the catchment, rainfall forecast uncertainty will usually dominate other sources of error [Krzysztofowicz, 1999]. However, the situation is different in a prediction context, where no rainfall forecast is involved. In this case, the relative contributions of rainfall, runoff, and structural errors to the total predictive uncertainty are unclear and likely case specific.
 As a result, it is currently common to use rule-of-thumb or literature values to fully specify the input, output, and structural error models and keep their parameters fixed during the hydrological model calibration. For example, Huard and Mailhot  used literature values for rainfall errors and rule-of-thumb values for structural errors (∼15% standard error). Similarly, Salamon and Feyen  used literature values for runoff errors (∼12.5% standard error for large runoff) and rule-of-thumb values for rainfall and structural errors (∼15% standard error).
 However, recent empirical and theoretical evidence reemphasizes the need for reliable descriptions of uncertainties in both the forcing and response data if a meaningful decomposition of predictive uncertainty is required [e.g., Huard and Mailhot, 2008; Renard et al., 2010]. Since the inference can be sensitive to these specifications [Renard et al., 2010; Weerts and El Serafy, 2006], using an unreliable error model will generally yield an unreliable uncertainty decomposition. Hence, using literature values from other studies may not always be adequate. For instance, rating curve errors depend on the hydraulic configuration of the gauging section, the number of stage-discharge measurements, the degree of extrapolation, etc., all of which are site specific. Similarly, structural errors of a hydrological model are likely to depend on the catchment, time period, etc., and are difficult to estimate a priori.
 An alternative to fixing the error model parameters a priori is to include them in the inference. For instance, the variance of rainfall errors can be estimated during hydrological model calibration, rather than being fixed a priori. Although this distinction may appear a superficial technicality, it is highly pertinent to the inference in the presence of multiple sources of errors [Huard and Mailhot, 2008; Renard et al., 2010; Weerts and El Serafy, 2006]. In particular, fixing the error model parameters to incorrect values may yield a computationally tractable, yet statistically unreliable inference. On the other hand, the information content of the data may not be sufficient to support the inference of the error model parameters.
 The approach of inferring the error model parameters was used in the studies of Kavetski et al. [2006c], Reichert and Mieleitner , and Thyer et al. . However, these studies did not attempt to fully decompose predictive uncertainty. Kuczera et al.  attempted to simultaneously infer rainfall and structural errors but limited themselves to point estimates of inferred quantities, thus leaving open questions regarding parameter identifiability and posterior well posedness. More recently, Renard et al.  and Kuczera et al. [2010b] quantitatively demonstrated the difficulties of simultaneously identifying rainfall and structural errors from rainfall-runoff data when only vague estimates of data uncertainty are known prior to the hydrological model calibration. This result confirms the earlier discussions by Beven [2005, 2006] of potential interactions between multiple sources of error. However, Renard et al.  also illustrated that the use of more precise (though still inexact) statistical descriptions of data errors makes the posterior distribution well posed.
 It is therefore vital that priors on individual sources of error reflect actual knowledge, rather than be used as mere numerical tricks to achieve well posedness. Given the difficulty of obtaining prior estimates of structural errors (especially for highly conceptualized rainfall-runoff models), it may be more practical to first focus on the observational uncertainty in the rainfall-runoff data. Provided the data error models are reliable, they can achieve closure on the total errors and can allow reliably estimating structural errors as “what remains” once data errors are accounted for.
1.4. Study Aims
 The aims of this paper are the following: (1) demonstrate the development and incorporation of uncertainty models for forcing and response data into a Bayesian methodology for hydrological calibration and prediction, (2) examine the resulting improvements in the predictive performance, (3) evaluate whether using informative models for data errors enables inference of structural errors as part of the model calibration process, and (4) evaluate the ability of the inference to provide quantitative insights into the relative contributions of individual sources of uncertainty. Point 3 is of primary importance because of the intrinsic difficulty in defining structural error models a priori. This constitutes a major contribution of this paper since previous attempts at isolating the contribution of structural errors to predictive uncertainty [Huard and Mailhot, 2008; Salamon and Feyen, 2010] were based on assuming known parameters of the structural error model.
 This paper uses the Bayesian total error analysis (BATEA) [Kavetski et al., 2002, 2006b; Kuczera et al., 2006]. The Bayesian foundation of BATEA, in particular, its ability to exploit quantitative (though potentially vague) probabilistic insights into individual sources of error, makes it well suited for using independent knowledge to improve parameter inference and predictions and to quantify individual contributions to predictive uncertainties. However, the development of realistic error models for rainfall and runoff errors is of general interest for any method aiming at decomposing the predictive uncertainty into its three main contributive sources.
 Here the rainfall error model is developed using a geostatistical analysis of the rain gauge network coupled with condition simulation (CS) [e.g., Vischel et al., 2009]. For the runoff data, the rating curve data and stage-discharge measurements are used to derive a heteroscedastic error model [Thyer et al., 2009]. The BATEA framework is then used to explore different calibration schemes for integrating observational uncertainty into the inference and to evaluate their influence on calibration and validation, focusing on objectives 2–4.
 This work is innovative in several aspects. First, while the characterization of rainfall errors has received considerable attention [e.g., Krajewski et al., 2003; Villarini et al., 2008], a comprehensive integration of this knowledge within a Bayesian statistical inference for hydrological models has yet to be demonstrated in a real catchment case study. More generally, the integration of independently derived data error models into a Bayesian framework for probabilistic predictions and a stringent verification and refinement of all error models are of increasing interest not just in hydrology but elsewhere in environmental sciences [e.g., Cressie et al., 2009]. Finally, a systematic disaggregation of predictive uncertainty into its contributing components in realistic case studies is only in its nascence. Previous studies in this area [e.g., Huard and Mailhot, 2008; Salamon and Feyen, 2010] were based on assuming known fixed values for the structural error parameters, which is hardly tenable, as discussed in section 1.3.
 Second, this study further develops the BATEA approach. Previous applications of BATEA focused primarily on rainfall errors and lacked a separate characterization of structural errors [Kavetski et al., 2006a; Thyer et al., 2009]. Kuczera et al.  explored separate specifications of rainfall, runoff, and structural errors but did not use informative priors on the parameters of their error models nor carried out a full Bayesian treatment of the posterior distribution (they limited themselves to finding the posterior mode only). Renard et al.  illustrated, on the basis of synthetic experiments, the necessity of deriving reliable and precise prior descriptions of data errors to achieve well-posed inferences. The present paper builds on the latter work and proposes a practical strategy toward these objectives. Moreover, it explicitly demonstrates the utility of independent rainfall error analysis for improving the predictive reliability and for gaining quantitative and qualitative insights into the contribution of different sources of errors in hydrological prediction.
1.5. Outline of Presentation
 The Bayesian inference framework is outlined in section 2. Section 3 describes the specific data and methods used in this case study: the hydrological model and catchment data are described in section 3.1; section 3.2 describes the geostatistical rain gauge analysis, the development of an error model for the catchment average rainfall data, and its incorporation into the Bayesian inference; section 3.3 describes the runoff error model, and section 3.4 discusses the treatment of structural errors. Section 4 presents the results of a case study that evaluates the utility of this information in improving the quantification and decomposition of the runoff predictive uncertainty, with an emphasis on posterior scrutiny of the hypotheses made during calibration. The results are discussed in section 5, followed by a summary of key conclusions in section 6.
2. Theory: Bayesian Framework
2.1. General Setup: Data and Model
 In general, a rainfall-runoff (RR) model, H( ) hypothesizes a mapping between rainfall and runoff, given a set of (usually time invariant) parameters . Let and denote the true rainfall and true runoff time series of length Nt, respectively. Let denote the runoff predicted by the RR model, such that
 Hydrological models are usually also forced with potential evapotranspiration (PET). However, sensitivity to PET random errors is minor, and the impact of PET systematic errors remains much smaller than that of rainfall errors [e.g., Oudin et al., 2006]. We therefore exclude PET uncertainty from the analysis and notation. The influence of initial conditions is minimized using a warm-up.
2.2. Data Uncertainty
 The uncertainty in the rainfall-runoff data can be characterized using statistical error models, which describe what is known about the true values given the observations,
where and are error model parameters describing the statistical properties of the rainfall and runoff errors, respectively (e.g., means, variances, and autocorrelations of observation errors). The specification of these error models is a major focus of this paper. It will be described in detail in sections 3.2 (rainfall) and 3.3 (runoff).
 In this paper, we use a hierarchical structural error model that hypothesizes a single stochastic RR parameter , which varies on a characteristic time scale represented using epochs ,
where is the epoch associated with the tth time step and are parameters describing the statistical properties of the stochastic parameters (e.g., could contain the mean and variance of storm-dependent parameters).
 A key challenge in using approach (7) is the meaningful specification of . Since structural error remains the least understood source of uncertainty, scarce guidance exists for specifying anything other than vague priors, whether on exogenous structural error terms or on stochastic parameters.
2.4. Remnant Errors
 In addition to error models developed for particular error sources, we also account for “remnant” errors [Renard et al., 2010; Thyer et al., 2009]. These are related to the notions of “model inadequacy” [Kennedy and O'Hagan, 2001] and “model discrepancy” [Goldstein and Rougier, 2009] but are intended to capture not only unaccounted structural errors of the hydrological model but also inevitable imperfections and omissions in the descriptions of data uncertainty.
 Here we assume additive Gaussian remnant errors εt with unknown variance ,
 Note that in traditional regression, remnant errors such as (8) represent the lumped effects of all sources of error and correspond to “residual” errors.
 The BATEA posterior in equation (10) explicitly represents individual sources of uncertainty in the hydrological model-data system as follows.
 1. The “runoff likelihood” describes runoff and remnant errors. We refer to Kuczera et al. [2010b] for a fully general derivation of this likelihood and to section 3.3 for its derivation with the specific error models used in the case study.
 2. The “rainfall likelihood” describes rainfalls errors.
 3. The “stochastic-parameter term” characterizes structural errors.
 In addition, independent information on any quantity of inference can be supplied via the priors: 1. and are priors on the parameters describing respectively, rainfall and runoff data uncertainties; 2. is the prior on the time-invariant RR parameters; 3. is the prior on the parameters of the probability model of the stochastic parameters ; 4. p(R) is the prior distribution of the true rainfall time series; note that the product of this prior with the “rainfall likelihood” is proportional to the rainfall error model (2). 5. is the prior on the parameters of the remnant error model. Here in equation (8).
 The posterior in equation (10) can be explored using Markov chain Monte Carlo (MCMC) sampling. In this study, we use a multistage limited-memory MCMC strategy detailed by Kuczera et al. [2010a]. Also note that equation (10) can be modified to use joint priors on any quantity of inference. This would be needed, for example, if BATEA were applied recursively as new data arrives.
 The key scientific (as opposed to computational) challenge in using BATEA or any other Bayesian approach for the decomposition of individual sources of error is to develop accurate and precise probabilistic models for the individual terms in the posterior (10). This will generally require independent information to augment and constrain the inference. Illustrating these developments in a practical study is a major objective of this paper.
2.6. Calibration Schemes
 The BATEA framework can be used to derive several calibration schemes, differing in the type of error models and the amount of prior knowledge utilized in the inference. This allows exploring the benefits and challenges of explicitly describing each source of uncertainty and of including additional prior information. The following schemes are considered in this study (Table 1).
Table 1. Summary of Calibration Schemes and Markov Chain Monte Carlo Convergence Resultsa
 1. The standard least squares (SLS) scheme lumps the effects of all sources of errors into the remnant error term in equation (8).
 2. The output-input (OI) scheme explicitly accounts for rainfall and runoff uncertainty. Structural errors are handled entirely by the remnant error term. Vague priors are used for the terms p(R) and . However, prior information, derived from rating curve analysis, is used for the runoff error parameters .
 3. The OI-CS scheme is an “enhanced” OI scheme, augmented using an informative prior for the term p(R). This prior is derived using CS as described in section 3.2.
 4. The output-input-structural (OIS) scheme explicitly accounts for rainfall and runoff uncertainty and characterizes structural errors using a stochastic RR parameter. Note that it still uses the remnant error (8) to account for the inevitable imperfections of the uncertainty models.
 5. The OIS-CS scheme is an enhanced OIS scheme, augmented using the CS prior for the term p(R).
2.7. Quantification and Decomposition of Predictive Uncertainty
2.7.1. Total Predictive Distributions
 In Bayesian methods, the uncertainty in a quantity of interest (e.g., runoff Y) is usually quantified by means of the predictive distribution. Let denote the vector of all inferred quantities and denote the posterior of parameters given observed data . By definition, the predictive distribution of Y is [Gelman et al., 2004]
 In this study, the individual contributions of distinct sources of uncertainty are quantified by formulating “partial” predictive distributions (PPDs). The derivation of a PPD is illustrated using a simple two-parameter model.
 Let be the posterior of parameters θ1 and θ2. For example, θ1 and θ2 could be viewed as representing input and structural errors, which we are trying to disaggregate in this study. Now, consider the conditional distribution:
where is a given conditioning value (e.g., the posterior mode).
Equation (12) represents the uncertainty in Y contributed by the uncertainty in θ1, conditional on . We hence refer to it as the “PPD of Y arising from the uncertainty in θ1.” The PPD , representing the uncertainty contributed by θ2, can be defined in a similar manner.
 Unlike the TPD, PPDs cannot, in general, be constructed directly from MCMC samples of the joint posterior distribution. Sampling from the conditional posterior distribution in equation (12) would, in general, require separate MCMC sampling. However, in the special case where the posteriors of θ1 and θ2 are independent, the conditional distribution is equal to the marginal distribution . The PPD then reduces to
 Consequently, if the analysis of the full posterior suggests that θ1 and θ2 are nearly independent, the PPD in equation (13) can be approximated from the MCMC samples by generating a realization Y(i) for each parameter . To the extent that θ1 and θ2 are independent, the sample is then an approximate realization from the PPD .
 This study distinguishes between the following sources of errors: (1) rainfall errors, (2) structural errors, and (3) runoff plus remnant errors. The corresponding PPDs are constructed from the MCMC samples generated during the inference by iterating the flowchart in Figure 1 for i = 1:Nsim.
3. Materials and Methods
3.1. Study Area and Hydrological Model
3.1.1. The Yzeron Catchment
 The case study is based on the 129 km2 Yzeron catchment in the Rhône-Alpes region of France, near Lyon (Figure 2a). Its regime is rainfall dominated, with floods between autumn and spring and extended periods of low flows in summer. The annual average rainfall and runoff are approximately 845 and 150 mm respectively, yielding an annual runoff coefficient of 0.18. The upstream elevations range from 400 to 917 m, with steep slopes often exceeding 10%.
 Nearly 8 years of daily runoff (shown in Figure 2b) are used in this study. The last 2 years, 2007 and 2008, are used for calibration, while the preceding 6 years are used for validation.
 Two separate sets of rain gauges are used. The first set, denoted as R3D, comprises three rain gauges in the lower areas of the catchment (squares in Figure 2a), with daily totals available for the whole period of study. The daily mean of the R3D rain gauges provides an estimate of the daily areal rainfall (inverted bars in Figure 2b) that was used in the calibration and validation experiments.
 The second set, R13H, comprises 13 rain gauges located within the vicinity of the Yzeron catchment, shown as dots in Figure 2a. The spatial density of this network is quite high considering the moderate catchment size; moreover, it provides measurements at an hourly resolution. However, its observations are available only for the last 2 years of the study period. Consequently, the R13H data are used solely to investigate the error properties of the R3D estimates of the catchment average rainfall. In particular, the high spatial density of the R13H gauges permits the spatial variability of rainfall to be described using conditional simulation (section 3.2). The concurrent availability of the R3D and R13H data explains the use of the last 2 years of the study period for calibration, while only R3D data are used in the validation period.
3.1.2. The GR4J Rainfall-Runoff Model
 This study applies the widely used GR4J model [Perrin et al., 2003], which simulates catchment runoff using rainfall and potential evapotranspiration at a daily time step (Figure 2c). The model has two conceptual stores (production and routing), two unit hydrograph elements, and four calibration parameters: the maximum production storage (L, mm), the groundwater exchange parameter (L T−1, mm d−1), the maximum routing storage (L, mm), and the unit hydrograph time delay parameter (T, days). Further details are given by Perrin et al. .
3.2. Development of the Rainfall Error Model
3.2.1. Conditional Simulation
 The uncertainty of areal rainfall estimates is generally dominated by sampling errors, i.e., errors due to the incomplete description of the rainfall spatial field using rain gauges [Moulin et al., 2009; Severino and Alpuim, 2005]. Conditional simulation is a geostatistical method that generates multiple replicates of the rainfall field based on the values measured at individual rain gauges [e.g., Vischel et al., 2009]. In most common CS methods, the replicates match the observed values at the rain gauge locations but differ elsewhere. The spatial variability of the replicates depends on the geostatistical properties (distribution, variogram, etc.) of the rainfall fields, which are estimated prior to generating the CS replicates.
 CS provides a natural means to describe the uncertainty in the areal rainfall forcing and is therefore well suited for augmenting the statistical inference of hydrological models.
3.2.2. The Turning-Band Method Rainfall Generator For CS
 The CS method used in this study was the turning-band method (TBM) rainfall generator. The main equations of the TBM geostatistical model are provided in Appendix A. Further details are provided by Tompson et al. . A summary of the main characteristics is provided below.
 TBM generates three-dimensional fields that describe rainfall variability in two spatial (areal) dimensions and in the time dimension. Rainfall fields are constructed from the product of two independent fields: (1) a Boolean indicator field representing pixels with zero and nonzero rainfall and (2) a field of nonzero precipitation generated from a prespecified distribution.
 The TBM simulation depends on parameters describing the at-site rainfall distribution (e.g., mean and variance of a lognormal distribution) and the spatiotemporal properties of the observed rainfall fields (e.g., the spatiotemporal variogram). The simulated field is constructed to be consistent not only with the observed variogram of raw data (e.g., hourly rainfall) but also with the variograms of data aggregated over various durations (e.g., 2, 4, 6, 12, and 24 h intervals). This constraint is addressed using the integrative properties of random fields: given the variogram of the point process that generates (unobserved) instantaneous rainfall, it is possible to derive the variograms of the aggregated fields. This operation is known as the regularization of the point variable to the aggregated variable [e.g., Journel and Huijbregts, 1978, chapter II]. Consequently, the inference of the variogram parameters of the (unobserved) point process is based on the (observed) variograms of observations aggregated over various durations. This allows the generated field to be consistent with the spatiotemporal properties of aggregated rainfall.
 The TBM method generates Gaussian random fields, which are then transformed to obtain the indicator field and the nonzero precipitation field. This transformation is based on thresholding for the indicator field and on the transformation for the nonzero precipitation field, where Φ is the cumulative distribution function (CDF) of the standard Gaussian distribution and F−1 is the inverse CDF of nonzero rainfalls. Care is needed at this step because these transformations alter the spatial correlations of the simulated random field. Therefore, empirical and analytical correction formulae are used to match the correlation structure of the final rainfall field to the observations (see Appendix A for details). Finally, Gibbs sampling is used to condition the simulations at the rain gauge locations. Onibon et al.  provide further details.
3.2.3. Derivation of the Rainfall Error Model Using CS Data
Figure 3 depicts three representative CS replicates over four consecutive hourly steps. In all replicates, rainfall values match the observations at the conditioning gauge locations (open squares, at the R13H locations) but differ elsewhere. For each replicate, the hourly rainfalls are aggregated to the daily scale and averaged over the catchment area. This yields the daily areal rainfall of the Yzeron catchment associated with a particular conditional replicate.
Figure 4a compares the time series of areal rainfall estimated from the R3D network to the distribution estimated from 34 CS replicates (conditioned on rainfall values from the R13H network). The limited number of replications is due to current computational constraints: a single CS of 2 years of hourly data over a 49 × 49 grid takes several hours on a standard desktop CPU. Improved computational strategies are beyond our scope and will be investigated in future work.
Figure 4a shows that the spread of the conditional replications is highly variable on a daily scale. For example, the individual replicates varied from 15 to 40 mm on day 119, while the R3D estimate was 25 mm. This suggests considerable uncertainty in the R3D areal rainfall estimates during this particular event. Conversely, the replicates ranged from 55 to 67 mm for day 134, with the R3D estimate of 62 mm, suggesting a markedly smaller uncertainty in the R3D data. Figure 4b also shows that the standard deviation of CS replicates (computed for each day of the calibration period) has no clear relationship with the R3D-estimated areal rainfall. This implies that larger rainfall events are not necessarily subject to larger uncertainties.
Figure 4c compares the mean of the conditional replications with the R3D-estimated values for each day of the calibration period. Overall, they are in acceptable agreement, suggesting the absence of strong systematic bias in the R3D estimates. However, a closer inspection reveals considerable discrepancies between the two estimates of the areal rainfall for small events. More precisely, on some days R3D estimates are zero even when CS suggests considerable precipitation (up to 20 mm). This suggests that significant rainfall events can be missed with only three rain gauges or that the CS is overestimating small events or both.
 The errors in the R3D estimates can be approximated as
Figure 4d, which plots the multipliers φ versus the R3D rainfall estimates, reveals a complex distributional structure of rainfall errors. Multipliers associated with small-recorded rainfall values are predominantly larger than 1.0 (corresponding to the underestimation reported earlier) and are highly variable. The discrepancies in small rainfall events have several possible explanations: (1) biases in the R3D areal averages due to insufficient spatial coverage and/or (2) biases in the CS of small rainfall events. Multipliers tend to stabilize around 1.0 for higher rainfall values, suggesting an absence of strong systematic biases and a limited heteroscedasticity. While the low heteroscedasticity of multipliers associated with larger events supports the multiplicative error model (14), the difficulty in describing errors in small rainfall suggests that simple models, such as Gaussian multipliers, may not be adequate over the entire rainfall range and need future refinement.
3.2.4. Diagnostic Evaluation of CS Predictions Versus R3D Gauges
 To investigate the reliability of CS for small rainfall events, we evaluated the CS replicates against R3D rain gauge values (as opposed to areal averages) by comparing the rainfall series from a given R3D rain gauge gk with the CS predictive distribution at the pixel containing gk. The reliability of the CS predictive distribution is evaluated using the predictive QQ plot, which displays the p-values of the observations within the predictive distribution against the quantiles of the uniform distribution. A statistically reliable predictive distribution leads to p-values close to the 1:1 line. Departures from the bisector have specific diagnostic interpretations (see Laio and Tamea  and Thyer et al.  for further details).
 While Figure 5b suggests that the CS predictive distribution is reliable for daily rainfall exceeding 2 mm, Figure 5a suggests poorer reliability for small rainfalls. In particular, numerous observations have p-values of zero, suggesting a tendency of the CS to overestimate the actual rainfall. The discrepancies in small rainfall events discussed in previous section 3.2.3 are therefore at least partly due to biases in the CS of small events.
 Since the current analysis shows that CS reliably quantifies the uncertainty in the larger rainfall events, which are generally (though not always) of primary interest, it supports the use of CS as a tool to derive rainfall error estimates for hydrological applications. The investigation of the apparently poor CS performance for small rainfall is deferred to a future study.
3.2.5. Conditional Simulation as a Prior on the True Rainfall in BATEA
 A key advantage of the Bayesian paradigm is its ability to augment the inference with independent knowledge. In this study, we incorporate the information from the geostatistical analysis of the R13H network into a BATEA calibration of a hydrological model forced with the R3D rainfall. This is achieved by using the CS replicates to specify the term p(R) in the BATEA posterior (10).
 The prior p(R) is described using independent Gamma distributions with time-varying parameters and , describing the rainfall at all time steps t where rainfall exceeds 2 mm,
 The scale and shape at step t are estimated by matching the moments of the Gamma distribution to the moments of the CS replicates described in section 3.2.3. Note that the specification of the prior is based solely on the R13H data (analyzed using CS) and does not use data from the R3D network. A posteriori, the R3D data are used indirectly in the exploratory analyses reported in sections 3.2.3 and 3.2.4.
 Note that the exclusion of rainfalls below 2 mm from the error model is used as a computational acceleration strategy to remove insensitive degrees of freedom from the inference. This approximation has little effect on the inference results because the predicted runoff is largely insensitive to small rainfalls.
3.2.6. Rainfall Error Model
 The likelihood of rainfall errors, in equation (10) is specified as follows:
where TN(a, b2, 0) denotes a Gaussian distribution with mean a and standard deviation b, truncated at zero. Similar to equation (15), the error model (16) is applied only on days where Rt > 2 mm. The advantages and limitations of rainfall models (15) and (16) are discussed in section 5.4.
3.3. Development of the Runoff Error Model
 Runoff uncertainty was investigated by analyzing the rating curve and related stage-discharge measurements. The Yzeron catchment can be considered well gauged, with stage-discharge measurements covering a large fraction of the flow duration curve.
Figure 6 shows the runoff measurement errors, defined as the difference between the runoff measurements and the runoff predicted by the rating curve (“RCP runoff”). There is a clear trend of runoff measurement errors increasing with the RCP runoff.
 In view of Figure 6, we hypothesized a heteroscedastic error model, where runoff uncertainty is Gaussian with a zero mean and a standard deviation proportional to the RCP runoff,
where Q is the gauged runoff and is the RCP runoff. In the context of equation (3), .
Equation (17) was fitted to the Yzeron runoff data (with vague priors on a and b) using the WinBugs software [Spiegelhalter et al., 2003]. The 90% predictive limits of the runoff measurement error model are shown in Figure 6. The fanning out of the uncertainty bounds for large runoff values is dominated by extrapolation from lower flows, where many more stage-discharge measurements are available. This deficiency arises because of limited gauging data in the high-flow range (a single measurement for flows exceeding 10 mm).
 The posterior mean and standard deviation for the parameters of the rating curve error model (17) were a = 0.0032 ± 0.0015 and b = 0.096 ± 0.014. Since the precision of these estimates is relatively high, they were fixed at their posterior means during the subsequent BATEA calibration.
 Note that equation (17), in combination with the remnant error model (8), allows deriving the runoff likelihood term in equation (10). Given the error models selected here, observed runoff is treated as a realization from a Gaussian distribution with mean and variance .
3.4. Representation of Structural Errors
 The characterization of structural error of the GR4J model is explored using stochastic daily variation of its parameter . We also investigate a more traditional exogenous treatment of structural errors using the remnant error term (see also section 2.4).
 When is treated as stochastic, it is assumed to follow a truncated Gaussian distribution with unknown mean and standard deviation ,
 Note that controls the maximum storage of the production store (Figure 2c). It may seem surprising, or even imprudent, to make this quantity time dependent because the actual storage can then, in principle, exceed the maximum capacity. However, a separate sensitivity analysis (similar to Figure 5 of Kuczera et al. ) indicated that this parameter, when made stochastic, had the largest impact on model predictions. Importantly, we examined the inferred stochastic variability of to determine its effect on the storage values and long-term water balance (section 4.4.2).
 The convergence of MCMC samples reflects the statistical characteristics of the target distribution. In particular, slowly convergent sampling is often indicative of ill-posed posteriors [Renard et al., 2010]. Such posteriors arise when the data contain insufficient information to identify the quantities of interest and no prior information is available or used.
 The MCMC convergence was assessed using the GR criterion [Gelman et al., 2004; see also Cowles and Carlin  for a broader review]. For all calibration schemes except OIS (see below), GR statistics were below 1.2 for all inferred quantities, suggesting a well-posed inference. As expected intuitively, Table 1 shows that convergence is faster for lower-dimensional inference schemes. Yet it also highlights the impact of the prior on the speed of MCMC convergence. Despite having exactly the same likelihood function and the same number of inferred quantities, scheme OI-CS converges 10 times faster than OI because it uses the informative CS prior. This emphasizes that the computational cost of an inference depends more on its structure than just on its dimensionality. An in-depth discussion of dimensionality and computation in hierarchical models is provided by Spiegelhalter et al. ; see also the synthetic investigations by Renard et al. .
 The OIS scheme, which attempts to infer both rainfall and structural errors without using the CS prior, suffered from a prohibitively slow rate of MCMC convergence, with GR statistics for several inferred quantities (including hydrological parameters and latent variables) still exceeding 3.0 after 106 MCMC iterations. Inspection of the simulated values revealed strong negative correlations between the latent variables for input and structural errors (with some posterior cross correlations exceeding −0.9). Moreover, the posterior standard deviations of inferred quantities were higher than in the OIS-CS scheme by a factor of about 3 on average, but exceeding 10 for some latent variables. This computational behavior is symptomatic of ill posedness (see detailed discussion by Renard et al. ). In practical terms, this means that rainfall and structural errors are not simultaneously identifiable solely from the given forcing response time series with no associated error estimates.
 The nonconvergence of the OIS scheme, contrasted with the convergence of the OIS-CS scheme, supports a key conclusion of Renard et al. , namely, that the specification of informative priors for rainfall and runoff uncertainty is a necessary step to ensure well posedness when both forcing and structural errors are modeled hierarchically using latent variables.
4.2. Reliability of Total Predictive Uncertainty (All Schemes)
Section 4.2 examines the adequacy of the predictive distribution of runoff. Posterior scrutiny of the predictive distribution is important because violations of calibration assumptions can result in unreliable and misleading predictions [Hall et al., 2007; Thyer et al., 2009]. In addition to visual appraisals, which are of clear value to a hydrological expert, a more formal approach for evaluating the reliability of a predictive distribution is given by the predictive QQ plot (see section 3.2.3). However, reliability alone is insufficient to demonstrate that a particular predictive method is superior to another [e.g., Gneiting et al., 2007]. In particular, the precision of the predictive distribution also needs to be assessed. Moreover, the reliability of the total predictive distribution does not prove that all individual error models are correctly specified; it is a necessary but insufficient condition. This topic is further discussed in section 5.2.
Figure 7 shows the predictive QQ plots constructed for the validation period. In addition, Figure 8 shows the total predictive distributions from schemes OI, OI-CS, and OIS-CS for several flood events. Figure 8 allows a visual appraisal of the precision (i.e., sharpness or resolution) of the TPDs. Several important results can be noted.
 1. The SLS scheme produces an unreliable predictive distribution. The shape of the QQ curve in Figure 7a suggests a general overestimation of predictive uncertainty. However, when restricted to runoffs above 2 mm (65 days, Figure 7b), it shows that predictive uncertainty is actually severely underestimated for large runoffs, with many observations outside of their predicted range.
 2. The shape of the OI curve in Figure 7 suggests a severe underprediction of observations. This is confirmed by Figure 8 (top), with the predicted runoff being consistently lower than the observed values.
 3. The OI-CS scheme slightly underestimates the predictive uncertainty, with about 1% of the observations lying outside of the predictive range (p-values of 0 and 1 by convention). On the other hand, Figure 8 shows that OI-CS yields markedly more precise predictions compared to other schemes.
 4. The OIS-CS scheme yields a reliable estimation of the predictive uncertainty for all runoff ranges (Figures 7a and 7b). However, Figure 8 shows that its predictive precision is the lowest, suggesting that in this application, representing structural errors using a stochastic parameter has increased the predictive uncertainty compared to the OI-CS setup, where structural errors were represented as part of the additive remnant error term.
 Further insights can be gained by examining the estimated parameters of the rainfall and structural error models, as listed in Table 2.
Table 2. Rainfall and Structural Uncertainty Estimated as Part of the Hydrological Model Inference Using BATEAa
The posterior means are reported, followed by the corresponding posterior standard deviations.
0.20 ± 0.17
1.54 ± 0.15
1.18 ± 0.03
0.30 ± 0.03
1.15 ± 0.03
0.27 ± 0.02
218 ± 23
84 ± 17
 1. The standard deviation of the rainfall multipliers is estimated as 1.54 in the OI scheme but reduces to 0.27 in the OIS-CS scheme. This occurs because the OI scheme lacks an adequate description of structural errors (the homoscedastic Gaussian remnant error term is poorly suited to this) and, by increasing its standard deviation, the rainfall error model can compensate for unaccounted structural errors. This compensation is detectable in this case study because of the availability of independent prior knowledge on rainfall errors.
 2. Conversely, the estimated parameters of the rainfall error models are similar in the OI-CS and OIS-CS schemes. This illustrates the constraint exerted by the CS prior, limiting the interactions between rainfall and structural errors. However, recall that removing the stochastic variability in (OI-CS) leads to a slight underestimation of the predictive uncertainty.
4.3. Decomposition of Total Predictive Uncertainty Into Forcing, Response, and Structural Components (OIS-CS Only)
Section 4.2 showed that the BATEA methodology yields reliable estimates of predictive uncertainty when prior information on rainfall and runoff errors is available (OIS-CS scheme). It is then of practical significance and scientific interest to explore and evaluate the relative contributions of forcing and structural errors to the total predictive uncertainty.
Figure 9 shows the TPD and PPD for the forcing, structural, and response errors (see section 2.7 for details). Under the hypotheses made in this case study (including the hydrological model and the data error models), predictive uncertainty in the runoff appears to be dominated by structural errors. Although significant, rainfall errors explain a smaller part of TPD, with runoff and remnant errors contributing even less. The identifiability of the parameters of the rainfall and structural error models (in particular, their near-independence, maximum absolute posterior correlation of about 0.12) provides confidence that the PPDs with respect to rainfall and structural errors can be interpreted as representing the individual contributions of these random variables to the TPD.
 Note that this study uses partial predictive intervals in the decomposition of uncertainty. Since these correspond to conditional distributions (see section 2.7), the choice of the conditioning values may affect the decomposition of uncertainty. In the validation analyses presented here, we condition all latent variables on the modal estimate of their mean ( in equation (16); see also Figure 1) because it represents the “most likely” estimate of individual latent variables. Note that a PPD derived with such conditioning excludes the effects of random rainfall errors (as intended for a PPD reflecting structural uncertainty only) but includes the effects of systematic rainfall biases (since the posterior mean of the multipliers in general deviates from unity). Further design and interpretation of partial predictive limits will be carried out in a separate development.
 These results suggest that in this particular application a greater reduction in predictive uncertainty can be achieved by improving the hydrological model rather than by improving the accuracy of the rainfall and runoff data. We also stress that insights such as those above could not have been obtained with approaches that do not attempt to isolate structural uncertainty and therefore motivate further research efforts on the decompositional approach.
4.4. Posterior Scrutiny of Error Model Hypotheses
4.4.1. Input Errors (OIS-CS Only)
Figure 10 shows diagnostic plots to scrutinize the rainfall error model in equation (16). In particular, Figures 10a and 10b suggest that the assumption of independent rainfall multipliers from a truncated Gaussian distribution is plausible (however, note the considerable posterior uncertainty).
Figures 10c and 10d yield further insights into the identifiability of rainfall errors. They assess the extent to which the posterior estimates of true rainfall differ from the prior, i.e., whether the rainfall-runoff data and hydrological model contain sufficient information to modify the prior estimates of the true rainfall estimated using CS. Figure 10c compares the 90% credibility intervals of the true rainfall arising from the prior and the posterior distributions. In most cases, these intervals are similar, suggesting that the information content of the calibration data only marginally modifies the prior CS-based estimates of true areal rainfall. A few exceptions can be observed: e.g., on day 136, the posterior is considerably tighter than the prior.
 The contribution of the rainfall-runoff data to the refinement of the rainfall error estimates during the hydrological model calibration can be quantified using an “uncertainty reduction factor” (URF). In this work, the URF is defined as the ratio of the posterior and prior standard deviations of each inferred rainfall multiplier. It can be interpreted as follows: (1) URF ≈ 0 implies a significant reduction of uncertainty in the areal rainfall estimates (high information content in calibration data); (2) URF > 1 indicates increased uncertainty (e.g., if the calibration data conflicts with the prior); (3) URF ≈ 1 indicates that the calibration data have not refined the rainfall error model and the inference of the rainfall multipliers is governed by the prior. Note that case 3 is “noninformative” solely with respect to the inference of rainfall errors and does not imply that the inference of the hydrological model parameters is noninformative or governed by the priors.
Figure 10d plots the URFs versus the corresponding R3D rainfall values. Two points can be made.
 1. For large rainfall values (>20 mm), the URFs are mainly between 0.8 and 1, indicating little reduction in uncertainty. This implies that the prior (rather than the data) controls the inference of rainfall errors affecting large events. This is the likely reason for the ill posedness of scheme OIS, which does not use the prior information in equation (15).
 2. URFs for smaller rainfall events are highly variable, with some multipliers having a significant reduction in uncertainty after calibration. Although perhaps unexpected given the low sensitivity of the hydrological model to small rainfalls, such reductions could be explained by the constraint exerted by the error model in equation (16): during calibration, multipliers with a large prior variance will have their posterior variances tightened approximately to . Inadequacies of the simple rainfall error model (16) and the CS replicates for small rainfalls (section 3.2.3) may also be responsible for the differences in the URF patterns.
4.4.2. Structural Errors (OIS-CS Only)
 Similar to rainfall errors, Figures 11a and 11b suggests that the structural error model based on stochastic variation of at the storm time scale is plausible. However, as noted in section 3.4, it is important to check the evolution of storage with respect to the production store capacity because stochastic variations of parameter may lead to a store content exceeding the store capacity.
Figures 11c and 11d show the evolution of storage during the calibration period. While the store remained consistently below its full capacity, exceedances did occur on some rare occasions. A closer inspection of GR4J [Perrin et al., 2003] suggests two possible problematic scenarios.
 1. If rainfall exceeds PET, a part of the net rainfall fills the production store, with the remainder being routed through unit hydrographs. However, when the store exceeds its capacity, some water is subtracted from the production store. Note that this does not create a water balance error because this overflowing water is simply transferred to the routing components. Moreover, in the 2 year calibration period, overflows due to stochastic variations of amounted to a total of <1.5 mm, which is minor in the overall context of a 2 year runoff volume of nearly 300 mm.
 2. If PET exceeds rainfall, a part of the store content is evaporated. The actual evaporation is computed as a function of the net PET and the store level. While exceeding the store capacity could result in the actual ET exceeding PET, this never occurred in this study.
4.4.3. Residual Diagnostics
Figure 12 shows distributional and autocorrelation diagnostics for the standardized residuals. Note that for all schemes except SLS, the residuals combine runoff and remnant errors (section 2). For those schemes, standardization is therefore performed by dividing the raw residual at time step t by , where is the standard deviation of runoff errors (which are heteroscedastic with respect to the runoff magnitude, as shown in equation (17)) and is the standard deviation of remnant errors (which, in this case study, are homoscedastic, as shown in equation (8)).
 Several comments can be made.
 1. Accounting for data errors (OI-CS, OI, and OIS-CS schemes) markedly reduces the skewness and excess kurtosis (Figures 12a and 12b) of the standardized residuals. However, skewness and kurtosis remained statistically significant for all calibration schemes, including OIS-CS. This further discredits the assumption of homoscedastic Gaussian remnant errors and needs to be addressed.
 2. The autocorrelation tends to decrease when more sources of errors are represented explicitly in the inference scheme (Figure 12c). The amount of prior information also appears to be an important factor, with markedly higher autocorrelations for the OI-CS scheme than for the OI scheme. Nevertheless, given appreciable remaining autocorrelation, the remnant error model may need autoregressive components.
5.1. Quantification of Predictive Uncertainty
Section 4.2 indicated that the predictive distribution of runoff was fairly reliable for the OI-CS and OIS-CS scheme. It can be seen that the OI-CS scheme slightly underestimates predictive uncertainty (see Figure 7 and section 4.2), while the OI scheme yields significantly larger estimated input errors and predictive uncertainty. This suggests that the CS prior constrains the input error estimates and reduces their ability to interact with structural errors and compensate for unaccounted errors.
 Arguably the most reliable predictive distribution is obtained with the OIS-CS scheme (section 4.2), which includes the CS prior and an explicit characterization of structural errors using stochastic parameters. Importantly, the OIS scheme, which omits prior information on the rainfall errors, leads to an ill-posed inference (section 4.1). This is consistent with previous findings that priors on rainfall and runoff uncertainty control the well posedness of Bayesian hierarchical inferences in hydrology [Renard et al., 2010]. However, further work is warranted to improve the predictive precision of the OIS-CS scheme. If, as it appears for this case study, structural uncertainty is the dominant uncertainty, improving the predictive precision will likely require tightening the characterization of structural errors, as well as improving the hydrological model.
5.2. Decomposition of Predictive Uncertainty
 The empirical results in section 4.3 suggest that decomposing predictive uncertainty into its contributing sources is possible when independent estimates of rainfall and runoff data uncertainty are available and used in the BATEA inference. The reliability of this decomposition can be examined by considering (1) the reliability of the total predictive distribution in combination with (2) the reliability of the individual data and structural components. However, scrutinizing individual components of a predictive distribution is considerably more challenging than scrutinizing the full predictive uncertainty, as discussed next.
 Direct scrutiny of the estimated contribution of rainfall uncertainty to the uncertainty in the predicted runoff requires accurate areal rainfall estimates. Since this is rarely available, the adequacy of the decomposition can be investigated indirectly by scrutinizing the inferred distribution of latent variables. In this study, this posterior diagnostic was carried out only partially by comparing the inferred and predicted rainfall errors with those suggested by the R3D rain gauge network. Because of the short length of the densely gauged R13H rainfall time series for this catchment, it was used entirely to construct the rainfall error model for the calibration period and was not used to check the rainfall PD in the validation period. In applications where longer periods of densely gauged rainfall are available, it could be partitioned between calibration and validation.
 Future avenues for scrutinizing the rainfall component include comparing inferred rainfall errors with the errors suggested by other sources, such as radar. Although radar estimates are affected by complex measurement errors [e.g., Kirstetter et al., 2010], they can provide spatial information that is not captured by sparse rain gauge networks. For instance, comparing the location of the main mass of a rainstorm with the location of the rain gauges may shed light at least on the sign of the error (i.e., whether the rain gauge network has underestimated or overestimated the areal rainfall).
 Direct validation of the estimated structural uncertainty requires highly accurate forcing and response data, so that structural errors can be isolated. This is seldom achievable in practice, except in densely gauged experimental catchments. However, indirect strategies are possible. For instance, assessing the stability of structural error estimates when different rainfall and runoff data are used provides a useful measure of the interactions between data and structural errors.
5.3. On the Treatment of Structural Error
 The treatment of structural error remains a topic of active research (e.g., see the discussion by Beven  and Doherty and Welter ). This study does not aim to compare or improve methods for representing structural errors. Instead, it uses two particular structural error methods as part of a study pursuing error decomposition by exploiting independently derived data error models. We view this as a logical first step before structural error characterization is tackled.
 Many other distinct strategies have been proposed to represent structural errors, including stochastic state errors [Moradkhani et al., 2005a], model-averaging schemes [e.g., Duan et al., 2007; Marshall et al., 2007], multimodel frameworks [e.g., Clark et al., 2008], and other approaches [e.g., Bulygina and Gupta, 2009; Jacquin and Shamseldin, 2007]. Which of these approaches, if any, provide an adequate description of structural errors remains an open question. In particular, some authors have argued that the epistemic nature of structural uncertainty makes it poorly suited to a statistical treatment [e.g., Beven, 2008]. Our view is that in a particular modeling context, such as hydrological modeling, such proposition is impossible to prove or refute a priori. Yet the extent to which a statistical scheme succeeds in representing structural error can be scrutinized a posteriori by inspecting total and partial predictive uncertainties, applying residual and other diagnostics, etc.
5.4. Limitations and Future Work
 While we are optimistic with respect to the practical feasibility of the Bayesian approach in the context of hydrologic prediction, several significant challenges remain to be tackled.
 First, immediate limitations with respect to data availability are noted. In particular, CS requires a distributed rain gauge network to calibrate the CS parameters and variograms. Applications where no reliable information exists to inform the data error models are unlikely to be suitable for a decomposition of sources of error. This provides a strong argument in favor of continuing measurement and experimental campaigns and improving operational networks.
 Second, the geostatistical rainfall model used in this paper can be improved to overcome the lack of reliability for small rainfall (see section 3.2.4). For example, an approximate classification of rainfall events into more homogenous rainfall types (e.g., localized convective storms versus frontal rainfall events) could be performed, and the geostatistical properties (e.g., variograms) could be estimated separately for each type. Similarly, orographic effects could be included through a regression with respect to elevation.
 Third, while the error model in equation (16) is geared primarily toward characterizing the errors in the larger rainfall events, the limitations of the multiplicative error model are noted. It is unable to handle errors in zero rainfalls (i.e., for a localized storm not recorded by the rain gauge network) and appears poorly suited for errors in small rainfalls (see section 3.2.3).
 Finally, applications at a subdaily scale would require additional development of the rainfall and runoff error models, particularly including autocorrelation [McMillan et al., 2011].
 Other areas in need of further research attention include the following.
 1. The rating curve error model needs to be generalized to rigorously distinguish between random and systematic rating curve errors and to account for their likely autocorrelation at short time scales. Several options are emerging, including Bayesian approaches [e.g., Moyeed and Clarke, 2005], dynamic schemes [Dottori et al., 2009], and other methods [e.g., McMillan et al., 2010].
 2. The stochastic parameter approach needs further appraisal and more informative structural error models should be developed. The work by Reichert and Mieleitner , who used an Uhlenbeck process to characterize the time structure of stochastic parameters in lieu of the epoch dependence assumption [Kuczera et al., 2006], is an important advance in this direction.
 4. The understanding of structural errors can be improved. In particular, the use of structural errors to diagnose, compare, or improve hydrological models remains an important area of future research [e.g., Reichert and Mieleitner, 2009; Smith et al., 2008].
 5. The computational implementation is an area of ongoing work [e.g., Kuczera et al., 2010a; Vrugt et al., 2008]. Moreover, given the emerging evidence that in many cases the geometrical complexity of parameter distributions is an artifact of the numerical implementation of the hydrological model, the use of efficient gradient-based schemes for optimization and uncertainty analysis is of interest [e.g., Kavetski and Clark, 2010; Kavetski et al., 2006d].
 The utility of these developments should be scrutinized using stringent posterior diagnostics. In particular, the predictive QQ plot [e.g., Laio and Tamea, 2007; Thyer et al., 2009] and similar reliability checks in combination with appraisals of the predictive precision provide an objective yardstick to empirically evaluate the practical performance of the inference and make quantitative judgments on their suitability for operational purposes.
 The application of the Bayesian framework in a real-data case study confirms earlier findings that prior information on data uncertainty is not merely beneficial but essential for a meaningful and reliable quantification and decomposition of the predictive uncertainty.
 1. Simultaneous inference of forcing and structural errors within the hierarchical framework is ill posed unless informative priors on forcing and response uncertainties are specified.
 2. Ignoring sources of error may lead to unreliable predictions. Conversely, including additional error models improved the reliability of the total uncertainty estimates. We stress that this improvement was demonstrated in the validation period and thus is unlikely to be due to potential overfitting.
 3. Including informative priors on rainfall uncertainty demonstrably improves the reliability of runoff predictions (scrutinized in a validation time period) and paves the way for a quantitative decomposition of the total predictive uncertainty into its contributing causes.
 In this study, where the GR4J model was calibrated to a three-gauge daily rainfall observation network in the Yzeron catchment (France), structural uncertainty appears to dominate data uncertainty. This conclusion is likely to be catchment and model dependent. In addition, further work is needed to further develop and test techniques for analyzing and communicating partial uncertainties.
 The use of rainfall conditional simulation as part of hydrological model calibration represents a significant advance in the treatment of rainfall uncertainty in hydrological calibration. Whereas earlier work, including data assimilation approaches, previous applications of BATEA and analogous hierarchical Bayesian methods, used largely heuristic rule-of-thumb considerations in the specification of rainfall uncertainty, this study demonstrates that conditional simulation can provide more reliable and precise estimates of the uncertainties in individual rainfall measurements and how these uncertainties vary in time. This demonstrably improves the statistical reliability of the model predictions when compared against methods that disregard such information. Perhaps more importantly, approximate decompositions of predictive uncertainty become possible, including separate estimation of structural errors of the hydrological model.
 More generally, this study takes an important step toward more reliable uncertainty quantification and decomposition, which would be beneficial for many key scientific and operational purposes in hydrological and environmental modeling, including (1) improved probabilistic forecasts and predictions, (2) meaningful hydrological model evaluations unobscured by data errors, and (3) more efficient research and operational resource allocation to reduce predictive uncertainty.
 Given the manifest significance of a robust quantitative understanding of data and modeling uncertainties in environmental studies, further development and implementation of instrumental and statistical procedures is needed to estimate the accuracy and precision of environmental data at the data collection and postprocessing stages. The Bayesian paradigm, with its philosophy of using and refining the knowledge of all uncertain quantities (be it model parameters, true forcings, or some error properties of the latter), provides a very appealing platform for the systematic integration of these insights into environmental model inference and prediction.
Appendix A:: Details of the TBM Rainfall Generator
 The models given in Table A1 apply to the point-process generating (unobserved) instantaneous rainfall. Moreover, the variograms are those used to generate Gaussian random fields and may differ from the empirical variograms of observed data. The following steps are necessary.
Table A1. Geostatistical Rainfall Models Used in This Case Studya
Indicator Field I(x,y,t)
Nonzero Rainfall Field W(x,y,t)
The parameters are , , , , , , and .
Variogram (in Gaussian space)
A1. Step 1: Pass From Gaussian to Real Space
 A Gaussian field U = U(x,y,t) generated using the variograms in Table A1 is transformed into a real-space random field R = R(x,y,t) using the following transformations. For the indicator field,
 For the nonzero field,
where is the standard Gaussian CDF and FW is the CDF of the at-site distribution in Table A1.
 These transformations alter the correlations of the transformed field, which will not match those of the Gaussian field. The variograms in real space are therefore derived as follows.
 For the indicator field, an exact formula can be used:
 For the nonzero field, the transformation is assumed to affect only the sill of the variogram,
where Var[W] is the variance of the at-site distribution (Table A1). Simulation studies suggest that this is a reasonable hypothesis when the coefficient of variation of the at-site distribution is moderate.
A2. Step 2: Convert Partial Variograms Into the Total Variogram
 Given the variograms of the indicator field I and the nonzero field W, the variogram of the rainfall field Z = IW can be derived as follows [Lepioufle, 2009]:
where is the transition variogram between zero and nonzero rainfall. If I and W are independent, this variogram does not depend on the distance d and is equal to [Lepioufle, 2009]
A3. Step 3: Convert Simultaneous Rainfall Into Cumulated Rainfall
 The variogram describes the instantaneous rainfall field, yet the observed data are rainfall cumulated over a given duration (e.g., 1 h). It is therefore necessary to derive the variogram of the cumulated rainfall field.
 Let (x1, y1) and (x2, y2) be two pixels in the simulation domain, and let be two time points. Here is the spatial distance between (x1, y1) and (x2, y2). The spatial variogram of the rainfall cumulated over a duration D can be derived as follows [e.g., Journel and Huijbregts, 1978]:
A4. Step 4: Estimation
 is the spatial variogram of observed data cumulated over a duration D. The parameters in Table A1 can therefore be estimated by fitting the variograms to the empirical variograms of observed data cumulated over durations D1, …, Dk. A simple least squares fitting criterion is used in this case study, with durations 1, 3, 6, 12, and 24 h. The inferred parameters are given in Table A2.
Table A2. Estimated Parameters and Simulation Grid for the Geostatistical Models
Indicator Field I(x,y,t)
Nonzero Field W(x,y,t)
Variogram (in Gaussian space)
 This work is supported by a FAST grant from the Department of Innovation, Industry, Science and Research (Australia), the Ministry of Higher Education and Research (France), and the Ministry of Foreign and European Affairs (France). The helpful comments from Jasper Vrugt, Keith Beven, and three anonymous reviewers substantially improved this paper and are gratefully acknowledged.