## 1. Introduction

[2] Model simulations/predictions are subject to various uncertainties and sources of forecast errors. Uncertainties may stem from model initialization because of incomplete data coverage, observation errors, or improper data assimilation procedures. Other sources of uncertainty in prediction are associated with model input (i.e., forcing data) and imperfections of the model structure itself, mainly due to parameterization and spatiotemporal discretization of physical processes. Model structure uncertainty is associated with the assumptions reflected in model conceptualization and mathematical structure. An unfortunate truth in model development is that no matter how many resources are invested in developing a particular model, there remain conditions and situations in which the model is unsuitable to give an accurate forecast. Reliance on a single model typically overestimates the confidence and increases the statistical bias of the forecast.

[3] Model-averaging techniques look to overcome the limitations of a single model by linearly combining a number of competing models into a single new model forecast. This method dates back to work by *Bates and Granger* [1969] who utilized model averaging in economic forecasting, and showed that a pooled forecast of competing models outperformed any single model's forecast.

[4] The early applications of model-averaging for hydrological systems resulted in a point forecast [*Shamseldin et al.*, 1997]. Techniques such as equal weight, Granger-Ramanathan averaging, and Bates-Granger averaging [*Granger and Ramanathan*, 1984, *Diks and Vrugt*, 2010] linearly combine the deterministic model outputs into another single-point deterministic forecast. An extension of these approaches was performed by *Doblas Reyes et al.* [2005] using a multiple linear regression model to compute model weights while assuming a model Gaussian distribution, which allows for a probabilistic performance valuation. Although these techniques outperform any single model's predictions, *Hoetting et al.* [1999] and *Rafterty et al.* [2005] argued that these weights are unintuitive and do not necessarily reflect the strength of a particular model's performance. Therefore, as an alternative approach to overcome these criticisms, *Hoetting et al.* [1999] considered the idea of Bayesian model averaging (BMA), a technique that weights a model by its performance and likelihood of predicting the observation, resulting in a probabilistic forecast. The model weights are all nonnegative with the total sum equaling 1, therefore acting as a probability measure of a model's likelihood of success. *Rafterty et al.* [2005] incorporated these techniques on an ensemble of meteorological models, while others [*Duan et al.*, 2007; *Vrugt and Robinson*, 2007; *Rojas et al.*, 2008] used these techniques on hydrological models.

[5] *Duan et al.* [2007] showed the BMA techniques outperform, or are comparable to, an individual model forecast using a variety of pointwise performance measures on a number of conceptual rainfall-runoff (CRR) models. These results were obtained by using a single training period to determine the weights of each model. Marked improvements were noted by splitting the training periods into specific flow regimes, therefore accentuating particular models strengths by making their weights larger during that flow sequence. Others [*Rafterty et al.*, 2005; *Vrugt and Robinson*, 2007] used a sliding window of training periods to estimate BMA weights. Another Bayesian formulation approach, the hierarchical mixtures of experts is performed by *Marshall et al.* [2007], which dynamically adjusts the model weights based on the watersheds predictor variables, such as system storage. Recently, the sequential Bayesian combination approach was introduced by *Hsu et al.* [2009] to act as an alternative to BMA by using sequential Bayes' law in recursively updating the posterior probability of a model likelihood function given new observations. The posterior probability acts as weights for the multimodel averaging.

[6] To apply the BMA techniques to deterministic models, some measure of the uncertainty surrounding the model forecast is required. This uncertainty is caused by a mixture of both estimates in the observations and state descriptions. Current techniques assume a fixed probability distribution function (PDF) that represents the uncertainty in the forecast. The parameters of this PDF are either estimated by error in the model prediction, with respect to the observation during a calibration period [*Hsu et al.*, 2009], or by using optimization techniques such as the expectation maximization algorithm [e.g., *Rafterty et al.*, 2005; *Ajami et al.*, 2007; *Duan et al.*, 2007]. Sequential data assimilation techniques, however, can provide a more direct way of accounting for this uncertainty, while having the potential to reduce these uncertainties.

[7] Sequential data assimilation techniques merge the observations, as they become available, with the dynamic model to update/correct the model states forecast. The ensemble Kalman filter (EnKF), the most commonly used ensemble data assimilation method in hydrometeorologic forecasting, relies on Gaussian distributions to represent the model error and observation uncertainties by randomly sampling from each distribution. Assuming that the model states are linearly correlated with the prediction, the assimilation process corrects the uncertain model states to build a posterior distribution [*Moradkhani and Sorooshian*, 2008]. Another data assimilation technique, the particle filter (PF), was developed to relax these assumptions, allowing non-Gaussian error distributions and creating a complete representation of the forecast density [e.g., *Moradkhani et al.*, 2005a, 2006; *Weerts and El Serafy*, 2006; *Matgen et al.*, 2010]. *Moradkhani et al.* [2005a] extended this approach to dual state-parameter estimation, where the sequential state and parameter estimation can be done concurrently accounting for interdependencies among the state variables and parameters during model simulation. *Leisenring and Moradkhani* [2010] extended the application of PF to a snow water equivalent estimation. In a parallel effort, *DeChant and Moradkhani* [2011a] assimilated the brightness temperature from an AMSR-E satellite to a National Weather Service (NWS) SNOW-17 model within a particle filtering framework. The resulting snowmelt fluxes were used as forcing data for the Sacramento Soil Moisture Accounting Model (SAC-SMA) to improve the streamflow forecasting. *Montzka et al.* [2011] used the PF to update both hydraulic parameters and state variables to explore the potential of using surface soil moisture measurements from different satellite platforms in retrieving soil moisture profiles and soil hydraulic properties using the HYDRUS-1-D model. More recently, *DeChant and Moradkhani* [2011b] combined the PF data assimilation with an ensemble streamflow prediction (ESP) model of National Weather Service (NWS) to more accurately and reliably characterize the initial condition uncertainty in generating the ensemble of streamflow forecast.

[8] Studies on the application of BMA strategies to CRR models to date have primarily focused on the performance over multiple years and hydrologic situations. In a more theoretical study for temperature forecast, *Weigel et al.* [2008] showed that prediction skill would only improve with multimodel averaging schemes when individual models were overconfident. Since the confidence in individual CRR models is highly correlated with precipitation pulses, this suggests that the performance of different BMA strategies applied to CRR models might also change with different stages on the hydrograph.

[9] In this paper, we conduct a comparison of several BMA strategies applied to CRR models and evaluate their forecast skills over a range of stages of the hydrograph. In section 2, a brief description of the different BMA strategies considered in this study is provided. In section 3, we outline the experimental design, including the individual models used and a brief description of the study area. In section 4, the performance measures used in our analysis are described and employed over the entire validation period followed by an analysis over different ranges of volatility in the hydrograph. Section 5 contains the summary and conclusions of our analysis.