## 1. Introduction

[2] Seasonal streamflow forecasts are important for management of water resources. In Australia, the Bureau of Meteorology (BOM) provides seasonal streamflow forecasting service. The forecasts are issued at the start of each month and predict the total unregulated inflow (hereinafter called streamflow) volumes for the next 3 months (hereinafter called season) at each forecast site (hereinafter called catchment). The forecasts are produced using a statistical technique, specifically the Bayesian joint probability (BJP) modeling approach [*Wang et al*., 2009; *Wang and Robertson*, 2011]. Separate forecasting models are established for each catchment and overlapping season to account for spatial variability and seasonality.

[3] The seasonal streamflow forecasts exploit two sources of predictability. The first is the amount of water held in a catchment (as snow, in surface storages, in the soil and in groundwater) at the time the forecast is made, which we term “catchment wetness”. The second is the climate during the forecast period. In the operational forecasting system, one predictor is used to represent each source of predictability. The predictors used in the forecasting models are chosen from a pool of candidates according to their predictive performance for retrospective cross-validation forecasts [*Robertson and Wang*, 2012a]. Candidate predictors representing catchment wetness include observed antecedent rainfall and streamflow totals. Candidate predictors used to represent the climate during the forecast period include the lagged climate indices.

[4] Choosing the best predictors can be problematic. A choice based on historical data is subject to sampling error, especially when the underlying relationships are relatively weak [*Wang et al*., 2012a]. In addition, several competing models can show similar overall performance but produce quite different forecasts for individual events. Choosing only one model excludes other plausible models and ignores model uncertainty [*Beven and Binley*, 1992; *Raftery et al*., 2005; *Wang et al*., 2012a].

[5] An alternative is to include all candidate models, and weight each model according to its predictive performance [*Casanova and Ahrens*, 2009; *Wang et al*. 2012a]. Several studies have pointed out the advantages of combining forecasts from multiple models [*Casey*, 1995; *Rajagopalan et al*., 2002; *Coelho et al*., 2004; *Luo et al*., 2007; *Stephenson et al*., 2005; *Regonda et al*., 2006; *Devineni et al*., 2008; *Bracken et al*., 2010; *Wang et al*., 2012a]. For example, *Rajagopalan et al*. [2002] used a Bayesian approach to combine categorical climate forecasts derived from a number of general circulation models (GCM) forecast ensembles. The posterior probabilities of the combined forecast, calculated under the assumption of multinomial process, showed improvement over individual model forecasts. *Bracken et al*. [2010] *and Regonda et al*. [2006] combined forecasts from multiple candidate models based on an objective criteria measuring the “predictive risk” of the candidate models, and reported that forecast combination resulted in improved seasonal streamflow forecast performance over using only the “best” model.

[6] Among various methods of model combination, Bayesian model averaging (BMA) has been reported to be an effective way of combining forecasts from multiple models [*Hoeting et al*., 1999; *Neuman*, 2003; Raftery *et al*., 1997, 2005, *Ajami et al*., 2007]. In the classical BMA approach, model weights are based on posterior model probabilities [*Hoeting et al*., 1999; *Neuman*, 2003]. However, BMA can also be formulated as a mixture model problem, where weights are derived by maximizing the likelihood function of the combined model. The maximization can be achieved by using expectation-maximization (EM) algorithm [Raftery *et al*., 1997, 2005, *Ajami et al*., 2007].

[7] *Wang et al*. [2012a] further developed the mixture model approach to BMA. Firstly, they applied a prior for the weights to slightly favor an outcome of more evenly distributed weights (and thus consensus forecasts). The use of the prior results in more stable weights in face of large sampling uncertainty. Secondly, they used a cross-validation likelihood function, instead of the classical likelihood function, to maximize the “predictive” rather than the “fitting” ability of the combined model [*Shinozaki et al*., 2010]. *Wang et al*. [2012a] applied BMA to combine seasonal rainfall forecasts from multiple statistical models. They showed that the combined rainfall forecasts are superior to forecasts generated from the best individual model.

[8] While the approach of combining forecasts from multiple models can be effective, it relies on candidate models that perform well individually. Recent studies show that it is possible to improve seasonal streamflow forecasts by incorporating outputs of dynamical hydrological models into statistical forecasting methods [*Robertson et al*., 2013; *Rosenberg et al*., 2011]. Hydrological models capture some of the catchment physical processes and can therefore better represent catchment wetness than simply using antecedent streamflow or rainfall as a predictor.

[9] In the BJP model currently used by the BOM, lagged climate indices are used as predictors of future climate. While there is evidence that lagged climate indices can be useful for forecasting seasonal rainfall, the forecast skill that can be achieved is modest [*Schepen et al*., 2012b; *Wang et al*., 2012a]. Indeed, *Robertson and Wang* [2012b] and *Robertson et al*. [2013] showed that most of the seasonal streamflow forecast skill comes from the knowledge of initial catchment wetness. A natural question is whether seasonal streamflow forecasts can be further improved by incorporating predictions from dynamical seasonal climate forecast models.

[10] Dynamical climate models simulate evolution of the climate system using physically based mathematical representations of the atmosphere, land and oceans. They predict dominant modes of climate variability such as the El Niño Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD) [*Lim et al*., 2009, 2011]. They also explicitly model transient processes and capture concurrent relations between forecast variables (e.g., rainfall and sea surface temperatures (SST)) [*Schepen et al*., 2012b]. Therefore, dynamical climate models can be expected to better predict climate than statistical models that simply use lagged climate indices as predictors.

[11] However, seasonal climate forecasting using dynamical models is challenging and the dynamical models do not often produce predictions that are more skillful than statistical models. The dynamical climate predictions are usually biased and have overconfident estimates of forecast uncertainty [*Schephen et al*., 2012a; *Lim et al*., 2011]. *Rajagopalan et al*. [2002] reported that climate predictions from a combination of a number of global climate models were superior to climatology in only a few regions of the world. *Schepen et al*. [2012b] reported that the performance of forecasts of “calibrated” dynamical climate models were comparable yet “different” to their statistical counterparts and showed that their combination could extend the spatial and temporal coverage of forecast skill [*Schepen et al*., 2012a, 2012b].

[12] In this study, we investigate two strategies for improving the statistical method currently used by the BOM for seasonal streamflow forecasting. The first strategy is to use BMA to combine forecasts from multiple candidate models (established using the BJP modeling approach) using lagged climate indices as predictors instead of the currently used method of selecting the “best” forecast. We use the BMA approach by *Wang et al*. [2012a] for forecast combination. The second strategy is to take advantage of the direct simulations of various dynamic processes represented by a global climate model by using its rainfall and SST predictions as predictors to first establish additional BJP candidate models, and then adding them into the existing pool for forecast combination.

[13] The next section describes the catchment and hydrological data. Section 3 describes the BJP modeling approach, and the candidate models, including a description of the predictors used to establish candidate models. Section 4 describes four different methods used to produce the seasonal streamflow forecasts. Section 5 assesses the skill and reliability of the forecasts. Section 6 presents some further analysis and discussion, and section 7 summarizes the study and presents overall conclusion.