Effect of different uncertainty sources on the skill of 10 day ensemble low flow forecasts for two hydrological models
Mehmet C. Demirel,
Water Engineering and Management, Faculty of Engineering Technology, University of Twente, Enschede, Netherlands
Corresponding author: M. C. Demirel, Water Engineering and Management, Faculty of Engineering Technology, University of Twente, PO Box 217, NL-7500 AE Enschede, Netherlands. (email@example.com)
 This paper aims to investigate the effect of uncertainty originating from model inputs, parameters and initial conditions on 10 day ensemble low flow forecasts. Two hydrological models, GR4J and HBV, are applied to the Moselle River and performance in the calibration, validation and forecast periods, and the effect of different uncertainty sources on the quality of low flow forecasts are compared. The forecasts are generated by using meteorological ensemble forecasts as input to GR4J and HBV. The ensembles provided the uncertainty range for the model inputs. The Generalized Likelihood Uncertainty Estimation (GLUE) approach is used to estimate parameter uncertainty. The quality of the probabilistic low flow forecasts has been assessed by the relative confidence interval, reliability and hit/false alarm rates. The daily observed low flows are mostly captured by the 90% confidence interval for both models. However, GR4J usually overestimates low flows whereas HBV is prone to underestimate them, particularly when the parameter uncertainty is included in the forecasts. The total uncertainty in GR4J outputs is higher than in HBV. The forecasts issued by HBV incorporating input uncertainty resulted in the most reliable forecast distribution. The parameter uncertainty was the main reason reducing the number of hits. The number of false alarms in GR4J is twice the number of false alarms in HBV when considering all uncertainty sources. The results of this study showed that the parameter uncertainty has the largest effect whereas the input uncertainty had the smallest effect on the medium range low flow forecasts.
 Rain-fed rivers in Western and Central Europe have a discharge regime with high flows in winter and low flows in late summer due to the temperate climate. The rivers, e.g., the River Rhine, are generally navigable throughout the year, a situation which has contributed to the region's industrial and trade development. The rivers are used for drinking water supply, irrigation, industrial use, power production, and freight shipment and also fulfill ecological and recreational functions [De Wit et al., 2007]. Floods and low flows are seasonal phenomena that may cause several problems to society. Since floods are eye-catching, quick, and violent events risking human life, contingency plans and water management bodies often focus on flood issues. In contrast, low flows are slowly developing events affecting a much larger area than floods. There is a growing concern that low flows will intensify due to climate change [Arnell, 1999; Grabs et al., 1997; Hagemann et al., 2008; Middelkoop et al., 2001]. Low flows in rivers may negatively affect the above-mentioned river functions. Severe problems, e.g., water scarcity for drinking water supply and power production, hindrance to navigation and deterioration of water quality, have already been experienced during low flow events in the River Rhine in the dry summers such as in 1969, 1976, 1985, and 2003, indicating the importance of considering these events in addition to flood events.
 To anticipate possible low flow events it is crucial that 10 day low flow forecasts become available in addition to short-range (1–4 days) forecasts. The forecasted low flow is commonly given as one value, even though it is an uncertain value. There is an increasing interest to account for uncertain information in decision support systems, e.g., how to operate river navigation and power plants during low flow periods to maximize the gain. One challenge is to develop systems that can use uncertain information [Engeland et al., 2010]. We are interested in forecasting low flows with a lead time of 10 days, and in presenting corresponding uncertainty to provide low flow information to major river users. This study focuses on assessing the uncertainty in ensemble 10 day low flow forecasts for two conceptual hydrological models.
 Carrying out a systematic uncertainty analysis in hydrological modeling is an important field in hydrology, according to the numerous recent contributions in well-known journals [Cunha et al., 2012; Rossa et al., 2011; Salamon and Feyen, 2009; Tolson and Shoemaker, 2008]. Uncertainty assessment has been one of the main goals of the Prediction in Ungauged Basins initiative promoted by the International Association of Hydrological Sciences [Montanari, 2011]. Similarly, the Hydrological Ensemble Prediction Experiment, another international initiative, published a special issue on the results of the intercomparison experiment for postprocessing techniques for ensemble forecasts [Van Andel et al., 2012]. Systematic uncertainty analysis consists of several steps such as, classification, importance assessment, quantification, uncertainty propagation through model, and finally, communication of the uncertainty to the end users. After the identification of the main sources of uncertainty [Ewen et al., 2006], these sources must be classified. There are many different approaches for source classification. For instance, Walker et al. (2003) classified uncertainty as originating from model context, input, model structure, and parameters. Other studies distinguished the uncertainty in observations, instruments, and the context of the problem, expert judgment, and indicators [Janssen et al., 2005; Van der Sluijs et al., 2005; Warmink et al., 2011]. It has been commonly accepted that model inputs, parameters, initial conditions, and structure are the major sources of uncertainty in conceptual hydrological models [Refsgaard et al., 2006; Zappa et al., 2011]. We focus on these sources for further analysis. Quantification of the uncertainty sources is probably the most difficult step of the uncertainty analysis.
 Uncertainty in forecasted input data, e.g., precipitation and temperature, is mainly from the assumptions and simplifications made when describing atmospheric processes in weather forecast models. In particular, future precipitation amounts are assumed to be very uncertain [Cunha et al., 2012; Roulin, 2007]. To quantify the uncertainty in weather forecasts, an ensemble of lower-resolution forecasts (ENS) has been developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) and other national meteorological services [European Centre for Medium-Range Weather Forecasts, 2012]. The system is operational since 1992, and a number of modifications have been implemented to its structure and grid resolution for improving the numerical weather predictions. In this system, there are 50 different perturbed weather forecasts and an unperturbed control forecast. The 50 members, comprising an ensemble, are computed for a lead time of 15 days using perturbed initial conditions and model physics [Pappenberger et al., 2005; Roulin and Vannitsem, 2005]. Each member of an ensemble is assumed to be equally probable and provide useful information to address the uncertainty in future precipitation amounts [Roulin, 2007]. However, in the context of flow forecasting it is important to assess the precipitation uncertainty in terms of the effect on runoff rather than in terms of comparing forecasted precipitation against observed precipitation [Arnaud et al., 2011; Nester et al., 2012]. For example, Pappenberger et al.  and more recently Pappenberger et al.  used meteorological ensembles in hydrological models with different parameter settings to assess the uncertainty in flood forecasts. Similarly, Randrianasolo et al.  coupled weather ensemble prediction system products from Météo-France with two hydrological models for forecasting discharges of 211 catchments in France for a lead time of 2 days.
 Obviously, there are other sources of uncertainty in low flow forecasts in addition to the model input [Meißner et al., 2012; Zappa et al., 2011]. Hydrological models, whether using observed [Cunha et al., 2012] or forecasted [Nester et al., 2012] rainfall, are also limited by their capacity to represent the dominant processes in the river basin with appropriate spatial and temporal scales. Effective values of model parameters affected by local spatial heterogeneities and nonstationarities provide usually loose associations with dominant processes [Lawal et al., 1997; Pappenberger et al., 2005; Stravs and Brilly, 2007]. Therefore, the uncertainty due to model parameters will inevitably influence model outputs. There are a range of methods of quantifying model parameter errors including Monte Carlo simulations and analytical approaches [Montanari and Grossi, 2008]. Generalized likelihood uncertainty estimation (GLUE) is a Monte Carlo based technique developed for calibration and estimation of uncertainty of predictive models using equifinality concept [Beven and Freer, 2001; Stedinger et al., 2008; Viola et al., 2009]. Concerning the choice of the likelihood measure, Beven and Binley  pointed out that many likelihood measures in GLUE can be appropriate for a given application. Jin et al.  compared different likelihood measures and the model uncertainty. They found that a less strict likelihood function obviously leads to a wider confidence interval of the output uncertainty. Therefore, neither a too strict nor a too relaxed likelihood is appropriate for the GLUE assessment. The GLUE method has been widely used for flood forecasting [Pappenberger et al., 2004, 2005] and for simulation of both high and low flows [Freer et al., 1996; Tolson and Shoemaker, 2008; Vázquez et al., 2009; Viola et al., 2009; Tian et al., 2012]. In addition, the GLUE method is simple and relatively easy to implement. Therefore, GLUE is used in this study for model calibration and uncertainty analysis.
 The drawbacks and advantages of the GLUE method have been enormously discussed in the hydrology literature [Beven and Young, 2003; Beven et al., 2007, 2008; Li et al., 2010; Mantovan and Todini, 2006; Montanari, 2005; Stedinger et al., 2008].
Beven et al.  showed that if a correct formal likelihood was used in the GLUE method, the results would be identical with the formal Bayesian technique. Stedinger et al.  also showed that GLUE can produce meaningful uncertainty and prediction intervals using a correct likelihood function. Beven  argues whether the assumptions used in a formal Bayesian analysis are valid for any nonsynthetic hydrologic system being modeled. In our ensemble low flow forecasting case study, performing a formal Bayesian method would require a very difficult process of deriving a correct description of the residual errors which are correlated in time and space. As a result, our study utilizes an elaborated low flow likelihood function within the GLUE method to assess parameter uncertainty.
 Uncertainty in initial condition of state variables can have a significant effect on low flow forecasts. The summer forecasts, using a model initialized with an unsaturated soil state and with a saturated soil, will be very different. During prolonged dry periods the discharge largely originates from the release of groundwater storage [De Wit et al., 2007; Tallaksen and Van Lanen, 2004]. Therefore, uncertainty coming from the model initial conditions and groundwater storage in particular should be treated separately from the model parameter uncertainty, although model parameters and states are part of the model structure [Butts et al., 2004]. The uncertainties may be amplified when cascaded through the hydrological model [Nester et al., 2012]. Kommaet al.  showed that small errors in rainfall may result in larger errors in model outputs. They showed that an uncertainty range of 70% in the precipitation ensemble increased to an uncertainty range of 200% in forecasted runoff with a lead time of 48 h. Although they related this to the nonlinearity of the catchment response, this amplification could also be caused by uncertainties from precipitation measurements and model parameters [Nester et al., 2012]. Kommaet al.  also showed that the assessment of precipitation uncertainty should be in terms of the effect on model outputs (herein forecasted low flows) instead of comparing only forecasted precipitation and observed precipitation.
 Different methods, i.e., particle filter [DeChant and Moradkhani, 2011] and ensemble Kalman filter [Pasetto et al., 2012], have been applied to asses initial condition uncertainty in the framework of ensemble streamflow prediction. The resulting update is similar to the static GLUE application [Pasetto et al., 2012]. These filters are examples of data assimilation techniques that are often used in short-term flood forecasts [Liu et al., 2012; Moradkhani et al., 2012; Parrish et al., 2012].
 The aforementioned studies demonstrate the need for a systematic uncertainty analysis framework that isolates uncertainties due to various weather inputs, parameter estimation, and initial conditions. Understanding the relative contributions of these sources to the total low flow forecast uncertainty and to the quality of forecasts can assist in the future development of ensemble forecasting systems. All studies mentioned constrain either to only flood forecast models or to simulation models used for low flows, but no similar application to low flow forecasts is known to the authors which also uses a sound model state updating procedure for assessing effect of initial condition uncertainty. There have been studies using ENS products for flood forecasting [Devineni et al., 2008; Fundel and Zappa, 2011; Jaun and Ahrens, 2009; Muluye, 2011; Nester et al., 2012; Pagano et al., 2012; Renner et al., 2009; Thirel et al., 2008, 2010] and high-resolution precipitation ensemble forecast of a regional climate model, i.e., Limited Area Ensemble Prediction System developed within COSMO consortium (COSMO-LEPS) [Addor et al., 2011; Zappa et al., 2011], but no study is known to the authors which focuses on 10 day low flow forecasts. Only short-range low flow forecasts up to 4 days are issued by different water authorities for the entire Rhine basin [De Bruijn and Passchier, 2006]. There have been different cross-border projects such as “Floods and low flow management in the Moselle and Saar Basin (FLOW MS)” focusing on climate impacts on low flows [Görgen et al., 2010]. However, low flow forecasts followed by a systematic uncertainty analysis do not exist for the Rhine basin and Moselle River, in particular, although there is a high demand [Meißner et al., 2012; Rutten et al., 2008] from different sectors (e.g., freight shipment, drinking water supply, and energy production).
 The objectives of this study are to assess (1) the uncertainty from ECMWF ensemble forecasted precipitation and potential evapotranspiration; (2) the uncertainty from the parameters of two hydrological models by using the GLUE framework; (3) the uncertainty due to the initial conditions; and (4) the effect of these uncertainties on different low flow forecast quality and reliability measures.
 Assessing the isolated three major sources of uncertainty, i.e., model input, parameters, and initial conditions, in ensemble low flow forecasts by applying all steps from identification to communication of uncertainty, is an innovative way to understand and explain the effects of different uncertainties on the skill of low flow forecasts. In terms of model storage update, i.e., estimation of the model states at forecast issue day, a new method is proposed using observed discharge. This is superior compared to using only calibrated model run as there can be inevitable errors between simulated and observed discharge affecting the model initial condition. A further interesting aspect of this study is the use of a new hybrid low flow likelihood function for GLUE, which allows evaluation of low flows. Low flows in the Moselle River are investigated to allow the navigation and energy sectors to timely prepare for low flow conditions as they are the most important economic river functions [Li et al., 2008; Rutten et al., 2008; Svensson and Prudhomme, 2005]. Since the River Rhine is a large-scale river, the Moselle River is selected as a case study. We use an exceedence probability of 75% (Q75) as a threshold for the definition of low flows [Demirel et al., 2012]. The number of days with low flows is sufficient to calibrate a forecast model, and low flows at this threshold are still affecting the important river functions. Several types of ensemble weather forecast products from the ENS data set are incorporated in this study to prepare model inputs, i.e., daily precipitation (P) and potential evapotranspiration (PET) for a lead time of 10 days from the ENS data set. We address the model structure uncertainty by comparing two conceptual models with different complexities: the GR4J conceptual model (Génie Rural à 4 paramètres Journalier) with four parameters [Perrin et al., 2003] and the HBV conceptual model (Hydrologiska Byråns Vattenbalansavdelning) with eight parameters [Lindström et al., 1997]. These models are assumed to represent dominant low flow indicators (predictors) with their appropriate temporal scales as identified by Demirel et al. . The GR4J model is a French conceptual hydrological model with a simple structure [Perrin et al., 2003]. With four parameters, it provides a minimum level of complexity. The HBV model has been calibrated and operationally used for the River Rhine [Renner et al., 2009]. Moreover, this model has been widely used in Rhine studies such as for real-time flow forecasts [Reggiani and Weerts, 2008b], for climate impact assessment [Eberle, 2005; Hurkmans et al., 2010; Te Linde et al., 2010, 2011], and for assessing uncertainties in flood forecasts due to ensemble weather forecasts by using a Bayesian postprocessor [Reggiani and Weerts, 2008a; Reggiani et al., 2009].
 This paper is organized as follows. In the next section, the study area and data are presented. The model structures and the uncertainty analysis method are described in section 'Methodology'. The results are presented in section 'Results and Discussion', and the conclusions are drawn in section 'Conclusions'.
2. Study Area and Data
2.1. Study Area
 The Moselle River has a surface area of approximately 27,262 km2 and a length of 545 km. The source of the river is in the forested slopes of the Vosges massif and meanders before leaving France to form the border between Germany and Luxembourg for a short distance. The river enters Germany and flows past Trier to its confluence with the Rhine at Koblenz. Two major tributaries, the Sauer and Saar rivers, flow into the Moselle before the Trier dam. There are other dams in the Moselle and Saar rivers, whereas the Sauer river has a natural flow [Ackermann et al., 2000; Belz et al., 1999]. Moreover, the river channels in the Moselle and Saar are mostly canalized for water management purposes and available for river navigation, while the Sauer is not navigable [Behrmann-Godel and Eckmann, 2003]. Annual generated discharge in the Moselle basin is about 410 mm (∼130 m3s−1). The measured discharge at Cochem station fluctuates between 14 m3s−1 in dry summers and a maximum of 4000 m3s−1 during winter floods. The altitude ranges from 59 to 1326 m with a mean altitude of 340 m [Demirel et al., 2012].
2.2.1. Observed Data
 Observed daily precipitation (P) and potential evapotranspiration (PET) estimated with the Penman-Wendling equation [Abwassertechnische Vereinigung - Deutscher Verband für Wasserwirtschaft und Kulturbau e. V. (ATV-DVWK), 2002] were obtained from the German Federal Institute of Hydrology (BfG) in Koblenz (Germany). Both variables are spatially averaged, i.e., disaggregated over 26 Moselle subcatchments.
 The mean altitude of these subcatchments has been also provided by BfG. The outlet discharge (Q) for the Moselle (station 6336050 at Cochem) has been provided by the Global Runoff Data Centre (GRDC), Koblenz (Germany). The daily P, PET, and Q data series span from 1951 to 2006 (Table 1).
Table 1. Observed Data
Number of Stations/Subbasins
Discharge at Cochem station
2.2.2. Meteorological Ensemble Forecast Data
 Both precipitation and other meteorological forecast data used in this study are originated from the ECMWF-ENS control and ensemble forecasts. These ensembles are computed for a lead time of 1–10 days using perturbed initial conditions and model physics (Table 2). A grid size of 0.25° (∼28 km) is chosen to retrieve weather forecast products using the ECMWF Mars retrieval system. The PET forecasts are determined by the Penman-Wendling equation requiring only forecasted surface solar radiation and temperature at 2 m data [ATV-DVWK, 2002]. This is consistent with the observed PET estimation carried out by the Federal Institute of Hydrology in Koblenz, Germany. Both grid-based P and PET ensemble forecast data are first interpolated over 26 Moselle subcatchments using areal weights. These subcatchment averaged data are then aggregated to the Moselle basin level.
Table 2. Overview of the Ensemble Forecast Data
R2 (240 h)
MAE (240 h)
50 + 1 control
50 + 1 control
3.1. Overview of the Model Structures
 The two hydrological models (GR4J and HBV) are briefly described later. Figure 1 shows the simplified model structures.
 The GR4J conceptual model has a parsimonious structure with only four calibration parameters and has been frequently used over hundreds of catchments worldwide, with a broad range of climatic conditions from tropical to temperate and semiarid catchments [Perrin et al., 2003]. The GR4J model requires only daily time series of precipitation (P) and potential evapotranspiration (PET) as inputs (Figure 1a). The four parameters in GR4J represent the maximum capacity of the production store (X1), the groundwater exchange coefficient (X2), the 1 day ahead capacity of the routing store (X3), and the time base of the unit hydrograph (X4). All four parameters are used to calibrate the model and estimate the parameter uncertainty (Table 3) based on Tian et al.  and Thyer et al. . The upper and lower limits are selected based on previous works [Booij, 2005; Eberle, 2005; Perrin et al., 2003; Pushpalatha et al., 2011; Tian et al., 2012].
Table 3. Parameters of the Models Used and Their Prior Uncertainty Ranges
Capacity of the production store
−8 to +6
Groundwater exchange coefficient
One day ahead capacity of the routing store
Time base of the unit hydrograph
Maximum soil moisture capacity
Soil moisture threshold for reduction of evapotranspiration
Maximum capillary flow from upper response box to soil moisture zone
Measure for nonlinearity of low flow in quick runoff reservoir
Recession coefficient for quick flow reservoir
Recession coefficient for base flow reservoir
Maximum flow from upper to lower response box
 The HBV conceptual model was developed by the Swedish Meteorological and Hydrological Institute in the early 1970s [Lindström et al., 1997]. The HBV model consists of four subroutines: a precipitation and snow accumulation and melt routine, a soil moisture accounting routine, and two runoff generation routines. The input data are daily P and PET. Since the Moselle basin is a rain-fed basin, the snow routine and daily temperature data are not used in this study (Figure 1b). The eight most important parameters in the HBV model (Table 3) are used to estimate the parameter uncertainty [Engeland et al., 2010; Tian et al., 2012; Van den Tillaart et al., 2012].
3.2. Calibration and Validation
 The GR4J and HBV models are calibrated using the GLUE method and historical Moselle low flows for the period from 1 January 1971 to 31 December 2001. The calibration period is rigorously selected as the first forecast issue date is 1 January 2002 and the number of low flow events (i.e., 567 days with low flows) in the calibration period is reasonably long for hydrological models [Perrin et al., 2007]. The validation period spans from 1 January 1951 to 31 December 1970. The definition of low flows, i.e., discharges below the Q75 threshold of ∼113 m3s−1, is based on previous work by Demirel et al. .
 The GLUE method [Beven and Binley, 1992] uses the “equifinality” concept rejecting only one optimal parameter set, instead, it uses many parameters sets that provide relatively equal performance [Beven and Freer, 2001]. This method is developed as an extension of the generalized sensitivity analysis (GSA) of Spear and Hornberger  based on Bayesian Monte Carlo simulations. GLUE has been widely used for calibration of hydrological models since it is easy to implement and allows flexible definition of a likelihood function to evaluate the model outputs and to distinguish between behavioral (accepted) and nonbehavioral (rejected) parameter sets [Freer et al., 1996; Ratto et al., 2007; Renard et al., 2010; Shen et al., 2012]. Behavioral parameter sets are then those that provide predicted low flows that fall within the limits of acceptability with regard to a given likelihood measure [Zheng and Keller, 2007]. It should be noted that the selection of the behavioral parameter sets is based on only the calibration period runs. In this study, the GLUE method, consisting of the three steps later, is applied for the selection of behavioral parameter sets. It is assumed that these parameter sets represent the uncertainty in model parameters.
3.2.1. Step 1: Definition of a Hybrid Likelihood Function for Low Flows
 The most commonly used likelihood function in GLUE literature is the Nash-Sutcliffe (NS) coefficient [Beven and Freer, 2001; Nash and Sutcliffe, 1970; Shen et al., 2012]. However, other likelihood functions have been used for low flows [Pushpalatha et al., 2012]. In our study, we combined two low flow likelihood functions using subjectively selected weights. The new hybrid likelihood function (NShybrid) substantially improves the low flow forecasts as it combines NS based on only low flows (NSa) and NS based on inverse discharge values (NSb) (see equations (1)-(3)).
where Qobs and Qsim are the observed and simulated values for the jth observed low flow day (i.e., Qobs < Q75), and m is the total number of low flow days.
 where n is the total number of days (i.e., m < n), and is 1% of the mean observed discharge to avoid infinity during zero discharge days.
where both NSa and NSb values range from −∞ to 1, with 1 indicating a perfect fit [Pushpalatha et al., 2012]. The weights α and β are selected as 0.3 and 0.7, respectively. These weights have been determined during calibration period. First component of our hybrid likelihood function is strictly developed for the low flows. Therefore, the resultant scores for this component can often be negative. Second component considers the inverse of all discharge values. The weights are determined for making the outcome values of our hybrid likelihood function positive for the calibration runs. In other words, the weights keep the balance between very strict and less strict likelihood functions since in cases with very strict low flow calibration, i.e., high α values, both the GR4J and HBV models show results with very low likelihood values since the NSa values are negative for both models.
3.2.2. Step 2: Sampling Parameter Sets for Two Conceptual Models
 Previous model calibration and sensitivity analysis of the GR4J [Perrin et al., 2003; Pushpalatha et al., 2011] and HBV [Booij, 2005; Eberle, 2005; Tian et al., 2012] in other rain-fed basins have allowed the prior uncertainty ranges of sensitive parameters to be assessed. These studies also indicated significant uncertainties for the sensitive parameters and emphasized the importance of inspecting the upper and lower parameter limits more in detail. Therefore, a sensitivity analysis is pursued using a large parameter space to select the most important parameters and their appropriate upper and lower limit values.
 Independent uniform distributions for each effective parameter are chosen due to the lack of prior knowledge about the true distributions. The typical drawback of the GLUE method is the computational time caused by its random sampling strategy. Therefore, an improved sampling technique, i.e., Latin hypercube sampling (LHS), was used with the GLUE method [McKay et al., 1979]. Compared to a standard GLUE random sampling, LHS substantially reduces the computational burden for sampling and provide a tenfold greater efficiency in parameter space coverage [Shen et al., 2012]. The sampling size should be large enough to ensure a sufficient calibration of the model. In this study, we generated 120,000 parameter sets for each conceptual model using LHS in the range of lower and upper limits given in Table 3. To our knowledge, this is the largest LHS sample size tested in low flow hydrology.
3.2.3. Step 3: Threshold Definition for Behavioral Model Selection
 The GR4J and HBV models are run for each of the 120,000 sets in the calibration. The output is evaluated against the observed daily discharge at Cochem station located at the outlet of the Moselle subbasin using the NShybrid likelihood function to distinguish between behavioral parameter sets (accepted) and nonbehavioral parameter sets (rejected). The parameter sets meeting the predefined threshold criterion (NShybrid > 0.40) are accepted. Although the threshold value is a subjective decision [Jin et al., 2010], we rigorously tested several thresholds based on low flow simulations and the size of the behavioral parameter sets for each model. The selected threshold resulted in two large behavioral parameter sets for parameter uncertainty analysis, i.e., 9770 × 4 (GR4J) corresponding to ∼8% of the sample parameter set and 10,909 × 8 (HBV) corresponding to ∼9% of the sample parameter set.
3.3. Model Storage Update Procedure
 Model storage updating is based on the observed discharge on the forecast issue day (Qobs). This is a crucial step for medium-range and seasonal low flow forecasts since the model initial state determines the model outputs [Wöhling et al., 2006].
 There are two storages in the GR4J model and three storages in the HBV model which are updated during low flow forecasts. The reader is referred to Perrin et al.  and Lindström et al.  for details of the process formulations in these models. A practical approach is used for both models. First, the two calibrated models are run with their best performing parameter sets, and model states are analyzed. This run is called the “reference run” in other recent works [Fundel and Zappa, 2011; Roulin, 2007; Roulin and Vannitsem, 2005]. The empirical relations between the simulated discharge and the fast runoff for each model are used to divide the observed discharge between the fast and slow runoff components (equations (4) and (5)).
 The Qr and Qd in the GR4J model, and Qf and Qs in the HBV model are estimated using these fractions together with the observed discharge value at the forecast issue day. Subsequently, the routing storage (R) in the GR4J model is updated for a given value of the X3 parameter using equation (6). Further, the surface water (SW) and groundwater (GW) storages in the HBV model can be updated for a given value of KF, ALFA, and KS parameters using equations (7) and (8).
 The other two storages S (in GR4J) and SM (in HBV) are difficult to update using an empirical bottom-up approach. Instead, these two storages are updated using the calibrated model run until the forecast issue day (i.e., top-down approach). It is assumed that the two updated storages (S and SM) represent the reality due to the calibrated model run. However, we are aware that there are inevitable uncertainties associated with this rough estimation of the initial conditions based on observed discharges and calibrated models' simulations.
3.4. Uncertainty Sources and Quantification
 A robust assessment of uncertainties begins with identification of all sources [Refsgaard et al., 2007]. Obviously, not all sources can be quantified. We used available classification schemes to identify and select the most important uncertainty sources in the two conceptual models and for low flow forecasting [Walker et al., 2003; Warmink et al., 2010]. Three uncertainty sources and their quantification are described in the following sections. It should be noted that errors in model structure are also important for hydrological models [Götzinger and Bárdossy, 2008; Gupta et al., 2012; Renard et al., 2010; Tian et al., 2012]. In this study, the model structure uncertainty is addressed partly by comparing two different model structures.
3.4.1. Input Uncertainty
 A rainfall event after the forecast issue day can easily increase flows above low flow threshold in the Moselle River. Therefore, low flow forecasts are highly dependent on the quality of ENS weather forecasts. These forecasts are available at a spatial resolution of approximately 28 km × 28 km for daily time steps. It has been reported that after several days these forecasts are highly uncertain due to the modeling limitations and complexity of the physical processes involved in the atmosphere [Fundel and Zappa, 2011; Reggiani et al., 2009]. In this study, the 51 ensemble members are used to quantify the uncertainty of future precipitation and potential evapotranspiration amounts. Obviously, new uncertainties are introduced using the empirical PET formula [ATV-DVWK, 2002] and grid data interpolation over 26 Moselle subcatchments.
3.4.2. Parameter Uncertainty
 The GLUE method is used for quantification of parameter uncertainties. This method rejects the idea of an optimal system representation and applies the equifinality concept accepting all forecasts using the behavioral parameter sets. This parameter ensemble then allows assessment of the output uncertainty arising from model parameters and partly from model structure [Pappenberger et al., 2005]. All four parameters of GR4J and eight parameters of the HBV are selected to estimate the parameter uncertainty.
3.4.3. Initial Condition Uncertainty
 The importance of initial conditions for hydrologic forecasting is well established [Shukla and Lettenmaier, 2011; Wood and Lettenmaier, 2008]. In most of the hydrologic modeling studies, initial conditions refer only to land surface states including soil moisture and snow cover [Li et al., 2009]. In this study, however, all model states, present in the conceptual models used, are included in the uncertainty analysis. This is from the fact that errors in estimated initial slow and fast runoff storages directly affect the low flow forecasts. We demonstrate a dynamic inverse-modeling approach based on observed discharge and uniformly distributed behavioral parameter sets on the forecast issue day for exploring the initial condition uncertainties and characterizing the relative importance of this uncertainty source for low flow forecasts. The X3 parameter of GR4J and KF, ALFA, and KS parameters of HBV are uncertain parameters that are directly linked to the model states (i.e., initial conditions). Other parameters are only assumed uncertain in the parameter uncertainty assessment.
3.5. Uncertainty Propagation
 The three sources of uncertainty described earlier are propagated through the GR4J and HBV models both separately and together. The latter case is executed to encapsulate the total uncertainty arising from all three sources together. In other words, this study employs the GLUE, an extended GSA method [Freer et al., 1996; Ratto et al., 2007], to apportion the output uncertainty of a model to different sources of uncertainty.
 The 10 day low flow forecasts are issued every day for the test period from 1 January 2002 until 31 December 2005. The posterior distribution of the model outputs (e.g., confidence interval) is based on 10,000 Monte Carlo runs for each day (a total of 1461 days). The size of the Monte Carlo sample is assumed to be reasonable based on the number of behavioral parameter sets and on the relevant literature [Blasone et al., 2008; Franz and Hogue, 2011; Rossa et al., 2011; Shen et al., 2012].
 For assessing the effect of the uncertainty in the forecasted input data, we run the models using randomly selected P and PET values from 51 members, while the model parameters and model states are fixed according to the best performing calibrated parameter values (Table 4).
Table 4. Overview of the Uncertainty Propagation Test Scheme
Forecasted P and PET
GLUE set (X3, KF, ALFA, and KS)
 For assessing uncertainty in model parameters, we run the models using randomly selected behavioral parameter sets, while the model inputs are fixed to ECMWF-ENS control forecast P and PET, and the model states are updated using the best performing calibrated parameter values.
 For assessing the uncertainty in model states at forecast issue day, the routing storage (R) in the GR4J model is updated using randomly selected behavioral X3 parameter values, and the surface water (SW) and groundwater (GW) storages in the HBV model are updated using randomly selected values of KF, ALFA, and KS parameters from behavioral parameter sets for each of the 10,000 Monte Carlo runs for each day. The remaining model parameters are fixed to the best performing calibrated parameter values, and the model inputs are fixed to ECMWF-ENS control forecast P and PET to evaluate the initial condition uncertainty (Table 4).
 It should be noted that spatial and temporal consistency of the inputs are preserved to avoid nonphysical outcomes. For assessing the total uncertainty, we run the models using randomly selected model inputs, behavioral parameter sets, and corresponding model states. For example, the storage S can never exceed the X1 parameter value in the GR4J model [Perrin et al., 2003]. Similarly, we defined a “saturation rate” as the fraction of SM storage to the calibrated FC parameter.
 For each kth Monte Carlo run at each tth forecast issue day, a new parameter set is randomly selected from the behavioral set, and SM is calculated using equation (9). Therefore, the saturation rate is kept constant for a particular day throughout the entire uncertainty propagation framework.
3.6. Uncertainty Presentation
 Uncertainty presentation allows the low flow forecasts to be monitored, thus helping to improve forecast quality by analyzing uncertainties in the model outputs and allowing comparison of different models. Obviously, the added value of a low flow forecasts for decision makers depends on its uncertainty characteristics. We employed three forecast quality measures to analyze the results of the uncertainty quantification in 10 day low flow forecasts. These measures have been often used in meteorology [WMO, 2012] and flood hydrology [Renner et al., 2009; Thirel et al., 2008; Velázquez et al., 2010]. In World Meteorological Organization , three properties of an accurate probabilistic forecast are defined as reliability, sharpness, and resolution. In this study, three forecast quality measures have been rigorously selected to evaluate these three properties of the forecasts, i.e., reliability diagram—reliability, RCI—sharpness, and contingency table—resolution.
3.6.1. Relative Confidence Interval
 The standard 90% confidence interval (90CI) was derived by ordering the 10,000 outputs on every day in the test period and then identifying the 5% and 95% percentiles (i.e., Q5 and Q95). The 90CI, observed discharge, and 50% percentile (i.e., Q50 forecast median) are presented together. The relative confidence interval (RCI) is then calculated for only low flow days j using equation (10) to monitor the evolution of uncertainties with increasing lead time and to compare the effect of different uncertainty sources on the relative confidence interval.
where m is the total number of low flow days.
3.6.2. Reliability Diagram
 The reliability diagram is an approach used to represent the performance of probabilistic forecasts of selected events, i.e., low flows [Bröcker and Smith, 2007]. A reliability diagram shows the observed relative frequency as a function of forecast probability, and the 1:1 diagonal represents the perfect reliability line [Olsson and Lindström, 2008; Velázquez et al., 2010]. In the present study, nonexceedence probabilities of 50%, 75%, 85%, 95%, and 99% are chosen as thresholds to categorize the discharges from mean flows to extreme low flows. The forecast probability for each forecast day is estimated as the number of ensemble forecasts exceeding these thresholds divided by the total number of ensemble forecasts (i.e., 10,000 ensembles) in that forecast day. The forecasts are then divided into bins of probability categories; here, five bins (categories) are chosen 0%–20%, 20%–40%, 40%–60%, 60%–80%, and 80%–100%. The observed frequency for each day is estimated as one if the observed discharge exceeds the threshold, or zero, if not. The forecast probability and observed frequency can then be drawn.
3.6.3. Contingency Table
 We used contingency tables to assess the effect of uncertainty on the performance of low flow forecasts. Contingency tables, particularly used in flood warnings [Martina et al., 2005], can be used to estimate the utility of hydrological forecasts and in their simplest form, indicate the forecast models ability to correctly anticipate the occurrence or nonoccurrence of preselected events (i.e., Q75 low flows). The definitions of four cases are given in a two-by-two contingency table (Table 5).
Table 5. Contingency Table for the Assessment of Threshold-Based Forecasts
Low Flow Event (Q75)
Hit: the event forecast to occur and did occur
False alarm: event forecast to occur but did not occur
Miss: the event forecast not to occur but did occur
Correct negative: event forecast not to occur and did not occur
 The skill of a forecasting model can be represented on the basis of the hit rate and the false-alarm rate [Cloke and Pappenberger, 2009; Martina et al., 2005]. Both ratios can be easily calculated from the contingency table using equations (11) and (12).
 It should be noted that these ratios are also known as the probability of detection and the probability of false detection in other hydrological studies [Velázquez et al., 2010]. The hit and false-alarm rates indicate, respectively, the proportion of events for which a correct warning was issued and the proportion of nonevents for which a false warning was issued by the forecast model.
4. Results and Discussion
4.1. Calibration and Validation
 The best performing parameter sets of the two models are shown in Table 6. The corresponding highest NShybrid values are 0.62 for the GR4J model and 0.56 for the HBV model. The GR4J model performs better than the HBV model on low flows based on only the best performing simulation in the calibration period. However, the HBV model performed better in the validation period. The highest NShybrid values did not change using another global optimization technique, i.e., a genetic algorithm [Velázquez et al., 2010], showing that 120,000 parameter sets for each model are enough for calibrating the models. Considering the performance only in the low flow period (i.e., NSa), the performance of the HBV model is better than the GR4J model in the calibration period. However, the drop in performance of the HBV model in the validation and the forecast periods is much larger than for the GR4J model. Such a drastic drop outside the calibration period is expected from a relatively complex model like HBV with eight parameters since it has more degrees of freedom to adjust to the basin behavior during the calibration period. This characteristic is somewhat concealed by the NShybrid results due to the subjective weights and due to the insensitive inverse performance index (i.e., NSb). That is why all three performance indices have been presented in Table 6. The models are calibrated for a relatively wetter climate with ∼910 mm mean annual precipitation than for the validation period (∼890 mm) and forecast period (∼830 mm).
Table 6. Calibration, Validation, and Forecast Results
Parameter and Likelihood
 The calibrated models are run for the test period (i.e., 2002–2005) to estimate the fraction of fast runoff to total runoff in the two models (Figure 2) and to update storages S and SM (i.e., top-down approach). The exponential relation between the fraction values and the simulated discharge shows that the total discharge is dominated by flows from the fast runoff storage during high flows. These categorized values have been used to estimate fast and slow runoff storages in the GR4J and HBV models. The k_GR4J fraction is zero for low flows and 0.04 for high flows above 6 mm, whereas the k_HBV fraction is about zero for low flows and one for high flows above 6 mm. Table 7 shows the empirical equations fitted to the simulation data presented in Figure 2.
Table 7. Empirical Equations to Divide Observed Discharge (Qobs) Into Fast and Slow Runoff
Qobs Category (mm)
>Q75 and ≤6
4.2. Effect of Uncertainty on Confidence Intervals of Low Flow Forecasts
 For the purpose of determining the extent to which different sources of uncertainty affect low flow forecasts for a lead time of 10 days, the degree of uncertainty in model outputs is expressed by a 90% confidence interval (90CI). The 90CI, the forecast median, and the observed low flows for both models are shown in Figure 3. Daily discharge values (m3 s−1) have been presented on a logarithmic y axis.
 There are significant differences between the two model results as 10 day ahead low flows are mostly overestimated by the GR4J model under uncertain conditions. As can be seen from Figure 3 the overestimation is more pronounced for the parameter uncertainty case than for other cases. First thing to be considered are the dependencies and interactions between groundwater storages and model parameters since the fraction of fast runoff to total runoff is about zero showing that the discharge, during low flows, is mainly produced in the groundwater storage (Figure 2). The more pronounced overestimation of GR4J compared for the underestimation of HBV may indicate that the slow responding groundwater storage of the HBV is less sensitive to different behavioral parameter sets. The more complex soil moisture and percolation components of the HBV model can also be a reason for the successful low flow forecasts of the HBV model under uncertain conditions. The systematic overestimation of forecasted precipitation is, therefore, well handled by the HBV model. However, it should be noted that the GR4J model is slightly better than HBV in deterministic 10 day forecasts (Table 6). Further, the low flows are usually underestimated by the HBV model, as shown in the last plot in Figure 3b.
 Surprisingly, there have been excessive rainfall forecasts for several days in summer months (e.g., in August 2005) causing high forecasted discharges (Figure 3). These days were carefully examined to determine if they significantly change the overall RCI results. However, they caused very minor effects on the RCI results that are based on the mean of the confidence interval statistics from 567 low flow days. Thus, the uncertainty in 10 day low flow forecasts is larger in the GR4J model compared to the HBV model. The GR4J model overestimates low flows for all sources of uncertainty and for parameter uncertainty, in particular, whereas the HBV model tends to underestimate low flows.
 Figure 4 compares two models and the effect of different uncertainty sources on the RCI of low flow forecasts with increasing lead time. From Figure 4, we can clearly see that the total uncertainty in the GR4J outputs is much higher than in the HBV outputs. This is similar to the results that we have seen in Figure 3. Comparing only 10 day forecasts issued by the two models, the RCI is ∼110% for the HBV model and ∼300% for the GR4J model (i.e., nearly tripled). One anticipated finding is that the RCI tends to increase with increasing lead time for both models and for all evaluated uncertainty cases. The increase of RCI for the initial condition uncertainty is slowest, showing that the initial condition uncertainty is less sensitive to increasing lead time. This is expected since our storage update procedure only depends on observed discharge and some of the model parameters. However, the uncertainty due to the model inputs (forecast P and PET) increases considerably with increasing lead time. This is from the fact that the error in the ensemble meteorological forecasts increases for longer lead times due to the atmospheric model limitations. It is interesting to note that the 10 day forecasts are even better using zero precipitation as model input, i.e., NShybrid for the GR4J results is increased from 0.45 to 0.54 in the test period from 2002 to 2005.
 The total uncertainty for the GR4J model is sum of the three sources of uncertainty assessed in this study. Moreover, only half of the RCI comes from uncertain GR4J parameters. The most striking result to emerge from the RCI results is that the parameter uncertainty is dominating the total uncertainty in the HBV model outputs. It is somewhat surprising that nearly all uncertainty comes from the HBV parameters. Parameter interactions in the HBV model can be the main reason for the unexpected total uncertainty which is not the sum of the three sources of uncertainty. Interestingly, the results, as shown in Figure 4, indicate that input uncertainty is even smaller than initial condition uncertainty. On the one hand, this is not expected as ensemble meteorological forecasts are assumed to be one of the most important uncertainty sources in streamflow forecasts [Engeland et al., 2010; Thirel et al., 2008; Vrugt et al., 2008; Zappa et al., 2011]. On the other hand, the large range of soil moisture related parameters randomly selected from the GLUE behavioral parameter set could enhance the impact of initial condition uncertainty compared to input uncertainty from only 51 P and PET forecast ensemble members. In other words, the dominating effect of parameter uncertainty certainly determines the impact of the initial condition uncertainty due to the parameters used in the storage update procedure. Moreover, during low flow periods slow responding processes like groundwater are more dominant than precipitation as low flows usually occur after prolonged dry periods. This can explain the smaller effect of uncertainty from precipitation compared to the uncertainty from initial conditions. The total uncertainty in 10 day lead time low flow forecasts is not a linear sum of three uncertainty sources in the HBV model due to parameter interactions. This finding is in agreement with the findings of Zappa et al.  who showed the full spread obtained from uncertainty superposition of three sources is growing nonlinearly for a hydrological model (i.e., PREVAH (PREecipitation-Runoff-EVApotranspiration)) similar to HBV.
4.3. Effect of Uncertainty on Reliability of Low Flow Forecasts
 Figure 5 compares the reliability of 10 day ensemble forecasts of low flows for below Q75 and Q95 thresholds using the GR4J and HBV models. Figure 5 exhibits the portion of observed data inside predefined forecast intervals. The reliability plots based on forecasts associated with different uncertainties show that GR4J and HBV overestimate or underestimate middle forecast intervals, but the narrowest (i.e., 0%–20%) and the 90% intervals are correctly estimated. From Figure 5 we see clearly that the ensemble Q75 forecasts issued by HBV including only the input uncertainty are the most reliable forecasts, and it is confirmed that GR4J provides too wide forecast intervals if all sources of uncertainty are included. The plot for the evaluation of the Q75 forecasts using the HBV model in Figure 5 shows that the average overestimation for the total uncertainty is ∼25% inside the highest interval bin (i.e., 80%–100%). The Q75 low flow event will then occur in only ∼75% of the cases when it is forecasted to almost certainly happen, indicating that every fourth low flow warning will be a false alarm. Initial condition uncertainty has less effect than parameter uncertainty on the reliability of Q75 forecasts by the two models. For the Q95 low flows, all intervals except for the narrowest interval (i.e., 0%–20%) are overestimated by the HBV model. A comparison of the four subplots in Figure 5 reveals that the parameter uncertainty has a negative effect on the reliability of the forecasts. Moreover, the overestimation of the GR4J model and underestimation of the HBV model are also visible in Figure 5. Finally, the forecasts of extreme low flows (Q95) issued by the GR4J model are more reliable than the forecasts by the HBV model.
4.4. Effect of Uncertainty on Contingency Table of Low Flow Forecasts: Hits and False Alarms
 From operational point of view, the main purpose of investigating uncertainty from 10 day ensemble low flow forecasts is to improve the forecasts (e.g., hits) and to reduce false alarms and missed targets in the low flow contingency measures. Figure 6 shows the comparison between the GR4J and HBV models, based on the number of hits, false alarms, misses, and correct rejections for the preselected Q75 low flow events. It should be noted that a threshold probability of 0.5 is used to issue a low flow forecast alarm from 10,000 forecasts each day in the test period. Subsequently, a low flow event is assumed to occur if more than half of the 10,000 forecasts are low flows. Subsequently, the 10,000 × 1461 forecast ensemble matrix is transformed to a 1 × 1461 binary vector consists of zero (no low flow) and one (low flow) values by applying the aforementioned warning threshold of 0.5. This corresponds to a total of 1461 forecasts divided into four subplots in Figure 6. The threshold approach was not necessary for the deterministic run as we run the models only one time with calibrated parameters and control forecasts. The y axes of the subplots show the percentage of different contingency measures for each evaluated uncertainty source, i.e., input, parameter, initial condition, and total. We are aware that the contingency table is very sensitive to the preselected threshold and the number of forecasts [Devineni et al., 2008]. From Figure 6a, we can clearly see that the two models perform similar for the deterministic run, whereas the number of hits declines for the GR4J model forecasts for the total uncertainty. This suggests that adding parameter uncertainty to the model certainly reduces the number of hits. This is what we have seen also in Figure 3a. In case of the HBV model, there is no drop in the number of hits indicating that most of the low flow events (a total of 567 events in the test period) are correctly forewarned by the ensemble forecasts. This is a significant success for a hydrological model calibrated for low flows. Moreover, nonoccurrence of low flow events was also correctly indicated by the HBV model. This could be inferred from the correct rejections plot in Figure 6d.
 The most striking result to emerge from Figure 6b is that the percentage of false alarms is highest (∼50%) for the forecasts issued by HBV including only input uncertainty. This may seem contradictory with the results presented in Figure 5 as the same ensemble forecasts have been indicated as the most reliable forecasts. However, it should be noted that these two quality measures evaluate the forecasts from totally different aspects, namely, the reliability diagram for the reliability and the contingency table for the sharpness of the forecasts [WMO, 2012].
 Figure 7 shows the utility of the low flow forecasts as a function of lead time using the hit rate and the false-alarm rate derived from contingency tables. What is surprising is that the hit rate of GR4J drops significantly from 0.9 to 0.1 by increasing the lead time, whereas the hit rate of HBV is slightly increased from 0.9 to 1. Another important finding was that the hit rate and false-alarm rate of the GR4J and HBV models do not vary significantly as a function of lead time for the deterministic forecasts. Moreover, the false-alarm rate of GR4J does not change considerably by increasing the lead time and for different uncertainty sources. The drop in false-alarm rate is higher for the HBV model. The importance of parameter uncertainty for both models can be clearly seen in Figure 7. The effect of the storage updating procedure and input uncertainty on both models outputs is much smaller.
 The findings of the current study (see Figure 7) are consistent with those of Zappa et al.  who found slight decreases in the hit rates of low flow forecasts for a lead time of 1 day and 5 days.
 The performance of two hydrological models in the calibration, validation, and forecast periods have been compared, and the effect of different uncertainty sources on the quality of 10 day low flow forecasts has been assessed. We applied a systematic uncertainty analysis to identify where the uncertainty comes from and to provide quantified model output uncertainty information to make a robust model comparison. A hybrid performance metric is used for evaluating low flow simulations, whereas the quality of the probabilistic low flow forecasts has been assessed based on relative confidence intervals, reliability, and hit/false-alarm rates. Based on the results presented in this study we can draw the following conclusions.
 (1) The 10 day ensemble forecast results show that the daily observed low flows are captured by the 90% confidence interval for both models most of the time. However, the GR4J model usually overestimates low flows, whereas HBV is prone to underestimate low flows. This is particularly the case if the parameter uncertainty is included into the forecasts.
 (2) The total uncertainty in the GR4J model outputs is higher than in the HBV model.
 (3) The parameter uncertainty has the highest effect, and the input uncertainty has the smallest effect on the low flow forecasts.
 (4) A direct relation is found between the number of parameters and the parameter uncertainty according to the RCI results.
 (5) The parameter uncertainty for 10 day low flow forecasts issued by the HBV model with eight parameters is almost half of the parameter uncertainty coming from the GR4J with four parameters. This is because the rainfall-runoff process resulting in low flows in the study area is better described by the HBV model.
 (6) The forecast distribution based on 10 day low flow forecasts (i.e., Q75) issued by the HBV model was the most reliable forecast distribution if only input uncertainty is considered.
 (7) The number of hits is about equal for two models only if the input uncertainty is considered. The parameter uncertainty was the main reason reducing the number of hits.
 (8) The deterministic forecasts using the GR4J and HBV resulted in similar performance indices and also similar hit false-alarm rates.
 (9) The performance of the HBV model for correct rejections is remarkable, indicating that the model is not only successful for low flows, but also correctly indicates other flows above the Q75 threshold while being calibrated on low flows below Q75.
 (10) The number of false alarms is almost doubled for the GR4J model considering total uncertainty. The importance of parameter uncertainty on the quality of forecast is emphasized by all forecast quality measures used in this study.
 In essence, this paper has shown that the output from two conceptual hydrological models, calibrated for a medium sized ∼27.000 km2 river basin, fed by raw ECMWF meteorological forecasts, is characterized by substantial uncertainty from model parameters. This source of uncertainty effects both the reliability and the sharpness of the forecasts. This finding is new for low flow forecasts as the significance of the rainfall prediction error is well known and documented for high flows [Pappenberger et al., 2005]. This study has taken a step in the direction of assessing major sources of uncertainties in medium-range low flow forecasts, in addition to flood forecasts for the Moselle River. However, further research has to be conducted on the effect of uncertainties on seasonal low flow forecasts using coarse seasonal weather products. Different types of models, especially data-driven models, may be considered to include in the uncertainty analysis framework for assessing model structure uncertainty explicitly.
 We acknowledge the financial support of the Ir. Cornelis Lely Stichting (CLS), project 20957310. The research is part of the program of the Water Engineering and Management in the University of Twente, and it supports the work of the UNESCO-IHP VII FRIEND-Water program. Discharge data for the River Rhine were provided by the Global Runoff Data Centre (GRDC) in Koblenz (Germany). Areal precipitation and evapotranspiration data were supplied by the Federal Institute of Hydrology (BfG), Koblenz (Germany). REGNIE grid data were extracted from the archive of the Deutscher Wetterdienst (DWD: German Weather Service), Offenbach (Germany). ECMWF-ENS data used in this study have been obtained from the ECMWF data server. We thank Dominique Lucas from ECMWF who kindly guided through the data retrieval process. The GIS base maps with delineated 134 catchments of the Rhine basin were provided by Eric Sprokkereef, the secretary general of the Rhine Commission (CHR). The GR4J and HBV model codes were provided by Tian Ye. We are grateful to the members of the Referat M2–Mitarbeiter/innen group in BfG, Koblenz, in particular, Peter Krahe, Dennis Meißner, Bastian Klein, Robert Pinzinger, Silke Rademacher, and Imke Lingemann for discussions on the value of low flow forecasts, biases in the models, and statistical tests adapted to ensemble forecasts. The constructive review comments of Hamid Moradkhani (Associate Editor), Florian Pappenberger, Thomas E. Adams, and one anonymous reviewer significantly improved this paper.