In hydrologic modeling, state-parameter estimation using data assimilation techniques is increasing in popularity. Several studies, using both the ensemble Kalman filter (EnKF) and the particle filter (PF) to estimate both model states and parameters have been published in recent years. Though there is increasing interest and a growing literature in this area, relatively little research has been presented to examine the effectiveness and robustness of these methods to estimate uncertainty. This study suggests that state-parameter estimation studies need to provide a more rigorous testing of these techniques than has previously been presented. With this in mind, this paper presents a study with multiple calibration replicates and a range of performance measures to test the ability of each technique to calibrate two separate hydrologic models. The results show that the EnKF is consistently overconfident in predicting streamflow, which relates to the assumption of a Gaussian error structure. In addition, the EnKF and PF were found to perform similarly in terms of tracking the observations with an expected value, but the potential for filter divergence in the EnKF is highlighted.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 The study presented here focuses on the use of data assimilation techniques to manage the uncertainty in the modeling framework. Of the above-mentioned methods, data assimilation is attractive for a number of reasons. First, the data assimilation framework provides a methodology for handling all sources of modeling error simultaneously. Second, data assimilation is performed sequentially and therefore has potential in an operational framework, where the estimation of hydrologic quantities is desired at regular intervals. The last benefit of data assimilation is that it does not rely on the assumption of stationarity. Through a sequential estimation of parameters, data assimilation has the potential to handle changes in hydrologic flow patterns.
 In the hydrologic data assimilation literature, recent studies have examined the estimation of uncertainty in parameters of a hydrologic model, in addition to the more traditional state estimation [Moradkhani and Sorooshian, 2008]. Through the inclusion of parameters in the data assimilation process, it is hypothesized that the total uncertainty in the prediction can be more accurately characterized. Several recent studies of state-parameter estimation in hydrologic models have utilized the popular EnKF [DeChant and Moradkhani, 2011a; Franssen and Kinzelbeck, 2008; Leisenring and Moradkhani, 2011; Moradkhani et al., 2005b; Wang et al., 2009]. In addition to the EnKF, particle filters (PF) have been increasing in popularity for both state and state-parameter estimation [DeChant and Moradkhani, 2011a; Leisenring and Moradkhani, 2011; Montzka et al., 2010; Mordakhani et al., 2005a; Nagarajan et al., 2010; Rings et al., 2010; Salamon and Feyen, 2009; Smith et al., 2008; Weerts and El Serafy, 2006]. Of the recent attention that has been paid to state-parameter estimation in the EnKF and PF, little has been shown as to the robustness of these two techniques. It is necessary for the hydrologic data assimilation community to address the effectiveness of both techniques for state-parameter estimation over different scenarios to prove the applicability of the techniques, and relate the results back to the statistical theory and their inherent assumptions. This study aims to perform such an analysis with two conceptual rainfall-runoff models of differing complexities. Throughout this analysis, the importance of examining the behavior of techniques over many different scenarios is highlighted. This study is organized as follows. Section 2 discusses the formulation of the data assimilation techniques and the study basin. Section 3 discusses the experimental setup, including the hydrologic models, time-lagged replicates of the experiment, and the methods through which these replicates are validated. Section 4 presents the results of the data assimilation techniques followed by a discussion of the results and the conclusion in section 5.
2. Data Assimilation Techniques
2.1. Ensemble Kalman Filter
 The EnKF is an ensemble version of the Kalman filter, performed as a Monte Carlo simulation, in order to overcome the need for a linear model (Kalman filter) and the need to obtain the derivative of the model for calculation of the error covariances (extended Kalman filter) [Evensen, 2003]. Through an ensemble framework, the need for model linearization is relaxed and the error covariances can be calculated from the ensembles [Moradkahni et al., 2005b]. Implementation of the EnKF begins at the initial time step of modeling. At this initial time step, the model is supplied with an initial distribution of states and parameters. As the model progresses forward in time, the prior distribution of states is produced according to equation (1):
where f is the forward operator (hydrologic model), represents the model predicted (prior) states, represents the posterior model states at the previous time step, represents the meteorological forcing data, represents the prior model parameters at the current time step, represents the model error, i is the ensemble member, and t is the time step. In order to describe parameter estimation, it is also necessary to describe the estimation of the prior parameter distribution, which is shown in equation (2):
where η is a hyper-parameter to retain diversity in parameters, which was tuned to 0.01 for this application, and is the standard deviation of the prior parameter distribution at the previous time step ( ). In addition, the prior parameter distribution at the initial time step is developed using Latin-hypercube sampling. Prior to update of the model states and parameters, an observational operator must be applied to transfer the states into the observation space, as in equation (3):
where is the observational operator (hydrologic routing), which translates the surface water and storages to flow at the watershed outlet, is the prediction (streamflow), and represents the prediction error. After the prediction is obtained, the posterior states and parameters are estimated with the Kalman update equation as follows [Moradkhani, 2008]:
where is the observed flow, represents the observation error, and and are the Kalman gains for states and parameters, respectively. The Kalman gain in state space is calculated from equation (6):
In equation (6), is the covariance of the states ensemble with the predicted observation, is the variance of the predicted observations, and is the observation error variance at time . is the linearized observation operator . The Kalman gain for the parameters can be obtained similar to equation (6), as shown by Moradkhani et al. [2005b]. The model state error covariance can now be computed directly from the ensemble deviations ( ):
where is the ensemble size.
2.2. Particle Filter
 The PF, similar to the EnKF, sequentially calculates the posterior distribution of states and parameters. The advantage of the PF, in comparison to the EnKF, is that it relaxes the assumption of a Gaussian error structure, which allows the PF to more accurately predict the posterior distribution in the presence of skewed distributions [Moradkhani et al., 2005a]. This is accomplished by resampling sets of state and parameters, or “particles,” with higher posterior weights, as opposed to the linear model state updating of the EnKF. The PF used in this study is the sequential importance resampling (SIR) PF. Since SIR is used in this study, the PF will be referred to as PF-SIR to be specific to the method while presenting the results.
 Based on the recursive Bayes Law (equation (10)), the PF sequentially samples prior states and parameters to create an accurate posterior distribution, at each observation time step,
Equation (10) shows mathematically that the posterior distribution of model-predicted states ( ) and parameters ( ), given the observations ( ), can be computed sequentially in time. In this study, the probability of each particle is calculated via the normal likelihood equation (11):
The normalized likelihood, , can easily be calculated by:
This probability is necessary to transform the prior particle weights into the posterior via equation (13):
In the PF-SIR, prior particle weights, , are set equal to before moving on to the next time step. This results in a posterior weight, , equal to , which is the normalized likelihood. The SIR algorithm resamples the states with a probability greater than uniform probability. Leisenring and Moradkhani  examined weighted random resampling (WRR) in comparison with SIR for the SNOW-17 model and concluded marginal improvement in the performance of the PF. In this study, the SIR method is implemented as elaborated by Moradkhani et al. [2005a].
3. Experimental Setup
3.1. Case Study: Leaf River Basin
 This study takes place over the Leaf River Basin in southern Mississippi. The basin is 1944 km2 and is the main tributary of the Pascagoula River, which drains into the Gulf of Mexico. Data for this study was obtained from the National Weather Service Hydrology Laboratory, which consists of precipitation (mm d−1), potential evapotranspiration (mm d−1), and streamflow (cm3 s−1). This data set has observations from October 1948 through September 1988 providing 40 yr of data for analysis. The methods for utilizing the entire data set are described in section 3.2. A map of the Leaf River Basin is presented in Figure 1.
3.2. HyMod Model
 The HyMod model is a simple, conceptual, lumped model containing five calibration parameters. Based on these five parameters, the model allocates water between a series of three quick-flow tanks and one slow-flow tank, then routes the runoff to the outlet. A description of the parameters, and the possible range of their values, is provided in Table 1. In addition to the parameters, all five HyMod states are estimated as well. For a more detailed description of the model processes, see Moradkhani et al. [2005a].
Table 1. HyMod Model Parameter Descriptions With Feasible Ranges
Quick flow tank parameter
Slow flow tank parameter
Variability of soil moisture capacity
Maximum watershed storage capacity
3.3. Sacramento Soil Moisture Accounting (SAC-SMA) Model
 The SAC-SMA model, first introduced by Burnash et al. , is a conceptual water balance model used operationally at the National Weather Service River Forecast Center. The model simulates water storage with two soil moisture zones: an upper and a lower zone. The upper zone accounts for short-term storage of water in the soil, while the lower zone models the longer-term groundwater storage. Water can move vertically from the upper zone to the lower zone, laterally out of the system depending on the state variables and the parameterization, or vertically out of the system through evapotranspiration. Excess runoff is routed to the watershed outlet using a Nash cascade of three linear reservoirs. The SAC-SMA model parameters are summarized in Table 2. In addition to these parameters, all six SAC-SMA states and the storage in the three Nash-cascade reservoirs are estimated.
Table 2. SAC-SMA Model Parameters Description With Feasible Ranges
Upper zone tension water maximum
Upper zone free water maximum
Lower zone tension water maximum
Lower zone free primary maximum
Lower zone free secondary maximum
Additional impervious area
Upper zone depletion parameter
Lower zone primary depletion parameter
Lower zone secondary depletion parameter
Percolation and other
Maximum percolation rate
Percolation equation exponent
Impervious area of watershed
Free water percolation from upper to lower zone
Nash-cascade routing parameter
Riparian vegetated area
Deep recharge to channel base flow
Lower zone free water not transferable to tension water
3.4. Assumed Errors
 In any data assimilation framework, it is necessary to assume error values for any quantity that contains uncertainties. This study applies noise directly to the precipitation, potential evapotranspiration (PET), model predictions, and streamflow observations to account for their uncertainties. Precipitation is assumed to have a lognormal error distribution with a relative error of 25%. Similarly, PET error is assumed to follow a normal distribution with a relative error of 25%. Both these values are necessary to account for errors in meteorological measurements due to spatial heterogeneity of these variables and sensor errors. All prediction errors are assumed to be normally distributed with a relative error of 30% for HyMod and 25% for SAC-SMA. Differing values for these models reflects the different accuracies in streamflow prediction. Last, the streamflow observation errors are assumed to be normally distributed with a relative error of 15%. All errors in this study are assumed to be uncorrelated. Errors assumptions are applied with the same magnitude in both the EnKF and PF-SIR. The assumed values were determined through a manual tuning to achieve the most reliable predictions over the time-lagged calibration replicates. Though the assumed values were calibrated to achieve the most reliable ensemble prediction over the entire observation period, it is necessary to caution the reader that these are not necessarily the physically correct error terms. Since these errors were determined with very little a priori knowledge about the real error magnitudes, their estimation is ill-posed, as explained by Renard et al. , and therefore are uncertain.
3.5. Time-Lagged Calibration Replicates
 In order to examine the robustness of both the EnKF and PF, this study implements each assimilation technique in time-lagged calibration periods between October 1948 and August 1981. The setup of the time-lagged calibration periods is shown in Figure 2. Figure 2 shows the 21 different calibration time periods (lagged by 500 d) for the HyMod model and the 10 different calibration time periods (lagged by 1000 d) for the SAC-SMA model. Each model calibration is 2000 time steps, and assimilates streamflow observations at a daily frequency. During each separate calibration time period, performance measures were only calculated for the second half (1000 d) of the model run to allow for states and parameters to converge to plausible values. The smaller number of calibration replicates for the SAC-SMA model were used because of increased computational time and greater required storage space due to the increased number of states and parameters. Following the model calibrations, the posterior parameters at the last time step of each calibration are used to run the model during a validation period from September 1981 through January 1987. Each validation is performed with state-only estimation, using the estimated posterior parameter distribution from the calibration. During these validation experiments, all noise terms are consistent with the calibration except that no parameter perturbation or evolution is considered. This is because the parameter distribution is assumed to be constant during the validation. Performing the validation is intended to assess the performance of the posterior parameter distribution, created by each method, on an independent data set. This provides insight into the accuracy of the posterior parameter distribution.
3.6. Deterministic and Probabilistic Performance Assessment
 In order to provide a robust analysis of each assimilation run, it was necessary to calculate multiple performance measures. Four quantitative measures and two graphical measures were used to check assimilation performance. The first is the Nash-Sutcliffe efficiency (NSE), which is the only measure of the accuracy of the expected value (EV). This shows the ability of each technique to track the observation. In terms of probabilistic measures, the normalized root-mean-square error ratio (NRR), 95% exceedance ratio (ER95) [Moradkhani et al., 2006], reliability (α), rank histogram, and quantile-quantile plot (Q-Q plot) were examined. All probabilistic measures are ensemble verification techniques over a time series of observation (i.e., streamflow). It is important to note that α is a measure of the proximity of the Q-Q plot to uniform, which was suggested by Renard et al. . Renard et al.  also proposed a second reliability score, (ξ), which measures the percentage of observations falling within the ensemble prediction. In the analysis of this experiment, ξ is not utilized because the ER95 provides similar information. All measures are described in Table 3. Each of these performance metrics examines the ability of prior streamflow forecasts to predict the observed streamflow.
Table 3. Summary of Performance Measures
Nash-Sutcliffe efficiency (NSE)
A NSE equal to 1 is a perfect prediction, while a value of 0 indicates no skill beyond the streamflow variability.
A measure of the fit of the quantile plot to uniform. A value of 1 is exactly uniform and a value of 0 is the furthest possible from uniform.
See Q-Q plot for description of calculation
Normalized root-mean-square error ratio (NRR)
A measure of the spread of the ensemble in relation to the accuracy of the EV. A value of 1 is accurate spread, >1 is a narrow distribution, and <1 is a wide distribution.
95% exceedance ratio (ER95)
A perfect ensemble would have a 5% exceedance of the 95% predictive bounds.
Rank all observations by their location in the sorted (ascending) ensemble
A uniform histogram indicates accurate representation of uncertainty. For a detailed description of rank histogram interpretation, see Hamill .
A histogram is created of all time steps.
Quantile-Quantile plot (Q-Q plot)
Calculate the quantile of every observation time step
A Q-Q plot matching the uniform line indicates optimal ensemble prediction. For details of the interpretation of a Q-Q plot, see Laio and Tamea 
Sort the Q-Q matrix and compare with uniform distribution
4.1. HyMod Results
 Each calibration of the HyMod model was performed with different ensemble sizes from 10 to 1000. By applying the EnKF and PF-SIR with 15 different ensemble sizes, the performance of each assimilation technique, with respect to ensemble size, is analyzed. In order to display the results of all model calibrations, the performance measures for all 21 different lagged calibration periods are averaged at each ensemble size and plotted in Figure 3.
 Over the four subplots in Figure 3, some contradictory results are observed. First, it is noted that the EnKF produces a greater NSE and α than the PF-SIR at all ensemble sizes. This greater NSE and α suggest that the EnKF produced a more accurate expected value and a more reliable ensemble prediction than the PF-SIR. Though this suggests the EnKF is more effective than the PF-SIR, the NRR and ER95, suggest different results. The NRR indicates a perfect characterization of uncertainty when equal to one, while values less than one indicate too much spread (underconfident) in the ensemble prediction and values greater than one indicate an ensemble with too little spread (overconfident). Also, the ER95 will be 5% for an ideal distribution. ER95 greater than five suggests the distribution is too narrow and ER95 less than five suggests the distribution is too wide. With an NRR closer to one and ER95 closer to 5%, the PF-SIR appears to have produced a more accurate characterization of the uncertainty. The EnKF produced a more accurate prediction but had a stronger tendency to be overconfident (the uncertainty in the system is routinely underestimated). This suggests that, although the EnKF predicted the mode of the posterior distribution more accurately, it struggled to estimate the tails of the posterior distribution. In comparison, the PF-SIR was less accurate in estimating the mode of the posterior distribution but more accurate in estimating the tails. In order to verify that the averaged results of Figure 3 are representative of all 21 calibration time periods, Figure 4 is presented to show the performance measures of each time-lagged model calibration.
Figure 4 provides validation that the averaged results of Figure 3 are representative of all 21 time-lagged model calibrations. The NSE and α subplots show that the EnKF produced a more accurate expected value and reliable ensemble prediction in nearly every model calibration, confirming the results found in Figure 3. Although the mode of the posterior distribution is more accurately characterized by the EnKF than the PF-SIR, the NRR and ER95 subplots confirm that the EnKF had a greater tendency toward overconfidence than the PF-SIR. This also confirms the conclusion that the PF-SIR more accurately characterized the tails of the posterior distribution. This is an important observation with respect to the assessment of uncertainty in hydrologic forecasting. In order to accurately estimate the uncertainty in the modeling framework, it is necessary to accurately estimate the entire posterior distribution. The need for accurate estimation throughout the posterior is discussed further in the analysis of subsequent figures.
 While the performance measures presented thus far provide information into the general accuracy of the predictive distribution, a visual representation is necessary to provide further insight into the behavior of each model. Therefore, we further this analysis by visualizing the data in the form of rank histograms and Q-Q plots in Figure 5. The rank histograms and Q-Q plots presented here show the results from the predictive distributions with an ensemble size of 1000 for the last 1000 d of each model calibration, for both the EnKF and PF-SIR. By examining the rank histograms, it is observed that both the EnKF and PF-SIR have a large quantity of observations that fall in the outer bins of the distribution, indicating an overconfidence problem for each method. While this confirms the previous results, it also provides information on the bias in the predictive distribution, which was not previously addressed. Each method tended to overpredict the low flows, indicating the model struggles to predict low flows. Though both methods struggled with the low flows, only the EnKF underpredicted the high flows. This poor characterization of the high flows caused the higher NRR and ER95 values observed in Figures 3 and 4. In addition to the rank histograms, the Q-Q plots also indicate a tendency toward overprediction in the low flows using both techniques, and an underprediction of the high flows in the EnKF. The Q-Q plot shows how bias can make the interpretation of α difficult. Since the Q-Q plot for the EnKF crosses the uniform line, it actually produces a higher α, but provides a worse ensemble prediction than the PF-SIR. As was suggested previously, it is observed that the EnKF struggled to predict the posterior tails, particularly the tail producing the high flows, in comparison to the PF-SIR. In general, this poor estimation of the full posterior by the EnKF is caused by the assumption of a Guassian error structure. In predicting streamflow, highly skewed error structures are quite common, especially in models as simple as HyMod. In the presence of non-Guassian error structures, the EnKF still has the potential to predict the mode of the distribution, but is incapable of estimating the full posterior distribution. As has been observed thus far, the EnKF accurately predicted the mode of the posterior, but struggled in comparison to the PF-SIR in characterizing the full uncertainty, suggesting the PF-SIR is a more robust uncertainty estimator. This is the result that should be expected of a simple rainfall-runoff model.
 Up to this point, the ability of the EnKF and PF-SIR to estimate streamflow during calibration has been assessed, but the ability of each technique to estimate a posterior parameter distribution must be analyzed separately. In order to determine the accuracy of the posterior parameters from each data assimilation technique, a validation of each model calibration is presented. Similar to Figure 3, validation performance measures, averaged over 15 different ensemble sizes, are shown in Figure 6. Note the large differences between the calibration results in Figure 3 and validation results in Figure 6. Overall, from Figure 6 it is observed that all four performance measures indicate the PF-SIR was more accurate for ensemble sizes above 200. While the EnKF more accurately predicted the mode of the predictive distribution during calibration, the results from the validation suggest the PF-SIR performed better in all measures. In order to understand the cause of this shift in results from calibration to validation, the reader is reminded that a cross validation is performed, in which parameters from each calibration are applied to an independent data set. Since the parameters are applied to an independent data set, with a slightly different flow regime, it is essential to accurately estimate the parameter uncertainty. From the previous results, it is understood that the EnKF had a stronger tendency toward overconfidence than the PF-SIR. This overconfidence has led to overfitting, or underestimation of the uncertainty with respect to the posterior parameters from each time-lagged calibration. While the overfitting of these parameters seemed to be beneficial in streamflow prediction accuracy during the calibration, the validation highlights the negative effects of an incomplete characterization of parameter uncertainty.
 To illustrate the consistency of the previous results, scatterplots of the performance measures for the validation are presented in Figure 7. Results in this figure show that the PF-SIR was not only more accurate than the EnKF in terms of expected value and ensemble prediction, for nearly every validation the total uncertainty was more accurately estimated.
 Further insight into the behavior of these two techniques is observed through rank histograms and Q-Q plots in Figure 8. First, and most importantly, differences between the calibration and validation rank histograms must be evaluated. During the validation time steps, the overconfidence in the EnKF prediction appears to be exacerbated. Both the lower and upper bins of the histogram are taller during validation than the calibration. Further, over half of the observations fell below the predictive distribution during validation. In addition, a highly overconfident prediction is indicated by the flat Q-Q plot. While the EnKF became more overconfident during the validation, the PF-SIR estimated the uncertainty with similar accuracy to the calibration. This again suggests that the posterior parameters produced by the PF-SIR are a more accurate representation of the uncertainty than the posterior created by the EnKF. This is examined further in Figure 9, which shows the combined posterior distribution of each parameter from all calibration replicates. From Figure 9, it is clear that the EnKF converges to a smaller posterior distribution than the PF-SIR for each parameter. In conjunction with the streamflow results, this provides evidence that the EnKF poorly estimated the full posterior parameter distribution. Overall, the EnKF appears to be overfitting the parameters to the data during the calibration runs. A poor characterization of the full posterior in the EnKF is a result of the skewed error structure during the lagged calibrations. Since the EnKF assumes a normal error structure, the tails of the posterior distribution are incorrectly estimated, leading to narrow parameter distributions. This skewed error structure is theoretically less problematic in the PF-SIR, and the results support that.
4.2. SAC-SMA Results
 In section 4.2, results from the SAC-SMA model are presented. The additional model analysis is necessary for two reasons. First, it is important to analyze the effects of greater model complexity on the performance of each method. Second, using a different model allows for analysis of each technique's behavior under a different model structure. Similar to the HyMod model, the SAC-SMA was calibrated over each time-lagged period and analyzed with NSE, α, NRR, and ER95. In the SAC-SMA analysis, results of the performance, with respect to ensemble size, were similar to the HyMod results. In the interest of simplifying the results presentation, the quantitative performance measures are summarized in Table 4 for the 2000 ensemble member case.
Table 4. Performance Measures for the SAC-SMA Model During Calibration and Validation
 The results in Table 4 provide a few contradictory results in comparison to those found during the HyMod model calibration. First, the NSE suggests that the PF-SIR more accurately reproduced the observation than the EnKF during the calibration runs. This result is skewed by a single poor calibration by the EnKF (December 1967–May 1973), but in most calibration runs the expected value from the EnKF was nearly equivalent to the PF-SIR. During the single poor calibration run, filter divergence occurred in the EnKF. This case of filter divergence hints that the EnKF may be less robust than the PF-SIR, in terms of parameter estimation, and this is discussed in detail in section 4.3. In comparison to the HyMod model, the accuracy of the ensemble prediction is also different for the calibration. Unlike the NSE, the average α value during the calibration replicates is not subject to an outlier. This highlights the importance of model structure in the comparison of the EnKF and PF-SIR. Model structure strongly affects the ability of data assimilation techniques to update model states and parameters. This point is discussed further in section 5.1. In terms of ensemble spread and 95% predictive bounds, consistent results are found in comparison to the HyMod model. During the calibration, results suggest that both techniques have a tendency toward overconfidence, but to a lesser extent than in the HyMod results. A consistent trend is observed of the EnKF producing results that are more overconfident than the PF-SIR.
 In order to further examine the uncertainty estimation in the SAC-SMA model, the rank histograms and Q-Q plots are provided as well. From Figure 10, it is important to note that the accuracy of the SAC-SMA during extreme flow events is quite different from the HyMod model. In the HyMod model, both methods showed difficulty predicting the low flows, but in the SAC-SMA model both techniques have difficulty predicting the high flows. Though both techniques struggled to predict the high flows, the problem is amplified when using the EnKF. Overconfidence persists when using the EnKF to calibrate the SAC-SMA model, indicating that the error structure in the SAC-SMA model predictions is sufficiently skewed to violate the Gaussian assumption. Though the assumption is violated, the results appear to be less adverse than in the HyMod model, which is likely a result of generally more accurate predictions from the SAC-SMA model.
 To analyze the performance of the EnKF and PF-SIR in estimating the posterior parameters in the SAC-SMA model, the reader is directed to the validation results in Table 4. Interestingly, both techniques showed similar accuracy in the expected value of prediction. Though significant differences are found in the ability of each method to estimate the posterior distribution, the EnKF and PF-SIR perform similarly in terms of expected value. Similar to the HyMod validation, the PF-SIR produced both a more accurate ensemble prediction and more accurate 95% predictive bounds, according to Table 4. It is also important to note that the PF-SIR is underconfident during the validation according to the NRR, but the EnKF remains overconfident. This provides further evidence of the tendency of the EnKF to overfit the model parameters. While the EnKF overfit the parameters, the PF-SIR estimated a more accurate posterior, which in this case led to underconfidence in the validation. Underconfidence in a validation scenario would be expected because a wider range of flows was observed during the 10 calibration time periods than during the one validation time period. This behavior is also suggested by the rank histograms and Q-Q plots in Figure 11. Similar to the results found in the HyMod model, it appears that the predictive distribution from the EnKF is more overconfident during the validation than the calibration. The reverse trend is observed when applying the PF-SIR. Since the PF-SIR more accurately characterized the uncertainty in the parameters during the calibration, it can still effectively estimate the uncertainty during the validation. Overall, the predictive distribution produced by the PF-SIR appears to give a more accurate representation of the uncertainty than the EnKF, but both methods displayed similar ability to track the observation with an expected value. The consistently more accurate estimation of uncertainty in the PF-SIR, over multiple replicates in two separate models, suggests that it is a more robust estimator of uncertainty than the EnKF.
4.3. Divergence in the Ensemble Kalman Filter
 In section 4.3, an analysis of the divergence observed in the EnKF during the December 1967 to May 1973 model calibration is presented. Filter divergence can refer to two scenarios: slow loss of sensitivity of the model to the observation due to poorly defined error terms [Houtekamer and Mitchell, 1998], and catastrophic filter divergence in which the Gaussian assumption is violated because of extreme nonlinearities in the model, leading to severe overadjustments in the updates [Harlim and Majda, 2010]. The latter was observed in this experiment. In order to understand the problems associated with parameter estimation in the EnKF during this calibration period, it is important to compare the streamflow hydrograph produced by the EnKF and the lower zone tension water maximum (LZTWM) parameter evolution. Figure 12 shows that during 300 d of the EnKF calibration time period, the peaks in the EnKF prediction far overestimate the observation. This overestimation can be upward of 1000 m3 s−1. On the time steps when this overestimation is noticeable, it is observed that the LZTWM has sudden spikes in the lower 95th percentile of its distribution. During these events, the LZTWM of several ensemble members (58 in the largest event) are adjusted from 500 mm to 10 mm (the maximum value to the minimum value). Since this parameter is a capacity of lower zone tension water storage, this sudden drop forced the given ensemble members to release excessive amounts of water, leading to significant overestimation of the streamflow. When examining the scenario for the PF-SIR, this phenomenon is not observed. Unlike the EnKF, which can make large adjustments to state and parameter values in the event of large errors, the PF-SIR is more limited because it resamples, as opposed to adjusting, states and parameters. This makes the PF-SIR a more robust estimation technique, provided sufficient ensemble size. Though this is a rare occasion for the EnKF, as it was only observed once in this study and not well documented in the literature, it raises questions about the confidence that can be placed on the EnKF to accurately estimate model parameters, in particular as the nonlinearity of the model increases.
4.4. Computational Time
 In addition to presenting results on the accuracy of the EnKF and PF-SIR with respect to ensemble size, this study presents results examining the computational requirements of the EnKF and PF-SIR, with respect to ensemble size, to illustrate the computational demands of each technique. The growth of computational demand with ensemble size for each technique is shown in Figure 13. From Figure 13, a trend not commonly presented in the literature is observed. This figure suggests that the EnKF, at each ensemble size, is more computationally demanding than the PF-SIR. In addition, larger ensemble sizes and an increased number of states and parameters, lead to a larger difference in computational demands between the EnKF and PF-SIR. This increased computational demand in the EnKF is caused by the calculation of covariances between predictions and all model states/parameters. While performing this calculation once is quite fast, over 2000 time steps and 1000 ensemble members this calculation can create a significant computational demand. The growth in computational demand for the PF-SIR is less steep because it is only necessary to calculate a weight for each ensemble member and perform resampling of the ensemble members. It is important to clarify that this figure is not presented to give the impression that the PF-SIR is more computationally efficient than the EnKF, it is merely an illustration for the need to factor in the execution time of each technique, and not just the ensemble size, when determining which method is more efficient for a given application.
5. Discussion and Conclusion
5.1. Effects of Model Structure on Data Assimilation Techniques
 From the results obtained in this study, it is apparent that the model structures of the HyMod and SAC-SMA models have significantly different effects on the assimilation techniques. These differences are observed in both bias and uncertainty estimation. A bias in all of the results from the HyMod model was observed. This is different from the results in the SAC-SMA experiments, where a low bias was displayed for each method. In addition, the ability of each method to characterize the uncertainty in the model prediction of the HyMod and SAC-SMA are considerably different. It appears that with the increased complexity of the SAC-SMA model, uncertainty estimation, through data assimilation, is more accurate than appears to be possible in the HyMod model. This is somewhat intuitive as the increased number of parameters provides more flexibility in model structure. This comparison highlights that the accuracy of data assimilation techniques is model dependent and therefore model behavior must be examined when determining the effectiveness of a given data assimilation technique.
5.2. Overconfidence and Divergence in the EnKF
 In the results presented, two problems of the EnKF were identified: a general trend toward overconfidence in the prediction of streamflow and a specific occasion of filter divergence. The cause of these errors in the EnKF can be inferred by comparison to the PF-SIR. While it is observed that both the EnKF and PF-SIR are overconfident in the HyMod model, results from the SAC-SMA model show that only the EnKF was overconfident. In addition, the EnKF was found to overfit the parameters during calibration in both models. This suggests a deficiency in the EnKF for prediction of both parameter and predictive uncertainty. Since the EnKF is poorly estimating the full posterior distribution, the error structure appears to be too skewed for reliable estimation of the full posterior distribution. Provided that the error structure is sufficiently non-Gaussian, the tails of the posterior distribution will be poorly estimated. This is found to be a consistent problem in the EnKF, but is less severe in the SAC-SMA model, where an error structure is likely to be less skewed than the HyMod model. Though the higher accuracy of the SAC-SMA model led to an increased ability of the EnKF to estimate uncertainty, the greater complexity led to difficulties in the linear estimation of states and parameters. Filter divergence was caused by a nonlinear relationship between the prediction and the LZTWM parameter under certain flow conditions. Since there is a sufficiently nonlinear relationship between this parameter and the prediction under these flow conditions, the Kalman update value was severely overestimated and several ensemble members were shifted to opposite ends of the parameter limits, leading to significant errors in streamflow estimation. Because this only occurred during one of the time-lagged replicates, the model is not sufficiently nonlinear to damage model predictions in most flow conditions and is therefore difficult to document. Though it is rare, the potential for filter divergence raises questions about the robustness of the EnKF technique in increasingly nonlinear models.
5.3. Expected Value and Uncertainty
 In this study, verification of both techniques was performed through an analysis of the expected value and the predictive uncertainty to determine the benefits of each data assimilation method. It was important to analyze both expected value and predictive uncertainty to measure the ability of the model to track the observation, as well as represent the inherent uncertainty in the prediction. In section 4, contrasting results were obtained in comparing the accuracy of the expected value and uncertainty. In general, the EnKF and PF-SIR showed a similar ability to track the observation with the expected value, but differences were observed in uncertainty estimation. While the EnKF can be quite effective in predicting streamflow values, due to its restrictive assumptions, it struggles to predict uncertainty as accurately as the PF-SIR. This result highlights the importance of determining the goals of a study when implementing data assimilation on hydrologic models. A further conclusion is that if the goal is to track streamflow with an expected value, the EnKF may be able to perform this even at a smaller ensemble size, leading to higher computational efficiency, but the modeler must take precautions to ensure filter divergence does not occur. This result is consistent with previous studies [Zhou et al., 2006; Nagarajan et al., 2010; Weerts and El Serafy, 2006]. If quantification of the uncertainty in the prediction is important, the PF-SIR is likely a better choice. In general, it is suggested here that the characterization of uncertainty is important in most applications in hydrologic sciences and therefore needs to be discussed when using these techniques. The quantification of uncertainty is valuable from an operational and research standpoint and should therefore be examined closely given the application.
 Key Conclusions: Both the EnKF and PF show similar abilities to track the observations; EnKF consistently produces overconfident results in comparison to the PF; and PF is a more robust parameter estimation technique than the EnKF.
 Partial financial support for this study was provided by NOAA-CPPA grant NA070AR4310203 and NOAA-MAPP grant NA110AR4310140. We would like to thank the three anonymous reviewers for their constructive comments that improved the clarity of this manuscript.