Development of a formal likelihood function for improved Bayesian inference of ephemeral catchments



[1] The application of formal Bayesian inferential approaches in hydrologic modeling is often criticized for requiring explicit assumptions to be made about the distribution of the errors via the likelihood function. These assumptions can be adequate in some situations, but often little attention is paid to the selection of an appropriate likelihood function. This paper investigates the application of Bayesian methods in modeling ephemeral catchments. We consider two modeling case studies, including a synthetically generated data set and a real data set of an arid Australian catchment. The case studies investigate some typical forms of the likelihood function which have been applied widely by previous researchers, as well as introducing new implementations aimed at better addressing likelihood function assumptions in dry catchments. The results of the case studies indicate the importance of explicitly accounting for model residuals that are highly positively skewed due to the presence of many zeros (zero inflation) arising from the dry spells experienced by the catchment. Specifically, the form of the likelihood function was found to significantly impact the calibrated parameter posterior distributions, which in turn have the potential to greatly affect the uncertainty estimates. In each application, the likelihood function that explicitly accounted for the nonconstant variance of the errors and the zero inflation of the errors resulted in (1) equivalent or better fits to the observed discharge in both timing and volume, (2) superior estimation of the uncertainty as measured by the reliability and sharpness metrics, and (3) more linear quantile-quantile plots indicating the errors were more closely matched to the assumptions of this form of the likelihood function.

1. Introduction

[2] The use of conceptual hydrologic models necessitates some type of calibration procedure to determine optimum values for the models' parameters such that the predicted outcome most closely matches the observed. Typically this will be done either manually or automatically, with automatic methods becoming the primary method of use recently [Madsen et al., 2002]. Automatic calibration can be performed by a variety of methods such as global optimization algorithms [e.g., Duan et al., 1992], Monte Carlo methods [e.g., Beven and Binley, 1992], and Bayesian inferential approaches [e.g., Kuczera and Parent, 1998], and many techniques have the ability to incorporate multiple objective measures into their calibration logic [e.g., Vrugt et al., 2003].

[3] As interest in quantifying the uncertainty in model predictions has grown, the use of Monte Carlo (such as the generalized likelihood uncertainty estimation (GLUE) methodology) and Bayesian inferential approaches have gained interest, as well. Much debate has centered on the differences between these methods [see Beven et al., 2007; Mantovan and Todini, 2006; Mantovan et al., 2007]; in actuality these methods can be looked at as special cases of one another (i.e., GLUE is a less statistically strict form of the formal Bayesian approach).

[4] Bayesian inference is based on the application of Bayes' theorem, which states that

equation image

where Q is the data, θ is the parameter set, P(θ∣Q) is the posterior distribution, P(θ) is the prior distribution, and P(Q∣θ) is the likelihood function summarizing the model (for the input data) given the parameters. Even a cursory examination of equation (1) reveals that the choice of the likelihood function will play an important role. Because of its importance to the resultant posterior distribution, the likelihood function must make explicit assumptions about the form of the model residuals [Stedinger et al., 2008].

[5] It is in the likelihood function where the GLUE method largely deviates from the formal Bayesian approach. The GLUE methodology typically opts for informal likelihood (objective) functions rather than risking violating the strong assumptions that arise from the use of formal likelihood functions, with critics of the Bayesian approach holding the overriding view that the appropriateness of such assumptions is inherently difficult to satisfy in hydrologic problems, and thus the use of Bayesian methods is problematic [Beven, 2006a].

[6] Despite this criticism, the use of Bayesian methods has become increasingly common [e.g., Kuczera and Parent, 1998; Marshall et al., 2004; Samanta et al., 2007; Smith and Marshall, 2008, 2010; Vrugt et al., 2009]. However, the predominant usage of Bayesian methods in hydrology has been under the assumption of uncorrelated, Gaussian errors, as evidenced by a plethora of studies [e.g., Ajami et al., 2007; Campbell et al., 1999; Duan et al., 2007; Hsu et al., 2009; Marshall et al., 2004; Samanta et al., 2008; Smith and Marshall, 2010; Vrugt et al., 2006]. Although there are undoubtedly situations (perhaps catchments with a very simple hydrograph) in which the errors are adequately represented by such a likelihood function, these scenarios are more likely to be the exception than the rule in hydrology.

[7] Because the posterior distribution is proportional to the prior distribution multiplied by the likelihood function (equation (1)), it is important to consider the form of the errors (assumed by the likelihood) when implementing the Bayesian method. Given the tendency for errors arising from hydrologic models to be (at least) heteroscedastic [e.g., Kuczera, 1983; Sorooshian and Dracup, 1980], the application of likelihood functions should not be chosen on the basis of computational simplicity. The ultimate effectiveness of Bayesian inference relies on properly characterizing the form of the errors via the formal likelihood function and holds untapped potential for improvement in uncertainty estimation.

[8] Xu [2001, p. 77] points out that “in the field of hydrological modeling, few writers examine and describe any properties of residuals given by their models when fitted to the data”. This provides a potential limiting constraint on more appropriate and widespread use of Bayesian inference. The need to address the lack of normality in hydrologic modeling errors is not new, but there is little guidance on exactly how to go about selecting an appropriate likelihood for a particular data-model combination.

[9] Although the assumption of normality is common to hydrological modeling studies, several alternatives aimed at addressing typical characteristics of rainfall-runoff modeling errors have been made. For example, Bates and Campbell [2001] considered multiple likelihood functions that aimed to address the issues of nonconstant variance (via a Box-Cox transformation) and correlation (via an autoregressive model) in the modeled errors. Marshall et al. [2006] considered a Student's t distribution with four degrees of freedom due to its higher peaks and heavier tails. More recently, Schaefli et al. [2007] attempted to address the nonconstant variance problem through the use of a mixture likelihood corresponding to high and low flow states. In arid regions, there is further difficulty in appropriately capturing the form of the errors caused by extended periods of no-flow conditions. This can lead to errors that are severely zero inflated (i.e., a significant proportion of the residuals are zero) due to the ability of the model to correctly predict a zero flow at a zero flow observation.

[10] In spite of the variations in the likelihood function outlined above, the issue of residuals originating from distinct flow states, as is the case with ephemeral catchments worldwide, has not been addressed at length in the literature. In this paper, we focus on the development and application of a likelihood function that specifically focuses on addressing the potential problems uniquely caused by zero-inflated errors under a Bayesian inferential approach. This paper is divided into the following sections: section 2 introduces a formal likelihood function that addresses the zero-inflation problem common in arid catchments, section 3 introduces two test applications for the formal likelihood function to be analyzed, and section 4 provides a discussion of the results and the important conclusions of the study.

2. Formal Likelihood Function Development

[11] Conceptual hydrologic models can be thought of, in a generic sense, as being of the form

equation image

where t indexes time, Qt is the observed discharge, f (xt; θ) is the model predicted discharge, xt is the model forcing data (typically, rainfall and evapotranspiration), θ is the set of unknown model parameters, and ɛt is the error.

[12] Proper understanding of the form of the errors (ɛt) is vital to the success of Bayesian inference and must be properly modeled via the formal likelihood function. Because arid catchments can have extended periods of no streamflow, the entire streamflow record can be thought of as being composed of a mixture of two dominant streamflow states (zero discharge and nonzero discharge). Conceptualization of the observed streamflow record into multiple states or regimes is typical of operational rainfall-runoff modeling where the design of an engineered feature depends on a corresponding flow regime [see Wagener and McIntyre, 2005]. We propose the use of a formal likelihood function that is formulated as a mixture model with components corresponding to the dominant discharge states. The standard form of the finite mixture distribution [e.g., Robert, 1996] can be defined as

equation image

where gt), the true error distribution, is approximated by a mixture of components such that w1 + … + wM = 1, image are the components of the mixture (probability density functions with errors ɛt and parameters image and M is the number of mixture components. For a mixture where the error-generating component is known, the likelihood function can be written as [Robert, 1996]

equation image

where n is the number of observed values, ɛt are the errors generated by the component m, and all other terms are as previously defined. Mixture models have not been used extensively in hydrology, although Schaefli et al. [2007] recently introduced a mixture of two normal distributions (corresponding to low and high discharge states) likelihood function that attempts to address the nonconstant variance problem common to residuals in hydrological modeling.

[13] The mixture likelihood proposed here for arid catchments is conditioned on the discharge state and is formed using two primary components: (1) a component corresponding to the zero discharge state and (2) a component corresponding to the nonzero discharge state. The component addressing the zero discharge state is further separated into a secondary mixture based on the zero/nonzero residual state, modeled as a mixture of a binomial distribution for the zero residual state and a Gaussian (or transformed Gaussian) distribution for the nonzero residual state. The component corresponding to the nonzero discharge state is modeled as a simple Gaussian (or transformed Gaussian) distribution. Figure 1 represents this mixture likelihood graphically. The use of a multilevel mixture model is necessary due to the fact that only zero discharges have a realistic probability of being modeled exactly (ɛt = 0) due to double precision accuracy of the machine. Given this knowledge, the multilevel mixture likelihood is able to properly assign uncertainty to the different discharge states. The use of a zero/nonzero mixture model is not entirely new in environmental applications [see Tu, 2002], dating back to the 1950s and the delta model introduced by Aitchison [1955], but it has been used more recently in medical cost data applications [see Tu and Zhou, 1999; Zhou and Tu, 1999, 2000].

Figure 1.

Visual representation of the multilevel mixture logic at the foundation of the proposed likelihood architecture for use in ephemeral, zero-inflated catchments.

[14] Mathematically, the likelihood function can then be expressed as the product of three components such that equation (3b) becomes

equation image

where ρ is the probability of a zero residual and is computed as n1/(n1 + n2), n1 is the number of zero discharge observations modeled with zero error, n2 is the number of zero discharge observations modeled with nonzero error, n3 is the number of nonzero discharge observations (modeled with nonzero error), the total number of observations is N = n1 + n2 + n3, L0 is the likelihood function for the zero discharge observations with nonzero errors (evaluated over n2), and L1 is the likelihood function for the nonzero discharge observations with nonzero errors (evaluated over n3).

[15] In this study we compare the appropriateness of this mixture likelihood to several others. Table 1 introduces the four likelihood functions to be examined. For simplicity the likelihood functions have been labeled (A–D). Likelihood A represents the typical likelihood function used in hydrologic modeling studies; it assumes the errors are Gaussian and uncorrelated and does not account for zero inflation in the data. Likelihood B holds the same assumptions of normality that are made in likelihood A; however, the zero-inflation problem is considered explicitly under the form of equation (4). Likelihood C again ignores the zero inflation (as with likelihood A), but assumes the errors are heteroscedastic in normal space. A Box-Cox transformation [Box and Cox, 1964] was applied to the data such that

equation image

The value of the transformation parameter (λ) was selected following the recommendation of Bates and Campbell [2001] for medium-sized Australian catchments that are highly positively skewed. Likelihood C assumes that the errors are homoscedastic in transformed space. Finally, likelihood D makes use of the conditional mixed likelihood function logic presented in equation (4) and the Box-Cox transformation presented in equation (5).

Table 1. Mathematical Formulas and Parameters for Each of the Likelihood Functions Considered in This Study
Likelihood FunctionAssumptionMathematical FormulaaCalibrated Parameter
  • a

    Where N is the number of observations in the data set, n1 is the number of zero observations with zero error, n2 is the number of zero observations with nonzero error, n3 is the number of nonzero observations, ρ is the probability of a zero error given a zero observation, equation image = QobsQp, equation image* = log(Qobs + λ) − log(Qp + λ), λ = 0.5, and σ2 is the variance of the errors.

AIndependent, homoscedastic errorsLA = (2πσ2)N/2equation image exp equation imageσ2
BZero-inflated, independent, homoscedastic errorsLB = equation imageσ2
CIndependent, heteroscedastic errors transformed via Box-Cox transformationLC = (2πσ2)N/2equation image exp equation image (Qobs + λ)−1σ2
DZero-inflated, independent, heteroscedastic errors transformed via Box-Cox transformationLD = equation imageσ2

[16] Each of the likelihood functions has a single calibrated parameter (σ2) that represents the variance of the errors. Likelihoods B and D could be constructed such that there are two variance parameters that correspond to the zero discharge with nonzero error state and the nonzero discharge with nonzero error state; however, owing to the small number of zero discharge observations with nonzero error, this was avoided to reduce the computational complexity of the method. The values of n1 and n2 (and ρ) are dependent upon the model simulations, but they are not calibrated directly as n1 is the number of zero observations with zero error and n2 is the number of zero observations with nonzero error. These values are computed directly from the model simulations generated by each calibrated parameter set. As such, these variables are functions of the calibration but not calibrated. The Box-Cox transformation parameter used in likelihoods C and D was set to a fixed value following standard convention, but could be calibrated as well. Under the multilevel mixture likelihood architecture, likelihood B will reduce to likelihood A (and likelihood D to likelihood C) as the number of zero discharge observations approaches zero.

3. Test Applications

[17] In this section, we present two case studies to explore the differences between the likelihood functions (Table 1) described in the previous section in terms of their impact on model performance and predictive uncertainty assessment. First, a synthetic example is developed to provide a controlled situation in which the likelihood functions can be compared with the true solution known a priori. A second application is provided using real streamflow data from Wanalta Creek (an ephemeral catchment) in Victoria, Australia, to analyze the likelihood functions in a more realistic setting.

3.1. Implementation of Bayesian Inference

[18] The use of Bayesian inference in hydrology typically requires the use of numerical techniques to estimate the posterior distributions due to analytical intractabilities that arise from nonlinearities common in even simple hydrologic models. Markov chain Monte Carlo (MCMC) sampling methods numerically approximate the posterior distributions and have gained attention for their increasing use in hydrologic applications [e.g., Kuczera and Parent, 1998; Marshall et al., 2004; Smith and Marshall, 2008; Vrugt et al., 2008]. For this study, the adaptive Metropolis (AM) algorithm [Haario et al., 2001] was selected as the MCMC method for implementing the Bayesian approach. This algorithm has been shown to perform well in hydrologic problems [Marshall et al., 2004] and contains a logic that is very simple to apply. For further detail on the implementation of the AM algorithm to hydrologic modeling studies, refer to the study reported by Marshall et al. [2004].

3.2. Hydrologic Model and Forcing Data

[19] The hydrologic model selected for analysis and synthetic data generation was the simplified version of the Australian water balance model (AWBM) developed by Boughton [2004]. The model was selected for its conceptual simplicity and common usage in Australia. The AWBM (see Figure 2) requires daily rainfall and potential evapotranspiration data to produce estimates of stream discharge. The simplified version has three parameters: surface storage capacity (S), recession constant (K), and base flow index (BFI).

Figure 2.

Simplified version of the Australian water balance model structure with parameters S, BFI, and K.

[20] The model forcing data used here included catchment-average daily rainfall and mean monthly areal potential evapotranspiration. Stream discharge is used to calibrate the AWBM's parameters to optimize the selected likelihood function. The catchments and data selected come from a larger collection of 331 unimpaired catchments that were part of an Australian Land and Water Resources Audit project [Peel et al., 2000].

3.3. Synthetic Case Study

[21] In order to maintain an unambiguous understanding of cause and effect, a synthetic case study was devised. A time series of synthetic discharge was created using observed rainfall and evapotranspiration data from Warrambine Creek (Victoria, Australia) to force the AWBM with a set of fixed parameter values. This synthetic data series was then corrupted with noise consistent with the characteristics of the true data; the zeros in the discharge data were preserved to ensure the catchment remained ephemeral and the corrupting noise was formulated to exhibit heteroscedasticity (i.e., noise was added to Box-Cox transformed flows such that in untransformed space small discharges had less noise than large discharges). The design of the corrupting noise that was applied to the synthetic discharge followed the assumptions made in the mixture likelihood formulation presented in equation (4).

[22] In this application, for simplicity we fixed two of the three AWBM parameters (BFI and K) to their known, true values. The remaining calibration problem involved estimating the AWBM storage parameter (S) and the variance parameter (σ2) associated with each of the likelihood functions of interest (refer to Table 1) using the AM algorithm described in section 3.1. Diffuse priors were used for both S and σ2 to reflect the lack of prior knowledge about the parameters. The AM algorithm was implemented for each of the likelihood functions separately and run until the convergence of the posterior distributions was achieved. Convergence was determined by both visual assessment of the posterior chains and the Gelman and Rubin [1992]R statistic which considers convergence in terms of the variance within a single chain and the variance between multiple parallel chains.

[23] Figure 3 shows box plots derived from the posterior distributions of the AWBM storage parameter as a result of the model calibration corresponding to each of the four likelihood functions. The box indicates the interquartile range (25th–75th percentile), the central red line represents the median value, and the whiskers show the extremes of the posterior distribution. It is clear that likelihoods A and B (which assume the errors have constant variance) fail to identify the true value of the storage parameter (indicated by the green line), while likelihoods C and D (which assume the errors are heteroscedastic) are able to obtain optimal values very similar to the true value (optimal value deviates 0.48% from true value).

Figure 3.

Box plots of storage parameter derived from calibrated posterior distributions for each likelihood function of the synthetic case study with true value shown.

[24] From this case study, it is clear that likelihoods C and D provide better optimal values for the calibrated parameter and therefore better performance than was achieved with likelihoods A and B. This result is indicative of the importance of properly characterizing the error distribution in Bayesian approaches. However, while model performance is the most commonly cited factor in determining the fitness of one model versus another, predictive uncertainty is another aspect that should be addressed.

[25] A useful way of quantifying the uncertainty estimates obtained from the posterior distributions is through the reliability and sharpness summary measures [Smith and Marshall, 2010; Yadav et al., 2007]. The reliability measures the percentage of the discharge observations that are captured by the prediction interval, and the sharpness measures the mean width of the prediction interval. Ideally, the reliability should be equal to the desired interval percentage (i.e., 90% of observations should be captured by a 90% interval) with the smallest possible value of the sharpness.

[26] Table 2 contains the reliability and sharpness values corresponding to each of the likelihood functions, based on a 90% prediction interval. Note that values are presented for two cases: (1) for residuals on the entire simulation (i.e., zero and nonzero states) and (2) for residuals with nonzero values only. Because likelihoods B and D are conditioned on the zero/nonzero state of the discharge, the intervals over the zero observations that were modeled with zero error are not directly interpretable (interval width equal to zero). In the cases of both likelihoods B and D, all the zero discharge observations were modeled with zero error.

Table 2. Summary of the Reliability and Sharpness Results for the Synthetic Case Study for Each of the Likelihood Functions
  • a

    The optimal value of reliability is equal to the value of the desired interval, 90% in this case.

  • b

    The nonzero category values represent the collection of data observations which are modeled with nonzero error by likelihood functions B and D (all nonzero observations).

  • c

    The over/under values indicate whether the observations that were not captured by the interval lie above or below the interval bounds.

Reliabilitya (%)All96.398.289.596.6
Sharpness (mm/day)All4.012.210.991.17

[27] From Table 2, it can be seen that likelihood C has a reliability value for the entire data set of 89.52%; however, much of that is accounted for by capturing the errors on the zero streamflows and missing the peak values. Only 65.3% of the nonzero observations are captured by the interval. On the other hand, likelihood D captures 88.9% of the nonzero observations and models all zero observations with zero error. Similar results are seen in comparing likelihoods A and B, and interval accuracy tends to be worse (reliability deviates further from the target) due to the false assumption of homoscedasticity. It should be noted that an exact comparison between the reliability and sharpness between likelihoods C and D (or likelihoods A and B) is not directly applicable because they are not conditioned in the same manner (i.e., the likelihood functions use “different” data). Note that the model simulations are comparable for the different assumptions of the likelihood functions, just not the likelihood function values themselves.

[28] Figure 4 introduces quantile-quantile (QQ) plots of the residuals to provide a visual estimate of the accuracy of the distributional assumptions that underpin each of the likelihood functions tested. The use of QQ plots as an assessment tool for residual assumptions has been utilized in recent studies including those of Thyer et al. [2009] and Schoups and Vrugt [2010]. If the residuals belong to the assumed distribution, the QQ plot should be close to linear. A quantitative measure of the linearity of the QQ plot is given by the Filliben r statistic [Filliben, 1975] and is provided for each plot. The Filliben r statistic takes on values near unity as the plot becomes increasingly linear. In Figure 4, the residuals corresponding to likelihoods B and D are only those that are nonzero (based on the assumption of the mixture likelihood) and the residuals corresponding to likelihoods C and D have been Box-Cox transformed (based on the assumption of normality only in transformed space).

Figure 4.

Quantile-quantile plots of the errors for each likelihood function of the synthetic case study. The Filliben r statistic is given on each subplot, representing the degree of similarity (coefficient of correlation) between the two quantiles. Likelihoods B and D include only the nonzero errors, as these likelihoods assume the zero errors do not belong to the same distribution as the nonzero errors.

[29] Comparing the QQ plots for likelihoods A and B, it is clear that there is a tangible benefit in accounting for zero inflation as measured by the increase in the Filliben statistic. Likewise, comparing the QQ plots for likelihoods A and C shows the benefit of accounting for the heteroscedastic nature of the residuals. By accounting for both zero inflation and heteroscedasticity (likelihood D), the QQ plot becomes very linear (Filliben statistic near 1) and indicates the errors come from the same distribution as assumed in the likelihood function.

[30] This synthetic test application provides a relatively objective setting for the four likelihood functions considered to be compared, where the true form of the errors and values of the AWBM parameters are known in advance. The results illustrate the potential troubles caused by selecting an incorrect likelihood function in a Bayesian setting. All points of comparison (calibration performance, uncertainty quantification, distributional correctness) identified the likelihood function (likelihood D) whose assumptions coincided with the form of the noise used to corrupt the synthetic discharge time series as the “best” likelihood, as expected. The application to real data that follows will implement the same general analysis procedure, but focuses only on likelihoods C and D given the observed heteroscedasticity in the errors.

3.4. Real Case Study: Wanalta Creek

[31] The second application focuses on the assessment of likelihood functions C and D, which were compared with regard to model performance and uncertainty quantification. As in the synthetic case study, the simplified version of the AWBM was used to generate predictions of discharge; however, all three of the parameters (S, BFI, K) along with the variance parameter associated with the likelihood functions (σ2) were estimated with the AM algorithm. Again, diffuse priors were used on S and σ2, while the priors used for BFI and K were very similar to those suggested by Bates and Campbell [2001] for dry Australian catchments:

equation image
equation image

where (τ, α1, δ1, α2, δ2) denotes a mixed beta distribution having mixing parameter τ and shape parameters (α, δ) corresponding to the first and second components of the mixture.

[32] Thirty-six years of rainfall, evapotranspiration, and discharge data were available for Wanalta Creek. The mean runoff ratio of the entire data set was 0.074 with 69.5% of the observed discharges being recorded as zeros; however, there were regular discharge events occurring throughout the record of data (see Figure 5). To initialize the conceptual storage of the AWBM structure, the first year of the record was used as a warm-up period to the calibration phase. Of the remaining 35 years, 25 years were used for calibration of the model parameters and the final 10 years of record were reserved for model validation testing.

Figure 5.

Observed discharge at Wanalta Creek from 1961 through 1996.

[33] Figure 6 presents the results of the calibrated posterior distributions graphically as box plots. For the storage parameter (S), similar optimal values are obtained under both likelihood functions but the posterior distribution found with likelihood D is less peaked as seen by the wider interquartile range on the box plot. Similar results for posterior distribution width characteristics were found for the BFI parameter, where the posterior from likelihood D is less peaked than the posterior from likelihood C. For the recession constant parameter (K) it is clear that the choice of likelihood function is impacting the optimal value, where the posteriors do not intersect whatsoever. The variance parameter (VARP) shows a much smaller optimal value for likelihood C than for likelihood D due to the differences in how the data is conditioned; because likelihood C assumes that the errors corresponding to each of the observations arise from a single distribution, the estimated variance is impacted by the residuals that occur at the zero discharge observations which tend to be small.

Figure 6.

Box plots of each calibrated parameter derived from the posterior distributions for each likelihood function of the Wanalta Creek case study.

[34] However, despite the differences in the calibrated optimal parameter sets (S, BFI, K), the overall performances associated with each of the likelihood functions remain very similar (during both calibration and validation phases) for measures emphasizing errors in timing and volume. To assess the fit of the parameter sets, the root-mean-square error (RMSE; applied to the Box-Cox transformed discharges) was selected to indicate errors in timing and the deviation of runoff volume (DRV; as the ratio of the sums of predicted and observed discharges) was selected to indicate errors in total volume. Table 3 provides a summary of these results for likelihood functions C and D over the calibration and validation periods.

Table 3. Summary of the Reliability and Sharpness Results for the Wanalta Creek Case Study During Calibration and Validation Periods for Each of the Likelihood Functions C and D
  • a

    DRV, deviation of runoff volume.

  • b

    The optimal value of reliability is equal to the value of the desired interval, 90% in this case.

  • c

    The nonzero category values represent the collection of data observations which are modeled with nonzero error by likelihood D.

  • d

    The over/under values indicate whether the observations that were not captured by the interval lie above or below the interval bounds.

RMSE 0.25010.25120.24840.2481
DRVa 1.03321.03991.03121.0381
Reliabilityb (%)All95.096.995.597.2
Sharpness (mm/day)All0.540.480.540.40

[35] As with the synthetic application, 90% uncertainty intervals were generated and the reliability and sharpness measures were computed. Table 3 shows these measures for the entire data set (under the heading “All”) and for the portion of the observations which were simulated with nonzero errors (as identified by likelihood D; under the heading “Nonzero”). Because likelihood D is conditioned on the data that produce nonzero errors, comparing reliability and sharpness criteria across the entire data set is problematic. To address this complication, the uncertainty-based criteria were computed for likelihood C on the “nonzero” portion of the data alone as well. Focusing on the reliability of likelihood C, it can be seen that the 90% interval actually captures 95% of the entire set of observations, indicating that the interval is too wide. When considering only the observations that likelihood D models with nonzero error, the reliability is reduced to 85.5%, highlighting the fact that the interval constructed using the posterior information from likelihood C is being decreased by the zero observations and is not capturing the nonzero observations properly. Similar patterns exist for the validation period as well, but with even greater impacts. Likelihood D, on the other hand, shows better performance in the quantification of uncertainty, capturing 91% of the nonzero observations for the calibration period and 88% for the validation period. These results indicate likelihood D's superior ability to quantify the uncertainty associated with the nonzero discharge observations, while at the same time perfectly modeling 5992 of the 5993 (99.98%) zero discharge observations during the calibration phase and all 2802 zero discharge observations during the validation phase.

[36] An examination into the appropriateness of the assumptions of the likelihood functions are plotted visually as quantile-quantile plots in Figure 7. The difficulty in modeling severely zero-inflated and, in general, low-yielding catchments is clearly seen in these plots. There is obvious improvement in the linearity (improvement of the Filliben statistic) of these plots when moving from likelihood C to likelihood D, as a consequence of the way in which likelihood D treats zero streamflow observations. However, while likelihood D is capable of removing zero inflation from the residuals, the results are still impacted by a number of predominantly small model residuals associated with very low discharge observations. Despite this, likelihood D maintains an advantage in its ability to properly characterize the uncertainty of the predictions while not suffering any noticeable negative impacts on its predictive performance.

Figure 7.

Quantile-quantile plots of the errors for both likelihood functions of the Wanalta Creek case study. The left panels show the results for the 25 year calibration period, and the right panels show the results for the 10 year validation period. The Filliben r statistic is given on each subplot, representing the degree of similarity between the two samples. Likelihood D includes only the nonzero errors, as these likelihoods assume the zero errors do not belong to the same distribution as the nonzero errors.

[37] This test application to the discharge data recorded at Wanalta Creek in Victoria, Australia, has illustrated some of the difficulties that come about when modeling low-yield catchments. When considering the two alternative likelihood functions (C and D), it was found that the posterior distributions estimated by the AM algorithm were impacted. Although the posterior distributions were different (and for the recession constant parameter nonintersecting), similar performances were obtained in both timing (RMSE) and volume (DRV) measures (consistent with the equifinality principle). Nonetheless, the posterior distributions form the basis of uncertainty analysis procedures in Bayesian statistics and the differences in these posteriors translate to the observed differences in uncertainty quantification (as measured by the reliability and sharpness metrics). A better fit to the distributional assumptions of the likelihood functions was found for likelihood D as well, but the QQ plot revealed that further refinement of the likelihood function's form should be considered.

4. Discussions and Conclusions

[38] The increasing interest in uncertainty analysis that has gained momentum in hydrologic modeling studies recently has led to prevalent use of Bayesian methods. Such approaches, however, require the modeler to make formal assumptions about the form of the modeled errors via the likelihood function. Critics of the Bayesian approach point to these assumptions as the main weakness behind such methods and advocate for less statistically formal methods to avoid false enumeration of the associated uncertainty.

[39] The importance of selecting an appropriate likelihood function was investigated through two applications: (1) a synthetically produced time series of streamflow that was corrupted with zero-inflated, heteroscedastic noise and (2) a real time series of streamflow from Wanalta Creek in Victoria, Australia. In both applications the simplified, three-parameter Australian water balance model was selected as the hydrologic model and parameter estimation and uncertainty analysis was carried out by the adaptive Metropolis algorithm under a Bayesian framework.

[40] In the synthetic case study we examined four unique likelihood functions, each of which made different assumptions about the form of the errors. Of the four likelihoods, likelihood D was the true error model which the form of the errors used to corrupt the synthetic discharge record was designed to match. From our analyses it was apparent that the selection of a misinformed likelihood function (such as likelihood A) can impact the ability of the AM algorithm to identify the true parameter values (Figure 3) and properly quantify the associated uncertainty (Table 2 and Figure 4). Likelihood D, which assumed the errors to be heteroscedastic and zero inflated, was best able to achieve uncertainty intervals consistent with expectations (i.e., reliability close to the interval value of 90% and the smallest possible value of sharpness).

[41] In the Wanalta Creek case study we considered only two (C and D) of the four likelihood functions tested in the synthetic application based on the heteroscedastic nature of the residuals. Again, we considered both of the likelihood functions in terms of their ability to fit the observed data, properly quantify the uncertainty, and correctly match the assumed form of the errors. On the basis of the composite results of these criteria, likelihood D was again shown to be preferred due to its ability to account for the zero inflation of the residuals. Although each likelihood function's optimal parameter set resulted in similar fits to the observed streamflow during both calibration and validation phases (Table 3), the calibrated posterior distributions deviated significantly from one another (Figure 6). The differences in posterior distributions resulted in distinct characterization of the uncertainty intervals for each of the likelihood functions, as quantified by the reliability and sharpness metrics, which indicated more appropriate assignment of uncertainty by likelihood D (Table 3). The potential hazards of using an inappropriate likelihood function were also illustrated by the QQ plots (Figure 7), which showed a more linear relationship for likelihood D than for likelihood C but also indicated that likelihood D still has much room for potential improvement.

[42] Although the zero-inflation adaptation was shown to effectively improve the simulations for the synthetic case study (as quantified by the QQ plots, uncertainty metrics, and identification of true parameter values), the S-shaped nature of the QQ plots (Figure 7) for the real case study indicate that other factors are complicating the error structure (although the uncertainty metrics indicate decent performance). Among these are the potential for autocorrelated residuals, better transformations of the data to remove heteroscedasticity (potentially by further separation of the nonzero discharge likelihood function component into low flow and high flow states), input data uncertainty, and model structural uncertainty.

[43] While an investigation into each of these potential complicating factors is well outside the scope of this study's objective, many of these areas are topics of ongoing research into a generic likelihood selection framework. In the specific areas of input data uncertainty and model structural uncertainty, a wealth of recent research from the University of Newcastle research group [e.g., Kavetski et al., 2006; Renard et al., 2010; Thyer et al., 2009] has indicated the importance of addressing these components for improved quantification of total uncertainty. Further, Schoups and Vrugt [2010] have recently introduced a paper on the development of a flexible formal likelihood function that aims to address many of the issues caused by heteroscedasticity and autocorrelation. Although the results of the study offer an intriguing and valuable tool for future work, their flexible likelihood function still suffered many of the issues outlined here. In particular, their method was found to work well for a wet catchment, but experienced a breakdown in performance for a semiarid basin (nonephemeral, continuous streamflow).

[44] The test applications detailed in this study highlight the importance of checking the assumptions made in the likelihood function, while also demonstrating the potential consequences of not. Modeling studies in hydrology have been increasingly focusing on incorporating uncertainty assessment into the presented results, yet too few studies perform simple checks on the uncertainty estimates themselves. In this paper, we have shown that false assumptions in the likelihood function might not result in noticeable differences in the ability of the optimum parameter set to the fit the observed data. This is not overly surprising given the attention and support that has been given to the equifinality thesis [Beven, 2006b]. However, the unique benefit of the Bayesian approach is that it results in a proper probability distribution for each of the calibrated parameters (the posterior distribution). It is in the posterior distributions where the false assumptions of the likelihood function persist and can be propagated into false estimates of the associated uncertainty. In both the synthetic and real data case studies, likelihood D was found to perform well in terms of the uncertainty metrics (reliability and sharpness). This result indicates the importance of a multifaceted assessment strategy that considers multiple aspects of “performance”. Despite this we also acknowledge that further refinement is necessary to improve the linearity of the QQ plots.

[45] The formal likelihood function introduced here sought to address the problem of severely zero-inflated residuals. The problem of zero inflation has not been addressed in previous hydrologic modeling studies performed under a Bayesian inferential approach. The results indicate improved composite performance (fit, uncertainty, distributional correctness) in both synthetic and real data settings by explicitly accounting for this zero inflation through the application of the multilevel mixture logic summarized in Figure 1 and equation (4) when compared to the more traditionally applied likelihood functions. This type of conditional mixing logic is advantageous due to its computational simplicity (same number of parameters as the simpler approaches) and its extendibility to other settings. In fact as the catchment moves away from zero inflation, likelihoods C and D become more and more similar and eventually (when there are no zero discharge observations) become identical to one another. Future work is currently under way to extend and incorporate the likelihood function adaptation implemented here (designed to accommodate zero-inflated errors) with other previously used adaptations (autocorrelation, heteroscedasticity, distinct discharge state mixtures, etc.) into a generic framework that assists modelers in selecting an appropriate likelihood function based on the given data-model coupling to be used in Bayesian studies; this work aims to fill a void in the advancement of such methods in hydrology.


[46] This work was made possible by funding from the Australian Research Council. We thank Francis Chiew for providing the data used in this study.