Specifying a hierarchical mixture of experts for hydrologic modeling: Gating function variable selection

Authors

  • Erwin Jeremiah,

    1. School of Civil and Environmental Engineering, University of New South Wales, Sydney, New South Wales, Australia
    2. Now at the Strategic Asset Unit of State Water Corporation, New South Wales, Australia
    Search for more papers by this author
  • Lucy Marshall,

    1. Land Resources and Environmental Science, Montana State University, Bozeman, Montana, USA
    Search for more papers by this author
  • Scott A Sisson,

    1. School of Mathematics and Statistics, University of New South Wales, Sydney, New South Wales, Australia
    Search for more papers by this author
  • Ashish Sharma

    Corresponding author
    1. School of Civil and Environmental Engineering, University of New South Wales, Sydney, New South Wales, Australia
    • Corresponding author: A. Sharma, School of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW 2052, Australia. (a.sharma@unsw.edu.au)

    Search for more papers by this author

Abstract

[1] The complexity of predicting surface runoff from hydrological models is compounded by uncertainties associated with the model structure, parameters and inputs. A hierarchical mixture of experts (HME) is recognized as one of the ways of incorporating model structural uncertainty into hydrological simulations. In this article, a framework capable of incorporating parameter and structural uncertainties is implemented via the use of a hierarchical mixture of experts together with sequential Monte Carlo parameter sampling. The use of a HME enables aggregation of multiple constituent models at the same instance, mixed to different extents in a dynamic manner as specified by a gating function, allowing the modeler to better characterize the uncertainty associated with the obtained predictions. This article presents a mechanism for better specifying the structure of the gating function used for combining models in a HME approach, by investigating the combination of predictor variables that allows the best model mixing. These predictors exist in various forms, each of which represents information on the catchment. We apply three different types of predictors to a case study, the Never Never River catchment in Australia. The outcomes from this case study consistently demonstrate improved Bayesian information criterion (BIC) readings for the HME especially when used with a combination of predictors. The predictor coefficients are further used for regionalization with the Manning River catchment, having similar characteristics to the Never Never River catchment, and also demonstrate satisfactory improvement in BIC when compared with a single structure model.

1. Introduction

[2] Hydrological models such as conceptual models are designed to predict surface runoff with a given set of model parameters. However, if these models are applied deterministically, they are often unable to generate reliable predicted surface runoff [Gupta et al., 1998; Vrugt et al., 2003] due to the presence of a range of uncertainties; including parameter uncertainties, input data, and structural uncertainties [Clark et al., 2008; Gupta et al., 2006; Kuczera and Parent, 1998; Liu and Gupta, 2007; Thyer et al., 2009]. These uncertainties exist as a natural consequence of the design of the model, comprising of complex mathematical equations trying to represent the dynamics of an actual catchment [Wagener and Gupta, 2005].

[3] A number of approaches have been developed to address this issue, either by incorporating parameter uncertainties [Beven and Binley, 1992; Kuczera and Parent, 1998; Micevski and Kuczera, 2009; Sisson et al., 2006; Smith and Marshall, 2008; Wagener and Montanari, 2011], addressing structural uncertainties [Clark et al., 2008; Freer et al., 2004; Gupta et al., 1998; Kuczera et al., 2006; Posada and Buckley, 2004; Refsgaard et al., 2006; Reichert and Mieleitner, 2009], or both [Jackson et al., 2010; Liu and Gupta, 2007; Wagener and Gupta, 2005]. The idea of incorporating both uncertainties within a single framework is appealing since these uncertainties are interdependent and important in improving surface runoff prediction.

[4] The concept of combining information from multiple models to address both uncertainties has been practiced and has gained much popularity in other fields, as well as hydrology [Chowdhury and Sharma, 2006, 2009; Draper, 1995; Fuentes and Raftery, 2005; Hoeting et al., 1999; Raftery et al., 2005]. Similarly, Gupta et al. [2006] and Shamseldin et al. [1997] explored the possibilities of combining the results from multiple hydrologic models, which leads to the principle that each of the models can process information from a different region of the hydrographs (such as high or low flows), and when combined together, a better representation of the hydrograph is obtained. Other approaches with similar concepts have been proposed, but one in particular has displayed advantages when simulated with hydrological data, has the capability of catering for both parameter and structural uncertainties, and has provided a platform for multiple models to dynamically aggregate information on a time step scale: the hierarchal mixture of experts (HME) [Marshall et al., 2006, 2007].

[5] The HME [Hastie et al., 2001; Young and Hunter, 2010] has successfully been implemented in the fields of speech recognition [Jacobs et al., 1997; Peng et al., 1996] and robotics [Jordan and Jacobs, 1994]. In hydrology, the HME approach (Figure 1) probabilistically combines the contributions from multiple hydrological models, with the flexibility for the models to contribute similar or different types of likelihoods. Usefully, the model that each data point is best modeled by is assessed at each time step, rather than being averaged at the end of each simulation. This temporal dynamic averaging approach is important because hydrological data contains numerous active regions (such as the characteristics of rising and falling limb, receding base flows, lag time and peak discharge of the hydrograph) and these regions should be addressed individually.

Figure 1.

Schematic of a single tier HME with two hydrological models. The models received similar inputs (e.g., precipitation and evapotranspiration) and the output from each of the model, Qt,1 and Qt,2 respectively will be weighted probabilistically by the gating function. The final output from HME consists of the combined runoff from the models.

[6] Most hydrological streamflow predictions are based on the output from a single hydrological model structure, and this single structure represents the range of potential hydrologic responses to the input climatic drivers. However, a single hydrological model is often unable to map the potential range of events effectively especially when a single objective function is used for model calibration. The HME provides a solution by probabilistically combining multiple donor models together within a Bayesian inferential procedure. A practical mechanism to implement the HME is e.g., the sequential Monte Carlo (SMC) sampler [Arulampalam et al., 2002; Del Moral et al., 2006, 2007; Doucet et al., 2001; Fan et al., 2008; Moradkhani et al., 2005; Sisson et al., 2009; Jeremiah et al., 2011, 2012].

[7] Since hydrological model structures are considered rigid when compared with the dynamics of an actual catchment [Fenicia et al., 2011; Kavetski and Fenicia, 2011], additional data related to the catchment can give added information to improve streamflow prediction. This information can exist in various forms such as landscape characteristics, climate derivatives, and model parameters, where each of which may be considered as catchment predictors. Previous work by Marshall et al. [2007] demonstrated that, for example, using a single predictor, the preceding 7 days' cumulative rainfall, within the gating (i.e., model-averaging) function of a HME clearly improved the performance of the fitted models.

[8] In this article, we extend the initial work by Marshall et al. [2007] and investigate the utility of multiple catchment predictors to improve the performance of a HME in modeling catchment data. We consider predictors from both climatic derivatives (i.e., precipitation) and generated streamflows (i.e., excess surface runoffs and base flows), and select the most influential predictors using the Bayesian information criterion (BIC). Our case study for this method is the Never Never River catchment in Australia, where we evaluated the performance of a predictor-dependent HME when using multiple predictors, compared to a predictor-free model.

[9] Such predictors hold essential information about a catchment. Quite usefully, each predictor has the potential to be transposed to an ungauged catchment with similar catchment characteristics [Blöschl and Sivapalan, 1995; Singh et al., 2012; Yadav et al., 2007]. One of the main difficulties in calibrating an ungauged catchment is a lack of data [Sivapalan et al. 2003; Wagener et al., 2004]. Since these predictors' coefficients contain the records of the active regions of the calibrated catchment with sufficient data, there is a good possibility that they can provide a simple way to regionalize information to a relevant ungauged catchment. Nonmodel-parameter derivatives such as streamflow dynamics and precipitation records are thought to be better catchment predictors than transposed model parameters [Wagener and Wheater, 2006; Young and Hunter, 2010; Zhang et al., 2008].

[10] In this article, we extend the concept of regionalization to the gating function of the HME, thereby allowing the predictor dependent averaging of component models also to be transposed to relevant ungauged catchments. We evaluated this approach to regionalization with the Manning River catchment in Australia, which has similar characteristics to the Never Never River catchment.

[11] The structure of this article is as follows: Section 2 discusses the setup of the HME model framework, and its implementation using a SMC sampler. Section 3 provides the implementation details of the case study and experimental setup, and the results are discussed in section 4. The final section contains the conclusion.

2. The Hierarchical Mixture of Experts Framework

[12] The HME framework has the potential to account for model structural uncertainty and to dynamically address parameter uncertainty through the aggregation and mixing of information from different hydrological models. However, in order to account for the parameter uncertainty in a HME in a Bayesian framework, a suitable Monte Carlo sampler is required. In this article, we utilize the SMC sampler due to its robustness and efficiency in sampling from the posterior distributions of hydrological models [Jeremiah et al., 2011, 2012].

2.1. Hierarchical Mixture of Experts

[13] Often, many uncertainties generated from surface runoff predictions are assumed to be a product of poorly specified model parameters, but these errors are also partly a function of misspecification of the model structure [Marshall et al., 2007] and input uncertainty error [Thyer et al., 2009]. Hydrological models are designed to represent the dynamic interaction of catchments through complex mathematical equations [Wagener and Gupta, 2005]. These models are only able to provide an approximation of the overall events and inevitability fail to provide a perfect simulation [Liu and Gupta, 2007]. Model uncertainty [Gupta et al., 2006] contributes significantly to predictive uncertainty, though it is not easy to quantify. Similarly, input uncertainty may have a detrimental effect on parameter estimation and model predictions, unless it is explicitly incorporated into the model [Kavetski et al., 2006; Thyer et al., 2009].

[14] Using a single set of model parameters obtained through one type of objective function [Gupta et al., 2006] realistically only represents one potential outcome for any modeling exercise. By using a multimodel framework, the modeler has the flexibility of using multiple models and their interaction [Butts et al., 2004] to better represent different aspects of the hydrologic response, such as receding limbs, low- and high-flow regimes. The advantage in aggregating information from a mixture of models is to obtain a pool of information that describes various aspects and characteristics of the hydrograph [Freer et al., 2004], hence providing a better representation of the runoff prediction.

[15] In the HME approach to model aggregation, each of the models provide insight on the hydrologic data they model, and the final predicted runoff is a weighted aggregation of the individual model predictions. The models are weighted probabilistically in time based on information from the catchment characteristics or dynamics. The benefit of assessing the models at each time interval of the data is that the approach better captures the dynamic processes that occur in different regions of the data, depending on the state of the catchment or relevant dynamic factors [Wagener and Gupta, 2005].

[16] An example single tier HME with two hydrological model components is shown in Figure 1. These models receive similar inputs (e.g., precipitation and evapotranspiration), which are represented by xt in equation (4). The individual models outputs are denoted Qt,1 and Qt,2 respectively and are weighted by the gating function, in accordance with information from the catchment predictors. The HME output, Qt is therefore this probabilistic aggregation of the combined components.

[17] For the model outputs to be considered probabilistically, the outputs are weighted via a mathematical function known as the gating function, g{t,h}. The gating function essentially acts as a switch selecting probabilistically each of the models used in the HME framework. The gating function can be expressed as

display math(1)

where gt,h is the weight of model h at time step t, fh is user specified gating function, H is the number of component models, and Xt is a matrix of predictors describing the catchment states. These weights determine the preferred model at each time step, t, based on information from selected catchment predictors. As a result, the parameters for each component model are conditioned on the model inputs and the probability as specified by the gating function [Marshall et al., 2007].

[18] When implementing a two-component HME, Marshall et al. [2006, 2007] found that a linear logistic function [Darlington, 1990] was appropriate and effective in mapping a predictor to the models. A detailed examination of different types of gating functions can be found in Marshall et al. [2006]. In the logistic case, for a two-component HME, the gating function is

display math(2)

where b is a vector of predictor coefficients (gating parameters). Gating functions with three or more components could, for example, express g{t,h} (for h=1,.,H, representing the number of components) as a logistic regression on Xt as in (2), with a component-specific vector of predictor coefficients, and then normalize these over the number of components. See e.g., Marshall et al. [2006] for discussion on various regression model implementations, and other variations including multiple gating functions.

[19] Marshall et al. [2006] used a HME with two Australian Water Balance Model (AWBM, see section 3) components with t-distributions to describe the model errors in a hydrologic application. Specifically, the model errors were assumed to be independent and identically distributed with d=4 degrees of freedom [Bates and Campbell, 2001; Marshall et al., 2004], so that the likelihood function (error distribution) for each model is given by

display math(3)

where σ2 is a scale parameter and inline image is a vector of model parameters, distinct for each HME model component. For the remainder of this article, we will use a two-component HME with a t error distribution for each component. As with any model-fitting procedure, it is important to determine that model assumptions are satisfied in practice e.g., through residuals and other diagnostic checks.

2.2. SMC Samplers

[20] Parameter uncertainties are inherent errors from hydrological modeling where the ideal model parameters are not achieved via the calibration process. Hydrological models exist in various forms, ranging from simple lumped models such as the AWBM [Boughton, 2004] and the Probability Distributed Model (PDM) [Moore, 2007], to more sophisticated models such as MODHYDROLOG [Chiew and McMahon, 1994]. All these models have a similar aim, which is to calculate catchment runoff based on a given set of parameters. However, despite the best combination of parameters and model structure, these models produce errors in their predictions. Bayesian inference (implemented by an appropriate Monte Carlo sampler) is able to probabilistically represent these errors and summarize parameter uncertainties in a distributional form.

[21] In this article, the predicted surface runoff of a model is defined as

display math(4)

where yt is the observed flow for a catchment at time inline image is the corresponding model output (estimated flow), xt is the model input (precipitation and evapotranspiration) at time t, inline image is a vector of unknown variables or model parameters, and inline image is an error term assumed to be independent and identically distributed. The complete vector of unknown parameters is given by inline image, where inline image denotes any unknown parameters in the distribution of the error term inline image.

[22] Bayesian inference provides a framework to probabilistically represent model parameter uncertainties, through the posterior distribution inline image. The posterior distribution is proportional to the product of the likelihood (model) function, inline image and the prior distribution inline image, which represents prior information on the parameters relevant to the catchment, so that inline image.

[23] An ideal Monte Carlo sampler should possess characteristics such as the ability to effectively sample in high-dimensional spaces, including those exhibiting complex nonlinear dependencies between the parameters, as well as the ability to identify a multimodal posterior distribution even if the posterior is located in the tail of the prior distribution. The SMC approach is capable of fulfilling these demands and has been gaining attention in the hydrologic literature due to its robust nature [Jeremiah et al., 2011, 2012; Moradkhani et al., 2005].

[24] The SMC algorithm initializes with a population of weighted samples, known as particles, generated randomly from an initial sampling distribution, commonly the prior, inline image. The particle population is then propagated from inline image through a sequence of intermediate distributions, inline image, until ultimately the particle population represents a weighted sample from the target (i.e., posterior) distribution, inline image. The particle population in this article is represented as N. The sequence of intermediary distributions may be constructed in a number of ways. In this article, we make use of the geometric bridge [Del Moral et al., 2006, 2007; Doucet et al., 2001; Drovandi and Pettitt, 2011; Fan et al., 2008] which defines inline image as

display math(5)

for s=0,.,S where the distributional sequence is described by inline image. The distribution inline image (when s = 0 and inline image) corresponds to the initial sampling distribution, and inline image (with s = S and inline image) corresponds to the targeted posterior distribution. The sequence of the inline image accordingly represents a smooth transition between the initial sampling distribution and the posterior distribution. The geometric bridge approach has proven to be effective in sampling from the targeted posterior distribution in complex hydrological models [Jeremiah et al., 2011, 2012].

[25] The SMC sampler involves three mechanisms at each stage of the sampler: reweighting, resampling (when needed) and mutation (move). Resampling allows those particles with smaller weights to be discarded in favour of particles with higher weights while the mutation step is responsible of increasing the particle diversity. Here, we follow Fan et al. [2008]'s approach and implement a Metropolis-Hastings MCMC update on each of the particles, with the stationary distribution of the update given by πs. However, through the process of reweighting and mutation, the quality of the particle approximation to πs will deteriorate as s increases. Therefore in order to control the particle degeneracy, a threshold is set in placed via the effective sample size (ESS). The ESS corresponds to the effective number of independent particles in the sample, and satisfies the condition 1 ≤ ESSN. Once the particle population ESS drops below the specified threshold, resampling process is executed. Specifically, we implement the SMC algorithm as presented in Jeremiah et al. [2012].

2.3. Incorporating the HME in the SMC Sampler

[26] In implementing an SMC sampler for the HME, it is convenient to introduce a set of latent indicator variables, {zt}, that in each posterior sample, define when the HME output is considered to be generated by a given component model through the gating function. Use of indicator variables is common in the Bayesian analysis of mixture models [Gelman et al., 1995] as it considerably simplifies the conditional independence structure of the model, and allows more convenient, lower dimensional sampler updates to be made. Naturally for a HME rationale with two components, the sampling of {zt} as a conditional simulation of independent Bernoulli random variables with probabilities quantified as:

display math(6)

[27] See e.g., Marshall et al. [2006] and Gelman et al. [1995] for further discussion on this idea.

[28] Defining inline image and inline image to be the parameters of the model and error distribution of each HME component, the full vector of unknown model parameters is given by inline image. The following steps help to summarize the implementation of the SMC with the HME rationale. The initialization sequence s = 0 starts with the random generation of the unknown parameters, which are the hydrological models, inline image and inline image, the gating parameters, bs, and the latent variables, inline image. It is also necessary to select the catchment predictor, Xt that will be used within the HME rationale. Further reading on SMC initial condition settings can be referred to [Fan et al., 2008].

[29] Initialization:

[30] For i =1,…,N (N representing the number of particles)

[31] 1. Generate unknown parameters

[32] 2. Identify predictor variables, Xt,

[33] 3. Identify the distributional sequence, 0 = β0 ≤ β1 ≤ … ≤ βS

[34] 4. Set SMC weights, inline image

[35] The sampling and acceptance of the unknown parameters call for the use of the SMC processes, which are the resampling, reweighting and mutation. Within the mutation step of the SMC sampler the parameters from each model, inline image, are updated in turn, followed by an update for the indicator variables, inline image. Then the gating parameters, b, are updated (which in turn modifies the relative weightings of each model), again followed by an update for inline image. However in order to ensure that the updating processes are accomplished efficiently, parameters from component 1 are updated prior to the other parameters by proposing inline image from a proposal density. Similar to an MCMC independent scheme, the proposal density is specified as inline image, where inline image represents the weighted sample covariance matrix of inline image. The remaining unknown parameters are kept in the previous state, s−1, and used to calculate the acceptance probability, inline image;

display math(7)

where inline image for this instance represents inline image and inline image is the joint likelihood of the parameters. Accepted component 1 parameters are denoted as inline image. This procedure is then repeated for component 2 parameters, gating parameters, model weights and finally latent variables respectively.

[36] Update parameters for Model 1

[37] 1. Update inline image with paramebters inline image.

[38] Update parameter for Model 2

[39] 1. Update inline image with parameters inline image.

[40] Update predictor coefficients (gating parameters)

[41] 1. Update bs with Xt and parameter inline image.

[42] 2. Update Xt if applicable.

[43] 3. Update the model weights.

[44] Update new latent variable

[45] 1. Update inline image with parameters inline image.

[46] Once all the variables are updated, the subsequent steps are SMC reweighting and resampling. Reweighting is implemented by setting the weighted particle population, inline image from distribution inline image and from step πs−1(θ) as inline image. The weighted particle population is then calculated as:

display math(8)

[47] Resampling will be executed once the ESS value drops below the predetermined threshold. Details for the SMC processes are presented in Jeremiah et al. [2011, 2012].

3. Implementation details

3.1. Case Study

[48] We implement a case study for the HME approach using two parameterizations of a conceptual rainfall runoff model, to investigate the importance of multiple catchment predictors within the gating function of the HME. The model described here is AWBM with 8 parameters. The AWBM (Figure 2) consists of three surface storages, C1, C2, and C3, fraction areas, A1, A2, and A3, base flow index, BFI, and a daily recession constant, K. The AWBM is used to analyze the flow from the Never Never River, a 51 km2 catchment located at Gleniffer Bridge in New South Wales, Australia, which has a total of 16 years of recorded data. This catchment has a rainfall-runoff ratio of 0.55. We use a two-component HME where both hydrological models are an eight-parameter AWBM with a t-distributed error model. It is anticipated that the component models will describe event flows different from one to another. Prior distributions for each model are specified as C1, C2, C3 ∼ U(0,3000), (A1, A2, A3) Dirichlet(1,1,1), K, BFI ∼ U(0,1), σ2 ∼ U(0, 150), and for the gating function coefficient vector b=(b1,.,bK) we have bk ∼ U(−100, 100) for k=1,.,K, where K is the number of predictors. We base all computations on N=2000 weighted particles drawn from the posterior distribution, where the initial SMC sampling distribution is given by the prior, inline image.

Figure 2.

The schematic of the AWBM model with eight parameters. The total runoff is the combination of the surface runoff and the base flow. The model consists of three surface storages as parameters and accompanied with fractional area parameter each. The other two parameters determine the amount of water release as excess surface runoff and as base flow.

3.2. Calculation of the Bayesian Information Criterion

[49] We assess the performance of the HME using the Bayesian information criterion (BIC) [Schwarz, 1978]. This is obtained through the likelihood function

display math(9)

[50] where n is the sample size, g{t,1} is the probability of selecting component model 1 at time t, and inline image is the model error distribution function of component model 1 while inline image is the model error distribution function of component model 2. Note that while the likelihood function within the SMC sampler also includes the indicator variables, inline image, these exist only to facilitate sampler performance. Integrating out the auxiliary variables results in the likelihood (9), and as such the auxiliary variables play no part in the computation of the BIC [Marshall et al., 2006]. The BIC is computed as inline image where inline image denotes the likelihood function evaluated at the maximum likelihood estimates, and m is the number of model parameter

3.3. Predictors

[51] Catchment predictors, directly or indirectly related to the catchment, are potential sources of important information for model fitting. Antecedent precipitation is a convenient way of converting information concerning the catchment hydrologic state into a potential predictor [Andrew, 2006]. However, there are also other important hydrologic variables that can characterize hydrologic states such as estimated surface excess runoff and base flow. The excess runoff and base flow may be directly obtained from the AWBM (Figure 2), since the AWBM is designed with both runoffs which eventually contribute to the total runoff. The excess runoff and base flow are dynamically updated at each model time step, t, and potentially indicate event information in the data. The predicted excess runoff (EFt) and base flow (BFt) at time t in the HME is given by

display math(10)
display math(11)

where EFt,h and BFt,h (for h=1,2) denotes the respective flows from each component model. While (10) and (11) represent the overall predicted flows of the HME at time t, when using EFt,h and BFt,h as predictors in the gating function within the context of the SMC sampler. The predictors are used together with b in (2) to calculate the g{t,h}. Eventually g{t,h} is replaced with appropriate model indicators {zt}. As is with standard Monte Carlo sampling, this occurs as for any given particle, the output of the HME is given by a single model component, which is finally determined by the {zt}.

[52] Marshall et al. [2006] demonstrated the utility of a single predictor, preceding 7 days' cumulative rainfall, within the gating function of the HME. However, there is a clear argument for the incorporation of multiple predictors in this manner, as this would allow the gating function a greater flexibility in utilizing different hydrological indicators. Hydrologic data contain complex and dynamic regions, from a sharp rising limbs to long recessing curves of the hydrograph and hyetograph. The likelihood of successfully capturing all such hydrologic catchment dynamics is vastly improved through the use of multiple predictors.

[53] In this article, we use antecedent precipitation, excess runoff and base flow as predictors. However, we consider several cumulative versions of each, namely over the previous 1, 3, 7, and 14 days, in order to capture catchment characteristics with different time dynamics. Figure 3 illustrates antecedent precipitation accumulated over each of these time periods for the Never Never River catchment. The single day antecedent precipitation (Figure 3a) is highly responsive to days with no moisture, whereas the 14 day cumulative antecedent precipitation (Figure 3d) represents a more slowly varying measure of precipitation over the previous fortnight. Estimation of the predictor coefficients, b, of the gating function will allow the HME to automatically determine the most suitable combination of predictor accumulations in order to accurately predict the observed hydrologic flows.

Figure 3.

Comparison of the four different types of Antecedent Precipitation predictors plotted with the hydrograph. The predictors are presented according to the antecedent precipitation accumulated over (a) 1 day, (b) 3 days, (c) 7 days, and (d) 14 days for the Never Never River catchment. The assessed time period presented here starts from day 500 until day 1000.

[54] In the following analyses (section 4), we assess the performance of the multiple-predictor HME by partitioning the observed data into model calibration and model validation subsets, whereby the model is fitted to the first portion of the observed data and then its performance evaluated on the second portion. We consider calibration/validation ratios of 20/80, 40/60, 60/40, and 80/20, and use the Nash-Sutcliffe efficiency (NSE) as a measure to evaluate the performance of the model.

3.4. Regionalization

[55] In terms of regionalization to ungauged catchments, translating catchment characteristics such as streamflow dynamics and precipitation records to similar catchments, is thought to be better than translating estimated model parameters [Yadav et al., 2007; Zhang et al., 2008]. In our HME framework, catchment characteristics and their relative importance are captured through the estimated vector of predictor coefficients, b, in the gating function (2). Regionalization using the estimated gating function coefficients will allow relevant catchment information to be translated for calibration, and reduce the amount of data required for complete calibration of the ungauged catchment. Regionalization is performed by first computing the posterior mean of the gating function coefficients, inline image, for the gauged catchment, and then transposing these as fixed values, inline image, in the HME gating function (2) of the ungauged catchment.

[56] In the following, we evaluate the potential utility of regionalization using the gating function regression coefficients of the HME. The regression coefficients calibrated by the Never Never River catchment analysis, located on Latitude 30.39° and Longitude 152.85°, are regionalized with the Manning River catchment located at Latitude 31.84° and Longitude 151.89°. Both catchments share similar rainfall runoff ratios and areas and so serve as appropriate donor catchments for this purpose.

4. Results and Discussion

[57] We first establish the utility of the HME over a stand-alone AWBM for the Never Never catchment, which has a constant gating function (component weight), gt,1=g1 for all time points, t. We then demonstrate that the performance of the HME can be further improved by incorporation of a single predictor, the 14 day cumulative antecedent precipitation, into the gating function. Table 1 presents summary posterior quantities for each model parameter for each of the three models, and the associated BIC for each model. The BIC results indicate a clear improvement of the HME models over the stand-alone AWBM model, with the HME with the single predictor in the gating function demonstrating the best model fit by a considerable margin. The clearest interpretable model parameter differences between the two-HME component models are observed through the BFI, which relates to the excess surface runoff from the surface storages. The differences in the posterior mean BFI estimates indicate that one HME component is primarily responsible for modeling low flows, and the other for higher flows. This supports the case for using the flexibility of the HME over the stand-alone hydrological model.

Table 1. Single Structure AWBM Models Compared Against HME With No Predictors and HME With 14 Day Cumulative Antecedent Precipitation as Predictora
 AWBM Stand Alone (BIC = 21117)HME with No Predictors (BIC = 20037)HME with 14 Day Cumulative Antecedent Precipitation (BIC = 15911)
   Model 1Model 2Model 1Model 2
ParametersMeanStandard DeviationMeanStandard DeviationMeanStandard DeviationMeanStandard DeviationMeanStandard Deviation
  1. a

    The models parameters mean, standard deviation and skewness are obtained from the posterior mean of the particles. The number of parameters estimated is 8 per AWBM model component (A3=1−A1−A2), plus the number of gating function parameters. For example, left to right, the above models have 8, 17, and 18 parameters, respectively.

C10.130.1251.922.130.150.142.282.0345.70.49
C2187.37.50553.6305.642.92.12178.520.9451.0460.5
C3714.944.11723.6918.6467.816.63962.551.11409.1724.0
A10.180.010.400.090.120.0050.190.030.980.01
A20.340.030.280.130.350.020.370.060.010.01
A30.480.030.320.180.530.020.440.060.010.02
K0.920.0020.870.0050.950.0010.920.020.980.0004
BFI0.580.010.700.010.490.0050.610.010.260.004
σ20.630.0210.950.360.120.0045.350.60.060.003
 MeanStandard DeviationMeanStandard DeviationMeanStandard Deviation
b0  −1.270.04−10.013.77
b4    0.140.07

[58] Figure 4 illustrates the observed (black line) and predicted (red line) hydrographs for a section of the data under each of the models. The hydrographs are shown on full (left) and restricted (right) scales. The performance of each of the models seems similar on the large scale events (Figures 4a, 4c, and 4e). However, clear differences emerge for the low-flow events. Compared to the stand-alone AWBM fit (Figure 4b), using a HME to model the runoff (Figure 4d) results in a slightly worse prediction in the magnitude of the runoff spikes observed roughly every 18 days, however this is offset by the improved predictions for the large numbers of low flows, most obviously visualized between days July 1996–August 1996 and December 1996–January 1997. Incorporating the single predictor into the gating function of the HME (Figure 4f), results in a further visual improvement in the predicted model fit in the same ranges, although there is a worsening of the low-flow prediction in the range September 1996–November 1996. Note that pointwise 95% central credible intervals are only visually apparent in Figure 4f. This uncertainty, only for certain regions under the AWBM using a single predictor, is indicative of the greater flexibility of this model to fit the observed data. The credible intervals are widest in regions where there is some uncertainty as to which model component is more suitable to describe the observed runoff (i.e., flows that are neither clearly low nor high). However overall, the relative BIC scores (Table 1) indicate that there is a clear gain in the use of gating function-based predictors in the HME.

Figure 4.

Rightmost hydrographs show the full comparison of the observed runoff (black line) versus the predictive runoff (red line) over the period July 1996–February 1997. The leftmost hydrographs show the same information, but with a greatly expanded range for the y-axis. The comparison is made for (a and b) the standalone AWBM model, (c) the HME with no predictors, (d) the HME with a single predictor, and (e and f) the 14 day cumulative antecedent precipitation.

4.1. Model Fitting Using Catchment Predictors

[59] We now consider the potential benefit of incorporating multiple predictors into the gating function of the HME. We consider the same two-component HME as before, but now continue including predictors into the model using a forward selection procedure, so that the most informative predictor is added to the model at each stage. We repeat this process separately for each of antecedent precipitation, excess runoff and base flow as predictors, and perform forward selection on the different cumulative versions of each predictor.

[60] Table 2 presents the results of this procedure, displaying the ordered sequence of predictors included in the model, the posterior mean estimates of the gating function coefficients (bk) and the overall BIC of the fitted model. In all cases there is a clear initial improvement in the model fit by the inclusion of more cumulative versions of each predictor. However the model fit then begins to worsen again by the time the final predictor version is added as the information content received from each predictor series into the gating function becomes saturated. The optimal configuration in each case includes both the single-day and 14 day cumulative predictor series, in addition to one cumulative series between these extreme durations.

Table 2. BIC and Parameter Estimates of the HME With the Best Combination of 1, 2, 3, and 4 Gating-Function Predictor, Based on the Three Predictors Setsa
Gating Parametersb0b1b2b3b4BIC
  1. a

    HME with no predictor: BIC = 20037.

Antecedent Precipitation
14 days−10.01   0.1415911
1 day and 14 days−9.490.75  0.0914562
1 day, 3 days and 14 days−8.400.370.03 0.0614512
1 day, 3 days, 7 days and 14 days−44.23.46−0.010.150.4514899
Antecedent Surface Runoff
7 days7.09  −8.75 15229
1 day and 7 days7.17−41.26 −0.90 14186
1 day, 7 days and 14 days6.60−22.09 −0.77−0.0414095
1 day, 3 days, 7 days and 14 days6.62−18.43−0.60−0.64−0.0314161
Antecedent Base Flow
1 day−22.877.78   15675
1 day and 14 days−8.147.12  −0.5415138
1 day, 3 days and 14 days−17.5108.07−38.49 −2.5614363
1 day, 3 days, 7 days and 14 days−18.94106.92−35.44−2.28−1.9614429

[61] The quality of the improvement in model fit can be visually seen by inspection of the hydrographs. Figure 5 displays the observed (black line) and predicted runoff of the HME under several antecedent precipitation predictor combinations. The HME performance with no gating function predictors is clearly poor, most notably with the strong underestimation of low flows. Introducing the single best predictor i.e., the 14 day cumulative precipitation (red line), strongly improves upon the overall low-flow prediction, although some small spikes are missed e.g., at December 1990. The best fitting HME with three gating function predictors generally performs better still in most regions of low flow, while also tracking small spikes more accurately. Clearly, by improving predictor usage in the gating function, the HME is able to better respond to the dynamic complexities in the catchment state and thereby improve model predictions.

Figure 5.

Comparison of observed (black line) and predicted hydrographs for a subset of the Never Never River catchment data. HME model predictions are based on no predictors (green line) in the gating function of the HME, the best single predictor (red line) and the best three predictors (blue line). The best antecedent prediction predictors are listed in Table 2.

[62] The improvement in model fit can also be seen through inspection of the estimated posterior distributions of the model parameters. Figure 6 illustrates the posterior distributions of the parameters C1 and BFI and σ2 for both HME model components (left/right hydrographs). Most obviously, the model error variance (σ2) is clearly reduced for each model component when using the best three cumulative predictors (Trial 3), indicating an improvement in model predictions. Further, the posterior variances under the three-predictor model are typically smaller than under the best single predictor model, indicating an improved model fit. That the different model components of the HME perform differently in the presence of more informative predictors in the gating function is clearly shown by location shifts in the C1 and BFI, particularly for component 2. This occurs as each component is now more precisely responsible for modeling particular aspects of the hydrograph––lower flows, in the case of component 2.

Figure 6.

Comparison of selected HME model component (AWBM) parameters when using different predictors in the gating function. Hydrographs show the posterior distributions for the parameters C1, BFI, and σ2, with the red and blue lines representing the best single and best three cumulative predictors based on antecedent precipitation (c.f. Table 2). Left and right hydrographs correspond to HME components 1 and 2, respectively.

[63] Figure 7 illustrates how the posterior probability of HME component 1 changes temporally in line with changes in the hydrograph (at iteration 200) for a subset of the Never Never River catchment data. The red line corresponds to the pointwise posterior mean, and the shaded region to pointwise 95% central credible intervals. Quite clearly, the HME places most posterior weight on component 1 when higher values of runoff are observed. This is consistent with previous indications that component 2 is more responsible for modeling lower flows. Also shown in Figure 7 is the evolution of these quantities through algorithm progression, where increasing iteration number (out of S=200). This progression satisfies natural intuition, in that when little data are considered, the HME is both equivocal and uncertain as to the more appropriate component for each time period. As more data become included, the HME component allocations become more certain, and more precise.

Figure 7.

Plot of pointwise posterior estimates of the probability of HME component 1 (a, c, e, and f) and the corresponding predicted runoff (b, d, f, and h) for a subset of the Never Never River catchment data. The model used is the HME with the best three predictors based on antecedent precipitation (c.f. Table 2). Iteration indicates the stage of the posterior simulation (out of S=200 in total), where iteration 200 corresponds to the final posterior prediction. The red line and shaded region correspond to the pointwise posterior mean and 95% central credible intervals, respectively. The observed runoff is denoted by a blue line.

[64] When comparing the hydrographs from the best predictor combinations as reported in Table 2, Figure 8a shows little change between HME with antecedent precipitation (in red line), HME with antecedent base flow (in blue line), HME with antecedent surface runoff (in green line) with the observed runoff (in black line) for the duration of March 1988–May 1988. However, the hydrographs between these HME models start to deviate in March 1988, demonstrating HME with antecedent precipitation predicting better runoff over the receding limb and the lower flows. Conversely, Figure 8b represents model 1 weights for the HME models respectively, over similar time duration. The figure clearly shows various preferences of the HME model 1 weights when describing the different events in the data, which also explains the different preferences between the hydrographs.

Figure 8.

Comparison of HME with best three predictors, according with results from Table 2. HME with antecedent precipitation is described in red while HME with antecedent base flow is in blue and finally HME with antecedent surface runoff is denoted in green. (a) The comparison of the predicted runoffs with the observed runoff, represented in the black hydrograph. (b) Model 1 weights, respectively from March 1988 to May 1988.

4.2. Regionalization of Predictor Coefficients

[65] Regionalization aims to improve the predictive capabilities of models in ungauged catchments by reusing information estimated from gauged catchments with similar characteristics. In this capacity, Zhang et al. [2008] demonstrated that nonparametric catchment characteristics have a higher impact on hydrologic predictability than model parameters. In the present setting, where we have focused on making use of catchment predictors in the HME gating function, the relevant (nonparametric) catchment characteristics are contained within the posterior distributions of the catchment predictor coefficients, b.

[66] To assess the potential of regionalization with the catchment predictor coefficients we first perform a controlled study on the Never Never River catchment. We fit the HME on a time-contiguous subset of calibration data (variously, the first 20%, 40%, 60%, and 80% of the full data set). We then fit the HME on the remaining validation data, but holding the catchment predictor coefficients fixed at the posterior means estimated from the calibration data. The analysis of this validation data therefore mimics regionalization to an ungauged catchment with similar (in this case identical) catchment characteristics. In particular, the case of 80%/20% calibration/validation data most closely represents the scenario when regionalization is most likely to be applied. Table 3 displays the Nash-Sutcliffe efficiency (NSE) of each model fit, whether based on calibration data or validation data with the transposed values for b. Table 3 also shows the NSE obtained if the HME is directly fitted to the calibration data, and the gating function parameters estimated directly from the data, with the aim to determine whether transposing the gating function coefficients is actually beneficial.

Table 3. Nash-Sutcliffe Efficiency (NSE) From Calibration and Validation Data as a Proportion of the Never Never River Data, Obtained Using the Best Three Cumulative Predictors Derived From Antecedent Precipitation, Excess Runoff and Base Flowa
 NSE Antecedent PrecipitationNSE Antecedent Excess RunoffNSE Antecedent Base Flow
  1. a

    The first/last number in each cell corresponds to the NSE obtained when fitting the full HME to the initial/final proportion of the data. The second number corresponds to the NSE obtained when fixing the gating function coefficients estimated from the initial calibration model.

Calibrate 20%0.670.670.67
Validate 80%0.630.620.62
Fit to last 80%0.600.610.62
Calibrate 40%0.690.660.66
Validate 60%0.600.600.61
Fit to last 60%0.580.600.59
Calibrate 60%0.660.640.64
Validate 40%0.650.650.65
Fit to last 40%0.300.590.59
Calibrate 80%0.680.660.66
Validate 20%0.540.560.57
Fit to last 20%0.280.310.40

[67] The Never Never catchment is fairly predictable for the initial 50%–60% of the observed data period, having experienced majority of the major storm events, covering 72% of the total volume of the runoff, and more variable in the later part of the observed data period where the last 40% of the data represent 28% of the runoff volume. For this reason, from Table 3, the NSE is at its highest (in the range 0.64–0.69) when fitting the HME to the initial calibration data set, regardless of the calibration proportion. The greater variability in the latter part of the observed data set becomes apparent in the NSE scores when fitting the HME to the last portion of the data, as the NSE is systematically reduced as the final portion of the data set becomes smaller. This is particularly apparent when using the final 20% and 40% of the observed data. When fitting the HME to the final 60% or a higher proportion, the HME is once again covering major runoff events, means that the overall NSE score becomes higher.

[68] The case for regionalization is made in the NSE scores of the validation data sets in Table 3, as compared to fitting the full HME to each validation data set (the final number in each cell). Here the NSE scores using gating function coefficients estimated from the (initial) calibration data set, are always higher than the NSE scores obtained when also estimating the gating function coefficients from the same (validation) data set. This implies that the transference of the catchment characteristics from the calibration to the validation data set, via the gating function coefficients of the HME, can considerably improve the predictive capability of the HME. In this case the evidence is particularly clear in the case of 80%/20% calibration/validation data (which represents the most likely regionalization situation), where there is a dramatic fall in the NSE score when regionalization is not performed (Table 3, last row).

[69] We now examine the ungauged Manning River catchment, which has similar catchment characteristics (e.g., areas and rainfall-runoff ratios) to the Never Never River catchment. As before we perform regionalization by estimating the gating function coefficients of the HME using the best three predictors based on the Never Never River data, and use the posterior of these coefficients when analyzing the Manning River data. For comparison, we also fit the full HME model to the Manning River data and estimate the gating function coefficients directly. Hydrographs of the HME model for the Manning River data, but based on the transposed Never Never River gating function coefficients are presented in Figure 9. The BIC scores of the fitted HME for each of the three sets of predictors, and both with and without regionalization, are shown in Table 4.

Table 4. Comparison of BIC Scores Both With and Without Regionalization, for the Manning River Catchmenta
 BIC Antecedent precipitationBIC Antecedent Excess RunoffBIC Antecedent Base Flow
  1. a

    With regionalization, the HME gating function coefficients are estimated from the Never Never River catchment based on the best three cumulative predictors derived from each primary predictor (precipitation, excess runoff and base flow).

BIC values without regionalization215841988022385
BIC values with regionalization204142004119801
Figure 9.

Comparison of Manning River catchment observed runoff (gray line) to HME with gating function coefficients transposed from the Never Never River catchment. The gating function model is based on using antecedent precipitation (red line), excess runoff (blue line) and base flow (green line) as predictors.

[70] From the table, there is a strong case for an improvement in the model fit while using regionalization when considering antecedent precipitation and base flow as predictors, with differences of more than 1000 in the BIC scores. However, when considering excess runoff as a predictor, the model fit is slightly in the favor of not performing regionalization, by around 161 difference in BIC. While including excess runoff as a predictor within the gating function of the HME strongly improved the model fit of the Never Never River (Table 2), quite clearly one cannot automatically assume that all predictors, however useful, will be equally transposable to other catchments. In the case of excess runoff, there is sufficient information in the catchment data to adequately estimate the predictor coefficient. Then, because this estimate is obtained directly from the Manning River catchment data, in this case it is likely to be closer to optimal than the same parameter estimated from a different catchment data set.

5. Conclusion

[71] Standalone hydrological models with rigid structures, such as the AWBM, are often not capable of capturing the active dynamics within a catchment, and as a result can produce poor predictions. Using a HME can help alleviate this problem by permitting different component models to specialize their modeling to particular characteristics of the catchment, thereby increasing predictive accuracy. In this article, extending initial work by Marshall et al. [2006], we have demonstrated that even greater HME performance can be gained by the inclusion of multiple predictors into the gating function of the HME (Table 2). Providing improved information about the catchment dynamics directly into the gating function allows the HME to more accurately determine which model component is likely to best describe the observed flow at each timepoint, based on increased insight into the various active regions of the hydrologic data. This leads to improve overall predictions as a result.

[72] We have also examined the benefits of regionalization within the HME framework, by transposing the posterior means of the gating function coefficients estimated from a donor catchment to a hydrologically similar ungauged catchment. On the whole, our results (Tables 3 and 4, Figure 9) indicate that regionalization in this manner is likely to result in improved HME predictions at ungauged catchments. In general, the possibility of fully utilizing these coefficients on future catchments with limited data looks promising.

[73] However, this finding should be treated with caution. In our analysis of the Manning River catchment (Table 4), we found that the model fit at the ungauged catchment can actually worsen when transposing the gating function coefficients, if there is sufficient information at the catchment to estimate the gating function coefficients directly from the ungauged catchment data. This scenario is intuitive and straightforward. However it does warn against assuming that if the transposed coefficients of one predictor result in an improved model fit at an ungauged catchment, that this implies similar improvements will occur when using other predictors.

Acknowledgments

[74] Funding for this research came from an Australian Research Council Discovery project. The streamflow datasets used in generating our results were provided to us by Francis Chiew and Jai Vaze. Finally, the authors wish to thank the two anonymous reviewers and the Associate Editor, Martyn Clark, for their helpful suggestions throughout the review process.

Ancillary