Bayesian methods in hydrologic modeling: A study of recent advancements in Markov chain Monte Carlo techniques



[1] Bayesian methods, and particularly Markov chain Monte Carlo (MCMC) techniques, are extremely useful in uncertainty assessment and parameter estimation of hydrologic models. However, MCMC algorithms can be difficult to implement successfully because of the sensitivity of an algorithm to model initialization and complexity of the parameter space. Many hydrologic studies, even relatively simple conceptualizations, are hindered by complex parameter interactions where typical uncertainty methods are harder to apply. This paper presents comparisons between three recently introduced MCMC approaches, the adaptive Metropolis, the delayed rejection adaptive Metropolis, and the differential evolution Markov chain algorithms via two case studies: (1) a synthetic Gaussian mixture with five parameters and two modes and (2) a real-world hydrologic modeling scenario where each algorithm will serve as the uncertainty and parameter estimation framework for a conceptual precipitation-runoff model.

1. Introduction

[2] With a growing desire to better quantify watershed processes and responses, many modeling studies have been undertaken ranging from attempts at developing models based completely on physical process understanding to simple black box methods [e.g., Atkinson et al., 2003; Beven, 1989]. Recently, there has been a movement toward a combination approach, including as much of the known physics of the system as possible, while maintaining model structure parsimony [Littlewood et al., 2003]. A driving force in this has been the advent of more sophisticated algorithms capable of automatic parameter estimation and uncertainty quantification, given such model parameters are “effective,” nonmeasureable values.

[3] Model calibration techniques have changed with the availability of ever-faster computing systems, from simple trial-and-error methods to fully computerized algorithms designed to completely investigate the parameter space [Vrugt et al., 2003]. Automatic calibration techniques are varied in how they attempt to implement objective mathematical procedures and search the parameter space to optimize the model simulation. Commonly used calibration methods range from stochastic global optimization techniques [e.g., Duan et al., 1992; Sorooshian and Dracup, 1980; Thyer et al., 1999] to Monte Carlo methods [e.g., Beven and Binley, 1992; Freer et al., 1996; Uhlenbrook et al., 1999] to Markov chain Monte Carlo routines [e.g., Bates and Campbell, 2001; Campbell et al., 1999; Kuczera and Parent, 1998]. In a push to characterize the predictive uncertainty associated with estimated parameter values, Monte Carlo–based approaches have moved to the forefront of automatic calibration routines [Feyen et al., 2007]. The most frequently implemented variants of Monte Carlo methods include uniform random sampling (URS) (often implemented in the popular generalized likelihood uncertainty estimation (GLUE) approach) and Markov chain Monte Carlo (MCMC) schemes [Bates and Campbell, 2001; Marshall et al., 2004].

[4] While these methods have been successfully implemented in hydrologic studies, in many cases they also suffer from a variety of problems. In general, all Monte Carlo–based techniques suffer from inefficiency in the exploration of the parameter space [Bates and Campbell, 2001]. This is especially true for highly parameterized models, where parameter interactions can be very involved and not adequately explored by the algorithm without an extremely large number of samples [Kuczera and Parent, 1998]. Markov chain Monte Carlo–based approaches are more adept at exploring the parameter space in an “intelligent” manner. However, such approaches often suffer greatly from initialization problems associated with the variance of the proposal being either too large or too small, preventing the algorithm from efficiently reaching convergence [Haario et al., 2006]. The additional problem of convergence to the posterior distribution is of significant concern for hydrologic models, as their nonlinear nature often leads to a complex parameter response surface with many local optima [Duan et al., 1992].

[5] The study presented here provides a comparison of three recently developed Markov chain Monte Carlo algorithms intended to overcome some of the inefficiencies common to other well established MCMC methods. This paper is divided into the following sections: section 2 presents a brief description of Bayesian methods in hydrology and introduces the MCMC algorithms featured in this research; section 3 presents two case studies with their results; and section 4 offers relevant conclusions gleaned from this research and applicable to a variety of modeling problems.

2. Bayesian Methods in Hydrology

[6] Bayesian inference provides a framework for explicitly accounting for modeling uncertainties, the essential characteristic of Bayesian methods being probability distributions for describing parameter and model uncertainty. At the heart of Bayesian inference is the use of formal likelihood functions to analyze parameter uncertainty. Bayesian methods are of particular popularity in environmental science, as they allow for the incorporation of expert knowledge and have the ability to learn from additional data as it becomes available [Mantovan and Todini, 2006], an especially desirable property in hydrologic forecasting.

[7] Given the analytically intractable nature of many hydrologic models, implementation of Bayesian methods are usually aided by Markov chain Monte Carlo (MCMC) techniques. MCMC is based on the favorable mathematical properties of Markov chains, as they relate to Monte Carlo sampling and distribution estimation [Tierney, 1994]. Markov chains can be used to generate samples of the posterior distribution of the model parameters, using a random walk approach [Kuczera and Parent, 1998]. Many MCMC algorithms have been developed with the aim of constructing useful and statistically relevant Markov chains [see, e.g., Brooks, 1998].

2.1. Advancements in Markov Chain Monte Carlo Techniques

[8] Recent efforts have yielded several new methods designed to improve the efficiency and effectiveness of MCMC algorithms [e.g., Kou et al., 2006; Laskey and Myers, 2003; Tierney and Mira, 1999]. Three such advancements will be compared here by way of two case studies, the adaptive Metropolis (AM) algorithm [Haario et al., 2001], the delayed rejection adaptive Metropolis (DRAM) algorithm [Haario et al., 2006] and the differential evolution Markov chain (DE-MC) algorithm [ter Braak, 2006].

2.2. Adaptive Metropolis Algorithm

[9] The adaptive Metropolis algorithm is a modification to the standard random walk, Metropolis algorithm. The key attribute of the AM algorithm is its continuous adaptation toward the target distribution via its calculation of the covariance of the proposal distribution using all previous states. Utilizing this attribute, the proposal distribution is updated using the information gained from the posterior distribution thus far. At step i, Haario et al. [2001] consider a multivariate normal proposal with mean given by the current value and covariance matrix Ci. The covariance Ci has a fixed value C0 for the first i0 iterations and is updated subsequently as

equation image

where ɛ is a small parameter chosen to ensure Ci does not become singular, Id denotes the d-dimensional identity matrix and sd is a scaling parameter depending on the dimensionality, d, of θ, the parameter set, to ensure reasonable acceptance rates of the proposed states. As a basic guideline, Haario et al. [2001] suggest choosing sd for a model of a given dimension as 2.42/d. An initial, arbitrary covariance, C0, must be defined for the proposal covariance to be calculated. The steps involved in implementing the AM algorithm are discussed by Marshall et al. [2004].

[10] Block updating (sampling all parameters concurrently) is utilized in the AM algorithm, enhancing the computational efficiency and reducing run time. While the AM algorithm has many beneficial traits, it can potentially experience difficulties with initialization (sampling an appropriate starting parameter set from a place of high posterior density) and in exploring the parameter space if the parameters are considerably non-Gaussian, given the proposal distribution is a multivariate Gaussian. Although the adaptive Metropolis algorithm is not a true Markov chain because of the adaptive component, results establishing the validity and the ergodic properties of the approach have been proven [Haario et al., 2001].

2.3. Delayed Rejection Adaptive Metropolis Algorithm

[11] The delayed rejection adaptive Metropolis (DRAM) algorithm is based on the combination of the adaptive Metropolis algorithm and the delayed rejection (DR) algorithm, introduced by Tierney and Mira [1999] and applied to a basic Metropolis-Hastings approach. A detailed explanation of the steps involved in combining the DR and AM algorithms and implementing the DRAM algorithm is provided by Haario et al. [2006].

[12] The DR algorithm is based on the concept that the performance of MCMC methods is improved by decreasing the probability that the algorithm remains at the current state, as proven by Peskun [1973] for finite state spaces and Tierney [1998] for general state spaces. In general terms, the DR algorithm allows for a more expansive search than traditional MCMC algorithms, as the parameter space explored by the algorithm can initially be set to attempt to sample regions of the response surface that are far from the current location by using a larger variance of the proposal. The variance of the proposal is then reduced if the parameter walk does not move; thereby searching a region of the parameter space closer to the current position. By creating multiple proposal stages the probability of remaining at the current state is reduced.

[13] A stage one proposal is generated and accepted or rejected in the same manner as in the Metropolis algorithm, however, if the stage one proposal is rejected, a stage two proposal is drawn from a region closer to the current position. If the stage two proposal is accepted, it becomes the current position. If the stage two proposal is rejected the position held prior to the failed stage one and two proposals is retained. While only two proposal stages are outlined here, this structure allows for as many proposal stages as desired.

[14] In mathematical terms, suppose the current position is x and a proposal to move to a new candidate, y, is generated from q1(x, ·), a multivariate proposal distribution, with the acceptance probability

equation image

where π(·) represents the likelihood function evaluated at the given proposal (x, y, z). Upon rejection, instead of remaining at the current state θi+1 = θi, a second stage proposal, z, is produced. This second stage proposal depends on the current state and the suggested and failed stage one proposal, q2(x, y, ·); the second stage is accepted with the probability

equation image

If the stage two candidate is rejected, either a stage three proposal is invoked or the algorithm remains at the current position.

[15] The relationship between the AM and DRAM algorithms may be illustrated by way of a simple diagram. Figure 1 shows a hypothetical, two-dimensional parameter space, with the model's log-likelihood function on the vertical axis; we initialize the algorithm at the parameter set indicated by point X. The ability of the DRAM algorithm to search a larger portion of the parameter space in its stage one proposal (indicated by the red ellipse) than the AM (indicated by the orange ellipse) is illustrated. If the stage one proposal of the DRAM fails, its search of the parameter space is reduced by a user-set factor. Although the AM is a subalgorithm of the DRAM, the computational efficiency of DRAM is reduced by having to perform multiple executions of the model for a single algorithm iteration. In this manner, the AM algorithm can be seen as maintaining the benefits of a computational efficiency advantage over the DRAM algorithm.

Figure 1.

Hypothetical parameter surface illustrating the ability of the AM and DRAM algorithms to explore the parameter surface. Rings represent distance from the current location an algorithm can explore.

[16] The DRAM algorithm benefits from its ability to make use of two concepts that, alone, have made improvements over the standard MCMC approaches. Namely, DRAM has the ability to adapt continuously toward the target distribution, constantly update its calculation of the covariance of the proposal distribution using all previous states and more efficiently search the parameter space by reducing the probability that the algorithm will remain at the current state. Haario et al. [2006] point out that from the DR point of view the first stage is aided by the adaptiveness of the AM component, while from the AM point of view the DR component offsets the initialization difficulties often associated with the AM component by searching for sets far away from the current position in stage one proposals and sets close to the current position in stage two proposals. The AM component can be seen as responsible for “global” adaptation, while the DR component is responsible for “local” adaptation within the DRAM algorithm.

2.4. Differential Evolution Markov Chain Algorithm

[17] The differential evolution Markov chain algorithm is formed by combining the differential evolution algorithm of Storn and Price [1997] designed for global optimization in real parameter spaces with MCMC simulation, utilizing standard Metropolis principles. The result is a population MCMC algorithm, where multiple chains are run in parallel and allowed to learn from each other.

[18] This combination intends to overcome the difficulties common to MCMC methods of choosing an appropriate scale and orientation for the proposal distribution, while also addressing issue of computational efficiency related to time to reach convergence [ter Braak, 2006]. There are only two parameters to be defined by the user, a scaling factor, γ, and the number of parallel chains, N.

[19] ter Braak [2006] provides the basic algorithmic setup for DE-MC. N chains are run in parallel and the jumps for a current chain are derived from the remaining N − 1 chains. The DE-MC algorithm generates proposals on the basis of two randomly selected chains that are subtracted from each other, multiplied by a scaling factor and added to the current chain,

equation image

where θp is the proposed parameter set, θi is the current parameter set, θR1 and θR2 are randomly selected parameter sets from the population excluding θi and γ is the scaling factor. The final term, ɛ, is intended to introduce a probabilistic acceptance rule to the DE algorithm, with ɛ ∼ N(0, b)d and b being small. The proposals are then accepted or rejected on the basis of the Metropolis ratio, defined in (2) previously.

[20] While the DE-MC algorithm is fairly simple to implement, it is also quite useful in practice. The algorithm's multiple chains maintain asymptotic independence that allows convergence testing using the Gelman and Rubin [1992]R statistic to be performed from one DE-MC run.

[21] The potential benefits of these newly introduced MCMC approaches for hydrologic studies are of great interest. The ability to more accurately and efficiently search the parameter space provides potentially enduring gains to the field of conceptual precipitation-runoff modeling. Multimodality is of great concern in hydrologic modeling, and the ability of each algorithm to handle such complexities should be considered. The AM and DRAM algorithms have the same adaptive component added to the traditional Metropolis formulation, with the DRAM having the further component of delayed rejection. The DE-MC algorithm is a combination of a genetic algorithm with a traditional Metropolis approach. Each algorithm has an associated efficiency due the specific components included in its particular formulation; the tradeoff between the algorithm's computational effort and its efficiency in characterizing the true posterior distribution will be considered for each of the three algorithms.

3. Case Studies

[22] In this section, we present two case studies to illustrate the differences in performance between the AM, DRAM and DE-MC algorithms. First, a synthetic example is constructed on the basis of a mixture of two multivariate normal distributions, generating a bimodal response surface. The second example illustrates the capabilities of the algorithms applied to a hydrologic setting, featuring a simple conceptual precipitation-runoff model with data from the Tenderfoot Creek Experimental Forest, Montana. In each example, the three algorithms will be compared in their ability to explore the parameter space and converge to the posterior, with additional consideration given to computational efficiency.

3.1. Case Study 1: Synthetic Bimodal Gaussian Mixture

[23] In order to provide a controlled setting in which to compare the algorithms, a multivariate Gaussian model was defined to emulate a parameter surface with multiple modes due to the commonality of multimodal response surfaces in hydrologic modeling. The likelihood of this mixture of normals is proportional to

equation image

where μ represents the mean of the given mode, Σ is the covariance of the given mode and θ is the parameter set. For this study, we set the model dimension at five variables.

[24] The AM, DRAM and DE-MC algorithms were applied to the defined model. Each algorithm was run at two fixed numbers of iterations, 10,000 and 50,000, to understand the parameter space exploration and computational efficiency for each algorithm under fixed iteration budgets of different sizes. The scaling factors for the DRAM algorithm were set up such that the stage one proposal was approximately 1.35 times larger than the AM algorithm proposal and the stage two proposal was one hundredth as wide as the stage one proposal. The DE-MC was initialized with γ set to the value suggested by ter Braak [2006] (2.38/(2d)1/2) and N set to 10 (twice the number of parameters in the study). Although it is suggested that the number of parallel chains required for multimodal response surfaces is at least ten times the model dimension when using the DE-MC algorithm, this requirement was found to be prohibitive for even relatively modest numbers of iterations (50,000 iterations would require 2.5 million evaluations for 50 chains).

[25] Figure 2 shows the sampled parameter values for each algorithm run for 50,000 iterations, as well as providing the true distribution of the Gaussian mixture. The results for each algorithm were similar for each of the iteration levels (10,000 and 50,000 iterations) with the AM algorithm only identifying one mode, while the DRAM and DE-MC algorithms sampled from each of the modes. It was found for all cases that even 50,000 iterations were insufficient to completely characterize the response surface. Each algorithm maintained acceptance rates within the recommended range of approximately 20–45% suggested to obtain optimal algorithm efficiency [e.g., Gelman et al., 2004]; moreover, all algorithms maintained similar acceptance rates across the two levels of iterations examined.

Figure 2.

Comparison of parameter estimation between the AM, DRAM, and DE-MC algorithms for 50,000 iterations and one parameter for a five-dimensional Gaussian mixture. Sampled values shown in blue were sampled in 5-D space and were projected into 1-D space. The exploration of the parameter surface is weakest in the AM algorithm, with only one mode sampled, while the DRAM and DE-MC algorithms manage to sample each mode.

[26] From this simple case study, it is clear that both the DE-MC and DRAM algorithms more readily traverse the parameter space in complex, multimodal situations than the AM algorithm. Not surprisingly, the DRAM algorithm proved to be an improvement upon the adaptive Metropolis algorithm because of its increased probability of moving from the current state, brought about by the DR component of the algorithm. While the DE-MC algorithm also proved to be more efficient than the AM algorithm (and perhaps the DRAM algorithm as well), it does benefit from the ability of having multiple parallel chains learning from each other. In this case, ten parallel chains were used, providing the DE-MC algorithm with ten times the number of sample draws. Investigating this further, both the AM and DRAM algorithms were set up to perform the same number of total sample draws as the DE-MC performed for the 10,000 iteration level (100,000 for this study). Under this setup, the AM was able to characterize both modes and the DRAM's characterization of the entire distribution was more complete than at 10,000 and 50,000 iterations.

[27] This simple mixture of multivariate normal distributions provides a relatively objective testing ground for the three algorithms of concern, where the end result is known a priori. The results of this case study reveal the ability of the DRAM and the DE-MC algorithms to explore the parameter space in a wider reaching, more efficient manner than the AM algorithm. While the differential evolution Markov chain algorithm had a potentially superior ability to characterize the multimodal distribution compared to the DRAM, it benefits from multiple parallel chain interaction unlike either the AM or DRAM algorithms and hence suffers from increased computational cost. The increased efficiency of the DRAM and DE-MC algorithm searches, however, holds particular importance to hydrologic modeling because of the tremendous complexity of the system being modeled.

3.2. Case Study 2: Tenderfoot Creek Experimental Forest, Montana

[28] This second case study applies the AM, DRAM and DE-MC algorithms to a hydrologic model for an experimental watershed. The following subsections discuss the site, the conceptual precipitation-runoff model, the input data and the outcomes associated with each algorithm.

3.2.1. Site Description

[29] The Tenderfoot Creek Experimental Forest (TCEF) is located at the headwaters of Tenderfoot Creek in the Little Belt Mountains of the Lewis and Clark National Forest in Montana, USA. TCEF was established in 1961 and is representative of the vast expanses of lodgepole pine (Pinus contorta) found east of the continental divide, encompassing an area of nearly 3,700 ha, in all.

[30] TCEF consists of seven distinct watersheds with differing vegetation, topographic characteristics and silvicultural treatments. The subwatershed studied in this project was the 555 ha Stringer Creek Watershed. For a more detailed description of TCEF, see K. G. Jencso et al. (Hydrologic connectivity between landscapes and streams: Transferring reach and plot scale understanding to the catchment scale, submitted to Water Resources Research, 2008).

3.2.2. Precipitation-Runoff Model Description

[31] This study implemented a model based on the probability distributed model (PDM, Figure 3), first developed by Moore [1985]. The PDM is a conceptual rainfall-runoff model that seeks to balance model structural parsimony with watershed physical complexities. As a conceptual model, the PDM is concerned only in “the frequency of occurrence of hydrological variables of certain magnitudes over the basin without regard to the location of a particular occurrence within the basin” [Moore, 1985, p. 274].

Figure 3.

The probability distributed model with snowmelt as an additional precipitation input.

[32] Soil absorption capacity controls the runoff produced by the model on the basis of the spatial variability of soil capacities across the watershed. Water in excess of the soil capacity is routed to the surface storage component, while infiltrated water eventually enters the subsurface storage component; the combination of the outflows of the two storage components comprises the total outflow of the watershed.

[33] The model was supplemented with a snowmelt routine to account for the dominant form of precipitation common to the study area. In this case, a relatively simple combined temperature index and radiation index approach was used. Previous studies [e.g., Brubaker et al., 1996; Kustas et al., 1994] have shown that incorporating radiation into the temperature index approach can greatly improve modeled results, while also improving the physical reality of the model.

[34] The model then comprises nine effective parameters: maximum soil storage capacity (cmax), spatial variability within the watershed (b), rate of drainage into subsurface storage (kb), fraction of subsurface storage released to outflow (Tres1), fraction of surface storage released to outflow (Tres2), soil storage threshold for subsurface inflow (cf), threshold temperature for snowmelt (Tf), degree day factor for snowmelt (DDF) and net radiation factor for snowmelt (NRF).

3.2.3. Input Data

[35] The required inputs to the model include precipitation (rain, snow), evapotranspiration, temperature and net radiation. Data required for model operation was obtained from the Onion Park SNOTEL site located within TCEF, featuring sensors recording snowpack depth, snow water equivalents, accumulated precipitation (snow and rain), temperature, radiation and others. On the basis of a desire to better characterize the diurnal dynamics of the system, a 12-hour time step was selected and data was retrieved from the NRCS National Weather and Climate Center, managing data collection for all SNOTEL sites. All streamflow data was obtained from the network of stream gauges located within TCEF managed by the Rocky Mountain Research Station of the U.S. Forest Service.

3.2.4. Algorithm Evaluations

[36] For this case study the model and watershed data were kept the same for all analyses and each of the three MCMC algorithms were implemented (AM, DRAM, DE-MC). Each algorithm featured differing internal parameters (scaling factors, number of chains, etc.) which were selected to maintain appropriate acceptance rates. The stage one jump space of the DRAM algorithm was approximately 1.67 times as wide as the jump space of the AM algorithm, with the stage two jump space of the DRAM algorithm being one hundredth as wide as its stage one proposal. The DE-MC algorithm was initialized with its jumping factor (γ) set to the suggested value of 2.38/(2d)1/2. As with the synthetic case study, the number of parallel chains was set to twice the model dimension because of computational restraints. All algorithms maintained appropriate acceptance rates [see Gelman et al., 2004] in all cases. The algorithms were compared in their ability to converge to the posterior density and their effectiveness in searching the posterior parameter space.

[37] To diagnose the necessary number of iterations to be performed for the algorithm to obtain convergence, multiple runs of the algorithms were conducted and the between- and within-chain variance was estimated, using the method described by Gelman and Rubin [1992]. Two hundred thousand iterations (for each parallel run) were chosen for this study to ensure that the estimated potential scale reduction (√R) for each parameter was less than 1.2.

[38] Two situations were considered in this case study, the performance of each algorithm when started from points of known, high posterior density and the performance of each when started from points of known low posterior density, such as a localized optimum. In each case, the points of known posterior density were determined from tuning runs and a series of multiple parallel runs were performed using this pilot information.

[39] Starting at a point of low posterior density, Figure 4 shows the trace of the log-likelihood value for each algorithm. At a large number of iterations (approximately 195,000) the DRAM samples a region of higher posterior density, while the AM and DE-MC algorithms falsely converge to a local optimum for all sampled values. Though the AM algorithm is more likely than the DRAM to converge to local optima (because of the single stage proposal), the convergence of the DE-MC algorithm to a local optimum is somewhat surprising given its beneficial interaction between multiple chains. On the basis of the convergence criteria of Gelman and Rubin [1992], the results for both the DE-MC and AM algorithms indicate convergence for all ten model parameters. The DRAM algorithm, on the other hand, has not converged on the basis of the criteria, which can be clearly seen in Figure 4 where the likelihood trace experiences a large increase near 190,000 iterations. Because the DRAM algorithm locates a region of higher posterior density, the within-chain and between-chain variability is no longer such that convergence is confirmed.

Figure 4.

Comparison of AM, DRAM, and DE-MC parameter searches, started from a point of known low posterior density. The AM and DE-MC falsely converge to a local optimum, while the DRAM locates a region of higher posterior density.

[40] When considering the other scenario of interest, starting from a known, high posterior density, the convergence criteria suggests that all three of the studied algorithms have converged (on the basis of ten parallel runs for AM and DRAM, 20 for the DE-MC). While each algorithm converges to the same maximum value for the likelihood, the DRAM algorithm appears to better sample from the tails of the parameter distributions. Table 1 shows the 2.5 and 97.5 percentiles for each parameter and the value of each parameter resulting in the maximum likelihood, for each algorithm. The DRAM algorithm displays a superior ability to sample a wider range of values for each parameter, in contrast to the adaptive Metropolis algorithm and even the DE-MC algorithm. However, the improved sampling coverage comes at a cost to the computational effort.

Table 1. The 2.5 and 97.5 Percentile Sampled Parameter Values for the AM, DRAM, and DE-MC Algorithms and Values Corresponding to the Maximum Likelihooda
ParameterAdaptive MetropolisDelayed Rejection Adaptive MetropolisDifferential Evolution Markov Chain
2.50%Maximum Likelihood Value97.50%2.50%Maximum Likelihood Value97.50%2.50%Maximum Likelihood Value97.50%
  • a

    Parameter values start from a point of known high posterior density. For each parameter, the DRAM samples a wider range of values, indicating its ability to better characterize the tails of the distribution.


[41] In considering the efficiency of the algorithms evaluated in this study, the number of model runs performed for each is of significant concern. In all cases, the adaptive Metropolis algorithm performed the same number of model evaluations as algorithm iterations, 200,000 evaluations for each run of the algorithm. The differential evolution Markov chain algorithm, because of its requirement of multiple parallel chains, performed 4 million model evaluations, 20 (the value of the algorithm parameter N) times as many evaluations as algorithm iterations. The DRAM algorithm experiences a variable number of model evaluations for any given run of the algorithm, based on the multiple proposal stages of the delayed rejection step. In the case of high known posterior density starting points, the DRAM algorithm performed approximately 360,000 model evaluations for each algorithm run. For the case of known, low posterior density starting values, approximately 350,000 model evaluations were performed for each run of the DRAM algorithm. The delayed rejection step of the DRAM algorithm is both responsible for the increased computational cost of the algorithm and the enhanced efficiency of characterizing the true posterior distribution.

4. Conclusions

[42] The implementation and use of Markov chain Monte Carlo algorithms in hydrologic modeling is becoming more and more common, especially as computing power becomes less of a limiting factor. While the use of traditional MCMC algorithms is not an uncommon event in hydrology [Bates and Campbell, 2001; Kuczera and Parent, 1998; Qian et al., 2003; Renard et al., 2006], few studies have taken advantage of the recent advances in such algorithms.

[43] Our case studies show that the delayed rejection adaptive Metropolis algorithm has the ability to improve modeling results where the posterior density is low and the parameter surface is extremely complex, with many local optima. The DRAM algorithm outperformed the adaptive Metropolis algorithm in its ability to explore the parameter surface with a limited number of iterations in a synthetic example featuring a bimodal parameter distribution. The DE-MC algorithm proved to be best at defining the parameter surface in this case, however, benefited largely because of its tenfold advantage in sample evaluations.

[44] A second case study illustrated the utility of the DRAM algorithm when applied to real data and a hydrologic model. When the modeling began from a point of low posterior density, the DRAM algorithm proved to be more able to traverse the complex parameter surface than either the differential evolution Markov chain or the adaptive Metropolis algorithms. While the AM and DE-MC falsely converged to a local optimum, the DRAM was able to discover a region of higher posterior density, providing the best set of parameter values on the basis of the calibration data from Tenderfoot Creek Experimental Forest, Montana. In addition to finding a better likelihood value, the DRAM algorithm more efficiently and extensively searched the tails of the parameter distributions.

[45] This case study has illustrated the potential consequences associated with hydrologic modeling when starting from a point of low posterior density. While the DE-MC and AM algorithms become trapped in a local optimum, the DRAM algorithm, benefiting from a more robust sampling of the tails of the parameter distributions, avoids convergence to areas of local optima.

[46] As with the first case study, the benefits of increasing the probability of moving from the current state can be clearly seen in Figure 4, with the DRAM locating a point of higher posterior density. Again, this is the expected result, especially for complex situations where points of high posterior density are unknown at the onset of the study. While the AM algorithm is more likely to falsely converge to a local optimum (refer to Figure 1), the delayed rejection component of the DRAM algorithm reduces the probability of such an outcome.

[47] The DE-MC algorithm potentially faces tremendous computational demand for multimodal, high-dimensional problems that require large numbers of iterations. The guideline (suggested by ter Braak [2006]) of using at least ten times the number of chains as model dimensions severely limits the collection of problems that can be solved in a reasonable manner using DE-MC. Although the DRAM algorithm has greater computational constraints (implementation of the algorithm code and its computational execution) than the AM algorithm because of the inclusion of the delayed rejection step, the utilization of an algorithm that can greatly reduce the risk of false convergence to local optima is fundamental to the application of hydrologic models for predictive purposes and should not be underestimated.

[48] The success of the DRAM algorithm is ingrained in the concept that the benefits of the adaptive nature of the AM algorithm are complemented by the benefits of the delayed rejection step of the DR algorithm through their combination, with each working to alleviate the deficiencies of the other while maintaining the desirable statistical properties fundamental to MCMC.


[49] This work was made possible by partial funding from the Montana Water Center. We thank the U.S. Forest Service Rocky Mountain Research Station and NRCS National Weather and Climate Center for providing the data necessary to perform this study. Finally, we thank the three anonymous reviewers for their comments that greatly improved the quality of the manuscript.