Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Groundwater models are vital tools for predicting the effects of future anthropogenic and/or natural occurrences in the subsurface environment. Model predictions are inherently uncertain due to epistemic and aleatory uncertainties in data and model parameters and structures; uncertainty quantification in groundwater modeling is indispensable, and many methods of uncertainty quantification have been developed to facilitate science-informed decision making in water resource management (see recent review articles of Matott et al.  and Tartakovsky , and references therein). While this study is only for quantification of parametric uncertainty, the results can be used directly for quantification of model uncertainty, because quantifying parametric uncertainty is the basis of quantifying model structure uncertainty in the popular multimodel analysis methods [Neuman, 2003; Ye et al., 2004; Poeter and Hill, 2007; Refsgaard et al., 2012; Neuman et al., 2012; Lu et al., 2012a]. The Bayesian method is one of the most widely utilized approaches for quantifying parametric uncertainty [Kitanidis, 1986; Box and Tiao, 1992; Ezzedine et al., 1999; Beck and Au, 2002; Marshall et al., 2005; Marzouk et al., 2007; Ma and Zabaras, 2009; Marzouk and Xiu, 2009; Allaire and Willcox, 2010; Renard, 2011; Zeng et al., 2012; Kitanidis, 2012; Lu et al., 2012b; Shi et al., 2012], wherein model parameters and predictions are modeled as random variables. The Bayesian methods are well connected with and complementary to other methods of uncertainty quantification [e.g., Woodbury, 2011; Nott et al., 2012]. They are flexible and can be applied to different models to incorporate multiple types of data and prior information [e.g., Woodbury, 2007; Rubin et al., 2010; Chen et al., 2012]. The outputs of Bayesian methods are probability density functions of quantities of interest that can be directly used for uncertainty quantification, risk assessment, and decision making.
 In the Bayesian inference framework, this paper presents a computationally efficient method developed using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method to reduce the computational cost of Bayesian computation, which is always a burden for practical Bayesian applications especially to computationally demanding models with a large number of parameters. When estimating the posterior probability density function (PPDF) in Bayesian inference, except in special cases in which analytical expressions of the PPDF can be derived [Woodbury and Ulrych, 2000; Hou and Rubin, 2005], the PPDF is usually estimated numerically using sampling techniques. One of the most popular and robust sampling techniques is the Markov Chain Monte Carlo (MCMC) method [Marshall et al., 2005; Gamerman and Lopes, 2006; Vrugt et al., 2008, 2009; Keating et al., 2010; Liu et al., 2010]. However, MCMC methods are in general computationally expensive, because a large number of model executions are needed to estimate the PPDF and sample from it. Many MCMC algorithms have been developed to improve computational efficiency, such as delayed rejection and adaptive Metropolis sampling [Haario et al., 2006] and differential evolution adaptive Metropolis (DREAM) sampling [Vrugt et al., 2008, 2009], by reducing the needed number of model executions. The number of model executions is of primary interest, because computational cost of solving the models dominates over that of other MCMC calculations that are simple algebraic operations. However, even with these advanced methods, the number of model executions is still often in the order of magnitude of tens of thousands or even hundreds of thousands. As a result, applications of MCMC approaches are prohibitive for computationally demanding models such as those of groundwater reactive transport, one solution of which may take tens of minutes and even hours [Zhang et al., 2012].
 In this study, the problem of high computational cost of MCMC simulations is resolved by incorporating sparse-grid methods into MCMC operation to develop sparse-grid-based MCMC algorithms. The sparse-grid methods in a broader sense are one of surrogate methods that have been used to improve computational efficiency in water resources research [Razavi et al., 2012]. The key idea of sparse-grid methods is to place a grid in the parameter space with sparse parameter samples (as opposed to a full tensor-product grid). Then the forward model is solved only for the sparse parameter samples to save computational cost. More specifically speaking, the method used in this study is a stochastic collocation method at sparse grids, also known as the sparse-grid stochastic collocation method [Nobile et al., 2008a, 2008b]. Another popular collocation method is the probabilistic collocation method that uses the finite-dimensional polynomial chaos expansion [Marzouk et al., 2007; Li and Zhang, 2007; Shi et al., 2009]. A comprehensive comparison of the two stochastic collocation methods can be found in the study of Chang and Zhang  in terms of their accuracy and efficiency. While such a comparison is of high significance to the selection of an appropriate method for different applications, it is beyond the scope of this study. The sparse-grid methods have been demonstrated to be efficient and effective for dealing with high-dimensional interpolation and integration, and they have been used recently in groundwater uncertainty quantification. In the studies of, e.g., Shi and Yang , Lin and Tartakovsky [2009, 2010], and Lin et al. , the sparse-grid methods were used to estimate the mean and covariance of groundwater state variables such as hydraulic head and solute concentrations. In these studies, parameter distributions were assumed known, and Bayesian inference was not conducted. Bayesian inference using the sparse-grid method was conducted in the study of Ma and Zabaras  and Zeng et al. , in which surrogate of geophysical models was built and then used to evaluate parameter distributions using observations of state variables.
 While the aSG-hSC method presented in this paper is in spirit similar to that of Ma and Zabaras  and Zeng et al.  in terms of using the sparse-grid method to improve computational efficiency of Bayesian inference, our method tackles a more challenging problem of uncertainty quantification and offers more computationally efficient structures of sparse grids. Different from the previous studies of sparse-grid methods that only quantify uncertainty in flow and advection-dispersion problems, this study conducts uncertainty quantification for groundwater reactive transport models, which are significantly more nonlinear due to nonlinear reactions and coupling between flow, transport, and biogeochemical processes. The nonlinearity causes two challenges to applications of sparse-grid methods. First, if the surrogate systems of the nonlinear models are constructed using linear hierarchical basis functions as in previous groundwater applications, more sparse-grid interpolation points, i.e., more model executions, are needed to obtain the prescribed interpolation accuracy, which plagues the purpose of using sparse-grid methods. The other challenge is that the nonlinearity always leads to extremely complex surface of likelihood function (or its least square equivalent) with a large number of local minima such as those reported in Matott and Rabideau  and Shi et al. (Assessment of parametric uncertainty for surface complexation modeling of uranium reactive transport, submitted to Water Resources Research, 2013) for nitrogen and uranium reactive transport, respectively. The multiple local minima correspond to multiple modes (significant or insignificant) on the surface of the PPDF. Existing algorithms cannot succeed in capturing all the significant modes or may succeed only with significantly increased computational effort. The two problems caused by nonlinearity are not limited to groundwater reactive transport models but prevalent to all nonlinear models.
 The aSG-hSC method is developed to resolve the two challenges above. To resolve the first challenge of efficiently approximating the PPDF involving nonlinear groundwater reactive transport models, the surrogate system with a sparse-grid interpolation is constructed with high-order stochastic collocation (hSC) approach, i.e., utilizing high-order hierarchical polynomial basis with quadratic or cubic polynomials as in Griebel  and Bungartz and Griebel . Due to their increased accuracy compared to the linear hierarchical basis, the number of model executions needed for constructing the surrogate system can be greatly reduced. The high-order approach is not a trivial extension from the linear technique [Zhang et al., 2010)], and it is the first time that the high-order stochastic collocation method is used not only in groundwater modeling but also in surrogate modeling for Bayesian inference. Furthermore, instead of building the approximate PPDF using isotropic sparse-grid interpolation [Nobile et al., 2008a; Barthelmann et al., 2000] or dimension-adaptive sparse-grid interpolation [Nobile et al., 2008b], a locally adaptive sparse-grid (aSG) interpolation [Griebel, 1998] is used. This technique utilizes the hierarchical surplus (discussed in section 3.2) as an error indicator to detect the nonsmooth and/or important regions in the parameter space and adaptively place more points in the regions. This results in further computational gains and guarantees that a user-defined accuracy of the surrogate system is realized.
 To resolve the second challenge of reducing the computational cost of constructing the surrogate system for a PPDF with multiple modes, an iterative procedure is developed for the aSG-hSC method to incorporate optimization results into the surrogate construction. Using aSG-hSC together with optimization is considered as a strength, since it can leverage extensive research in the area of optimization. The design of the iterative procedure is based on the following observations. In MCMC-based Bayesian inference, large parameter ranges are always specified in the prior distribution due to lack of information. If multiple modes exist on the PPDF, there are high-probability regions around each significant mode (definition of the high-probability regions is given in section 3 below). Markov chains move toward the high-probability regions and generate random samples by following the Metropolis rule [Gamerman and Lopes, 2006]. During this process, a large number of samples are discarded in the burn-in period and rejected due to the Metropolis rule, and model executions corresponding to these samples are wasted. This procedure of sampling can be made more computationally efficient using the adaptive sparse-grid techniques, if the approximate locations of the modes are known from optimization. This motivates the iterative aSG-hSC method. In each iteration, global or local optimization is utilized to detect each significant mode of the PPDF, and the corresponding high-probability region is determined based on optimization results such as Hessian matrix at the found optimum. Subsequently, the high-probability region is incorporated into the prior distribution, and the aSG-hSC method is used to construct surrogate within the high-probability region. This is the key to saving computational cost, because the surrogate is not constructed over a large parameter space where a significant number of sparse grid points are blindly placed in the low-probability regions. However, there is a trade-off between the saved computational cost and that spent on optimization, which is discussed in the numerical examples in section 4. The iteration stops until all significant modes are identified according to a user-specified significance tolerance. It is demonstrated in section 4 that our method can find all the modes whose significance is larger than a user-defined significance tolerance. Note that the aSG-hSC method is independent of MCMC methods, so that it can be used together with any MCMC methods. In addition, because both the aSG-hSC and MCMC methods are model independent, the resulting sparse-grid-based MCMC algorithms can be applied to a wide range of problems.
 The rest of the paper is organized as follows. In section 2, the Bayesian framework and the conventional MCMC method used in this study are briefly introduced, followed by the iterative aSG-hSC method of constructing the surrogate system presented in section 3. In section 4, the new approach is applied to reactive transport problems and its effectiveness and efficiency in comparison with the conventional MCMC method is demonstrated.
2. Bayesian Inference and MCMC Simulation
 Consider the Bayesian inference problem for a nonlinear model
where is a vector of Nd measurement data, is a vector of model parameters, is the forward model with inputs and Nd outputs, and is a vector of residuals, including measurement, model parametric, and structural errors.
 The posterior distribution of the model parameters , given the data , can be estimated using the Bayes' theorem [Box and Tiao, 1992] via
where is the prior distribution and is the likelihood function to measure goodness-of-fit between model simulations and observations. The prior distribution can be specified using data of previous studies in similar sites or expert judgment. When prior information is lacking, a common practice is to assume uniform distributions with relatively large parameter ranges such that the prior distribution does not affect the estimation of posterior distribution. Selection of a likelihood function appropriate to a specific problem is still an open question. Generally speaking, there are two types of likelihood functions, formal and informal, in literature. A commonly used formal likelihood function is based on the assumption that the residual term in equation (1) follows a multivariate Gaussian distribution, which leads to the formal Gaussian likelihood function listed in Table 1. However, validity of the explicit Gaussian assumption in practice is always criticized, despite that the Gaussian likelihood function has been used with success for decades. The informal likelihood function is designed carefully to implicitly account for errors in measurements, model inputs, and model structure and to avoid overfitting to measurement data [Beven and Binley, 1992; Smith et al., 2008; Schoups and Vrugt, 2010; Smith et al., 2010)]. Several widely used informal likelihood functions in hydrology are also listed in Table 1. Definition of informal likelihood function is problem specific in nature, and there has been no consensus on which informal likelihood function outperforms the others.
, mean of observations; , mean of outputs of forward model; ζ, scaling constant for the exponential likelihood function; Σ, covariance matrix of residual for the Gaussian likelihood function.
Multivariate normal (MVN)
Informal Likelihood Functions
Mean cumulative error (MCE)
Normalized sum of squared errors (NSSE)
 This study is not to investigate how to define likelihood functions but how to efficiently build surrogate models for a chosen likelihood function using the aSG-hSC approach. As a function approximation method, aSG-hSC only requires that in equation (2) is a continuous function, which is satisfied by all likelihood functions in literature including those listed in Table 1. In section 4 of numerical examples, for the sake of illustration, the Gaussian likelihood function is used for the first numerical example and the likelihood function of exponential type for the second numerical example.
 While this study is focused on quantification of parametric uncertainty, its results can be used directly for quantification of model uncertainty. In parametric uncertainty quantification, the denominator of the Bayes' formula in equation (2) is a normalization constant that does not affect the shape of the PPDF. As such, in the hereafter discussion concerning building surrogate systems, the notation or the terminology PPDF will only refer to the product . When extending this research to quantification of model uncertainty, the denominator, , becomes critical. In the Bayesian model averaging method that considers model uncertainty due to alternative models [e.g., Ye et al., 2004, 2008, 2010], for an individual model Mk, this term becomes the model likelihood function, , where is the joint likelihood function of the model and its parameters. The model likelihood function is the most critical variable for evaluating model probability used to quantify model uncertainty. Although this term can be evaluated using the aSG-hSC method, it is beyond the scope of this study.
 Due to the nonlinearity of model with respect to parameter , it is often difficult to draw samples from the PPDF directly, so that the MCMC methods, such as Metropolis-Hastings (M-H) algorithm [Gamerman and Lopes, 2006] and its variants, are often used for sampling. The essence of the MCMC methods is that parameter samples are drawn from a proposal distribution instead of the PPDF and the Markov property guarantees the convergence of the proposal distribution to the posterior distribution. However, in practice, the convergence is often slow when the proposal distribution deviates from the posterior distribution. Many advanced MCMC methods have been developed, and one of them is the Differential Evolution Adaptive Metropolis Approach (DREAM) developed by Vrugt et al. [2008, 2009]. The DREAM algorithm uses multiple Markov chains simultaneously, and all chains are viewed as from the same population, and the sampling procedure is treated as the evolution of the population. As such, the classic proposal distribution used in the M-H algorithm is not necessary, and the jump of each Markov chain at each step is determined by differential evolution of a genetic algorithm. It was shown by Vrugt et al. [2008, 2009] that DREAM is generally more efficient than traditional MCMC algorithms in the absence of additional information about the PPDF. Moreover, DREAM is more advantageous in dealing with multimodal posterior distribution, which matches our goal of building a surrogate system for a posterior distribution with multiple significant modes. For these reasons, DREAM is chosen in this study as the framework of Bayesian inference. Using aSG-hSC together with DREAM is considered as a strength, because it leverages the recently developed MCMC algorithm. However, it should be noted that the aSG-hSC method to build surrogate system can be used with other MCMC algorithms.
3. Iterative aSG-hSC Methodology
 This section describes the iterative aSG-hSC method to construct the surrogate system for estimating the posterior distribution. To provide context for the method described in section 3.3, determination of the high-probability region is first introduced in section 3.1, followed by describing the high-order hierarchical polynomial basis and the adaptive sparse-grid interpolation in section 3.2.
3.1. Determining High-Probability Region as the Prior
 In the context of Bayesian inference, the searching region for MCMC sampling can be represented by
which is usually large due to lack of prior knowledge about the posterior distribution . In the parameter space, it is always the case that is very small (close to zero) within a large part of Γ but significant in one or several subregions. These subregions are referred to as high-probability regions in this paper, denoted by and rigorously defined as
where δ is a user-specified threshold, and is the maximum value of the PPDF. Equation (4) indicates that the high-probability region, , is within the contour . For MCMC sampling, after convergence, all Markov chains will only move around high-probability region . Although some trial samples of MCMC can jump out of the high-probability regions, most of them will be rejected such that almost all accepted MCMC samples after burn-in period fall into . Thus, it is a waste of computational effort to build an accurate surrogate system for over the whole searching region Γ. Instead, it is computationally more efficient to approximate in the high-probability region . Thus, we seek to define the high-probability region for each significant mode. When the PPDF has multiple significant modes, consists of several disjoint subregions in Γ, which can be defined iteratively as discussed in section 3.3.
 Defining an individual high-probability region starts with searching the global maximum of (i.e., ) using global optimization. The objective function used for optimization may be different from the posterior distribution. For example, if the selected likelihood function is of an exponential type, such as the formal multivariate normal likelihood and the informal exponential likelihood in Table 1, using the logarithm of the likelihood function in optimization gives more stable results than using the likelihood function itself [Pflüger, 2005]. This strategy is also applicable to the prior distribution. Thus, for the convenience of notation, is used to represent the objective function, which can be viewed as the preprocessing of for optimization. For example, when using uniform prior and informal exponential likelihood in Table 1, the objective function is defined by where and are the mean of the observations and outputs of the forward model, respectively. When using the Nash-Sutcliffe likelihood, the objective function can be defined as to guarantee enough gradient information over the searching region. Note that the preprocessing function must be monotonic such that it is invertible. While any global optimization algorithm can be used, the DIRECT algorithm is used in this study. DIRECT, first proposed by Jones et al. , is a derivative-free global optimization algorithm and an improvement of the standard Lipschitzian approach that eliminates the need to specify a Lipschitz constant.
 Once the global parameter optimum is obtained, the next step is to define the high-probability region around . This can be done by investigating how fast the posterior distribution decays to zero away from through estimating the curvature of , as more curved function decays faster. Since the second-order derivative is a measure of a function's curvature, the Hessian matrix of at , denoted by , is used to determine the high-probability region. In the case of multivariate Gaussian distribution, is the inverse of the covariance matrix and sufficient to define the Gaussian shape; in non-Gaussian cases, still provides sufficient information about the curvature of around .
 In this work, defining , is estimated via [Nocedal and Wright, 2006]
for the diagonal entries and
for the off-diagonal entries. Here and are vectors with zero elements except for the lth and mth entries, which are equal to properly selected steps and . can be estimated by using singular value decomposition (SVD), i.e.,
where contains the singular values , each of which characterizes the variance of the posterior distribution along the individual orthogonal singular vectors in V. Note that when is an uncorrelated random variable, both V and U are identity matrices and characterize the variances along the axes of the parameters. also represents a linear transform in dimensional space where and V determine the rotation and determine the stretching along orthogonal directions. Based on these results, the high-probability region, , corresponding to is defined by transforming a unit cube via
where is a scaled matrix of by scaling vector , i.e., the diagonal terms of are . The scaling vector can be a user-defined constant vector that determines the volume of . The value can be easily determined in a Gaussian case based on probability tables. For example 99.7% of the samples are within three standard deviations from the mean value. In a non-Gaussian case, although it is not straightforward to automatically find an optimal value of , it can be set a little large to guarantee that covers for most common cases.
 The numerical example shown in Figure 1 is used to illustrate how to define the high-probability region for both Gaussian and non-Gaussian densities. Consider two two-dimensional density functions, and , defined by
where is a Gaussian density with mean and covariance matrix . While is pG itself, is non-Gaussian. The contours of the two density functions are shown in Figures 1a and 1b, respectively. The searching region is set to be large with . The maxima of and are 44.5 found at (0.5,0.5) and (0.5, 0.4), respectively. The desired in equation (4) is set as 0.01 ( ), meaning that the defined high-probability region should cover the area within the contour of . Figure 1 plots the high-probability regions for and . In Figure 1a, for the Gaussian density, the prior region with can cover very well; in Figure 1b, for the non-Gaussian density, is needed to fully cover the contour of 0.01. In either case, the prior regions are dramatically smaller than the initial search region, .
 Defining such high-probability regions for MCMC simulation has two advantages. First, since the volume of the high-probability region is often significantly smaller than the searching region, the computational cost of building the surrogate system of desired accuracy can be considerably reduced. In addition, because the high-probability region well covers the significant mode of the , the initial samples of Markov chains can be generated within such regions, which will significantly accelerate the convergence of MCMC sampling. While both the global optimization and the calculation of the Hessian matrix require forward model executions, such computational expense is worthwhile as long as more computational cost can be saved by working on the prior regions due to the two advantages. Although global optimization is used here, it may not be necessary in practice, because the aSG-hSC method is expected to perform well as long as the prior regions include the mode. In other words, a rough estimation of the location of each significant mode is sufficient for our method. This can be achieved by using local optimization techniques if information about the shape of is available, which will further reduce computational cost.
 After obtaining the high-probability region using equation (8) for the significant mode of the PPDF around in the parameter space, the next task is to build the surrogate model for on using the aSG-hSC method. The method is only briefly described here, for more details refer to Griebel (1998), Barthelmann et al. (2000), Bungartz and Griebel (2004), and Klimke and Wohlmuth (2005). Since the methods of building surrogate systems are generally applicable to any functions governed by partial differential equations, not limited to , a general function is used for the method description.
3.2.1. One-Dimensional Hierarchical Interpolation
 The basis of constructing the desired sparse-grid approximation in the multidimensional setting is the one-dimensional (1-D) hierarchical interpolation. Consider a function where the standard domain [0,1] can be rescaled to any bounded domain by translation and dilation. The 1-D hierarchical Lagrange interpolation formula is defined by
where the incremental interpolation operator is given as
The nonnegative integer L in equation (10) is called the resolution level of the hierarchical interpolant and the summation over the resolution level in equation (10) exhibits the hierarchical structure of the interpolant . For , and in equation (11) are the basis functions and the interpolation coefficients for , respectively. For , the integer mi in equation (11) is the number of interpolation points involved in , which is defined by
A uniform grid, denoted by , can be utilized for the incremental interpolant . The abscissas of are defined by
Then, the hierarchical grid for is defined by
 Since the representation of depends on the properties of the selected basis function , the basis functions is discussed first. Different from the previous studies that utilize linear hierarchical basis functions to build surrogate systems, this study uses high-order hierarchical polynomial basis functions, including quadratic and cubic hierarchical bases defined by Bungartz and Griebel , in order to improve the accuracy and efficiency for constructing the surrogate system. Expressions of linear, quadratic, and cubic hierarchical polynomial bases are provided below.
 In the case of linear hierarchical basis, for ,
For , ,
 In the case of quadratic hierarchical basis, for , the basis is the same as the linear hierarchical basis defined in (15). For i = 1, j = 1 and 2, define and set
For , ,
where and .
 In the case of cubic hierarchical basis, for , the basis is the same as the linear hierarchical basis defined by equation (15). For and i = 1, the basis is the same as the quadratic hierarchical basis defined by equations (17) and (18). For , and j is odd,
where and For , and j is even,
 Figure 2 depicts the hierarchical basis functions from level 0 to level 3 for the linear, quadratic, and cubic bases. The definitions of basis and Figure 2 show that on each level
Based on equations (10), (11), (22), and the interpolatory property of , i.e., for and , representations of the coefficient are derived as follows. For ,
and for , ,
The coefficient is defined as the hierarchical surplus of the basis function , which is the difference between the value of the interpolated function and the value of the interpolant at . As discussed in Klimke and Wohlmuth  and Ma and Zabaras , when the function is smooth with respect to θ, the magnitude of the surplus will approach to zero as the resolution level i increases. Therefore, the surplus can be used as an error indicator for the interpolant in order to guide the sparse grid refinement.
 Based on the one-dimensional hierarchical interpolation discussed in section 3.2.1, one can construct an approximation for a multivariate function, , where . Starting from the isotropic sparse-grid interpolation, analogous to the definitions of in equation (10) and in equation (11), define the multidimensional hierarchical interpolation formula as
and the multidimensional incremental interpolation operator is defined by
where is a multi-index of the resolution level of , is a strictly increasing function, belonging to the multi-index set
and is the hierarchical surplus. is the multidimensional hierarchical basis function defined by
where for , is the one-dimensional hierarchical basis function. The multidimensional grid points are defined corresponding to the basis . Equation (26) shows that the multidimensional incremental interpolation operator on level is the tensor product of one-dimensional incremental interpolation operators. This is the reason that the notation can be used to illustrate the tensor-product operation. In the following discussion, the equivalent notation is used to denote the incremental interpolation operator. The grids for and , denoted by and , respectively, are represented as
Note that involves a total of grid points. In addition, is composed of several incremental interpolants. Thus, the definition of the function in equation (25) determines the number of grid points involved in and also the structure of the resulting grid. The following two definitions are given corresponding to the full tensor-product grids and isotropic sparse grids:
A L-level full tensor-product interpolant needs grid points, where mi is defined in equation (12). This is also the number of model executions needed when building the surrogate system. Using the full tensor-product formulation, the number of grid points grows exponentially with the number of random parameters , which is the curse of dimensionality as the dimension increases. By virtue of the second definition of in equation (30) corresponding to the isotropic sparse-grid interpolation, the curse of dimensionality can be resolved.
 Figure 3 illustrates how the curse of dimensionality is resolved, using the construction of a two-dimensional ( ) level L = 3 isotropic sparse grid as an example. The definitions of in equation (30) show that an L-level isotropic sparse grid is a subgrid of an L-level full tensor-product grid. The resolution level in one dimension can be , and 3 as shown in the top horizontal lines in Figure 3a. The same is true for the other dimension as shown in the left vertical lines. There are a total of 16 subgrids in Figure 3a, each of which corresponds to an incremental interpolant in equation (25), where and . Different combinations of i1 and i2 with lead to all the 16 subgrids in Figure 3a, the union of which constitutes the level L = 3 full tensor-product grid with the 81 grid points shown in Figure 3c. In comparison, different combinations of i1 and i2 with lead to only 10 subgrids above the dashed line in Figure 3a, the union of which constitutes the level L = 3 isotropic sparse grid with only the 29 grid points shown in Figure 3b. This reduction is significant even though the maximum number of interpolation points in each dimension is the same for the both grids. Generally speaking, an isotropic sparse grid contains approximately points where , whereas a full tensor-product grid contains points [Nobile et al., 2008a]. Although significantly fewer points are used, the accuracy of the sparse-grid interpolation does not appreciably deteriorate compared to that of the full tensor-product interpolation [Barthelmann et al., 2000; Bungartz and Griebel, 2004]. Thus, in the sequel, the definition of in equation (25) is fixed to be and referred to in equation (25) as an isotropic sparse-grid interpolant.
 The coefficients can be computed analogous to the one-dimensional case and following the discussion in Klimke and Wohlmuth . For , i.e., , the coefficients are calculated as
For , and , the coefficients are evaluated via
Next, we explain how to construct adaptive sparse grid (as opposed to isotropic sparse grid) using the higher-order hierarchical polynomials defined in equations (15)-(21).
3.2.3. Adaptive Sparse-Grid Interpolation
 As discussed above, if the function is smooth with respect to , the magnitude of the hierarchical surplus will decay to zero as the resolution level L of increases. A smoother function has faster decay rates of the surplus. This feature is the basis of constructing adaptive sparse grids using the surplus as an error indicator. We start from the construction of one-dimensional adaptive grids and then extend it to the multidimensional sparse grids. As shown in Figure 4, the one-dimensional isotropic hierarchical grid have a tree-like structure, a grid point on level i has two children, namely and on level i + 1. Special treatment is required when moving from level 1 to level 2, because only one child point is added on level 2 for nodes and . On each successive interpolation level, the basic idea of adaptivity is to use the hierarchical surplus as an error indicator to detect the smoothness of the target function and refine the grid by adding two new points on the next level for each point whose magnitude of the surplus is larger than the prescribed error tolerance.
 The adaptivity concept is illustrated in Figure 4, where the six-level adaptive grid is used to interpolate the Gaussian kernel function on [0,1] with the error tolerance being 0.01. From level 0 to level 2, because the magnitude of every surplus is larger than 0.01, two points are added for each grid point, except that only one point is added for each grid point on level 1. On level 3, since the surplus is larger than 0.01 at only one point, , two new points are added after this point on level 4. When this procedure continues through levels 5 and 6, it leads to the six-level adaptive grid with only 21 points (points in black in Figure 4), whereas the six-level nonadaptive (isotropic) grid has a total of 65 points (points in black and gray in Figure 4).
 It is straightforward to extend the adaptivity from one-dimensional to multidimensional adaptive sparse grid. The isotropic level L sparse grid in equation (29) can be rewritten as
where the grid points have the tree-like structure in each dimension. For example, a point has two children points in each direction, so that it has a total of children. For , the two children of , denoted by and , are represented by
where with and . Note that the children of each sparse-grid point on level belong to the sparse-grid point set of level . Adding children points is to perform the sparse-grid interpolation from level to level . In this way, the sparse grid is refined locally without breaking the structure of sparse grids.
 For a prescribed error tolerance α, the adaptive sparse-grid interpolant is defined as
where the multi-index set is defined by modifying the multi-index set in equation (27), i.e.,
Thus, the level L adaptive sparse-grid interpolant in equation (35) only retains the terms of the isotropic sparse-grid interpolant in equation (25) for which the magnitudes of the corresponding surpluses are larger than α. The corresponding adaptive spare grid can be represented by
which is a subgrid of the level L isotropic sparse grid in equation (29). If the tolerance , the adaptive sparse-grid interpolant is equivalent to the isotropic sparse grid interplant in equation (25); if , it will adaptively select which points are added to the sparse grid. Subsequently, the sparse-grid points will become concentrated in the nonsmooth region, e.g., where oscillations or sharp transitions occur, to guarantee the prescribed accuracy of the interpolation. On the other hand, in the region where is very smooth, e.g., insensitive to certain parameters, this approach will save a significant number of grid points but still achieve the prescribed accuracy.
 In practice, for a specific -dimensional target function , the total number of sparse grid points and the accuracy of can be controlled by two user-defined constants L and α, where L defines the maximum allowable resolution of the sparse grid and the error tolerance α is used to guide mesh refinement. L is usually set to large according to the maximum affordable computational cost and α is set to the desired accuracy of the interpolation, which allows maximizing the use of the available computation resource. The mesh refinement can be stopped in two ways, when the magnitudes of all surpluses on the current level are smaller than α or when the maximum level L is reached.
3.3. Algorithm for Iterative Construction of the Surrogate PPDF
 Using the procedure of defining high-probability regions and the aSG-hSC method discussed in sections 3.1 and 3.2, respectively, one can iteratively construct the surrogate system for with multiple modes due to the nonlinearity of the reactive transport models. Figure 5 shows the flowchart of the iterative algorithm that sequentially captures all the significant modes. As shown in Figure 5, the algorithm starts from defining the searching region Γ of the Bayesian inference (also used for global optimization) and the objective function of global optimization . As discussed in Section 3.1, since the preprocessing function must be invertible, i.e., exists, we also use as the target function of the sparse-grid approximation due to numerical stability issues studied in Pflüger . For example, when using the formal Gaussian likelihood function or the informal exponential likelihood function in Table 1, the logarithm of the likelihood function is used as the target function for both optimization and the surrogate system.
 The initial surrogate system has no component, i.e. . In the first iteration (k = 1) in Figure 5, global optimization is used to search for the global optimum of the function through the global optimization operator . is the highest peak of , so that is the most significant mode on . Subsequently, the inverse of Hessian matrix of is calculated to determine the prior region around based on equation (8) and the user-defined vector . On , the adaptive sparse-grid interpolant
is constructed by setting in equation (35). Note that to generate sparse grids on an irregular shaped region, e.g., the prior regions in Figure 1, one needs to generate the sparse grid abscissas in the unit cube and then map it onto by the transformation in equation (8). is the first component of the surrogate system . After that, is updated to , where is the characteristic function of the prior region
 The following iterations, , are to find other modes of with the previous modes excluded and to build the surrogate system around the modes. Specifically speaking, in the kth iteration, the k – 1 components of the surrogate system are excluded from the optimization, and the optimization operator is applied to only
where for , is the mth component of the surrogate system defined on the domain , is the characteristic function of the region that avoids overlap of different prior regions. The maximum represents the kth highest peak (mode) of the . Since the mode becomes less significant when k increases, the significance ratio
is used to terminate the iteration when is too small to be significant. If the ratio is smaller than the user-defined significance threshold δ, for example, , then the height of the peak of the posterior distribution at is negligible in comparison with the highest peak at . As a result, there is no need to construct a surrogate component for such a negligible mode. Whenever a new mode is found, the corresponding sparse-grid approximation is constructed and added to the surrogate system . The final surrogate system for is
with M components for M significant modes. The total number of model executions for constructing in equation (41) consists of those for global optimization, estimation of the Hessian matrix, and adaptive sparse-grid interpolation.
4. Numerical Examples of Groundwater Reactive Transport Modeling
 To illustrate effectiveness and efficiency of the iterative aSG-hSC method in building the surrogate system for the PPDF, it is applied to two synthetic examples of groundwater reactive transport modeling. The first example, adapted from Sun et al. , considers multispecies reactive transport with six random parameters. Since the five reactions involved in this example are linear, this example can be used to evaluate the computational efficiency and accuracy of the aSG-hSC approach in approximating high-dimensional posterior distributions with high-order hierarchical basis. The second example is related to reactive transport of uranium (VI) in column experiment with four random parameters, which is revised from Kohler et al. . This example is more complicated, since it includes nonlinear reactions of surface complexation. For the same reactive transport model used in this study Shi et al. [submitted manuscript, 2013] found that the PPDF of the model parameters are non-Gaussian and have multiple modes. This example therefore can be used to evaluate computational efficiency and effectiveness of the iterative aSG-hSC method for a PPDF with multiple modes. To demonstrate that the aSG-hSC method is not limited to the Gaussian likelihood function, the informal likelihood function of the exponential type (Table 1) is used in the second numerical example. The aSG-hSC method is evaluated by comparing the results of aSG-hSC-based MCMC with those of the DREAM-based MCMC in approximating the PPDFs of model parameters and the PDFs of model predictions. Computational efficiency of aSG-hSC is evaluated from two perspectives: equation (1) the number of model executions required to obtain an estimate of the PPDF within a prescribed accuracy, and equation (2) the accuracy of the approximate PPDF for a given number of model executions. These two criteria are complementary in that the first criterion is for the situation when a large number of model executions is affordable while the second criterion when limited number of model executions is affordable. For the second numerical example, nonlinear regression is conducted using UCODE_2005 [Poeter et al., 2008] to estimate local parameter optimum and quantify parameter uncertainty. Due to high model nonlinearity, the nonlinear regression cannot identify multiple modes on PPDF of one parameter and cannot accurately quantify parameter uncertainty.
4.1. Case 1: Multispecies Reactive Transport
 This numerical example considers the transport of multiple reactive species coupled by a serial-parallel reaction network in a uniform flow field discussed in Sun et al. . As shown in Figure 6, species A has one child species B, and B has three child species C1, C2, and C3. The governing equations of the simultaneous transport and degradation of the five species involved in the serial-parallel reaction network are as follows:
where CA, CB, , , and are the concentrations of the five species A, B, C1, C2, and C3, respectively, t is the time, x is the spatial location in the domain [0,40], v is the constant flow velocity, D is the dispersion coefficient, kA, kB, , , and are the reaction rates of the species, yB is the stoichiometric yield factor that describes the production of its parent species A to B, and likewise for , , .
 Using the parameter values given by Sun et al. , synthetic data are generated by solving equation (42) at time t = 40 and 10 points of using numerical code PHT3D [Prommer and Post, 2010]. A total of 50 concentrations are generated for the five species, and they are corrupted with 3% Gaussian random noise; the corrupted data are treated as measurements. In the DREAM-based MCMC simulation, the six parameters listed in Table 2 are considered as unknown parameters, which are the dispersion D and the logarithm of the five reaction rates, , , , , and . Their true values are listed in Table 2; the other parameters are fixed at their true values of .
Table 2. True Parameter Values and Searching Region, Γ, of the Parameters in Case 1
 The surrogate system is constructed by using the aSG-hSC algorithm discussed in section 3.3. The large searching region Γ of the six parameters are listed in Table 2. The global optimization takes 1034 model executions to find the first maximum
of ( to guarantee that g is positive for numerical convenience). The corresponding is −7.9128. Computing the Hessian matrix using equations (5) and (6) requires 73 model executions. The inverse of the Hessian matrix is
Using equation (8) with leads to the prior region, , for . The diagonal entries of indicate that is significantly smaller than Γ. Therefore, building surrogate system on the high-probability region can greatly reduce computational cost compared to that on the searching domain Γ. Subsequently, the adaptive sparse-grid interpolant in equation (35) is constructed on by setting , , L = 20 and . This is the first component of the surrogate system . The sparse grid interpolant is constructed using the linear, quadratic, and cubic basis functions shown in Figure 2. The number of model executions needed for the three interpolants are 6760, 1909, and 1299, respectively, which are also the number of points of the three corresponding adaptive sparse grids.
 After constructing the first component , the second maximum of is obtained by conducting the global optimization on the remainder . The second round of optimization takes 1359 model executions to find
with . If one sets the significance tolerance in Figure 5 to , the significance ratio defined by equation (40)
is negligible. Having only one mode is not surprising, given the linear reactions. Therefore, there is no need to construct the surrogate component for the second mode and the iteration is terminated.
 With the surrogate systems, , constructed using the linear, quadratic, and cubic hierarchical basis, the PPDF of model parameters are estimated by conducting MCMC simulations. The DREAM-based MCMC simulation without using the surrogate systems is also conducted, which is referred to the conventional MCMC, and its results are used as the reference to evaluate accuracy and efficiency of the three basis functions. All the MCMC simulations are conducted using the same searching domain Γ listed in Table 2. The prior distribution of each parameter is assumed to be uniform distribution with bounds the same as the searching domain. Each MCMC simulation draws 60,000 parameter samples using three Markov chains, each of which evolves 20,000 generations. Convergence of the Markov chains is examined using the Gelman-Rubin R statistic [Gelman et al., 1995], which indicates that the chains converge after 720, 840, 970, and 760 generations for the linear, quadratic, and cubic surrogates and the conventional MCMC, respectively. For simplicity, the first 1000 generations of each chain are discarded in all the four simulations, and the remaining samples are used to estimate the PPDF.
 Figure 7 plots the marginal PPDFs for the six parameters. The black vertical line represents the true values of the six parameters listed in Table 2. The red-solid lines are the marginal PPDFs estimated by the conventional MCMC, and the dashed lines represent those estimated by the MCMC simulations based on the surrogate systems. The figure indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are close to those of the conventional MCMC. However, the surrogate-based MCMC needs significantly fewer model executions. In comparison with the 60,000 model executions for the conventional MCMC, the number of model executions for the surrogate-based MCMC are 9226, 4375, and 3765 for the linear, quadratic, and cubic surrogate systems, respectively, which consists of those of global optimization, calculation of the inverse of Hessian matrix, and construction of surrogate systems. For the surrogate-based MCMC simulations, drawing the 60,000 parameter samples does not require any model executions but negligible computational time for polynomial evaluation using the surrogate systems. The improvement of computational efficiency by using our surrogate systems is more outstanding when more parameter samples are drawn in the MCMC simulation.
 The accuracy of the surrogate-based MCMC and the conventional MCMC is also compared by running the conventional MCMC with the same computational effort of surrogate-based MCMC, i.e., using the number of model executions needed to construct the surrogate systems. The marginal PPDF for each parameter based on the conventional MCMC with 9226, 4375, and 3765 samples are plotted in Figure 8 as dashed lines. Comparing Figures 7 and 8 indicates that, with the same number of model executions, the approximations in Figure 7 using our surrogate systems are more accurate than those in Figure 8 using the conventional MCMC, suggesting the efficiency of our surrogate-based MCMC method.
 To investigate computational efficiency between the three linear, quadratic, and cubic interpolants, Figure 9 plots their error decay with the number of interpolation points. To attain the same error, the cubic interpolant needs significantly fewer interpolation points than the linear and quadratic interpolants. This indicates that the surrogate system based on high-order hierarchical basis (i.e., the cubic basis) is more efficient than that with linear hierarchical basis. It suggests that, when computational resources are limited, using higher-order hierarchical basis is a better choice.
 Predictive performance of the four types of MCMC simulations is evaluated by using the parameter samples obtained above to predict spatial distribution of the concentration of species C3 with a different velocity of at time and . For each parameter sample drawn in the conventional MCMC, PHT3D is run for predictions; for the samples drawn in the surrogate-based MCMC, the predictions are estimated based on the surrogate systems. The surrogate systems are built via in equation (35) by setting and . Figure 10a shows that the upper and lower bounds of the 95% credible intervals based on the conventional and surrogate-based MCMC are identical. Figure 10b plots the probability density functions of C3 at a fixed location and time obtained using the four types of MCMC simulations. Figure 10b indicates that the four density functions are almost identical except at the peak. However, the computational cost of prediction is dramatically different for the conventional MCMC and surrogate-based MCMC. While the conventional MCMC needs to run the model for 57,000 times (the number of parameter samples), the surrogate-based MCMC only needs to run the models for 1853, 1032, and 793 times for building the linear, quadratic, and cubic surrogate systems, respectively.
4.2. Case 2: Reactive Transport of Uranium (VI) in Column Experiment
 The second synthetic study is designed based on the uranium reactive modeling of Kohler et al. , who conducted seven column experiments in a well-characterized U(VI)-quartz-fluoride column system and simulated the experiments using seven alternative surface complexation models (C1–C7) with different numbers of functional groups and reactions. The models were calibrated against three column experiments (Experiments 1, 2, and 8) conducted under different experimental conditions, and the calibrated models are used to predict the remaining four experiments (Experiments 3, 4, 5, and 7). Model C4 of Kohler et al.  is used in this study. As shown in Table 3, the model has two functional groups, called weak site ( ) and strong site ( ), respectively. The weak site is associated with one reaction, and the strong site with two reactions. The model has a total of four parameters, three of which are the formation rates of the three reactions, denoted as K1, K2, and K3. The fourth parameter is the fraction of the strong site, denoted as Site (the fraction of the weak site is calculated as 1 minus the fraction of the strong site). The 10-base logarithm of the parameters are listed in Table 3. In this study, following Kohler et al. , the concentration data are generated using the computer code RATEQ (developed by Curtis ) for the chemical conditions of Experiments 1, 2, and 8. The numbers of concentrations for Experiments 1, 2, and 8 are 39, 32, and 49, respectively. The synthetic data are corrupted by adding 3% random noise to the true concentration values.
Table 3. Surface Complexation Reactions, True Parameter Values, and Searching Region Γ of the Parameters in Case 2a
U(VI) Surface Reaction
Total site density used in this model is 1.3 M/L.
(Site) = –1.7104
 The PPDF of the four parameters, , , , and , are estimated using the 49 data of Experiment 8; the data of Experiments 1 and 2 are treated as prior data not used directly in the PPDF estimation. For the sake of demonstrating that our method is not limited to Gaussian likelihood function, the informal likelihood function of the exponential type in Table 1 is used with the coefficient . The searching region Γ of the parameters are listed in Table 3. The prior distribution is assumed to be uniform for , , and . For , a bivariate Gaussian distribution is assumed, because of the results of calibrating the model against the data of Experiments 1 and 2. The optimum parameters obtained using UCODE_2005 [Poeter et al., 2008] with the searching regions listed in Table 3 are
for Experiments 1 and 2, respectively. While the optima of , , and are very close, the optima of are significantly different for Experiments 1 and 2, which is also found in Shi et al. (submitted manuscript, 2013) for the same model. It suggests that the PPDF of may have at least two modes. This prior information is used for estimating the PPDF of (using the data of Experiment 8) by defining the bivariate Gaussian prior, , as
where the standard deviations, and , are obtained from the results of the local calibrations using UCODE_2005.
 The high-probability region is defined as follows. First, the global optimum for the objective function ( to guarantee that g is positive for numerical convenience) is estimated in the searching domain Γ. After 2309 model executions, the first maximum is found as
corresponding to . Calculation of the Hessian matrix using the formula in equations (5) and (6) takes 33 model executions. The inverse of the Hessian matrix is
With these results, the first high-probability region is defined using equation (8) with .
 Figure 11 illustrates the relation between the above defined high-probability region and the searching region. To make the visualization possible, is fixed at its optimum of −3.4077, and the high-probability region of the other three parameters is plotted in the gray region of Figure 11, which is transformed from the unit cube after rotation and dilation. The figure shows that the volume of the high-probability region is dramatically smaller than that of the searching region Γ given in Table 3. Nevertheless, the high-probability region is sufficiently large to cover all the MCMC samples obtained using DREAM around , which are plotted as the blue dots in Figure 11a. Based on the three-dimensional high-probability region, the adaptive sparse-grid interpolant is constructed using equation (35) and by setting , , L = 20, and the tolerance . The interpolant is built using the linear, quadratic, and cubic basis functions, and the number of model executions needed for the three interpolants are 1577, 633, and 393, respectively. The sparse grid of cubic basis function for parameters log , log , and log is shown in Figure 11b. Since the number of model executions to build an isotropic sparse grid is 6017, using the adaptive sparse grids surrogate is more computationally efficient. Among the three basis functions, the cubic hierarchical basis is more efficient and thus used for the calculation below. Using the cubic basis function, the number of model executions needed to build the four-dimensional sparse grid, , for all the model parameters increases to 548.
 The second maximum of is obtained by conducting global optimization to . It takes 3131 model executions to find the second maximum of , i.e., , whose corresponding parameters are
It is similar to except the value of log , which is close to the optimum value obtained using the data of Experiment 2. This is not surprising, because log is not influential to the data of Experiment 8. If one sets the significance tolerance (Figure 5), the significance ratio δ in equation (40) is
indicating that the optimum parameter set is a significant mode on the PPDF. Computing the Hessian matrix with 33 model executions and taking its inverse leads to
The high-probability region and the sparse grid for are developed in the same manner for , and the number of needed model executions is 609.
 The iteration continues, and the third set of optimum parameters
is obtained after 2984 model executions, and . The significance ratio in equation (40)
is dramatically smaller than the user-specified , indicating that this mode on PPDF is negligible in comparison with the other two modes, and . The iteration terminates, and the surrogate system using the cubic basis is used for the surrogate-based MCMC.
 Figure 12 plots the marginal posterior distribution of the four parameters and the 2-D contours of their combinations obtained using DREAM- and surrogate-based MCMC. For each MCMC simulation, like in the first numerical experiment, a total of 60,000 parameter samples are drawn using three Markov chains. The Gelman-Rubin R statistic indicates that the Markov chains converge after 600 and 420 samples for the DREAM- and surrogate-based MCMC, respectively. For simplicity, the first 600 samples are discarded in both the simulations, and the remaining samples are used to estimate the PPDF. Figure 12 indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are almost identical to those obtained using DREAM. However, considering that the number of model executions for the DREAM- and surrogate-based MCMC are 60,000 and 9647, respectively, with comparable accuracy, the surrogate-based MCMC is significantly more efficient.
 Predictive performance of the DREAM- and surrogate-based MCMC is evaluated by using the parameter samples obtained above to predict the breakthrough curve of Experiment 4 of Kohler et al.  with 118 measurements. Like in Case 1, a cubic surrogate system is built at each predicted point, which costs 411 model executions. Figure 13a plots the upper and lower bounds of the 95% credible intervals for the predictive breakthrough curve obtained from the DREAM- and surrogate-based MCMC. The two sets of credible intervals are visually identical. Figure 13b plots the density functions of concentrations at a pore volume of 3.76 in the predicted breakthrough curve obtained from the two kinds of MCMC simulations. The two sets of distributions are very close. While only 411 model executions are needed to build the surrogate system using the cubic basis function and to obtain the results in Figure 13, the DREAM-based MCMC requires 58,200 model executions.
 However, the number of model executions needed for the surrogate-based MCMC is still relatively large. One may raise the question that whether the same results can be obtained using computationally frugal methods, such as nonlinear regression, considering that Lu et al. [2012b] and Shi et al.  showed that nonlinear regression and Bayesian methods may give similar results for quantifying parametric uncertainty, while nonlinear regression methods only require hundreds of model executions. To answer this question, nonlinear regression is conducted using UCODE_2005, which minimizes the sum of squared weighted residual (SSWR) to estimate the local minimum of the parameters and calculate the parameter estimation covariance matrix. Since UCODE_2005 can incorporate prior information into the nonlinear regression, the prior density given in equation (48) is used in the UCODE_2005 optimization. The initial parameter values used for the local optimization are selected randomly in the searching region, Γ, listed in Table 3. The calibrated parameter values are , which is very close to the second mode, , on the PPDF. If one adjusts the initial parameter values, the first mode, , on the PPDF may be obtained. However, it is impossible to obtain the two modes simultaneously, despite that the two modes have similar density (Figure 12). Therefore, the discussion here is focused on the current local optimum that is close to . Its corresponding covariance matrix is
The variance terms indicate that, except for log , the parametric uncertainty is negligible for the other three parameters, which is incorrect based on Figure 12. In addition, the covariance terms do not accurately reflect the parameter correlation. For example, the covariance matrix shows that the correlation between and is negative, whereas it is actually positive as shown in Figure 12. The inconsistency is attributed to the nonlinearity of the model. The Beale's measure of nonlinearity of this model calculated using UCODE_2005 is 197.52, which is overwhelmingly larger than the threshold value of 0.39 [Hill and Tiedeman, 2007]. Therefore, the covariance matrix estimated above based on linearity assumptions cannot accurately quantify parametric uncertainty. While there may be other methods that are comparable with the surrogate-based MCMC in terms of accuracy and efficiency for parametric uncertainty quantification, identifying such methods and having a comprehensive comparison is beyond the scope of this study.
 This paper presents a new adaptive sparse-grid high-order stochastic collocation method (aSG-hSC) to improve the computational efficiency of Bayesian inference for quantification of parametric uncertainty. The method is model independent and flexible to be used together with any MCMC algorithms and likelihood functions (formal and informal). This study tackles a challenging problem of groundwater reactive transport modeling. High nonlinearity of groundwater reactive transport models cause difficulties of developing an accurate and efficient surrogate of the models and capturing significant modes on parameter distributions. These problems are resolved by combining high-order hierarchical polynomial basis and the local adaptive sparse-grid technique, which can greatly reduce the computational cost for the desired surrogate system in comparison with the existing sparse grid methods. To further reduce the computational cost of constructing a surrogate system for a parameter distribution with multiple modes, the iterative aSG-hSC algorithm is developed that uses optimization methods to find the modes sequentially. For each mode, a high-probability region is built, on which a component of sparse grids is constructed. The high-probability regions are significantly smaller than the searching region of MCMC simulation, which is the reason for saving computational cost. The iterative aSG-hSC method is demonstrated using two numerical examples of groundwater reactive transport models. In the both cases, the aSG-hSC method provides almost identical results of DREAM-based MCMC but requires a dramatically smaller number of model execution for estimating parameter distributions and quantifying predictive uncertainty. The first example involves only linear reactions and is suitable to demonstrate that higher-order hierarchical basis functions are more efficient. The second example involves nonlinear reactions and is thus highly nonlinear. Its parameter distributions are multimodal and non-Gaussian, and these features can be well captured in the results obtained using the iterative aSG-hSC method. These features however cannot be captured by the linear regression method investigated in this study. The computationally efficient aSG-hSC method is critical to the practical application of Bayesian inference to time-consuming groundwater reactive transport modeling. Due to the nonintrusive nature of the new method, it can be used together with many models and sampling methods used in hydrology and other fields.
 As a surrogate method, the iterative aSG-hSC method has some limitations. First, its computational performance relies on the ability of finding the modes using optimization methods. If the execution of the global optimization solver is computationally expensive, then it will deteriorate the efficiency of aSG-hSC. If the optimization fails to find an optimum parameter set at a given iteration, a significant mode may be missing. In this case, one has to sacrifice computational efficiency and use more sparse grid points. However, it is worth mentioning that this kind of challenge is not specific to our method but to all numerical algorithms of uncertainty quantification and optimization. It is expected that this problem can be resolved with advances in optimization techniques. In addition, since the Hessian matrix is to determine the high-probability domain for each significant mode, it remains empirical at this moment to find the optimal value for the user-defined constant in equation (8). When the shape of the detected significant mode is extremely complicated, the commonly used value, e.g., , may not be appropriate to cover the high-probability region. In other words, the reduction of the bounds for building sparse grids may not be significant as shown in the numerical examples of this study. The major challenge resides in nonsmoothness of the surface of parameter distributions due to nonlinearity of groundwater reactive transport models. Reducing nonlinearity may be a solution to the problems mentioned above.
 G. Zhang was supported by the Advanced Simulation Computing Research (ASCR), Department of Energy, through the Householder Fellowship at ORNL. M. Ye was supported by the DOE Early Career Award, DE-SC0008272. M. Gunzburger was supported by the US Air Force Office of Scientific Research under grant FA9550-11-1-0149. C. Webster was supported by the US Air Force Office of Scientific Research under grant 1854-V521-12. C. Webster was also sponsored by the Director's Strategic Hire Funds through the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory (ORNL). The ORNL is operated by UT-Battelle, LLC, for the United States Department of Energy under Contract DE-AC05-00OR22725.