A Gibbs sampler for inequality-constrained geostatistical interpolation and inverse modeling



[1] Interpolation and inverse modeling problems are ubiquitous in environmental sciences. In many applications, the parameters being estimated or mapped have physical constraints, such as nonnegativity (e.g. concentration, hydraulic conductivity), solubility limits, censored data (e.g. due to dry wells or detection limits), and other physical boundaries or missing data. Geostatistical interpolation and inverse modeling techniques have often been applied for estimating such parameters, but these methods typically cannot enforce physical constraints. This paper describes a statistically rigorous and computationally efficient Gibbs sampler, a Markov chain Monte Carlo technique, based on an a priori truncated Gaussian distribution model, which allows for multiple and variable physical constraints to be enforced within a geostatistical framework. Sample interpolation and inverse modeling applications confirm that estimates, uncertainty bounds and conditional simulations reflect the specified constraints, leading to conclusions that are more consistent with the underlying conceptual model, and provide a more accurate measure of the posterior uncertainty of the parameters being estimated. In addition, especially in inverse modeling applications, a posteriori confidence bounds are narrower even in areas where constraints are not imposed. The method is applicable in multiple dimensions, for data with or without measurement error, and with any variogram model.

1. Introduction

[2] Interpolation and inverse modeling problems are ubiquitous in environmental sciences. Methods based on geostatistical approaches have been used extensively to address these problems in surface and groundwater hydrology. In many applications, the parameters being estimated have physical constraints. Examples from environmental applications include solubility limits, nonnegativity constraints on hydraulic conductivity or species concentrations, concentration ranges for contaminants below an instrument's detection limit, maximum hydraulic head constraints when sampling wells are not sufficiently deep to locate the water table, and/or other physical boundaries. Enforcing known constraints on parameter values would allow the resulting parameter estimates to be consistent with the physical structure of the system. In addition, enforcing constraints can remove biases introduced when these constraints are ignored, and can decrease the uncertainty of estimates. As described in further detail in section 2, traditional geostatistical approaches are limited in their ability to enforce such constraints.

[3] The objective of this paper is to present a statistically rigorous, geostatistically-based, interpolation and inverse modeling approach applicable to inequality-constrained data and parameters. The approach is to be applicable with any variogram or covariance model, in multiple dimensions, and with multiple, and possibly variable, constraints. The proposed approach is conceptually related to the approach developed by Michalak and Kitanidis [2003, 2005], which uses a linear variogram model, and is applicable to a single constant constraint, for a system where the modeled parameter varies as a function of a single variable (e.g. time). Whereas this existing approach is based on Reflected Brownian Motion [e.g. Karlin and Taylor, 1975], the approach presented here is a generalization that makes use of truncated Gaussian distributions.

2. Background

2.1. Constrained Interpolation and Inverse Modeling

[4] In environmental applications, methods for incorporating inequality constraints into interpolation and inverse problems have been explored for a few problems, most commonly to enforce nonnegativity for estimated parameters. Within the geostatistical literature, these developed methods have relied on (1) constraining kriging weights to be nonnegative, (2) using Lagrange multipliers or related tools to enforce inequality constraints on the estimates, and (3) applying data transformations. These approaches are briefly summarized below.

[5] Szidarovszky et al. [1987] and Deutsch [1996] presented approaches that constrain kriging weights to be nonnegative, which, by default, yields nonnegative estimates given nonnegative observations. In general, however, nonnegative kriging weights are not necessary for producing nonnegative estimates. In addition, these approaches are not applicable for multiple or variable constraints or censored data. Barnes and You [1992] present an alternative to this approach, which does not constrain the kriging weights, but only the estimates themselves. At locations where the estimated value would otherwise violate an inequality constraint, the estimate is set to the inequality bound, and an adjustment is calculated to the estimation variance. Both in this approach and those presented by Szidarovszky et al. [1987] and Deutsch [1996], if this variance is used to define symmetric confidence intervals for the estimated field, then these confidence intervals may not honor the inequality constraints. In addition, the approaches have not been extended for application to inverse problems.

[6] Journel [1986] proposed a soft kriging approach based on indicator kriging which can satisfy multiple constraints but requires the formulation of a set of internally consistent indicator covariance models, which can be difficult to derive in practice [Yoo and Kyriakidis, 2006]. The final uncertainties are approximated using a series of indicator values. The approach has also not been extended to inverse problems.

[7] Lagrange multipliers have been used extensively in estimation problems for equality and inequality constrained parameters. This method amounts to restricting a common Gaussian process by replacing the original objective function f(s) by the Lagrange function

equation image

where s must satisfy the constraints gi(s) = bi or gi(s) ≥ bi, k is the total number of active constraints, and ν = (ν1, ν2,…,νk) are Lagrange multipliers (see Michalak and Kitanidis [2003] for a more thorough discussion). This approach is effective at bounding best estimates, can be applied to multiple and variable constraints, and is valid for both interpolation and inverse problems. However, this approach is not easily suited to assessing uncertainty bounds for the estimated parameters. This is because, if this approach is used in generating conditional realizations of the modeled process, the ensemble of realizations yields a finite probability for elements of s being equal to the constraint value, which is contrary to the definition of continuous random variables. A related approach, based on spline formalism, was presented in Dubrule and Kostov [1986].

[8] Data transformations are another popular method that has been applied to geostatistical and other estimation problems, especially for enforcing parameter nonnegativity. The approach involves using a mapping function to transform the original variable. Such methods have been applied both to geostatistical interpolation and inverse problems (e.g. Kitanidis and Shen [1996], Kitanidis [1997, p.70], Chilès and Delfiner [1999], Saito and Goovaerts [2000], Leuangthong and Deutsch [2004], and Diggle and Ribeiro [2007]). The most common method is the power transformation, which is defined as

equation image

where s is the vector of values in the original domain, equation image is the transformed data vector, and κ is a constant selected based on the application. The commonly used logarithmic transformation is included as a special case at the limit of κ tending to zero. In general, data transformations lead to highly non-symmetric probability density functions for the estimated parameter values in the untransformed space, make linear inverse problems nonlinear, and present difficulties for measurements of zero (e.g. non-detects). These and other difficulties have been documented in Snodgrass and Kitanidis [1997], Walvoort and de Gruijter [2001], and Michalak and Kitanidis [2003], among others. In addition, traditional data transformations are only valid for enforcing a single upper or lower bound that is constant for all estimation times or locations.

[9] In a few instances, numerical sampling approaches have been proposed to address specific problems in spatial interpolation. Militino and Ugarte [1999] used expectation maximization to analyze censored spatial data. The approach involves “whitening” the available data, and using these transformed observations to estimate the censored values. Abrahamsen and Benth [2001] used a Monte Carlo approach together with data augmentation and prior information on the drift parameters describing the trend in the modeled data to apply inequality constraints in a kriging context. Both of these approaches assumed exact measurements, and are further reviewed in De Oliveira [2005]. De Oliveira [2005] developed an MCMC Bayesian model for modeling the spatial distribution of a parameter with censored observations, using a Metropolis-Hastings algorithm [Hastings, 1970]. Fridley and Dixon [2007] recently developed a similar algorithm focusing specifically on estimating observations that are below detection limits, and making use of both a Gibbs sampler and a Metropolis-Hastings step. Some of these sampling approaches assume error-free measurements, while others are designed for specific constraints on data values, or do not fully account for the additional uncertainty associated with censored measurements. In addition, none of these approaches have been extended for application to inverse problems.

[10] Recently, Michalak and Kitanidis [2003, 2005] developed a nonnegativity-enforcing interpolation and inverse modeling approach based on a Gibbs sampling algorithm. This approach provides nonnegative best estimates as well as confidence intervals. The method is described in more detail in section 2.2.

2.2. MCMC Methods in Hydrology

[11] Markov chain Monte Carlo (MCMC) approaches have been applied in hydrology to problems such as hydrologic model parameter optimization [e.g. Vrugt et al., 2003a, 2003b] and parameterization of rainfall-runoff models [e.g. Bates and Campbell, 2001; Campbell et al., 1999; Marshall et al., 2004]. The majority of these approaches have been based on the Metropolis or Metropolis-Hastings algorithms. The Metropolis algorithm involves the sequential generation of correlated conditional realizations, which are accepted or rejected based on their a posteriori probability density relative to that of the previous accepted realization. The Metropolis-Hastings algorithm is an extension of this approach for the case where the transition probability between successive realizations is not symmetric. A useful introduction to these approaches is presented in Chib and Greenberg [1995]. In hydrology, these approaches have primarily been applied to the estimation of a relatively small number of distinct variables representing different components of an examined system. A few other applications, such as parameter optimization for groundwater contaminant fate and transport models [e.g. Balakrishnan et al., 2003] and zonation estimation [Chen et al., 2006], have also been developed.

[12] The applications that are of interest in this paper involve the estimation of a spatially or temporally autocorrelated parameter field that can be described using a geostatistical variogram or covariance model. In this case, the state space, which includes each of the parameters to be estimated, is much larger relative to the applications listed above, with each location in discretized time or space represented by a random variable, yielding hundreds to millions of random variables to be characterized. A few applications of MCMC approaches to spatial processes have been presented in the literature. For example, Chen et al. [2004] presented an application of a Gibbs sampler for estimating the spatial distribution of iron concentrations using GPR tomographic data and borehole lithofacies logs. A Gibbs sampler is an approach that relies on sequentially sampling the marginal distribution of individual random variables. An overview of this general approach can be found in Casella and George [1992].

[13] Michalak and Kitanidis [2002, 2004a] developed a Metropolis-Hastings algorithm for enforcing nonnegativity in geostatistical inverse modeling. Michalak and Kitanidis [2003, 2005] developed a Gibbs sampling approach for enforcing nonnegativity using a prior probability density function based on the method of images applied to reflected Brownian motion [Karlin and Taylor, 1975]. This approach yields a prior pdf that is the sum of a Gaussian distribution and its reflection about the constraint boundary. This method eliminates some approximations necessary in the earlier approach, and the method was applied in an interpolation context to contaminant load estimation in the North Fork of the Humboldt River, and in an inverse modeling context to contaminant source identification at the Dover Air Force Base. The approach uses the method of images to define a marginal probability density function that is defined only for the nonnegative portion of the parameter space. This approach provides a rigorous statistical framework for defining a probability density function that constrains parameter values to be nonnegative at each estimation point, yielding nonnegative best estimates as well as confidence intervals. Given its reliance on the method of images, the approach is applicable for parameters that vary as a function of a single variable (e.g. time). Furthermore, the Brownian motion model implies that the modeled process can be represented using a linear variogram. Fienen et al. [2004] applied the approach to vertical deconvolution of hydraulic conductivities.

2.3. Use of Truncated Normal Distribution for Statistical Estimation

[14] Applications involving constrained estimation, including nonnegativity, other threshold constraints, and censored data abound in the statistical literature. One of the most common approaches to enforcing inequality constraints has been through the implementation of truncated distributions. Several of the implemented algorithms make use of the Gibbs sampling algorithm, described in section 2.2. Smith and Roberts [1993] provide a review of such applications, with those most relevant to the current work involving constrained parameter models, missing data, and censored data problems. Gelfand et al. [1992] also describe Gibbs sampling for constrained parameter and truncated data problems, specifically in the context of Bayesian analysis. In the context of mining and petroleum engineering, a Gibbs sampler applied with a truncated Gaussian distribution has been applied for generating individual simulations of sampled constrained processes [Freulon and de Fouquet, 1993; Freulon, 1994], and this approach has also been described in Lantuéjoul [2002] and Chilès and Delfiner [1999]. More recently, this approach has been used to delineate geologic facies [Armstrong et al., 2003]. In these applications, the emphasis is on generating a single representative map of the sampled process, and not on assessing the uncertainty associated with the estimated process, or exploring the impact of constraints in inverse modeling applications.

[15] Overall, although the use of the Gibbs sampler for sampling truncated distributions as a solution to constrained problems is well established in the statistical literature, the applicability of such a method for estimating uncertainty in both geostatistical interpolation and inverse modeling is presented for the first time in this work.

3. Model Development

[16] Bayes' rule states that the posterior pdf of a state vector s given an observation vector z is proportional to the likelihood of the data given the state, times the prior pdf of the state. Symbolically:

equation image

In this context, prior and posterior are with respect to using the data z, and the state vector s represents the parameter field under estimation. If we want to enforce constraints of the form g(s) ≥ b, we modify the probability distribution by multiplying it by heaviside functions corresponding to the desired constraints, effectively truncating the distribution at the constraint boundary, and rescaling the remaining portion of the probability density function to integrate to unity:

equation image


equation image

The inclusion of the heaviside function in the integral in the denominator of the pdf ensures that, at each point in the discretized parameter field s, the pdf still integrates to unity, which amounts to truncating the prior pdf at the inequality constraints. This approach is theoretically applicable with any probability density function, but is used here to truncate a Gaussian prior. Note that this setup can be used for specifying both lower and upper limits on parameter values. In addition, because both g and b are vectors, different upper and/or lower bounds can be specified for each component of the vector s, such that the constraints can vary in space or in time. For example, for the simple case of a nonnegative parameter with an upper limit smax (e.g. a solubility limit), k = 2 and

equation image

More complex examples of constraints are described in the applications presented in section 5.

[17] In geostatistical approaches, the prior represents the deviation of the unknown vector from a model of the trend. As such, if we assume a truncated multi-Gaussian distribution, we define the unconstrained portion of the prior probability of the parameter values as

equation image

where s is an m × 1 state vector obtained from the discretization of the unknown parameter distribution that we wish to estimate, Xβ is the model of the trend, where X is a m × p matrix of drift functions and β is a p × 1 vector of unknown drift coefficients, Q is the m × m prior covariance matrix based on the geostatistical model of the parameter distribution, and equation imageequation image denotes matrix determinant. In the case of inverse modeling, Q is the prior covariance matrix of the estimated state vector. In the case of interpolation, Q represents the covariance among all estimation and measurement locations. Given that the method, as presented in this work, is based on a prior defined as a truncated Gaussian distribution, the method should only be applied for datasets where a multivariate Gaussian distribution would be appropriate, were it not for the presence of constraints.

[18] The likelihood of the observations can be expressed as:

equation image

where z is an n × 1 vector of observations. When the system is underdetermined, which is the case of most interest for the application of geostatistical approaches, m > n. The vector θ contains other parameters needed by the model function h (s, θ). We assume that the measurement error ɛ has zero mean and known covariance matrix R. The measurement errors are typically assumed to be uncorrelated, yielding a diagonal matrix R, with the measurement error variances σR,12 to σR,n2 on the diagonal. Note that the measurement error encompasses both the actual observation error when data are collected, and any inaccuracies inherent in the physical or conceptual model used to represent the problem. If the dependence of observations is linear in the unknown parameter s, then:

equation image

where H is an n × m sensitivity matrix, where Hij = ∂zi/∂sj. For interpolation applications, H simply maps the measurement locations onto the unknown state vector, and is therefore made up of ones and zeros. For inverse modeling application, H represents the sensitivity of each observation to each element of the state vector, as determined by a physical model of the system (e.g. a groundwater transport model). The method, as presented here for inverse modeling applications, assumes that the forward model is indeed linear. Although the presented approach could also be implemented within an iterative solution to a non-linear inverse problem, the computational feasibility of such an application will be the topic of future research.

4. Conditional Realizations and Estimation of Uncertainty

[19] One of the advantages of using a stochastic approach to interpolation and inverse problems is that physically significant confidence intervals and conditional realizations can be obtained in addition to a best estimate of the parameter values. In the case of classical linear geostatistical methods, the posterior pdf of the unknown parameters is a multivariate Gaussian distribution, and can therefore easily be sampled.

[20] The constrained posterior pdf in equation (4) is not Gaussian, and therefore does not lend itself to straightforward computation of confidence intervals or generation of conditional realizations. Therefore, a Markov chain Monte Carlo (MCMC) method is used to obtain conditional realizations from the posterior pdf. Ensemble properties of conditional realizations can then be used to infer other statistics of the estimated parameters, such as a measure of central tendency and percentiles, which can be used as a measure of the uncertainty associated with the estimate. In the discussion that follows, the median of the conditional realizations is used as the measure of central tendency, because it represents the estimated value for which there is an equal probability of the true parameter value being above or below the estimate. The mean can clearly also be used as a measure of central tendency, depending on the application.

[21] MCMC methods allow for the sampling of probability density functions in multiple dimensions with computational effort that is manageable relative to performing the multi-dimensional integrations that would otherwise be required. The dimensionality of the posterior pdf is equal to the number of points in the discretized parameter field, and can therefore easily be on the order of thousands to millions.

[22] One of the methods falling into the MCMC category is the Gibbs sampling algorithm. In this approach, conditional realizations are generated by sequentially sampling the marginal (i.e. 1-dimensional) probability density function at each point in the discretized parameter field, while holding the values at all other points constant. This marginal pdf is defined using the most recent conditional realization available at each of the other points in the discretized parameter space. It can be shown that, once the chain has converged, the realizations resulting from this process are equally likely realizations from the full multi-dimensional posterior pdf (see, for example, Casella and George [1992]).

4.1. Description of Constrained Geostatistical Gibbs Sampling (CGGS) Algorithm

[23] This section describes the Gibbs sampling algorithm as implemented in the constrained interpolation and inverse modeling method. We define the l-th conditional constrained realization as scc,l. In this context, a conditional realization is one that has been conditioned on the data z. The chain can be initialized with any realization that has a non-zero posterior probability. The Gibbs sampler proceeds as follows (see, for example, Casella and George [1992] for a more detailed discussion):1. Set initial values scc,0 = (suc,0(x1), suc,0(x2),…, suc,0(xm))T and initialize the iteration counter of the chain l = 1.2. Obtain a new conditional constrained realization scc,l = (scc,l(x1),scc,l(x2),…,scc,l(xm))T from scc,l−1 through successive generation of values from the marginal pdf at each point (note the use of counter values l and l − 1):

equation image

3. Change counter l to l + 1 and return to step 2 until convergence is reached.

[24] When convergence is reached, the resulting realization scc,l is a conditional constrained realization from the full posterior pdf. Steps 2 and 3 can then be repeated to obtain additional conditional realizations. In the presented applications, convergence is evaluated by tracking the values of the two components (likelihood and prior) of the posterior probability of the realizations. When the running average of both components stabilizes, convergence has been reached. The number of realizations discarded during this ramp-up portion of the Markov chain is listed in section 5 for each of the presented applications. Once the realizations are representative of the probability space of the estimated parameters, the chain is run until the probability space has been appropriately sampled. The starting realization and the length of the Markov Chain are also listed in section 5 for each of the presented applications. The reader is referred to the original work by Geman and Geman [1984], and subsequent publications by Casella and George [1992], Gamerman [1997], Gelman et al. [1995], and Robert and Casella [2004] for a more thorough discussion of convergence in the context of the Gibbs sampler.

4.2. Derivation of Marginal Posterior Probability Density Function

[25] In order to apply the Gibbs sampling algorithm as described in the previous section, a marginal pdf from which it is possible to draw samples of s at a single point xi is needed. The problem is broken into two components, first presenting the marginal likelihood, then the marginal prior.

4.2.1. Marginal Likelihood of si

[26] The likelihood of the measurements, defined in equation (9), can be expressed as [Michalak and Kitanidis, 2003]:

equation image

Therefore, the marginal likelihood for a single point in the discretized parameter field s, if values at all other points are held constant, is:

equation image

combining the n Gaussian distributions, we obtain

equation image


equation image

[27] In the case of interpolation

equation image

yielding μL = zj if the ith estimation point xs,i is at the jth measurement point xz,j, and p(zsi) ∝ c, where c is a constant, otherwise. For the special case of interpolation with no measurement error, p(zsi) = image = zj at a measurement point; in other words, the estimate at a measurement location is fixed at the measurement value.

[28] Note that the equations presented in this section have been written under the common assumption that measurement errors are independent, as described in section 3. The presented approach can also be applied with correlated errors, modifying the presented equations to reflect the resulting off-diagonal terms in R, in a manner analogous to the way in which the off-diagonal terms in Q are treated in the following section.

4.2.2. Marginal Prior of si

[29] The prior probability density function of the discretized parameter field s was defined in equations (4) and (7). The unconstrained portion of the marginal prior pdf of a single point in the discretized parameter field is therefore p′(sis), where, in this case, s = {s1,…,si−1,si+1,…,sm}. This portion of the marginal prior can be rearranged to be

equation image

where (Q−1)j,k refers to the j,k-th component of Q−1, not the inverse of the j,k-th component of Q. This expression is the probability density function of si, given known values for all other elements of s. This is equivalent to a standard kriging problem where all elements of s except for si are treated as the measurement locations, and si is the point at which an estimate is needed. The mean and variance of the distribution of si define a normal distribution:

equation image

where the mean μP and variance τP are obtained as in the solution of a regular universal kriging system of equations, holding all measurement and other estimation points constant, and estimating at a single point:

equation image

which yields the kriging weights λ and the Lagrange multipliers ν. These parameters define the mean and variance of the estimate of si as:

equation image

where sj = ski includes all the most recent sampled values at all other estimation points, and Qj,i = Qki,i is the covariance between the current estimation point, and all other estimation points. For interpolation, the measurement locations are included as estimation points in Q. Equivalently, the influence of the trend Xβ can first be removed, and these statistics can then instead be represented as:

equation image


equation image

[30] Note that obtaining μP and τP at each point requires the solution of m systems of m linear equations, which are then used for each realization. However, the kriging weights and Lagrange multipliers are only a function of the arrangement of the measurements and estimation locations, not of the values of the samples or realizations at these locations (see equation (17)). Therefore, these systems of equations need only be solved a single time, and not for every realization, yielding considerable computational savings.

4.2.3. Marginal Posterior of si

[31] Combining the marginal likelihood and prior, the full marginal posterior pdf of si can be expressed, to within a normalizing constant, as:

equation image

which can be written as:

equation image


equation image

This general formulation can be applied to include constraints in kriging with no measurement error (e.g. ordinary kriging, universal kriging, kriging with a trend, etc.), kriging with measurement error (a.k.a. continuous-part kriging), and linear geostatistical inverse modeling.

4.2.4. Sampling of Marginal Posterior of si

[32] We are interested in sampling from a normal distribution that has been truncated at the inequality constraints. As such, we simply draw a random sample from the distribution N(μ,τ), until we obtain one that is in the allowable range for si. Once we obtain an admissible sample, this realization of si is used as the next conditional constrained realization at point xi, denoted scc,l(xi). The computational efficiency of this sampling approach can further be improved by implementing a more efficient algorithm for obtaining a sample from a one-dimensional truncated normal distribution. Robert [1995], for example, presents an efficient method for sampling the portion of a normal distribution lying within the constraint boundaries, based on an acceptance-rejection scheme using an exponential distribution.

5. Applications

[33] The following section includes three sample applications of the CGGS approach. The first application represents a hypothetical case of interpolating hydraulic head in the presence of censored data and variable topography. The second application involves the estimation of contaminant load in a river with censored data and a nonnegativity constraint. The third application is a hypothetical inverse modeling example involving the estimation of the historical distribution of a contaminant in an aquifer. These sample applications were selected to be both simple and illustrative of a variety of applications for the presented approach, with the recognition that field applications will often have higher dimensionality relative to the presented examples.

5.1. Example 1: Hydraulic Head Mapping

[34] The first application involves the estimation of a hypothetical hydraulic head distribution in an unconfined aquifer. Five wells are present in the aquifer, three of which measure the hydraulic head, and two of which are dry. The two dry wells represent censored data, or inequality constrained data, because we know that the water table is below the bottom of the well, but we do not know its exact location. We wish to estimate the water table depth at all unsampled locations, including the two dry wells. As an additional constraint, we know that the groundwater table cannot be above the land surface, and all estimation locations can also therefore be thought of either as inequality constrained parameters, or as censored data. For simplicity, we assume that the measurements in the wells are exact (i.e. no measurement error). An example where this assumption does not hold is presented in the second application.

[35] Figure 1a presents the available data and estimation constraints. We assume that the unconstrained spatial covariance of the water table distribution can be represented using an exponential covariance function:

equation image

with parameters σ2 = 1.0 m2, l = 600 m, where hij is the separation distance between estimation points i and j. In a practical setting, this covariance information would be derived from additional sampling or previous experience. The hydraulic head is estimated at 1 m intervals.

Figure 1.

(a) Hypothetical hydraulic head measurements. (b) Best estimate and 95% confidence intervals for hydraulic head distribution using constrained estimation method. (c) Best estimate and 95% confidence intervals for hydraulic head distribution using unconstrained estimation method. (d) Sample conditional realizations for hydraulic head distribution using constrained and unconstrained estimation methods.

[36] Figure 1c presents the estimated water table distribution using ordinary kriging. Because ordinary kriging cannot directly handle the inequality constraints imposed by the dry wells, these measurement location either have to be ignored, or the water table depth has to be set to a fixed value. In this case, ignoring the dry wells would yield best estimates at the dry well locations above the bottom of one of these wells, a physically inconsistent result. Therefore, as is often done in practice, the water table in the dry wells is assumed to be at the bottom of the well. This approach underestimates the uncertainty in the groundwater level at and around the dry well locations. In addition, ordinary kriging cannot incorporate the ground level constraint on the water table depth, and the uncertainty bounds of the estimate extend above ground. This is again a physically inconsistent result stemming from the Gaussian assumptions common to estimating uncertainty bounds using the kriging variance.

[37] Figure 1b presents the estimated water table distribution using the CGGS algorithm. No measured water table depth is specified at the dry well locations, but the groundwater level is constrained to be below the bottom of the well. In addition, the realizations are everywhere constrained to be below the ground surface. The Markov chain was initialized with the ordinary kriging best estimate of the hydraulic head distribution, because this was a realistic starting point that also did not violate the constraints for this application. The chain is run for a total of 50,000 realizations, because the large distance between measurements requires a relatively large number of realizations in order to effectively sample the uncertainty space. Because the starting point for the Markov chain is very good in this case, only the first 100 realization of the chain are discarded in the analysis. Contrary to the ordinary kriging result, none of the estimates or their uncertainty bounds violate the physical constraints imposed by the dry wells or ground surface. In addition, the uncertainty associated with the groundwater level at the dry wells is realistically represented, with wide uncertainty on the water table depth at these locations. Interestingly, for both dry wells, the best estimate of the groundwater depth is significantly below the bottoms of the wells for this example. Finally, the uncertainty intervals on the water table depth are not symmetric close to constraint boundaries, which is realistic for this situation and results from the use of a non-Gaussian a priori pdf.

[38] Figure 1d presents a sample conditional realization generated using a multi-Gaussian assumption as is representative of a traditional kriging sampling approach, and a sample conditional realization generated using the proposed constrained algorithm. The two realizations exhibit a similar degree of spatial variability, because they are both based on the same prior covariance model. As was already demonstrated in Figure 1b, however, the realization generated using the CGGS algorithm does not reach above ground level, and is better reflective of the uncertainty in groundwater levels at the locations of the dry wells.

[39] Note that the marginal pdf at each point for each conditional realization is modeled as a truncated Gaussian distribution. However, the overall pdf describing the uncertainty across the ensemble of realizations can take on a variety of forms (Figure 2). Away from constraint boundaries (e.g. x = 100 m), the pdf is close to Gaussian and similar to that which would be obtained using a traditional kriging setup. At locations where constraints have a significant impact on the estimates (e.g. x = 140 m), the final pdf looks like the truncated Gaussian pdf used in the sampling procedure. Near such boundaries (e.g. x = 138 m, x = 20 m), the distribution is skewed. These pdfs reflect the spread in the ensemble of conditional realizations used to characterize the uncertainty associated with parameter values at unsampled locations. Note that the product of a multidimensional truncated-Gaussian prior with a Gaussian likelihood function does not point-wise yield a truncated Gaussian distribution.

Figure 2.

A posteriori hydraulic head probability density function at four sample points.

5.2. Example 2: Contaminant Load Estimation

[40] The Humboldt River basin is located in North-Eastern Nevada in the United States, and its water resources have a variety of recreational and agricultural uses. The Humboldt River contains arsenic which results in large part from mining practices when mineralized rock is crushed and exposed to oxygen and water. The concentration history of dissolved arsenic in the North Fork of the Humboldt river and the total dissolved arsenic load supplied to downstream locations were previously estimated using the nonnegativity-enforcing Gibbs sampling approach of Michalak and Kitanidis [2005].

[41] The CGGS approach not only allows for this lower concentration bound to be enforced and measurement errors to be taken into account, but also provides a statistically rigorous methods for accounting for censored data where the measured concentration was below the detection limit. The new approach is compared with results obtained using ordinary kriging with a linear variogram, chosen based on an examination of the experimental variogram of available data and for easy comparison to Example 3 in Michalak and Kitanidis [2005]. The covariance structure of the contaminant concentrations is therefore modeled using a linear generalized covariance function:

equation image

where θ = 10−8 (μg/l) 2day−1, hij is the time lag between the i and j estimation times, and the generalized covariance takes the place of the covariance Qij used for stationary parameter distributions.

[42] The concentration data were obtained from the EPA STORET database [EPA, 2003] and are plotted in Figure 3a. All measured and estimated concentrations must be nonnegative. Based on documentation from EPA, the detection limit is 3 μg/l. The measurement error is assumed to be normally-distributed with a variance of 0.25(μg/l)2, yielding 95% confidence bounds of ±1 μg/l, corresponding to the reported data precision.

Figure 3.

(a) Concentration measurements for North Fork of Humboldt River at North Fork Ranch, Elko county, Nevada. (b) Best estimate and 95% confidence intervals for concentration as a function of time using constrained estimation method. (c) Best estimate and 95% confidence intervals for concentration as a function of time using unconstrained estimation method. (d) Sample conditional realizations for concentration history using constrained and unconstrained estimation methods.

[43] We discretize the concentration history into ten-day intervals, augmented by the times at which measurements were actually taken, with time zero starting on the day of the first measurement, 21 April 1999, yielding a total of m = 173 estimation times. Note that although we are interested in the variability of the arsenic concentration in time rather than in space, the approach presented in sections 3 and 4 is directly applicable, simply by substituting temporal coordinates t for spatial coordinates x in the algorithm presented in section 4.1.

[44] For this application, the median and 95% confidence intervals of the probability density functions of concentration values at ten day intervals are determined based on ensemble properties of conditional realizations generated using the method described in section 4. The Markov chain is initialized with the ordinary kriging best estimate of the concentration history, because this is a realistic starting point that also does not violate the constraints for this application. The chain is run for a total of 10,000 realizations. Because the starting point for the Markov chain is very good in this case, only the first 100 realization of the chain are discarded in the analysis. Results are plotted in Figure 3b. The equivalent plot using kriging with a linear variogram is presented in Figure 3c. As can be seen in these figures, the proposed approach behaves similarly to the kriging interpolation away from constraints, but the best estimate near constraints deviates from the kriging estimates. For the kriging application, the non-detect points are assumed to have no arsenic (0 μg/l). An alternative approach which is sometimes used is to set non-detects at half the detection limit (1.5 μg/l in this case).

[45] As can be seen from Figure 3b, the new methodology is effective at enforcing parameter nonnegativity, and constraining non-detects to below the detection limit without specifying a prior estimate of concentrations at those times. The measurement uncertainty is also reflected in the estimates. Note that the measurement error is modeled through the likelihood term (section 4.2), whereas the non-detects are modeled as an interval constraint in the range of zero to the detection limit. By design, the new methodology behaves similarly to kriging with a linear variogram in high concentration regions.

[46] Traditional geostatistical simulation, on the other hand, leads to conditional realizations and confidence intervals reaching into the negative parameter range, which have no physical significance, and can be misleading. In addition, kriging requires explicit assumptions about the concentration for non-detect samples, and cannot account for the finite uncertainty range for these measurement times.

[47] Figure 3d presents a sample conditional realization generated using a multi-Gaussian assumption as is representative of a traditional kriging setup, and a sample conditional realization generated using the CGGS algorithm. The two realizations exhibit similar degrees of spatial variability, because they are both based on the same covariance model. As was already demonstrated in Figure 3c, however, the realization generated using the proposed algorithm does not violate the nonnegativity constraint, and is better reflective of the uncertainty in concentrations at times of non-detect samples.

[48] The obtained estimates can also be used in conjunction with flowrate information to estimate total contaminant load, as was presented in Michalak and Kitanidis [2005]. River flow data for the equivalent time period, however, were not available. Therefore, flows for 1 January 1971, through 31 December 1981, are averaged to obtain a representative hydrograph for the stream [USGS, 2001]. These daily average flows are used to estimate the flowrate history for the period of 21 April 1999, through 30 July 2003, by assigning to each day a flowrate equivalent to the average flow for that calendar day. These flows are presented in Figure 4.

Figure 4.

Averaged flow data at one-day intervals for North Fork of Humboldt River, Elko county, Nevada.

[49] To estimate the total contaminant load, individual conditional realizations are weighted using river flows, yielding an ensemble of contaminant loads that can be used to describe the uncertainty associated with this quantity. The total loads are presented in Figure 5 for the kriging and constrained approaches. As discussed in Michalak and Kitanidis [2005], for this river, high concentration events are associated with high flows. Therefore, because the proposed method has a stronger impact close to constraint boundaries, the impact of the constraints on total load is relatively limited. However, Figure 5 shows that the kriging approach consistently underestimates contaminant loads relative to the constrained approach, which is more consistent with the physical bounds on parameter values. This effect is more pronounced than that discussed in Michalak and Kitanidis [2005], because the current approach is able to provide a better representation of observations with non-detect concentrations. At the end of the four-year period, the new approach estimates a statistically significantly higher contaminant load relative to the kriging approach, at the 0.05 confidence level. This effect is important because it implies that methods that do not account for physical constraints on parameter values, especially with regard to nonnegativity and representation of non-detect values, can lead to strong underestimation of contamination.

Figure 5.

Cumulative arsenic load best estimate and 95% confidence intervals for constrained and unconstrained estimation methods.

5.3. Example 3: Estimation of Historical Contaminant Distribution

[50] The final example application involves the identification of the historical distribution of a contaminant in a two-dimensional aquifer, and is modeled after the heterogeneous example presented in Michalak and Kitanidis [2004b]. The two-dimensional distribution at time Ta is estimated based on downgradient concentration measurements taken at time Tb = Ta + 2000 days. The affected aquifer is represented as having a deterministically heterogeneous hydraulic conductivity field.

[51] The domain is finite, measuring 1024 m and 512 m in the x1 and x2 directions, respectively. It is discretized into 128 × 64 nodes in the x1 and x2 directions, respectively, resulting in an 8 m × 8 m grid. No-flux boundary conditions are applied at the top and bottom boundaries for both flow and transport. The left-hand side and right-hand side boundaries have prescribed constant heads, resulting in a mean gradient of 3.472 × 10−2 m/m. Details regarding the aquifer heterogeneity are available in Michalak and Kitanidis [2004b]. The flow solution is obtained using MODFLOW [McDonald and Harbaugh, 1988; Harbaugh and McDonald, 1996].

[52] The actual contaminant distribution at time Ta used in this example is presented in Figure 6a. The plume profile at time Tb is obtained using MT3DMS [Zheng, 1990; Zheng and Wang, 1999]. The boundary conditions used to solve the forward problem are:

equation image

where t is time, xi are the spatial directions (i = 1, 2), x = (x1, x2), C is resident concentration, η is porosity, Dij is the i,jth entry of the dispersion tensor, and vi is fluid velocity in the direction xi. Uncorrelated random error with a standard deviation of 1 × 10−3 mg/l is added to the observations to represent measurement error.

Figure 6.

(a) Actual contaminant distribution at time Ta. (b) Actual contaminant distribution at time Tb and measurement locations for inverse modeling application. (c) Recovered contaminant distribution for time Ta using constrained estimation method. (d) Recovered contaminant distribution for time Ta using unconstrained estimation method. Concentration units are mg/l.

[53] The distribution at time Tb is presented in Figure 6b, along with sampling locations. The sampling is conducted on a 32 m × 32 m grid, yielding a total of 105 observation locations. We recover the contaminant distribution in the region Ωa = {x: x1 ∈ (0,256), x2 ∈ (168, 392)}. For the purpose of solving the inverse problem, this area is discretized into 8 m intervals, yielding 896 points at which the concentration at time Ta is to be estimated. This represents a strongly underdetermined problem. The adjoint approach of Michalak and Kitanidis [2004b] is applied to define the sensitivity matrix H needed to solve the inverse problem, which defines the sensitivity of each observation at time Tb to a historical concentration at each location in the domain Ωa. The covariance of the concentration distribution at time Ta is taken from this earlier work, where it was estimated using a Restricted Maximum Likelihood approach. A cubic generalized covariance model was used:

equation image

where hij is the physical separation distance between the ith and jth locations at which the contaminant distribution is to be estimated, and θ = 10−6 (mg/l)2 m−3 [Michalak and Kitanidis, 2004b].

[54] Figure 6d presents the recovered historical distribution using a linear geostatistical inverse modeling approach, analogous to the method presented in Michalak and Kitanidis [2004b]. Although the overall distribution at time Ta is recovered reasonably well, the best estimate includes areas with negative concentrations. Even in locations where the best estimate itself is positive, the uncertainty bounds can encompass negative values, as seen in the one-dimensional slice presented in Figure 7b. Figure 6c presents the recovered historical distribution using the proposed CGGS approach. The Markov chain is initialized with the absolute value of the best estimate obtained using linear inverse modeling, because this is a starting point that does not violate the constraints for this application. The chain is run for a total of 1000 realizations. As in the two previous applications, the first 100 realization of the chain are discarded in the analysis. The applied constraint in this case is nonnegativity within the entire domain. As can be seen from this figure, the best estimate is indeed everywhere nonnegative. In addition, as seen in Figure 7a, the entire probability density function at each point is constrained to be nonnegative, yielding positive uncertainty bounds.

Figure 7.

Cross-section at y = 236 m of recovered contaminant distribution for time Ta. (a) Using constrained estimation method. (b) Using unconstrained estimation methods. Concentration units are mg/l.

[55] Furthermore, the accuracy and precision of the estimates are improved by the addition of the constraint. First, the third peak in the historical contaminant distribution, which is absent in the estimate obtained using the linear approach, is correctly inferred in the best estimates of the constrained approach. Second, because of the strong implicit constraint on total contaminant mass offered by the plume measurements, the addition of the nonnegativity constraint decreases the uncertainty throughout the domain. Conceptually, through mass conservation, by eliminating the possibility for negative concentrations, the possibility for some large positive concentrations is eliminated as well. This effect can be seen clearly by comparing Figures 7a and 7b, where the uncertainty bounds for the constrained approach are everywhere narrower relative to the linear approach. This effect is especially pronounced in areas of low concentration. Overall, the new approach successfully enforces physical constraints in an inverse modeling setup, while improving the precision and accuracy of the obtained estimates. Note that the inverse modeling approach used for the solution of the solute inverse problem assumes a known transport model, parameterized in H. The current literature on stochastic methods for solving solute transport inverse problems in groundwater hydrology does not consider uncertainties in transport parameters (see, e.g. review in Michalak and Kitanidis [2004b]), although a few recent works have considered such uncertainty in a deterministic context [e.g. Sun et al., 2006; Sun, 2007]. Relaxing this assumption within a probabilistic framework is the topic of ongoing research. The innovation presented in the current paper, however, focuses specifically on the treatment of the constraints within the solution space.

6. Conclusions

[56] The presented approach provides a statistically-rigorous methodology for geostatistical interpolation and inverse modeling, subject to multiple and spatially-variable inequality constraints. The approach uses a Gibbs sampler to characterize the marginal probability distribution at each estimation point, using a truncated Gaussian prior probability distribution. As presented in the current work, the method is applicable to cases where a truncated Gaussian distribution is a good representation of the a priori uncertainty on the parameter distribution, and would therefore be less applicable to highly skewed distributions. In addition, for applications to inverse problems, the method assumes a linear and known forward model.

[57] From a methodological perspective, the three presented applications demonstrate the applicability of the proposed approach to a wide range of problems and constraints. Censored data with an upper bound were used in Example 1, whereas interval censored data were used in Example 2. Nonnegativity was enforced in Examples 2 and 3. Example 1 also demonstrated the applicability of the proposed approach to data with a spatially-variable inequality constraint. Applications to interpolation were presented in Examples 1 and 2, whereas an application to geostatistical inverse modeling was shown in Example 3. Example 1 presented an application in one spatial dimension, Example 2 presented a temporal problem, and Example 3 represented an application in two spatial dimensions. Finally the three presented examples also applied three different covariance structures (exponential covariance, linear generalized covariance, cubic generalized covariance).

[58] Scientifically, the presented examples demonstrate the broad need for constrained interpolation and inverse modeling approaches, with sample applications to groundwater flow, surface water quality, and groundwater solute transport. The presented approach can also be applied to a wide range of other data types and problems.

[59] Overall, the geostatistical Gibbs sampler using a truncated Gaussian prior distribution offers an effective, simple, computationally efficient and universally applicable approach for spatial and temporal interpolation and inverse modeling problems in environmental applications where inequality constraints are present. In addition, the method yields estimates with high precision and accuracy relative to classical geostatistical methods.


[60] This material is based upon work supported by the National Science Foundation under grant 0644648 “Development of Geostatistical Data Assimilation Tools for Water Quality Monitoring.” Olaf Cirpka, Jinsong Chen and one anonymous reviewer contributed significantly to the final version of this paper.