Water Resources Research

A method for enforcing parameter nonnegativity in Bayesian inverse problems with an application to contaminant source identification

Authors


Abstract

[1] When an inverse problem is solved to estimate an unknown function such as the hydraulic conductivity in an aquifer or the contamination history at a site, one constraint is that the unknown function is known to be everywhere nonnegative. In this work, we develop a statistically rigorous method for enforcing function nonnegativity in Bayesian inverse problems. The proposed method behaves similarly to a Gaussian process with a linear variogram (i.e., unrestricted Brownian motion) for parameter values significantly greater than zero. The method uses the method of images to define a prior probability density function based on reflected Brownian motion that implicitly enforces nonnegativity. This work focuses on problems where the unknown is a function of a single variable (e.g., time). A Markov chain Monte Carlo (MCMC) method, specifically, a highly efficient Gibbs sampler, is implemented to generate conditional realizations of the unknown function. The new method is applied to the estimation of the trichloroethylene (TCE) and perchloroethylene (PCE) contamination history in an aquifer at Dover Air Force Base, Delaware, based on concentration profiles obtained from an underlying aquitard.

1. Introduction

[2] In stochastic approaches to the solution of inverse problems, the unknown parameters are described through probability distributions. As such, estimation uncertainty is recognized and its importance can sometimes be determined. This is in contrast to deterministic approaches, where one seeks a single value for the unknown parameters.

[3] The geostatistical approach to inverse modeling is a stochastic, Bayesian approach. This general methodology has been used for a variety of environmental applications, including, among others, the estimation of hydraulic conductivity fields in groundwater systems based on hydraulic head measurements [Dagan, 1985; Rubin and Dagan, 1987a, 1987b, 1988, 1989; Rubin et al., 1992; Kitanidis, 1996; Zimmerman et al. 1998] and the identification of the contamination history based on the current distribution of a contaminant [Snodgrass and Kitanidis, 1997; Michalak and Kitanidis, 2002, 2003].

[4] In many environmental applications, measurements are sparse and imperfect. This information sparsity can lead to high uncertainty in the estimation of the unknown function. The ability to incorporate any additional available information, such as known constraints on parameter values, would improve the precision with which an unknown function can be identified.

[5] In environmental applications, methods for dealing with nonnegativity have been almost exclusively limited to (a) the application of data transformations to the original variable, yielding a transformed variable that is defined over an infinite domain but that corresponds to the original variable defined only in the nonnegative parameter range; and (b) the use of Lagrange multipliers in constrained optimization based on probability density functions that would otherwise allow the variables to become negative.

[6] Data transformations, a standard method in statistics, have been applied in interpolation and inverse problems in, for example, Kitanidis [1997, p. 70] and Snodgrass and Kitanidis [1997]. The most common of these is the power transformation, which is defined as

equation image

where s is the vector of values in the original domain, equation image is the transformed data vector, and κ is a constant selected based on the application. The commonly used logarithmic transformation is included as a special case of the power transformation, obtained at the limit of κ tending to zero. For example, it is common in hydrogeologic analysis to use the logarithm of the hydraulic conductivity field [see, e.g., Hoeksema and Kitanidis, 1985]. A few alternate data transformation methods have also been proposed for environmental optimization problems [see, e.g., Kauffmann and Kinzelbach, 1989].

[7] In general, however, data transformations render a linear inversion problem nonlinear and can lead to highly nonsymmetric probability density functions for the unknown parameter values in the untransformed space, which is unrealistic in some cases. Such difficulties have been documented for large-variance cases, for example, by Snodgrass and Kitanidis [1997]. Specifically, the confidence intervals tend to be unreasonably narrow for low values of the unknown function and wide for high values.

[8] The other popular method, the use of Lagrange multipliers for obtaining nonnegative best estimates of unknown functions, consists of expanding the original objective function f(s) into the Lagrange function

equation image

where s must satisfy the constraints gi(s) = bi or gi(s) ≥ bi, k is the total number of active constraints, and λ = (λ1, λ2, …, λk) are Lagrange multipliers [see, e.g., Gill et al., 1986]. The solution method involves setting the derivative of the Lagrange function with respect to sj and λi equal to zero. For inequality constraints, the Lagrange multipliers of the points corresponding to active constraints must be positive. If all that is required is a single estimate of the unknown function, then the use of Lagrange multipliers in conjunction with linear inversion methods can be justified as a means to make this estimate nonnegative. When confidence intervals or conditional realizations are needed, however, the use of this method raises some issues. Specifically, realizations obtained using the method of Lagrange multipliers will not be equiprobable samples from the assumed form of the posterior distribution of the unknown function, if this distribution did not itself implicitly assume nonnegativity.

[9] Contrary to these methods, the method presented in this paper is based on a prior probability density function that has a nonzero value only in the nonnegative parameter range. This pdf is derived based on the method of images applied to reflected Brownian motion. The probability density function of a particle undergoing Brownian motion is well known, and reflection at s = 0 can be enforced by using the method of images to modify the original pdf.

2. Objective

[10] The objective of this work is to develop a statistically rigorous method for enforcing nonnegativity in Bayesian inverse problems. The proposed method is to behave similarly to a Gaussian process with a linear variogram (or unrestricted Brownian motion) for parameter values significantly greater than zero, which would correspond to geostatistical inverse modeling (or cokriging) with a linear variogram model. This is desirable because the linear variogram model is one of the most commonly used models in geostatistical kriging and inverse modeling as it yields flat best estimates with confidence intervals that are symmetrical and not dependent on the absolute value of the parameter of interest.

[11] The work presented in this paper extends the stochastic interpolation model developed by A. M. Michalak and P. K. Kitanidis (manuscript in preparation, 2002) to Bayesian inverse modeling, and focuses on inverse problems where the unknown is a function of a single variable (e.g., time). We present a sample application of the new methodology to the estimation of the contamination history in an aquifer at Dover Air Force Base, Delaware, based on information available from cores taken from an underlying aquitard [Mackay et al., 1997; Ball et al., 1997; Liu and Ball, 1999].

3. Model Development

[12] Bayes' rule states that the posterior pdf of a state vector s given an observation vector z is proportional to the likelihood of the data given the state, times the prior pdf of the state. Symbolically:

equation image

In this context, prior and posterior are with respect to using the data z.

[13] In the geostatistical approach, the prior represents the assumed structure of the unknown function. The nonnegativity enforcing method is based on abandoning the multi-Gaussian prior pdf formulation that is implicit in linear geostatistics. Instead, a new pdf that implicitly enforces parameter nonnegativity is defined. The pdf used in the new methodology was developed based on the concept of reflected Brownian motion, as applied in the method of images. A more detailed derivation is given by Michalak and Kitanidis (manuscript in preparation, 2002).

[14] Brownian motion is a continuous time, continuous state space, simple Markov process. As such, the probability law governing transitions is stationary in time and the probability density function describing a parameter s at time t does not depend on the initial time [Karlin and Taylor, 1975, p. 340]. The partial differential equation satisfied by Brownian motion is

equation image

where p is the probability density function, and D is a constant coefficient. For the application that we are interested in, the probability density function will describe the distribution of source concentration values s, as a function of time t.

[15] The PDE defined in equation 4, subject to the boundary and initial conditions

equation image

where ti is initial time, ti+1 is a later time and δ(ssi) is the Dirac function, is satisfied by the probability density function:

equation image

where si+1 is located at a point ti+1, the parameter value si at point ti is known, and the vertical bar means “given”; for example p (si+1si) represents the probability of si+1 given si. This pdf is plotted in Figure 1a. It represents unrestricted Brownian motion, and is the one assumed when using the linear variogram model. A sample unconditional realization obtained using this distribution is presented in Figure 2a.

Figure 1.

Probability density functions for unrestricted and reflected Brownian motion with si = 2, variance D = 1, and separation distance (ti+1ti) = 1. (a) Gaussian probability density function (unrestricted Brownian motion). (b) Method of images derived probability density function (reflected Brownian motion).

Figure 2.

Sample unconditional realizations with variance D = 1. (a) Realization generated using linear variogram model. (b) Realization generated using nonnegativity enforcing probability density function.

[16] Although the defined pdf satisfies the governing equation, we are interested in defining a second pdf, which has the feature of having 100% of its probability in the range s ∈ [0, ∞). In other words, we want to change the boundary conditions to:

equation image

The solution is obtained by adding an image pdf, with a mean located at −si, and setting the probability of s < 0 equal to zero. This pdf is

equation image

and is plotted in Figure 1b. This pdf represents reflected Brownian motion about the boundary s = 0, and is the one that is used in the current work. A sample unconditional realization obtained using this distribution is presented in Figure 2b. As was demonstrated by Michalak and Kitanidis (manuscript in preparation, 2002), a sample unconditional constrained realization can be obtained by generating a realization of a simple Brownian motion with variance 2D (ti+1ti), and reflecting any negative components of the realization about the s = 0 axis.

[17] Returning to the Bayesian inference problem, we can now define the prior probability of the unknown function as

equation image

where m is the total number of unknown parameter values in the discretized state vector, and p (si+1si) is as defined in equation 8.

[18] The likelihood of the observations, which is defined in the same manner as in classical linear geostatistical inverse modeling (See Appendix A for a review of this method) can be expressed as

equation image

where z is an n × 1 vector of observations, s is an m × 1 “state vector” obtained from the discretization of the unknown function that we wish to estimate, and ∣ ∣ denotes matrix determinant. When the system is underdetermined, which is the case most of interest for the application of stochastic approaches, m > n. The vector r contains other parameters needed by the model function h (s, r). We assume that the measurement error ε has zero mean and known covariance matrix R. The covariance of the measurement errors used is

equation image

where σR2 is the measurement error variance and I is an n × n identity matrix. Note that, in this case, the measurement error encompasses both the actual observation error when data is collected, and any inaccuracies inherent in the physical model used to represent the problem.

[19] If the dependence of observations is linear in the unknown parameter s, then:

equation image

where H is an n × m sensitivity matrix. In this case, the full posterior pdf is, to within a normalizing constant:

equation image

4. Conditional Realizations and Estimation of Uncertainty

[20] One of the advantages of using a stochastic approach to inverse problems is that physically significant confidence intervals and conditional realizations can be obtained in addition to a best estimate of the unknown function. In the case of linear geostatistical inverse modeling, the posterior pdf of the unknown function is Gaussian, and can therefore easily be sampled.

[21] The prior pdf used in this work is not Gaussian, however, which also results in a non-Gaussian posterior pdf. As such, this posterior pdf does not lend itself to straightforward computation of confidence intervals or generation of conditional realizations. Therefore, a Markov chain Monte Carlo (MCMC) method was developed to sample the pdf to obtain conditional realizations. Ensemble properties of conditional realizations can then be used to infer other statistics of the unknown function, such as a best estimate and confidence intervals.

[22] MCMC methods allow for the sampling of probability density functions in multiple dimensions with computational effort that is manageable relative to performing the multi-dimensional integrations that would otherwise be required. The dimensionality of the posterior pdf is equal to the number of points in the discretized unknown function, and can therefore easily be on the order of hundreds.

[23] One of the methods falling into the MCMC category is the Gibbs sampling algorithm. In this approach, conditional realizations are generated by sequentially sampling the marginal (i.e., 1-dimensional) probability density function at each point in the discretized unknown function, while holding the values at all other points constant. This marginal pdf is defined using the most updated information available at each of the other points in the unknown function. It can be shown that, once the chain has converged, the realizations resulting from this process are equally likely realizations from the full multi-dimensional posterior pdf [see, e.g., Casella and George, 1992]. Examples of past applications of the Gibbs algorithm in a Bayesian context are given by, for example, Carlin and Louis [2000], Gelman et al. [1995], and Gamerman [1997].

4.1. Description of Gibbs Sampling Algorithm

[24] This section contains a description of the Gibbs sampling algorithm, as implemented in the new nonnegativity enforcing inverse modeling method. We define the lth conditional constrained realization as scc,l. In this context, a constrained realization is one that is everywhere nonnegative, and a conditional realization is one that has been conditioned on the data z. The chain can be initialized with any realization that has a nonzero posterior probability. We chose to initialize the chain with an arbitrary unconditional constrained realization suc,0 sampled from the prior distribution, ensuring quick convergence. The Gibbs sampler proceeds as follows (see, e.g., Casella and George [1992] for a more detailed discussion).

  1. Set initial values scc,0 = (suc,0(t1), suc,0(t2), …, suc,0(tm))T and initialize the iteration counter of the chain l = 1.
  2. Obtain a new conditional constrained realization scc,l = (scc,l(t1), scc,l(t2), …, scc,l(tm))T from scc,l−1 through successive generation of values from the marginal pdf at each point (note the use of counter values l and l − 1):
    equation image
    equation image
    equation image
    equation image
  3. Change counter l to l + 1 and return to step 2 until convergence is reached.

[25] When convergence is reached, the resulting realization scc,l is a conditional constrained realization from the full posterior pdf. Steps 2 and 3 can then be repeated to obtain additional conditional realizations. The chain is run until the probability space has been appropriately sampled. Convergence is evaluated by tracking the values of the two components (likelihood and prior) of the posterior probability of the realizations. When the running average of both components stabilizes, convergence has been reached.

4.2. Derivation of Marginal Posterior Probability Density Function

[26] In order to apply the Gibbs sampling algorithm as described in the previous section, a marginal pdf from which it is possible to draw samples of s at a single point ti is needed. We break this problem into two components, first deriving the marginal likelihood, then the marginal prior.

4.2.1. Marginal Likelihood of si

[27] The likelihood of the measurement, defined in equation 12, can be expressed as

equation image

Therefore, the marginal likelihood for a single point in the discretized unknown function s, if values at all other points are held constant, is

equation image
equation image

combining the n Gaussian distributions, we obtain

equation image

where

equation image
equation image
equation image

4.2.2. Marginal Prior of si

[28] The prior probability density function of the discretized unknown function s was defined in equations 8 and 9. Each interior point in the discretized unknown appears in two terms of the full prior distribution. The marginal prior pdf of a single point in the discretized unknown function is therefore,

equation image

where p(si+1si), p(sisi−1) and p(si+1si−1) are as defined in equation 8, and, in this case, s = {s1, …, si−1, si+1, …, sm}. This marginal prior can be rearranged to be:

equation image
equation image

where

equation image
equation image
equation image
equation image
equation image

[29] Points s1 and sm appear in only one term of the full prior distribution and their marginal priors are therefore simply p′(s2s1) and p′(smsm−1), respectively, as defined in equation 8.

4.2.3. Marginal Posterior of si

[30] Combining the marginal likelihood and prior, the full marginal posterior pdf of si in the nonnegative range (sI ≥ 0) can be expressed, to within a normalizing constant, as

equation image

for all interior points in the discretized unknown function, which can be expressed as the sum of four Gaussian distributions:

equation image

where,

equation image
equation image

4.2.4. Sampling of Marginal Posterior of si

[31] This formulation suggests an efficient method for generating realizations from the marginal distribution of si. We know that the realization will be drawn from the nonnegative portion of one of these four Gaussian distributions. If there were no constraints, the probability of drawing from each Gaussian would be proportional to the value of its Kj. Therefore, a uniformly distributed random number α in the range [0, 1] can be generated to choose a distribution, based on each Gaussian distribution's Kj value, normalized by the sum Σj=14Kj. We are still only interested in sampling the nonnegative portion of this chosen Gaussian distribution. Therefore, we draw a number from this distribution, and, if it is nonnegative, we keep it. Otherwise, we draw another random number α, and select one of the Gaussian distributions anew. Once we obtain a nonnegative sample, this realization of si is used as the next conditional constrained realization at point ti, denoted scc,l (ti).

[32] The overall algorithm for sampling the marginal pdf at interior points si therefore proceeds as follows.

  1. Generate a uniformly distributed random number α in the range [0, 1]. If α < K1, μ = μ1; if K1 < α < (K1 + K2), μ = μ2; if (K1 + K2) < α < (K1 + K2 + K3), μ = μ3; otherwise, μ = μ4.
  2. Generate a normally distributed random number γ with mean μ and variance τ (equation 30).
  3. If γ < 0, return to Step 1. Otherwise, scc,l (ti) = γ.

[33] For points s1 and sm, P1 = 1, and μP,1 = s2 and sm−1, respectively. In all other respects, the algorithm proceeds as for interior points.

5. Structural Parameter Optimization

[34] In order to apply the Gibbs sampler as outlined in the previous section, the values of the structural parameters D and σR2 must be known. This section covers the method employed to optimize these parameters for use with the nonnegativity enforcing method. If these parameters are known a priori, the algorithm outlined in this section is not needed.

[35] We have modified the prior pdf of the unknown function from unrestricted Brownian motion, corresponding to a geostatistical linear variogram model, to reflected Brownian motion, corresponding to the new nonnegativity enforcing methodology. Although these pdf's are significantly different in terms of the range within which the final estimates of the function lie, they describe similar patterns of expected variation of the values of the unknown parameter vector as a function of the separation distance between two points at which the unknown function is to be estimated.

[36] Because we are no-longer in the multi-Gaussian setting, however, the standard geostatistical procedure for finding the maximum likelihood estimate of the structural parameters (see Appendix A) is not applicable. Therefore, an iterative Expectation-Maximization (or EM) scheme is implemented to identify the optimal values of the structural parameters, in this case σR2 and D.

[37] The EM approach is a general iterative method for computing the mode of the marginal pdf of a parameter such as D or σR2 from the joint pdf of the parameter and s [Dempster et al., 1977; McLachlan and Krishnan, 1997]. The basic requirement is to have a method for generating conditional realizations with given structural parameters. The method presented in section 4 fulfills this requirement.

[38] In our case, the two parameters that we are optimizing only each appear in one term of the objective function. Therefore, we can optimize σR2 simply by using the likelihood portion of the objective function, and optimize D by using the prior pdf portion. The method proceeds as follows.

  1. Start with an initial guess of D and σR2, denoted D(0) and σR2(0). The optimal values obtained using the standard geostatistical approach with a linear variogram (see Appendix) are a good choice. Set counter k = 0.
  2. Generate a large number N of conditional constrained realizations of s, using D = D(k) and σR2 = σR2(k), using the Gibbs sampler as outlined in section 4.
  3. Set D(k+1) equal to the value that maximizes:
    equation image
    where p′(scc,l, D) is the prior pdf of the conditional realization, as defined in equation 9.
  4. Set σR2(k+1) equal to the value that maximizes:
    equation image
    where p (z, σR2scc, l) is the likelihood of the observations, as defined in equation 12.
  5. Set counter to k = k + 1. Return to step 2.

[39] The procedure continues until the iterations converge on the modes of the marginal pdf's of σR2 and D.

6. Application to Contaminant Source Identification at Dover Air Force Base, Delaware

[40] Interest in techniques aimed at identifying sources of environmental contaminants has been growing over the past several years. The ability to conclusively identify the source of observed contamination helps in the remediation process and can be critical to the identification of responsible parties. In this section, we present an application of the newly developed methodology to the estimation of the contamination history at Dover Air Force Base (DAFB), Delaware [Mackay et al., 1997; Ball et al., 1997; Liu and Ball, 1999].

6.1. Contaminant Source Identification Methods

[41] A large number of methods are currently available for contaminant source identification. Inverse methods are one subset that analyze the contamination distribution to determine either the prior location of observed contamination or the release history from a known source. A first subset of work in this category focuses on determining the values of a small number of parameters describing the source such as, for example, the location and magnitude of a steady state point source [Gorelick et al., 1983; Kauffmann and Kinzelbach, 1989; Butcher and Gauthier, 1994; Ala and Domenico, 1992; Dimov et al., 1996; Sonnenborg et al., 1996; Sidauruk et al., 1997]. Other work allows for a larger number of variables describing the source such as additional variables for the times at which the release began and ended [Wagner, 1992; Ball et al., 1997; Mahar and Datta, 1997; Sciortino et al., 2000]. A final subset of work uses a function estimate to characterize the source location or release history. In this case, the source characteristics are not limited to a small set number of parameters, but are instead free to vary in space and in time.

[42] This last category includes methods that use a deterministic approach and others, such as the method developed in this work, that offer a stochastic approach to the problem. Because there will always be uncertainty in contaminant concentration estimates, release history and release location, it makes sense to treat these quantities as random functions that can be described by their statistical properties. In this framework, estimation uncertainty is recognized and its importance can be determined. Deterministic approaches include Tikhonov regularization [Skaggs and Kabala, 1994, 1998; Liu and Ball, 1999; Neupauer et al., 2000], quasi-reversibility [Skaggs and Kabala, 1995], backward tracking [Bagtzoglou et al., 1991; Bagtzoglou and Dougherty, 1992], Fourier series analysis [Birchwood, 1999], nonregularized nonlinear least squares [Alapati and Kabala, 2000], the progressive genetic algorithm method [Aral et al., 2001], and the Marching-Jury Backward Beam Equation method [Atmadja and Bagtzoglou, 2001]. Stochastic approaches include geostatistical techniques [Snodgrass and Kitanidis, 1997; Michalak and Kitanidis, 2002, 2003], minimum relative entropy methods [Woodbury and Ulrych, 1996; Woodbury et al., 1998; Neupauer et al., 2000], and adjoint-derived source distribution probabilities [Neupauer and Wilson, 1999, 2001].

[43] One of the methods proposed in the past for the identification of the release history from a known contaminant source is the use of the geostatistical approach to inverse modeling. Snodgrass and Kitanidis [1997] estimated the release history for a point source of a conservative solute being transported in a 1-dimensional homogeneous domain, given point concentration measurements at some time after the release. Michalak and Kitanidis [2003] applied the method to the analysis of aquitard cores taken from the Dover Air Force Base in an attempt to estimate the perchloroethylene (PCE) and trichloroethylene (TCE) contamination history in the overlying aquifer. Michalak and Kitanidis [2002] extended the method to 3 dimensions and estimated the release history of 1,3-dioxane from the Gloucester Landfill in Ontario, Canada, based on downgradient concentration measurements.

6.2. Case Study

[44] We apply the method developed in this work to the analysis of aquitard cores taken from the DAFB, in an effort to infer the contamination history in the overlying aquifer. These data sets have previously been examined by Ball et al. [1997], Liu and Ball [1999], and Michalak and Kitanidis [2003]. Ball et al. [1997] assumed that the history was made up of one-step and two-step constant concentrations at the aquifer/aquitard interface and the times of step concentration changes were estimated from the data. Liu and Ball [1999] applied Tikhonov regularization to obtain a function estimate of the concentration history. Michalak and Kitanidis [2003] applied geostatistical inverse modeling with a cubic variogram to the data set, and developed a method for enforcing concentration nonnegativity in a geostatistical framework. Their overall objective function was still multi-Gaussian, however, and the method used for enforcing the constraint required some approximations.

6.3. Site and Data Description

[45] The research site is located at DAFB. At the site, an unconfined sand aquifer is underlain by an aquitard, which consists of two layers of distinctly different characteristics: an upper layer of orange silty clay loam (OSCL) and a bottom layer of dark gray silt loam (DGSL). Tetrachloroethylene (PCE) and trichloroethylene (TCE) are two principal chemical contaminants of the overlying aquifer contaminant plume, and concentration profiles for these chemicals have been obtained in the underlying aquitard at several locations. A detailed description of the site geology and hydrogeology is given by Mackay et al. [1997] and Ball et al. [1997]. A description of the sampling at the site is given by Liu and Ball [1999]. The data sets used for the analysis presented in this work are at locations referred to as PPC11 and PPC13.

[46] The soil core samples were also used to independently determine the sorption properties and porosity of the two aquitard layers [Ball et al., 1997]. The physical parameters as used by Ball et al. [1997] are presented in Table 1. Identical values were used by Michalak and Kitanidis [2003] and in the current work, in order to facilitate a direct comparison between the methods.

Table 1. Summary of Parameters in Two-Layer Aquitard
Physical DefinitionParameterUnitsLayer 1 (OSCL)Layer 2 (DGSL)
Effective diffusivityD (PCE)m2/s4.2 × 10−104.2 × 10−10
Effective diffusivityD (TCE)m2/s4.9 × 10−104.9 × 10−10
Retardation factorR (PCE)-245
Retardation factorR (TCE)-1.420
Porosityη-0.530.56
Bulk densityρbkg/L1.221.15

6.4. Physical Model

[47] Solute transport in this two-layer aquitard is mainly controlled by a diffusive process which is assumed to be mathematically described by the following differential equation:

equation image
equation image

where c1aq and c2aq are aqueous concentrations, R1 and R2 are retardation factors, and D1 and D2 are effective diffusion coefficients in layer 1 (OSCL) and layer 2 (DGSL), respectively. D1 and D2 should not be confused with D, the variance of the prior probability density function in the inverse model setup (see equation 8). L is the thickness of the first layer (OSCL) and is 0.74 m for location PPC11 and 0.91 m for location PPC13.

6.5. Inverse Model

[48] The system of equations described in the previous section has an analytical solution in 1-d, which will be useful for setting up the sensitivity matrix H. The solution, taken from Liu and Ball [1999], is

equation image

where c is the total concentration (sorbed and aqueous), x is the depth within the aquitard and T is measurement time. The source is a function of time and is expressed by s(t). The transfer function f(x, Tt) applies the appropriate weight to the source function:

equation image
equation image

where

equation image
equation image

and the subscripts 1 and 2 refer to parameter values in the upper and lower aquitard layers, respectively.

[49] Let xi, i = 1, …, n be the n points at which the measurements are taken, and let us discretize the time domain into m temporal points tj, j = 1, …, m, with a time step Δt. All measurements are taken at time t = T. In this case, the vector of observations z and that of the unknown function s we wish to estimate are:

equation image

and the sensitivity matrix is

equation image

[50] The available data were used with the model presented in this work to estimate the contamination history in the aquifer overlying the sampled aquitard. A total of four data sets were analyzed, corresponding to TCE and PCE data from the two sampling locations PPC11 and PPC13. Results obtained using the method developed in this paper are compared with those obtained using linear geostatistical inverse modeling with a linear variogram. The standard methodology used in applying the linear geostatistical model is summarized in Appendix A.

6.6. Results and Discussion

6.6.1. PCE

[51] The results presented in Figures 3 and 4 were generated by applying the unconstrained methodology to the PCE concentration profiles measured at PPC11 and PPC13, respectively. Figures 3a and 4a show the estimated boundary concentration with 95% confidence intervals, Figures 3b and 4b show sample conditional realizations, and Figures 3c and 4c show the actual concentration profiles in the aquitard cores taken at these locations along with the fitted concentrations resulting from the best estimate of the boundary concentration. Note that the jump in the concentration data shown in the aquitard is associated with the high retardation in the lower DGCL aquitard layer (see Table 1). Figures 5 and 6show the same information for the case where concentration nonnegativity is enforced.

Figure 3.

Results of source estimation from PCE data at location PPC11. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (27 October 1994). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 4.

Results of source estimation from PCE data at location PPC13. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (6 June 1996). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 5.

Results of source estimation from PCE data at location PPC11 with nonnegativity constraint. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (27 October 1994). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 6.

Results of source estimation from PCE data at location PPC13 with nonnegativity constraint. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (6 June 1996). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

[52] For all data sets, the structural parameters D in the prior pdf and the variance of the measurement error σR2 were optimized using the standard linear maximum likelihood approach for the unconstrained case (see Appendix A) and using the EM algorithm for the constrained case (see section 5). The optimal parameters are presented in Table 2. Time was discretized at one-month intervals. The constrained best estimates and confidence intervals were calculated from chains of 50,000 conditional realizations, generated using the Gibbs sampling algorithm as described in section 4.

Table 2. Optimal Structural Parameter Values
Structural ParameterSolution MethodPCETCE
PPC11PPC13PPC11PPC13
D [(μ/L)2 month−1]unconstrained106037.72.40 × 1062.08 × 106
D [(μ/L)2 month−1]nonnegative11457.50.59 × 1061.62 × 106
σR2 [(μg/kg)2]unconstrained6.474.152.17 × 1041.80 × 104
σR2 [(μg/kg)2]nonnegative10.94.493.89 × 1044.74 × 104

[53] Our results for location PPC11 are consistent with those presented by Liu and Ball [1999], indicating an increase in PCE concentrations in the aquifer in recent times. However, unlike in these previous results, the distribution is single-peaked when the solution is constrained to be nonnegative. Michalak and Kitanidis [2003] had also found double-peaked distributions for PCE at both locations. Therefore, it appears that the double peak is a product of using a regularization term that minimizes the second derivative of the estimates (this is both the case in Tikhonov regularization and in geostatistical inverse modeling with a cubic variogram). The regularization term used in the current method, as well as in linear geostatistical inverse modeling with a linear variogram, on the other hand, tends to minimize concentration differences over short periods. However, as can be seen from the conditional realizations presented in Figure 3a and 5a, a single-peaked best estimate does not necessarily imply that the actual contamination history was single-peaked. Instead, it implies that the data do not point to double peaking being a necessary feature of the contamination history.

[54] The timing of the recent peak in concentration is consistent both between the two data sets (PPC11 and PPC13) and also with results obtained by Liu and Ball [1999] and Michalak and Kitanidis [2003].

[55] As noted by Liu and Ball [1999], the boundary conditions estimated using Tikhonov regularization and those estimated using a two-step approach [Ball et al., 1997] were quite different, and yet reproduced the observed concentration profiles equally well. This point emphasizes the advantages of a stochastic approach, because the uncertainty associated with the estimated boundary conditions can be quantified. The ability to generate confidence intervals and conditional realizations greatly improves the ability to interpret obtained results.

[56] Overall, current results indicate that the diffusive process that led to the contamination of the aquitard, combined with the significant concentration measurement error, result in relatively wide confidence intervals about the estimated contamination history in the overlying aquifer. The introduction of additional information into the system in the form of a nonnegativity constraint, however, greatly reduced the width of the confidence intervals (approximately by a factor of 8 and 2 for locations PPC11 and PPC13, respectively). The results obtained by Liu and Ball [1999] fall within the obtained confidence intervals, but the conditional realizations presented in Figures 5 and 6 show the variety of concentration histories that may have led to the observed aquitard concentration profiles.

[57] In addition, the nonnegativity constraint has a regularizing effect on the local variance of the concentration history, as represented by the parameter D. Relative to the unconstrained case, this parameter is more consistent between sampling locations PPC11 and PPC13. Because we expect the contamination history that was experienced at the two locations to be similar in character, such similarity is consistent with our understanding of the problem.

6.6.2. TCE

[58] The results presented in Figures 7 and 8were generated by applying the unconstrained methodology to the TCE concentration profiles measured at PPC11 and PPC13, respectively. Figures 7a and 8a show the estimated boundary concentration with 95% confidence intervals, Figures 7b and 8b show sample conditional realizations, and Figures 7c and 8c show the actual concentration profiles in the aquitard cores taken at these locations, along with the fitted concentrations resulting from the best estimate of the boundary concentration. Figures 9 and 10 show the same information for the case where concentration nonnegativity is enforced.

Figure 7.

Results of source estimation from TCE data at location PPC11. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (27 October 1994). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 8.

Results of source estimation from TCE data at location PPC13. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (6 June 1996). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 9.

Results of source estimation from TCE data at location PPC11 with nonnegativity constraint. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (27 October 1994). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

Figure 10.

Results of source estimation from TCE data at location PPC13 with non-negativity constraint. (a) Estimated time variation of boundary concentration at the interface between the aquifer and aquitard. The end time represents the sampling date (6 June 1996). (b) Sample conditional realizations of boundary concentration at the interface between the aquifer and aquitard. (c) Measurement data and fitted concentrations resulting from the estimated boundary conditions.

[59] As with the PCE data, the addition of the nonnegativity constraint greatly improves the results, in the sense that the confidence intervals become narrower (approximately by a factor of 6 and 4 for locations PPC11 and PPC13, respectively) without the introduction of any additional assumptions. The results obtained by Liu and Ball [1999] fall within these intervals for location PPC13, but this earlier work had indicated a double peak for location PPC11. This second peak had been inconsistent with results at location PPC13, and does not appear in the current results, or those obtained by Michalak and Kitanidis [2003]. The timing and level of the recent peak in the TCE concentration in the overlying aquifer is generally consistent among the three studies.

7. Summary and Conclusions

[60] The method presented in this paper provides a fully statistically rigorous method for enforcing function nonnegativity in a Bayesian stochastic inverse modeling framework. As such, the method allows for the identification of a best estimate of an unknown function, as well as various other function statistics such as confidence intervals.

[61] The method offers several advantages relative to linear cokriging (also known as linear geostatistical inverse modeling): (1) The best estimates and their corresponding confidence intervals remain in the nonnegative range, because they are defined using a prior pdf that is nonzero only in the nonnegative parameter range. (2) The method allows for the generation of physically reasonable conditional realizations. (3) The method utilizes additional information (i.e., nonnegativity), resulting in narrower confidence intervals relative to cokriging, especially in areas close to the constraint boundary.

[62] Furthermore, unlike nonnegativity enforcing methods that have been applied in the past, the method does not impose any undesirable characteristics on the solution, such as absolute value dependent and highly asymmetrical confidence intervals in the case of power transformations.

[63] The presented application demonstrates the applicability to field data of a stochastic inverse modeling technique based on geostatistical principles. Furthermore, the robustness of a geostatistical approach when applied to a nonuniform domain is demonstrated. Results show that the history of contamination overlying the sampled aquitard can be estimated with reasonable precision. This precision is greatly improved by the incorporation of the additional information provided by the nonnegativity constraint.

[64] Finally, there are various possible extensions to the method presented in this paper that are beyond the scope of the current work. One such extension is the incorporation of an upper constraint on the possible function values (such as, for example, a solubility limit in the case of concentration) in addition to the nonnegativity constraint. This idea was also discussed by Michalak and Kitanidis (manuscript in preparation, 2002). Such an application would require the consideration of an infinite number of image pdf's in developing the prior probability density function to be used in the solution of the inverse problem, similarly to the case where two parallel boundaries are considered in a groundwater flow or transport problem. Only a manageable small number of these images would have a physically significant effect on the solution, however.

Appendix A:: Linear Geostatistical Approach to Inverse Modeling

[65] Where used, linear geostatistical inverse modeling is performed according to the standard procedure briefly described herein. No derivations are provided, and the reader is referred to, for example, Kitanidis and Vomvoris [1983], Hoeksema and Kitanidis [1984], and Kitanidis [1996] for additional details and background.

[66] The objective is to estimate an unknown function. The standard estimation problem may be expressed in the following form:

equation image

where z is an n × 1 vector of observations and s is an m × 1 “state vector” obtained from the discretization of the unknown function that we wish to estimate. The vector r contains other parameters needed by the model function h(s, r). The measurement error is represented by the vector ε. Following geostatistical methodology, s and ε are represented as random vectors. For the linear case, the function h(s, r) can be written as

equation image

where H is a known matrix. We assume that ε has zero mean and known covariance matrix R. The covariance of the measurement errors used is

equation image

where σR2 is the variance of the measurement error, and I is an n × n identity matrix.

[67] We model s, the unknown function, as a random vector with expected value

equation image

where Y is a known m × p matrix and β are p unknown drift coefficients. For the linear variogram, a constant mean is assumed. Thus

equation image

and β is the mean of the unknown function, an unknown scalar.

[68] The prior covariance function of s is

equation image

where Q (θ) is a known function of unknown parameters θ. In the case of the linear variogram, the generalized covariance matrix Q can be written as

equation image

where ∣titj∣ is the separation distance between two points at which the function s is to be estimated.

[69] The approach used to obtain the structural parameters, in our case D and σR2, is detailed by Kitanidis [1995]. In short, the parameters are estimated by maximizing the probability of the measurements

equation image

where ∣ ∣ denotes matrix determinant, and

equation image
equation image

[70] Once these parameters have been estimated, the posterior probability density of the unknown vector s is defined as

equation image

The corresponding system of equations that allows one to obtain the best estimate and covariance of s is

equation image

where Λ is a m × n matrix of coefficients and M is a p × m matrix of multipliers. The best estimate of the function is

equation image

and its posterior covariance is

equation image

[71] Using geostatistical methodology, it is also possible to generate realizations of the unknown function that are conditional on all the observations. The procedure for generating conditional realizations is discussed by Gutjahr et al. [1994] and Kitanidis [1995]. First, an unconditional unconstrained realization suu,l is generated, where l serves as a reminder that there is an infinite number of possible realizations. A realization of the error vector εl must also be independently generated with zero mean and covariance R. Then, a conditional unconstrained realization scu,l may be found by minimizing

equation image

with respect to scu, l. Here

equation image

Acknowledgments

[72] This research was partially funded by the Natural and Accelerated Bioremediation Research (NABIR) program, Biological and Environmental Research (BER), U.S. Department of Energy (grant DE-FG03-00ER63046). We would like to thank Dr. Chongxuan Liu for providing us with the DAFB data used in this study, and Prof. Peter Glynn for his valuable input.

Ancillary