Estimation of historical groundwater contaminant distribution using the adjoint state method applied to geostatistical inverse modeling



[1] As the incidence of groundwater contamination continues to grow, a number of inverse modeling methods have been developed to address forensic groundwater problems. In this work the geostatistical approach to inverse modeling is extended to allow for the recovery of the antecedent distribution of a contaminant at a given point back in time, which is critical to the assessment of historical exposure to contamination. Such problems are typically strongly underdetermined, with a large number of points at which the distribution is to be estimated. To address this challenge, the computational efficiency of the new method is increased through the application of the adjoint state method. In addition, the adjoint problem is presented in a format that allows for the reuse of existing groundwater flow and transport codes as modules in the inverse modeling algorithm. As demonstrated in the presented applications, the geostatistical approach combined with the adjoint state method allow for a historical multidimensional contaminant distribution to be recovered even in heterogeneous media, where a numerical solution is required for the forward problem.

1. Introduction

[2] When determining the effect of historical groundwater contamination, the distribution of a plume at a given point back in time is often required to establish exposure of wells or individuals to the contaminant. For example, in the case of Woodrow Sterling et al. versus Velsicol Chemical Corporation [1986], a class of people who owned property in the vicinity of a chemical waste burial site sought damages for personal injury and damages to their property suffered when water in their home wells became contaminated by hazardous chemicals escaping from Velsicol's site. Velsicol admitted that some of the wells were contaminated with chemicals from its waste burial site, but disputed that all members of the class had been exposed and did not agree with the plaintiffs as to the intensity and duration of exposure. Therefore the case centered on estimating the past distribution and concentration of the chemical plume, in order to determine concentrations in the plaintiffs' wells at given times [Michalak, 2001]. Emerging inverse modeling methods can be applied to solve such problems.

[3] One set of inverse methods is based on geostatistical principles and allows for the estimation of unknown functions based on the dual criterion of reproducing available observations while maintaining an assumed correlation structure. Methods falling under this category have been used for some time for estimating subsurface hydraulic conductivity or transmissivity distributions based on hydraulic head and other data [e.g., Kitanidis and Vomvoris, 1983; Kitanidis, 1995; Zimmerman et al., 1998]. More recently, these types of methods have also been applied to contaminant release history identification in groundwater systems [Snodgrass and Kitanidis, 1997; Michalak and Kitanidis, 2002, 2003, 2004]. In this paper the geostatistical method is extended to the estimation of the antecedent distribution of a contaminant at a given point back in time, making it applicable to cases such as the one described.

[4] In the geostatistical approach to inverse modeling, the solution involves the calculation of a sensitivity matrix relating each point in the discretized unknown function to each observation, which typically requires one forward run for each point in the discretized unknown function. Because inverse problems associated with groundwater systems are typically strongly underdetermined, in the sense that the number of points in the discretized unknown function m is greater than the number of available measurements n, the computational cost of calculating the sensitivity matrix can be prohibitive. This is especially true when the function to be estimated is itself multidimensional. In this work the adjoint state method is used to efficiently populate the full sensitivity matrix by solving a series of adjoint problems instead of the traditional approach of solving a series of forward problems. The combination of the adjoint and geostatistical methodologies makes the identification of a multidimensional contaminant distribution in a heterogeneous domain feasible.

[5] Note that throughout this paper, we will use the term “historical contaminant distribution” to describe the plume at a single, given point in the past. We avoid using the term prior distribution so as to prevent confusion with the terms “prior” and “posterior,” which have a different, very specific definition in the context of Bayesian inverse modeling. Also, although in the presented applications the historical distribution will be recovered for a single point in time, the presented algorithm could directly be applied for a series of times, yielding a time-dependent description of the history of a plume.

2. Background

[6] Section 2.1 discusses methods used for the solution of inverse problems aimed at identifying the history of contamination in groundwater systems. These methods include algorithms for estimating the release history of a known source, identifying the location of sources, and recovering the historical distribution of a contaminant. Section 2.2 presents a brief review of past applications of the adjoint state method to groundwater systems.

2.1. Inverse Modeling Methods for Identifying History of Contamination

[7] Inverse methods are one set of tools that can be used to investigate the history of groundwater contamination. Such methods use modeling and statistical tools to determine the historical distribution of observed contamination, the location of contaminant sources, or the release history from a known source.

[8] A first subset of inverse methods focuses on determining the values of a small number of parameters describing the source of a contaminant such as, for example, the location and magnitude of a steady state point source [Gorelick et al., 1983; Kauffmann and Kinzelbach, 1989; Butcher and Gauthier, 1994; Ala and Domenico, 1992; Dimov et al., 1996; Sonnenborg et al., 1996; Sidauruk et al., 1997]. Other works allow for describing the source using more parameters, such as additional variables for the times at which the release began and ended [Wagner, 1992; Ball et al.,1997; Mahar and Datta, 1997; Sciortino et al., 2000]. A third subset deals with the identification of the location or release time of instantaneous point sources, but offers a probabilistic solution to the problem. These methods include backward tracking [Bagtzoglou et al., 1991; Bagtzoglou and Dougherty, 1992] and adjoint-derived source distribution probabilities [Neupauer and Wilson, 1999, 2001, 2002]. A final subset of work uses a function estimate to characterize the historical contaminant distribution, source location, or release history. In this case, the contaminant distribution or source description is not limited to a small set number of fixed parameters but can instead vary in space and/or in time.

[9] This last category includes methods that use a deterministic approach and others that offer a stochastic approach to the problem. The method developed in this work, for example, offers a stochastic function estimate of the historical distribution of a contaminant. In stochastic approaches, parameters are viewed as jointly distributed random fields that can be described by their statistical properties. In this framework, estimation uncertainty is recognized and its importance can sometimes be determined. Deterministic approaches include Tikhonov regularization [Skaggs and Kabala, 1994, 1998; Liu and Ball, 1999; Neupauer et al., 2000], quasi-reversibility [Skaggs and Kabala, 1995; Bagtzoglou and Atmadja, 2003], Fourier series analysis [Birchwood, 1999], nonregularized nonlinear least squares [Alapati and Kabala, 2000], the progressive genetic algorithm method [Aral et al., 2001], and the marching-jury backward beam equation method [Atmadja and Bagtzoglou, 2001; Bagtzoglou and Atmadja, 2003]. Stochastic approaches include geostatistical techniques [Snodgrass and Kitanidis, 1997; Michalak and Kitanidis, 2002, 2003, 2004] and the minimum relative entropy method [Woodbury and Ulrych, 1996; Woodbury et al., 1998; Neupauer et al., 2000].

[10] The methods presented in the previous paragraph have been applied to a range of problems, summarized in Table 1. The majority of applications are related to the estimation of the release history or contamination history in homogeneous one-dimensional domains. A few authors have tackled nonuniform domains with a small number of zones with deterministically varying properties [Liu and Ball, 1999; Atmadja and Bagtzoglou, 2001; Michalak and Kitanidis, 2003, 2004]. A smaller number have dealt with fully heterogeneous one-dimensional [Bagtzoglou and Atmadja, 2003] and multidimensional [Aral et al., 2001] domains. These heterogeneous applications, however, have been limited to deterministic approaches. Also, the number of stochastic applications in multidimensional domains is still very small [Woodbury et al., 1998; Michalak and Kitanidis, 2002] and all these applications have been for homogeneous cases. Furthermore, all the work published thus far has dealt with what were essentially one-dimensional sources or distributions, estimating the past distribution of a contaminant in a one-dimensional medium, or the release history of a point source, a uniform patch source, or an interfacial source (see Table 1). None of the methods has been used to estimate a multidimensional past distribution of a contaminant.

Table 1. Applications of Inverse Modeling Methods That Provide a Function Estimate of the History of Contamination
MethodReferenceEstimated FunctionSource TypeDomainDimensionsSite
  • a

    Dover Air Force Base, Delaware.

  • b

    Gloucester Landfill, Ottawa, Ontario, Canada.

Deterministic Approaches
Tikhonov regularizationSkaggs and Kabala [1994]release historypointhomogeneous1-Dhypothetical
Skaggs and Kabala [1998]release historypointhomogeneous1-Dhypothetical
Liu and Ball [1999]concentration historyinterfacenonuniform1-DDAFBa
Neupauer et al. [2000]release historypointhomogeneous1-Dhypothetical
Quasi-reversibilitySkaggs and Kabala [1995]release historypointhomogeneous1-Dhypothetical
 Bagtzoglou and Atmadja [2003]historical distributionpointheterogeneous1-Dhypothetical
Spectral analysisBirchwood [1999]location and time of rectangular pulsepointhomogeneous1-Dhypothetical
Nonregularized nonlinear least squaresAlapati and Kabala [2000]release historypointhomogeneous1-Dhypothetical
Progressive genetic algorithm methodsAral et al. [2001]release history and source locationpointheterogeneous3-Dhypothetical
Marching-jury backward beam equation methodAtmadja and Bagtzoglou [2001]release historypointnonuniform1-Dhypothetical
Bagtzoglou and Atmadja [2003]historical distributionpointheterogeneous1-Dhypothetical
Stochastic Approaches
Minimum relative entropyWoodbury and Ulrych [1996]release historypointhomogeneous1-Dhypothetical
Woodbury et al. [1998]release historypatch or pointhomogeneous3-DGLb and hypothetical
Neupauer et al. [2000]release historypointhomogeneous1-Dhypothetical
Geostatistically based methodsSnodgrass and Kitanidis [1997]release historypointhomogeneous1-Dhypothetical
Michalak and Kitanidis [2004]concentration historyinterfacenonuniform1-DDAFBa
Michalak and Kitanidis [2002]release historypatchhomogeneous3-DGLb
Michalak and Kitanidis [2003]concentration historyinterfacenonuniform1-DDAFBa
present workhistorical distributionanyheterogeneous3-Dhypothetical

[11] The relatively limited application of methods that provide a function estimate of contaminant sources or distributions to more complex problems is likely due to two factors. First, some of the methods derived for specific applications are not easily extendable to multiple dimensions or to heterogeneous media. Several methods are also only applicable to the identification of release histories at a point but not historical distributions or vice versa. The second factor is the computational cost associated with solving more complex problems, as many of the listed methods require a large number of numerical runs for each solution of an inverse problem.

[12] The current work will for the first time allow for the recovery of the historical multidimensional spatial distribution of a contaminant. Furthermore, this will be done in a heterogeneous domain.

2.2. Application of Adjoint State

[13] The most common application of the adjoint state method in environmental applications is in improving the computational efficiency of sensitivity analyses. This sensitivity information can then be used for a variety of other purposes, as will be discussed later. Properly formulated, the adjoint state method allows for the sensitivity to a total of m parameters (or, alternately, m points in a discretized unknown function) to be determined through the solution of n flow or transport problems, where n is the number of available observations. Without the use of the adjoint state method, the number of required problem solutions is of the order of m. Clearly, the relative efficiency of the method increases as m becomes significantly greater than n.

[14] Examples of applications of the adjoint state method to sensitivity analyses in groundwater hydrology abound in the literature. The adjoint state method has been used with the groundwater flow equation for sensitivity analysis for parameter estimation [Sykes et al., 1985; Townley and Wilson, 1985; Wilson and Metcalfe, 1985; Lu et al., 1988; Yeh and Sun, 1990], for inverse modeling to obtain an estimate of the hydraulic conductivity distribution [Vemuri and Karplus, 1969; Neuman, 1980; Neuman et al., 1980; Sun and Yeh, 1990a, 1990b; Sun, 1994; McLaughlin and Townley, 1996; Cirpka and Kitanidis, 2000], for groundwater remediation design [Ahlfeld et al., 1988], and for pumping test design [Yeh and Sun, 1990]. Although most of the work has concentrated on the groundwater flow equation, the adjoint state method has also been applied in conjunction with the advection dispersion equation for solute transport in groundwater. Tracer or solute information was used by Ahlfeld et al. [1988] in the design of groundwater remediation. Cirpka and Kitanidis [2000] used temporal moments of tracer data to aid in inverse modeling to estimate the hydraulic conductivity distribution.

[15] The adjoint state method has also been applied in the solution of forensic groundwater problems in a few instances. Dimov et al. [1996] derived the adjoint state equation for the nonreactive advection-dispersion equation with a source term in one dimension. Their method uses a numerical integration technique to evaluate a linear functional arising from the solution of the adjoint problem. The authors applied the method to the solution of two specific source identification problems: (1) the identification of the set of locations where a point source can be located such that the concentration at a given downstream point at a given time does not exceed a given maximum concentration, and (2) the identification of the location and strength (in terms of contaminant flux) of a constant point source based on two downstream measurements. Neupauer and Wilson [1999] also derived the one-dimensional adjoint state formulation for the advection-dispersion equation with sources and sinks. The authors applied the solution to the derivation of backward-in-time location and travel-time probabilities of a contaminant plume emanating from a single instantaneous point source. Neupauer and Wilson [2001] extended the method to three dimensions and applied it to a synthetic two-dimensional, homogeneous test case. The inverse methods of Dimov et al. [1996] and Neupauer and Wilson [1999, 2001], however, are only directly applicable to the identification of a single point source.

3. Objectives

[16] The work presented in this paper has several objectives. First, this work extends the geostatistically based contaminant source release history identification methods developed by Snodgrass and Kitanidis [1997] and Michalak and Kitanidis [2002, 2003, 2004] to the identification of the historical multidimensional spatial distribution of a contaminant, and demonstrates the new method's applicability in a fully heterogeneous medium. Second, the adjoint formulation of the advection-dispersion equation is applied to improve the computational efficiency of solving underdetermined inverse problems, thereby allowing for more complex problems to be examined. Third, the adjoint formulation is implemented in a manner that allows for encapsulation of groundwater flow and contaminant transport codes. Existing models are thereby reused as modules without requiring modification to the models themselves. Fourth, the developed methods are tested using three hypothetical, two-dimensional case studies involving the recovery of the historical distribution of a groundwater contaminant. The first is an idealized case in a homogeneous domain, with a fine sampling array and small measurement error. The second and third examples involve a homogeneous and a deterministically heterogeneous aquifer, respectively, with more realistic sampling configurations and introduced errors. The third example also represents the first application of a stochastic inverse modeling method in a fully heterogeneous domain. The heterogeneous application uses MODFLOW [McDonald and Harbaugh, 1988; Harbaugh and McDonald, 1996] for the solution of the flow problem and MT3DMS [Zheng, 1990; Zheng and Wang, 1999] for the solution of the transport problem. These codes were used as external modules by the adjoint and inverse methods.

4. Geostatistical Approach to Estimating the Historical Distribution of a Contaminant

[17] Geostatistical inverse modeling follows a Bayesian approach. Bayes' theorem states that the posterior probability density function (pdf) of a state vector s given an observation vector z is proportional to the likelihood of the state given the data, times the prior pdf of the state. Symbolically,

equation image

where the vertical bar means “given.” In this context, prior and posterior probability density functions are with respect to using the data z. In the geostatistical approach the prior represents the assumed spatial or temporal correlation structure of the unknown function, as described by a covariance function. The likelihood of the data represents the degree to which an estimate of the unknown function s reproduces the available data z.

[18] Overall, the objective is to estimate the unknown function s. The standard estimation problem may be expressed in the form

equation image

where z is an n × 1 vector of observations and s is an m × 1 state vector obtained from the discretization of the unknown function. Whereas in past applications of the geostatistical approach to inverse modeling s represented the release history from a known source [Snodgrass and Kitanidis, 1997; Michalak and Kitanidis, 2002, 2003, 2004], in the case examined here s is the spatial distribution of a contaminant at a previous time Ta. The vector z contains the available groundwater concentration measurements. The vector r contains other parameters needed by the model function h(s, r). The measurement error is represented by the vector ɛ. This error encompasses both the actual measurement error associated with collecting the data and any random numerical or conceptual inaccuracies associated with the evaluation of the function h(s, r).

[19] When the function h(s, r) is linear in the unknown s, as will be the case in the applications presented in this work, the function h(s, r) can be written as

equation image

where H is a known n × m matrix, the Jacobian representing the sensitivity of the observations to the function s (i.e., Hi,j = ∂(zi − ɛi)/∂sj). In the case of the identification of the historical distribution of a contaminant, H represents the sensitivity of available observations to the concentration of the contaminant at given spatial locations and single previous time. The components of H could be obtained numerically by performing one run of a groundwater transport model for each component of s. When s is discretized finely or when it varies in multiple dimensions, the computational cost quickly becomes prohibitive. This is the issue that will be addressed by the implementation of the adjoint state method in the next section.

[20] Following geostatistical methodology and returning to equation (2), s and ɛ are represented as random vectors. We assume that ɛ has zero mean and known covariance matrix R. The covariance of the measurement errors that will be used is

equation image

where σR2 is the variance of the measurement error and I is an n × n identity matrix. We model s, the unknown function, as a random vector with expected value

equation image

where Y is a known m × p matrix and β are p unknown drift coefficients that can represent the mean of the process as well as linear and/or nonlinear dependence on auxiliary variables. For example, for a linear drift in two dimensions, p = 3,

equation image

and β represents the mean and trend of the unknown function, such that at a point (x1,i, x2,i) the a priori expected value of the function s is

equation image

The prior covariance function of s is

equation image

where Q(θ) is a known function of unknown parameters θ. This function represents the correlation between the historical contaminant concentration at various points, which, for most models, decays as the separation distance between the points increases. In the case of a cubic generalized covariance function (GCF) in two spatial dimensions, which is the function that will be used in the presented applications, the covariance matrix can be written as

equation image

where (x1,i, x2,i) and (x1,j, x2,j) are the x1 and x2 coordinates of the ith and jth locations at which the contaminant distribution is to be estimated and h is the separation distance between these locations.

[21] The method used to obtain the structural parameters, in our case θ and σR2, follows a restricted maximum likelihood approach, as detailed by Kitanidis [1995]. In short, the parameters are estimated by maximizing the probability of the measurements

equation image

where ∥ denote matrix determinant and

equation image
equation image

[22] Once these parameters have been estimated, and returning to the Bayesian notation outlined in equation (1), the posterior probability density of the unknown vector s is Gaussian:

equation image

where the first term represents the likelihood and the second term represents the prior probability density function of s. The system of equations that allows us to obtain the best estimate and posterior covariance of s is [e.g., Michalak and Kitanidis, 2003]

equation image

where Λ is a m × n matrix of coefficients and M is a p × m matrix of multipliers. The best estimate of the function is

equation image

and its posterior covariance is

equation image

The diagonal elements of Vequation image represent the posterior variance of individual elements of equation image.

[23] In short, once the form of the prior covariance model has been selected, the values of the required structural parameters as well as the measurement error variance can be optimized using a restricted maximum likelihood approach. The inverse problem can then be solved by formulating a set of n + p algebraic equations to obtain a best estimate for the contaminant distribution, equation image, as well as an estimate of its posterior covariance, Vequation image. Conditional realizations, which are equally likely realizations of the historical contaminant distribution s, can also be generated [e.g., Michalak and Kitanidis, 2003].

5. Adjoint State Formulation and Implementation

[24] In the solution of inverse problems, the number of observations is often significantly lower than the number of estimate locations (i.e., nm). In such cases, the application of the adjoint state method can significantly reduce the cost of computing the Jacobian H. Note that adjoint methods have traditionally been used primarily for nonlinear sensitivity analyses to a small number of parameters. We are instead interested in deriving the sensitivity to a spatially variable function s in a linear system.

[25] Note that in the remainder of this paper, x denotes the spatial coordinate, whereas X denotes locations at which measurements are taken. Furthermore, t denotes the temporal coordinate, t = Tb is the time at which measurements are taken, and t = Ta is the time for which the contaminant distribution is to be estimated.

5.1. Multidimensional Advection Dispersion Equation

[26] A generic form of the advection-dispersion equation for solute transport is

equation image

where repeated index notation is used, t is time, xi are the spatial directions (i = 1, 2, 3), x = (x1, x2, x3), C is resident concentration, η is porosity for porous media and unity for other cases, Dij is the i, jth entry of the dispersion tensor, vi is the fluid velocity in the direction of xi, qs is the source flow rate per unit volume, Cs is the source strength in mass per unit volume, and qo is the sink flow rate per unit volume. The initial conditions are

equation image

where Ta is the time at which the initial condition is specified and Ca is the concentration distribution at that time. The possible boundary conditions are listed in Table 2, where Γ1, Γ2, Γ3 are subsets of the domain boundaries, ni is the outward unit normal vector in the xi direction, qD is a dispersive mass flux per unit volume, and equation image is a specified concentration. The first two terms on the right-hand side of equation (17) represent the divergence of the dispersive and advective mass fluxes, respectively. The advection-dispersion operator is

equation image

In our application, the initial condition in equation (18) represents the time for which the contaminant distribution is to be estimated, regardless of when the contaminant was originally released. As such, the initial condition does not necessarily represent the time at which the contaminant was introduced into the aquifer. We will be defining simulations that will allow us to compute the sensitivity of the available observations z to the unknown distribution of the contaminant in a given area, at a given time in the past Ta.

Table 2. Possible Boundary Conditions for Solute Transport
FirstDirichletfixed concentrationC (x, t) = equation imageon Γ1
SecondNeumannfixed dispersive fluxDij(∂C/∂xj)ni = qDon Γ2
ThirdCauchyfixed total mass fluxviC − ηDij(∂C/∂xj)]ni = ηviequation imagenion Γ3

5.2. Adjoint State Formulation

[27] A sensitivity analysis approach [e.g., Sykes et al., 1985] can be used to derive the adjoint of the advection-dispersion equation presented in equation (17). Neupauer and Wilson [2001] presented such a derivation, and their steps are summarized here, modified as needed for the current application.

[28] A performance measure P that quantifies some state of the system is defined as

equation image

where ζ(s, C) is a functional of the state of the system, s is a parameter or set of parameters that we are interested in estimating, C is resident concentration for solute transport, Ω is the spatial domain, and integration is over the entire space-time domain. In the derivation of Neupauer and Wilson [2001], s was the strength of an instantaneous point source. In our case, s is the solute concentration distribution in a region of interest Ωa at some point in the past, Ta. The performance measure P is the predicted concentration C(X,T) at an observation location, and the function ζ(s,C) is defined accordingly.

[29] The marginal sensitivity of this performance measure with respect to a parameter s is obtained by differentiating equation (20):

equation image

where dP/ds is the marginal sensitivity that we are interested in and ψ is the state sensitivity, ψ = ∂C/∂s. Because the state sensitivity ψ is unknown, adjoint theory is used to eliminate it from equation (21), and the marginal sensitivity is obtained in terms of the adjoint state.

[30] Differentiating the governing equation (17) with respect to a distributed parameter s to obtain the governing equation in terms of the state sensitivity, ψ, and assuming that the boundary conditions, porosity, dispersion tensor, fluid velocity, and source and sink flow rates do not depend on the solute distribution at the time of interest, we obtain

equation image

where ψ has homogeneous boundary conditions and, because we defined s as the distribution of Ca in the region Ωa, the initial condition becomes

equation image

[31] Taking the product of each term in equation (22) with an arbitrary function ψ* (the adjoint state), integrating over time and space, adding this equation to the right-hand side of equation (21) and integrating by parts yields [Neupauer and Wilson, 2001]

equation image

[32] Because ψ* is not defined at this stage, we can prescribe its properties in a manner that is most convenient to our goal of eliminating ψ from equation (24). The second term in the integral can be eliminated by defining an appropriate governing equation for ψ*. The remaining terms that contain ψ are the spatial and temporal divergence terms. Integrating the temporal divergence term over the time domain, applying Gauss's divergence theorem to the spatial divergence terms, and substituting the initial and boundary conditions on ψ, it can be shown that the remaining terms containing ψ vanish if the final condition on ψ* is set to ψ* (x, Tb) = 0, and the boundary conditions on ψ* are homogeneous on Γ1, Γ2, and Γ3 [Neupauer and Wilson, 2001]. Specifying ψ* in this way and defining backward time as τ = Tbt, the adjoint of the governing operator and its initial and boundary conditions are

equation image
equation image

where we assume steady flow (i.e., equation image · ηv = equation imaged = qsq0, and ηviequation imageq0ψ* = equation imageviψ*) − qsψ*). In equation (25), L* [ ] is the adjoint operator of equation (19) and ψ* is the adjoint state. Note that the differences between the governing equation and its adjoint are that the signs on the first-derivative terms are reversed. In addition, the form of Dirichlet (first-type) boundary condition remains unchanged, the adjoints of Neumann (second-type) boundary conditions are Cauchy (third-type) boundary conditions, and vice versa. For our setup the marginal sensitivity of the performance measure in equation (24) simplifies to

equation image

According to equation (23), given that in our setup s is the concentration distribution in the subdomain Ωa at time Ta, this equation simplifies to

equation image

where the integral of ψ* is over the subdomain Ωa, because ψ(x ,Ta) = 0 everywhere else.

5.3. Adjoint State Source Terms

[33] In our setup the performance measure P is C(X, 0), the solute resident concentration at a measurement location defined in three spatial directions X = (X1, X2, X3), and τ = 0. The load term for the adjoint state, ∂ζ/∂C, is defined such that the integral of the performance functional ζ evaluates to the observation value. Assuming a point measurement, ζ is given by

equation image

The Dirac delta function in time causes the integral to be evaluated only at τ = 0. Using this ζ, the resulting governing equation for the adjoint state is

equation image

[34] Having defined the governing equation, boundary conditions, and source terms on ψ*, the only remaining task is to derive H from the results of the adjoint runs. Given the form of the performance functional ζ (s,C) in equation (29), it is clear that the direct contribution to the marginal sensitivity as defined in equation (28) (i.e., ∂ζ(s,C)/∂s) is zero. In a discretized domain the individual contaminant regions that we are interested in, Ωa, are simply the grid cells within the area where the historical contaminant distribution is to be estimated. Therefore, for each observation location, one adjoint run is performed using a source term as defined in equation (30), and the marginal sensitivities of this observation to the discretized unknown contaminant distribution s are defined simply by ψ* (x, τ = TbTa), where x are the grid points at which s is to be estimated. An adjoint run with a source term at the location of observation Ca thus defines one full row of the H matrix, Hi,j = 1.m.

5.4. Implementation

[35] Many general purpose codes as well as case or site-specific models are available for the solution of the advection-dispersion equation. Although implementing inverse methods that provide function estimates of sources or historical distributions has up to this point required the development of custom groundwater flow and/or contaminant transport codes, there are many advantages to reusing existing models, especially if modifications to these models can be avoided. In such cases, these models would essentially be used as external program modules by the inverse modeling code. The use of modules has been shown to improve code maintainability [Glass and Noiseux, 1981; Lientz and Swanson, 1980] and comprehension [Shneiderman and Mayer, 1979] and is compatible with the notions of encapsulation and abstraction advocated by object-oriented design [McConnell, 1993]. Because groundwater flow and transport codes offer a collection of services in a way that allows for an external program to interact with them cleanly, they are perfectly suited for being coupled with an additional inverse model. For example, Neupauer and Wilson [2001] described the possibility of using existing groundwater transport codes for performing adjoint simulations.

[36] In this section we present an implementation of this idea for the problem of deriving the historical distribution of a contaminant. The flow field, boundary conditions, and load terms in the transport model need to be set up in a manner that reflects the adjoint model described in sections 5.2 and 5.3. The setup of adjoint transport simulations is described here, with additional implementation details presented by Michalak [2003]. Note that the setup for the flow field and boundary conditions would be similar for various applications of the adjoint state method, and is also described by Neupauer and Wilson [2001] for a different problem. The initial conditions and performed simulations, on the other hand, are specific to the problem being addressed.

5.4.1. Flow Field

[37] The steady state flow field should first be calculated in the same manner as if forward simulations were to be run. The flow field is then reversed because the time parameter τ is defined as reverse time in the derivation of the adjoint state methodology, starting at the time at which observations were taken (τ = Tt). The simplest way to do this is to change the sign on all flow terms in the output file of the flow model.

5.4.2. Boundary Conditions

[38] First-type boundary conditions remain first-type, second-type boundary conditions become third-type, and third-type boundary conditions become second-type. Furthermore, all boundary conditions are homogeneous (i.e., the right-hand side is zero) for the adjoint runs (see equation (26)). Note that if the velocities normal to a boundary are zero, second-type and third-type boundary conditions have equivalent forms and can therefore be simulated even if the transport code only supports one of these boundary types (see also section 6.2).

5.4.3. Initial Conditions

[39] Because we are working in discretized space, the Dirac delta function δ (xX) δ (τ) that was derived as the initial condition in equation (30) becomes a Kronecker delta function in numerical applications. Therefore, if we are interested in estimating the historical contaminant distribution in an aquifer, each adjoint run has an initial concentration of zero everywhere, except in the grid cell containing the observation, where the concentration is set to one.

5.4.4. Simulations

[40] The transport model is run once for each observation. The total duration of the run is equal to the amount of time elapsed between the time at which the contaminant distribution is to be estimated and the time at which the observation was made. Once the simulation has been run for the appropriate time, the concentration is recorded at each point in the discretized area of interest. The concentrations in this zone resulting from each adjoint run represent sensitivities of that observation to a historical concentration at each of the points in the discretized zone. As such, the results of each adjoint run allow for one row of the sensitivity matrix H to be filled.

6. Application to the Estimation of the Historical Contaminant Distribution in Two-Dimensional Aquifers

[41] Three sample applications are presented. They involve the identification of the historical distribution of a contaminant in a two-dimensional aquifer. Measurements are taken at a time Tb, and the distribution at a prior time Ta is to be estimated. Although the presented examples involve hypothetical cases, they are representative of conditions observed at the field scale. For the first two examples, the aquifer is assumed to be homogeneous, whereas a deterministically heterogeneous hydraulic conductivity field is used in the third. The first example is an idealized case, whereas the second and third examples have more realistic setups. Although the method is directly applicable to three-dimensional systems, two-dimensional systems were selected for these applications for ease of illustration.

6.1. Homogeneous Aquifer

[42] The first two examples involve the identification of the historical contaminant distribution in a homogeneous aquifer at time Ta, based on downgradient concentration measurements taken at a time Tb = (Ta + 2000) days (or approximately 5.5 years later). The aquifer is assumed to be infinite in both directions, with a groundwater seepage velocity of v1 = 0.1 m/d, v2 = 0 m/d. The effective dispersion coefficients are D1 = 0.3 m2/d, D2 = 0.03 m2/d. The actual distribution at time Ta, which would be unknown in a field case, is presented in Figure 1. The measured distribution at time Tb is presented in Figure 2. We intend to recover the contaminant distribution in the region Ωa = {x : x1 ∈ (0, 256), x2 ∈ (168, 392)}, which is also outlined in Figure 1. For the purpose of solving the inverse problem, this area will be discretized into 8-m intervals, yielding 896 points at which the concentration at time Ta is to be estimated.

Figure 1.

Actual contaminant distribution at time Ta and estimation region (in dotted line).

Figure 2.

Actual contaminant distribution at time Tb for the Idealized and Homogeneous Cases and measurement locations for the Homogeneous Case.

[43] The first example, referred to as the Idealized Case, is designed to demonstrate the method's ability to recover the historical contaminant distribution for a case with extensive sampling and small measurement error. For this example, sampling was conducted on a 16-m × 16-m grid in the range {x : x1 ∈ (200, 456), x2 ∈ (152, 408)}, yielding a total of 289 concentration measurements. A vector of normally distributed measurement errors ɛ with mean zero and small variance of 10−10 (mg/L)2 was added to the actual concentration values C.

[44] The second example, referred to as the Homogeneous Case, involves a sparser measurement array and higher errors. The sampling was conducted on a 32-m × 32-m grid, yielding 54 concentration measurements. Given the ratio of observations to unknowns (54 : 896), the problem is strongly underdetermined. A vector of normally distributed measurement errors ɛ with mean zero and variance 10−6 (mg/L)2 was added to the actual concentration values C to simulate the effect of the model and measurement error that would always be present in a field setting. This measurement error is equivalent to a standard deviation of 1 ppb.

[45] For both examples, the vector of observations, z, and that of the unknown function s we wish to estimate are

equation image

For these examples, we have an analytical solution for the forward problem:

equation image

where C (X1, X2, t) is the concentration at location (X1, X2) at a time t after the time at which the historical distribution s (x1, x2) is defined. In this case, t = TbTa = 2000 days. The distribution is a function of location and is expressed by s (x1, x2) and the integration is over the historical plume region Ωa. The transfer function f (X1x1, X2x2, t) applies the appropriate weight to the historical distribution function:

equation image

[46] In a standard setup the sensitivity matrix would have elements Hi,j = Δx1Δx2f (X1,ix1,j, X2,ix2,j, t). These components could be calculated by perturbing each element of sj (x1,j, x2,j), where j = 1,…, m and observing the impact at each observation location (X1,i, X2,i), where i = 1,…, n. To define the solution to the adjoint problem, the flow field is reversed. Because there are no domain boundaries in this case, we do not need to adjust boundary conditions for the solution of the adjoint problem. Therefore the sensitivity matrix H is expressed as

equation image

where the adjoint transfer function is

equation image

and τ = −(TbTa) = 2000 days. Each row of matrix H can be computed using a single vector operation, yielding a total of n vector operations.

[47] From this point forward, the solution for both examples is obtained using the geostatistical inverse modeling approach described in section 4. The cubic GCF with linear drift was selected for this application because it yields smooth estimates that have continuous first and second derivatives everywhere, which is consistent with the contaminant distribution function used here. For the second example, the structural parameter θ in the cubic GCF model and the measurement error variance σR2 were assumed unknown and were optimized using the restricted maximum likelihood approach (see Table 3). The recovered contaminant distribution is presented in Figure 3a for the Idealized Case and Figure 4a for the Homogeneous Case. The standard deviations of the estimates are presented in Figures 3b and 4b and are indicative of the uncertainty of the inversion results.

Figure 3.

Recovered contaminant distribution for time Ta for the Idealized Case. (a) Best estimate. (b) Estimate standard deviation.

Figure 4.

Recovered contaminant distribution for time Ta for the Homogeneous Case. (a) Best estimate. (b) Estimate standard deviation.

Table 3. Optimal Structural Parameter Values for Homogeneous and Heterogeneous Cases
Structural ParameterHomogeneousHeterogeneous
θ [(mg/L)2m−3]4.1 × 10−76.8 × 10−7
σR2 [(mg/L)2]0.80 × 10−61.06 × 10−6

6.2. Heterogeneous Aquifer

[48] The third example, referred to as the Heterogeneous Case, again involves the identification of a historical contaminant distribution, but this time in a heterogeneous aquifer. The distribution at time Ta is estimated based on downgradient concentration measurements taken at time Tb = Ta + 2000 days.

[49] The domain used for the third example is presented in Figure 5. The domain is finite, measuring 1024 m and 512 m in the x1 and x2 directions, respectively. It is discretized into 128 × 64 nodes in the x1 and x2 directions, respectively, resulting in an 8-m × 8-m grid. No-flux boundary conditions were applied at the top and bottom boundaries for both flow and transport. The left-hand side and right-hand side boundaries have prescribed constant heads, resulting in a mean gradient of 3.472 × 10−2 m/m.

Figure 5.

Hydraulic conductivity field used for the Heterogeneous Case.

[50] The domain has a deterministically heterogeneous hydraulic conductivity field with a geometric mean of 0.864 m/d (1.00 × 10−5 m/s), resulting in a mean velocity in the x1 direction comparable to that used in the homogeneous application. The field was generated using the numerical spectral approach of Dykaar and Kitanidis [1992a, 1992b]. The flow solution was obtained using MODFLOW [McDonald and Harbaugh, 1988; Harbaugh and McDonald, 1996].

[51] The actual contaminant distribution at time Ta used in this example is identical to the one used in the homogeneous applications (see Figure 1). The plume profile at time Tb was obtained using MT3DMS [Zheng, 1990; Zheng and Wang, 1999]. The boundary conditions used to solve the forward problem were

equation image
equation image

The distribution at time Tb is presented in Figure 6, along with sampling locations. The sampling was conducted on a 32-m × 32-m grid, as in the Homogeneous Case. However, because the heterogeneous domain resulted in more spreading of the plume, a total of 105 observation locations were needed. The zone Ωa for which we try to recover the contaminant distribution is identical to that used in the homogeneous applications, once again yielding 896 points at which the distribution is estimated. As in the homogeneous case, the problem is strongly underdetermined.

Figure 6.

Actual contaminant distribution at time Tb and measurement locations for the Heterogeneous Case.

[52] The solution was obtained in a method analogous to the one presented for the homogeneous domain. The solution to the adjoint problems, however, was obtained numerically, using MT3DMS. For the adjoint runs, the MODFLOW-derived flow field was reversed, and the boundary conditions were changed to

equation image
equation image

Note that according to equation (26), the third-type boundary condition in equation (37) should have been changed to a second-type boundary condition. However, in this case, the boundary condition for flow had been set as no-flux for the upper and lower boundaries of the domain (v2 = 0 for x1 ∈ [0, 1024]; x2 = [0, 512]) and the boundary condition in equation (39) is equivalent to the required second-type boundary condition:

equation image

Therefore, in this case, the boundary conditions did not need to be modified for the adjoint runs.

[53] Each adjoint run consisted of setting the initial concentration to zero throughout the domain, except in the grid cell corresponding to one of the observations, where the concentration was set to one. The adjoint simulation was run for 2000 days, at which point the concentration at each point in the discretized historical distribution area Ωa was recorded. This process was repeated 105 times, once for each observation. Without the use of the adjoint methodology, a total of 896 runs would have been required. The results from the adjoint simulations were used to fill in the sensitivity matrix as outlined in equation (34).

[54] The recovered concentration distribution at time Ta is presented in Figure 7a, and its standard deviation is contoured in Figure 7b. Restricted maximum likelihood estimates of parameter values θ and the variance of the measurement error σR2 are presented in Table 3.

Figure 7.

Recovered contaminant distribution for time Ta for the Heterogeneous Case. (a) Best estimate. (b) Estimate standard deviation.

6.3. Discussion

[55] The adjoint state formulation was successfully implemented both in cases where an analytical solution exists for the forward problem and where a numerical solution is required. In cases where a numerical solution is required for the forward problem, the computational savings are considerable, making the implementation of inverse modeling in multidimensional heterogeneous media manageable.

[56] The estimated covariance parameter θ is similar for the Homogeneous and Heterogeneous Cases (4.1 × 10−7 and 6.8 × 10−7 (mg/L)2 m−3, respectively). These parameters are an indication of the inferred correlation structure of the historical distribution presented in Figure 1 but were estimated without the use of this function, relying instead on information contained in the available measurements (see equations (10)–(12)). Because the historical distribution, and therefore its spatial correlation structure, is the same for both applications, the fact that the estimated θ are similar is indicative of the robustness of the approach. Also, because in this pseudodata example we have access to the actual historical distribution, the actual θ can be derived [e.g., Kitanidis, 1997]. This parameter is estimated to be 1.1 × 10−7 (mg/L)2 m−3, which is in good agreement with the values inferred from the measurements. The fact that the inferred θ is slightly lower than those recovered as part of the inversion is likely due to the large fraction of the plume area with zero concentration, which is not fully constrained by the limited measurements used in the inversions, yielding a higher inferred θ.

[57] In addition, the variance of the error ɛ that was artificially added to the measurements to simulate the effect of measurement and model error was assumed unknown and was estimated for both test cases. The estimates obtained for the Homogeneous and Heterogeneous Cases were once again consistent (0.80 × 10−6 and 1.06 × 10−6 (mg/L)2, respectively) and are also close to the actual variance of 1.00 × 10−6 (mg/L)2 (which would not be known in a field setting).

[58] A good way to verify the effectiveness of the new geostatistical approach combined with the use of the adjoint method is to look at whether the overall method is effective at estimating the historical contaminant distribution and the uncertainty about that estimate. As can be seen in Figure 3a, the distribution is recovered almost perfectly in the Idealized Case, indicating that the method can recover the contaminant distribution, when there is enough information in the measurements to strongly constrain the inversion. For this Idealized Case, the posterior standard deviations presented in Figure 3b are very low (notice the different color scale relative to Figures 4b and 7b), indicating that the method recognizes the fact that the contaminant distribution is very well defined.

[59] Having demonstrated the method's effectiveness for the Idealized Case, we now examine the results for the Homogeneous and Heterogeneous Cases. As can be seen in Figures 4a and 7a, the historical contaminant distribution is recovered reasonably well in these applications. We do not, in fact, expect to be able to recover the contamination distribution perfectly in these cases, because of the information loss that inevitably results from the mixing process, the added model and measurement error, and the small number of measurements relative to unknowns. In these hypothetical cases, we could easily have come as close to recovering the exact historical distribution as we would have wanted to, by increasing the number of measurement locations n and decreasing the variance of the error vector equation image added to the measurements, as was demonstrated in the Idealized Case. This was not the goal of the exercise, however. Instead, we are interested in verifying whether the method can accurately gauge to what extent the distribution can be recovered in realistic cases, by providing meaningful confidence intervals (as quantified by the posterior variance of equation image) in addition to a best estimate. In other words, we want the method to be able to gauge the precision of the best estimate. The posterior standard deviations presented in Figures 4b and 7b give an indication of the precision of the obtained solution, and ideally, the actual contamination distribution at time Ta should fall within two standard deviations of the best estimate 95% of the time. Conversely, the actual contamination distribution should fall outside the 95% confidence intervals at 5% of the points in the discretized unknown function. This would be an indication that the method has effectively gauged the degree to which the historical distribution could be recovered. For the Homogeneous and Heterogeneous Cases presented here, the actual historical contaminant distribution lay outside two standard deviations of the best estimate at 6.1% and 8.0% of the grid cells, respectively. Both of these percentages are close to the ideal case where this would occur at 5% of the points, indicating that the method is successful in determining the precision of the obtained best estimate. Note that these percentages would vary somewhat based on the particular realization of the measurement error ɛ that is artificially added in generating the observation pseudodata. For the sample applications presented here, the percentages varied between approximately 5% and 7% for the Homogeneous Case, and between approximately 3% and 10% for the Heterogeneous Case for different realizations.

[60] Finally, it is worthwhile to note that for the presented applications, the historical distribution was recovered based on measurements that were all obtained at the same time. As has been demonstrated in past applications of the geostatistical approach to inverse modeling [e.g., Michalak and Kitanidis, 2002], the method can also be applied if measurements are taken at different times. In that case, the duration of the individual adjoint simulations would be variable, equal to the time elapsed between the time at which the contaminant distribution is to be estimated and the time at which a given measurement was taken.

7. Conclusions

[61] The work presented in this paper extends the geostatistical approach to inverse modeling to the recovery of a historical contaminant distribution, implements an adjoint methodology that improves the efficiency of solving underdetermined inverse problems, allows existing groundwater flow and transport codes to be used as modules of the inverse model, and presents the first application of an inverse modeling method to the identification of a historical, multidimensional contaminant distribution in a heterogeneous medium.

[62] The method was tested using three applications. The Idealized Case demonstrated the method's ability to precisely and accurately recover the historical contaminant distribution in an aquifer when the quantity and quality of available data are sufficient. The Homogeneous and Heterogeneous Cases demonstrated the method's ability to recover a reasonable best estimate of the contaminant distribution and to accurately gauge the precision of that estimate. Although the method was applied here to derive the historical contaminant distribution at a single time, the method is also applicable to obtaining a time-dependent description of the history of a plume. In that case the distribution of the adjoint state ψ* in the adjoint runs would be recorded for a series of times τ, and the inversion would be performed for each of these times.

[63] Finally, although the adjoint state methodology was presented with an application to geostatistical inverse modeling in mind, several of the other inverse modeling methods described in Table 1 could benefit directly from this work. Methods such as Tikhonov regularization, nonregularized nonlinear least squares, and minimum relative entropy all require the calculation of a sensitivity matrix analogous to H. Therefore, although the specific inverse modeling algorithms differ from the geostatistical approach, the adjoint method implemented in this work would allow for similar computational savings in calculating the sensitivity matrix if applied with these methods.


[64] This research was partially funded by the Natural and Accelerated Bioremediation Research (NABIR) program, Biological and Environmental Research (BER), U.S. Department of Energy (grant DE-FG03-00ER63046). Additional funding for Anna M. Michalak was provided by a NOAA Climate and Global Change postdoctoral fellowship, a program administered by the University Corporation for Atmospheric Research (UCAR).