Bayesian inversion with total variation prior for discrete geologic structure identification

Authors


Abstract

[1] The characterization of geologic heterogeneity that affects flow and transport phenomena in the subsurface is essential for cost-effective and reliable decision-making in applications such as groundwater supply and contaminant cleanup. In the last decades, geostatistical inversion approaches have been widely used to tackle subsurface characterization problems and quantify their corresponding uncertainty. Some well-established geostatistical methods use models that assume gradually varying parameters. However, in many cases, the subsurface can often be better represented as consisting of a few relatively uniform geologic facies or zones with abrupt changes at their boundaries. We advance a Bayesian inversion approach with the gradient represented not through a Gaussian but a Laplace prior, also known as total variation prior, for the case that there are reasons to believe that discrete geologic structures with relatively homogeneous properties predominate in the subsurface but their number, locations, and shapes are unknown a priori. Structural parameters (or hyperparameters of the inversion scheme) are determined in a Bayesian framework by maximizing the marginal distribution of these parameters using an expectation-maximization approach; this allows proper weighting of prior versus data information and produces results with realistic uncertainty quantification. We present here three applications of the method: a time-varying extraction rate estimation at a well, a linear cross-well seismic tomography, and a nonlinear hydraulic tomography. These results are compared with those achieved in the classical geostatistical method and it is shown that the Bayesian inversion approach with total variation prior can be a useful tool to identify discrete geologic structures.

1. Introduction

[2] Subsurface characterization for accurate groundwater flow and transport simulation is an important part of hydrogeologic studies such as groundwater resources planning and contamination remediation. Among subsurface properties of interest for the characterization are spatially distributed hydraulic conductivity, porosity, storage coefficients, and dispersion coefficients, as well as aquifer geometry and boundary conditions. To estimate the spatial structure of these subsurface properties, available observations are inverted through the relationship between observations and the subsurface properties of interest, which is formulated and treated as an “inverse problem.” Solutions to the inverse problem have been an active area of study in groundwater engineering [e.g., Yeh, 1986; McLaughlin and Townley, 1996; Carrera et al., 2005; Oliver and Chen, 2011]. A representative groundwater inverse problem is the estimation of hydraulic conductivity from hydrogeologic data such as hydraulic heads and direct soil/rock samples collected in boreholes [Hoeksema and Kitanidis, 1984; Carrera and Neuman, 1986; Rubin and Dagan, 1987]. Due to expensive and time-consuming hydrogeologic data collection process, near-surface geophysical techniques [Rubin and Hubbard, 2005; Knight and Endres, 2005] such as ground penetrating radar, electrical resistivity tomography and electromagnetic induction methods have also been introduced to provide additional information on the subsurface geology thus improving the estimation of unknown aquifer properties. With recent advances in computational capacity to handle large-scale data from various data collection techniques, inversion methods allow us to depict possible “images” of the subsurface for making predictions about subsurface phenomena and optimal decisions in engineering applications.

[3] However, subsurface characterization from hydrogeologic or geophysical measurements is fraught with challenges. Due to the limited number of measurements, which is usually much smaller than the number of spatially distributed parameters of subsurface properties, the weak sensitivity of measurements to unknowns, and the presence of noise in the observations, the typical inverse problem encountered in practice is ill-posed, i.e., there exist an infinite number of subsurface images consistent with measurements. In addition, the solution to an inverse problem must account for epistemic uncertainty introduced not just by sampling error in measurements but also by imperfect representation of the relationship between measurements and subsurface properties. Whereas information in the data is limited, inversion techniques utilize prior knowledge reflecting understanding of the subsurface to weigh possible solutions among those consistent with data. These solutions can be evaluated in a probabilistic way, which is commonly treated in the Bayesian framework by finding the probability density of feasible solutions that satisfy both model fitting and prior information [Kaipio and Somersalo, 2007; Tarantola, 2005; Kitanidis, 2012]. In the Bayesian geostatistical inversion approach [Kitanidis and Vomvoris, 1983; Dagan, 1985; Kitanidis, 1995], for example, prior information is described through the model that subsurface properties are distributed randomly with prescribed mean and spatial covariance functions, which is suitably casted into a probabilistic distribution function (pdf). Sometimes, one looks for the most probable solution that represents, in some sense, the best guess given data and prior information. In this case, the Bayesian inverse problem is reduced to the deterministic optimization problem of maximizing the posterior distribution. In the first-order Tikhonov regularization method [Tikhonov and Arsenin, 1977], the inverse problem is formulated as the optimization problem of seeking the “flattest” solution that reproduces the data. In fact, it is shown that the solution of the Tikhonov regularization method is equivalent to the best estimate obtained from the geostatistical inversion method with a certain class of generalized covariance functions [Kitanidis, 1999]. In summary, whether one uses deterministic or stochastic inversion methods, smoothness of the unknown subsurface structure is usually introduced as a desirable feature, which has the side effect of image blurring as a result of smoothing details that cannot be inferred from the incomplete and noisy information that measurements contain.

[4] However, when one strongly suspects that subsurface consists of a few relatively uniform geologic units with abrupt changes between units [Fienen et al., 2004], this approach seems inappropriate. Especially with a few available measurements, the data provides very weak sensitivity to the boundaries so that conventional techniques often perform poorly in the facies detection or zonation problem, defined as the challenging problem of finding regions of relatively uniform properties or, equivalently, identifying boundaries between such regions. Even if the subsurface is heterogeneous with earth properties varying at all scales, it may still be useful to categorize spatial parameters of interest as belonging to a few important discrete types or facies. For example, when one is interested in groundwater quality, it may be desirable to classify the subsurface into areas wherein groundwater meets potable, agricultural, or industrial use.

[5] The common approach to facies detection in groundwater engineering is the method of zonation in which the unknown field is parameterized into a few variables that describe the shape and location of zones or facies [Jacquard, 1965; Yeh and Yoon, 1981; Tsai et al., 2003]. During the inversion, the location and number of facies are optimized until the method achieves an acceptable data fit. Each facies in the method of zonation is typically assumed to have a constant property and then with identified facies distribution, variability within each facies can be simulated through geostatistical inversion methods. Other researchers [Berre et al., 2007; Dorn and Villegas, 2008; Cardiff and Kitanidis, 2009] identify zonal structures using the level-set method in which the shapes of facies are evolved during the inversion process to fit the data and thus one does not have to predetermine the shape. These approaches have shown satisfactory results in synthetic or practical applications, however, additional information such as the number and/or shape of zones needs to be defined before solving the inverse problem and the criteria for updating zonal parameters during the inversion may be ad hoc and depend largely on the choice of zonal structure parameterization.

[6] Another approach to delineate discrete geologic structures is the compactness or minimum support regularization method [Last and Kubik, 1983]. In this method, a subsurface image with the minimum area or volume of anomalies is selected thus geologic units with sharp contrasts can be recovered. The compactness regularization approach has been applied in gravity and electromagnetic inverse problems [Portniaguine and Zhdanov, 1999] and seismic travel-time tomography [Ajo-Franklin et al., 2007]. However, the compactness regularization term is nonconvex and discontinuous so that the minimization of the objective function is challenging and measurement noise may degrade the solution quality significantly. In addition, auxiliary parameters have to be assigned before or during the inversion.

[7] In a Bayesian framework, a natural solution to the zonation problem can be obtained by introducing appropriate information about the structure of the unknown function through the prior distribution. In other words, a prior pdf that captures the discretely changing structure of unknowns needs to be selected to produce multiple facies structures. For example, if one knows the exact number of facies, their threshold values for truncation and appropriate covariance models, the truncated Gaussian/pluri-Gaussian method [Galli et al., 1994] can be used to generate possible multiple-facies geologic images. Conceptual subsurface maps that professional geologists draw from outcrop survey can also be used to generate subsurface images using multiple-point geostatistics [Strebelle, 2002; Hu and Chugunova, 2008]. These prior images are then conditioned on measurement data to produce posterior realizations of the subsurface property field [Caers, 2003; Liu and Oliver, 2004; Ronayne et al., 2008; Alcolea and Renard, 2010; Jafarpour and Khodabakhshi, 2011].

[8] For the situation that one only has limited information such as the degree of the smoothness of unknowns, Kitanidis [2012] discusses the improper (generalized) prior in Bayesian inverse problems and how the choice of priors affects how sharply sharp boundaries may be identified. Resulting priors, for some reasonable choices of information, are identical to those obtained from Markov Random Field theory [Li, 2009], which states that a field characterized by joint probability distribution of the local properties under the Markov property is equivalent to a random field described by the global measure on it. Therefore, the improper prior can be a useful tool in modeling discrete geologic structure, as will be shown in the following sections.

[9] In this paper, we present a methodology of detecting zonal structure without any demanding assumptions about the location, shape, and number of zonal properties. For the zonal subsurface structure that contains sharp boundaries between units of distinct properties, it may be reasonable to assume there are a few large changes in geologic distribution of entire domain and small, perhaps negligible, variability within each zone. It will be shown that this prior information is translated suitably into a probability distribution that gives higher weight to solutions with a few sharp changes through the Laplace distribution of gradient, also known as total variation prior. The idea of using the Laplace distribution as a prior has long been recognized as a way to provide a “sparse” estimation whose elements contain many zeros (a zero, in this case, signifying no change between neighboring elements or nodes) and a few nonzeros (contrasts) [Claerbout and Muir, 1973; Tibshirani, 1996].

[10] Furthermore, recently developed compressed sensing theory [Donoho, 2006] explains that the optimization with 1-norm regularization [Daubechies et al., 2004], which attempts to find the mode of posterior pdf with a Laplace prior, can be equivalent to minimization of the number of nonzero elements in the solution leading to the sparsest solution; in our case, minimization of the number of the changes in the unknown field would be the most appropriate formulation for characterizing and classifying the subsurface structure with sharp contrasts. Several studies have implemented the 1-norm regularization approach based on compressed sensing theory such as subsurface reservoir permeability estimation using discrete cosine transform (DCT) [Jafarpour et al., 2009; Li and Jafarpour, 2010a] and seismic wavefield reconstruction using an extension of wavelets [Lin and Herrmann, 2007] under the assumption that a few nonzero coefficients in the discrete cosine or wavelet transform domain can represent the true solution effectively. Li and Jafarpour [2010b] extend DCT-based 1-norm regularization approach in a tractable Bayesian framework by replacing a Laplace prior with hierarchical models and account for the uncertainty on the cosine coefficients of estimated solutions.

[11] The remainder of this paper is organized as follows. In section 'Methodology', a Bayesian inversion approach with total variation prior is presented in detail. The structural parameters that account for uncertainty in prior information and likelihood function, measurements as well as forward model are estimated using an iterative Expectation-Maximization method. In section 'Applications', three numerical applications are presented. We briefly present a simple 1-D groundwater pumping rate identification example [Kitanidis, 2007] and how well uncertainty of the estimate is quantified compared to a classical geostatistical approach. Then we apply our method to a synthetic 2-D seismic borehole tomography problem [Cardiff and Kitanidis, 2009] and compare the result with those obtained by the geostatistical method and a level-set inversion method. Lastly, an example of the hydraulic conductivity estimation from sequential pumping tests is presented to illustrate a nonlinear inversion.

2. Methodology

[12] We are interested in estimating unknown subsurface properties given observation data under the following relationship:

display math(1)

where y is a n × 1 observation data vector such as hydraulic head, h is a forward operator that maps parameter space to observation space, s is a vector of subsurface parameters to be estimated such as hydraulic conductivity and usually discretized to a m × 1 vector as inputs for a numerical simulation model, and ɛ is the error in observation y that takes place during sampling but also accounts for errors in the forward model h. Based on equation (1), the subsequent sections will describe how to construct prior and posterior distributions, determine structural parameters and quantify uncertainty of estimates s.

2.1. Bayesian Inversion With Total Variation Prior

[13] In a Bayesian framework, all the variables with incomplete information such as s and ɛ are regarded as random variables. Under the typical assumption that observation error is Gaussian with zero mean and a covariance matrix math formula, the complete solution to the Bayesian inverse problem is through the posterior distribution of parameters s given math formula, which is proportional to the likelihood function math formula multiplied by the prior distribution math formula through the Bayes' formula:

display math(2)

[14] For convenience, this paper uses math formula for prior pdf and math formula for posterior pdf conditioned on y. When the problem is ill-posed, e.g., the number of data is not enough to fully characterize the subsurface and the system is underdetermined math formula, math formula plays a significant role to determine the posterior distribution of the estimate. In the classical geostatistical inversion, the prior math formula is commonly modeled as multivariate Gaussian with a known covariance function that gives higher weight to flat or smooth solutions. Such a geostatistical prior is not suitable for describing abruptly changing subsurface structure. To identify the discrete geologic structure, a prior pdf needs to be constructed that allows s to change discretely.

[15] In fact, it might be expected that the zonal structure would have zero or small changes in most locations and some jumps across the boundaries. In this case, the “changes” can be represented quantitatively as the first-order finite-difference approximation of the gradient of s. An additional useful (but not necessary) requirement is that the prior information is expressed as a parametric distribution that provides us mathematical tools to analyze. To simplify our analysis, s is discretized on equally spaced finite difference grids and numbered sequentially along an axis, i.e., math formula. Then, the Laplace distribution (see Appendix A) of the (1-D) gradient of a point sk given an adjacent point math formula is one possible option for describing the zonal structure:

display math(3)

where θ is a scale parameter of the Laplace distribution determined by the shape of distribution and discretization scheme, and regarded as a structural parameter in Bayesian analysis. The Laplace distribution has a sharper peak and thicker tails than the Gaussian distribution, thus the finite difference approximation of the gradient math formula has a great tendency to be zero but is also allowed to have an extreme value that describes a series of jumps across the boundaries. To construct the prior of the entire field s, it can be assumed that a point sk depends only on its adjacent points, for example, math formula and math formula in 1-D case and is independent of other remaining points (i.e., math formula) indicating that they can be in either similar or separate zones. The prior pdf of s is then a product of equation (3) over a d-dimensional domain:

display math(4)

where nfd is the total number of first-order finite-difference approximation of math formula is the number of first-order finite-difference approximation of s in direction math formula accounts for the anisotropy on s, math formula is 1-norm and Li is a finite difference matrix in i-direction. For example, in 1-D problem, L1 is defined as

display math(5)

[16] The unknown subsurface parameter set s following the prior is likely to have zero gradient in most places with a few relatively large jumps. We name the Laplace distribution of the gradient as total variation (TV) prior after the total variation norm [Rudin et al., 1992] used in image processing, which calculates the sum of the modulus of “jumps” between neighboring pixels of an d-dimensional image:

display math(6)

[17] The concept of total variation was first introduced to image processing for the application of digital image denoising [Rudin et al., 1992]. The TV norm, unlike 2-norm, does not favor gradual changes over abrupt changes in s [Kitanidis, 2012], while also removing noise. Because the image denoising using the TV norm preserves the salient edge of objects while suppressing the noise in image, it has emerged as one of the most popular techniques for image processing. The TV norm regularization approach has been used in many engineering applications such as MRI image reconstruction [Lustig et al., 2007] and video quality enhancement [Ng et al., 2007].

[18] One might notice that the construction of the TV prior shown above is closely connected to a discrete Markov Random Field (MRF). In fact, the MRF model [Li, 2009] is a general and consistent approach to model the local conditional probability of a correlated random field and has been applied to model and simulate the spatial variability of discrete geologic structures [Carle and Fogg; Norberg et al., 2002]. Furthermore, the MRF-Gibbs equivalence or Hammersley-Clifford theorem [Besag, 1974] states that the MRF model characterized by the local conditional probability is equivalent to the exponent form of some function V(s), showing that the total variation prior (i.e., math formula) is useful to simulate discrete geologic structures in terms of the MRF theory. The same formulation can be also derived through the maximum entropy assignment [Kitanidis, 2012] with a measure on s in the form of generalized prior and the TV norm is our choice to measure the structure in a parsimonious way. It is also worth noting that the TV prior is “improper” since the integral of it is not bounded. However, when combined with the likelihood function, the TV prior produces a posterior distribution that is integrable and can be used in inversion.

[19] The TV prior is incorporated with the likelihood function to provide the posterior distribution. For simplicity, let us assume that an isotropic TV prior is considered, i.e., θi = θ1 in any direction i and ɛ is independent identically distributed, i.e., math formula. Then the joint posterior pdf of s and θ = (θ1,θ2) is obtained using the Bayes theorem:

display math(7)

where the prior pdf of the structural parameter set θ = (θ1,θ2) is assigned as math formula and math formula with θ1,θ2 > 0. The prior pdf on θ is called Jefferey's prior and represents uniform distribution on log (θ), which can be considered as the limiting case of log-normal distribution of θ as the variance goes to infinity. Since we are interested in the independent estimate of s, irrespective of the structural parameter set θ, the marginalization over the nuisance structure parameters provides the marginal distribution of s:

display math(8)

where

display math(9)

[20] In most cases, math formula cannot be expressed analytically thus is represented through realizations of s using expensive Monte Carlo Markov Chain methods. To reduce the computational intensity, the posterior distribution of the structure parameters is often approximated as a delta distribution:

display math(10)

where math formula is the mode of equation (9), i.e., math formula. Note that this approximation is widely used in many engineering areas such as geostatistics (structural analysis) [Kitanidis, 1997] and machine learning (latent variable value determination) [Bishop, 2007]. Then the marginal distribution of s is given by

display math(11)

[21] If we would like to find one “best” estimate that represents the most possible subsurface structure, one useful approach is to find a maximum a posteriori (MAP) estimator that is the mode of equation (11) and the probabilistic inverse problem becomes a minimization problem of negative logarithm of equation (11)

display math(12)

where λ is 2θ2/θ1. This variational approach solves a deterministic inverse problem in which the ill-posedness issue becomes ameliorated by the regularization term, i.e., the TV norm, derived from the TV prior in the Bayesian approach. If h is a linear operator H, i.e., y = Hs, equation (12) becomes a convex optimization problem and can be solved efficiently using interior-point method [Nocedal and Wright, 2006]. For nonlinear forward operators, the same optimization approach can be used through successive linearization of h iteratively.

[22] The 1-norm regularization approach in equation (12) has been known to produce a sparse solution, where the term sparsity alludes to the concept of a sparse vector that has mostly zero elements and only a few nonzero values. For example, several researchers in geophysics [Claerbout and Muir, 1973; Taylor et al., 1979; Santosa and Symes, 1986] have used the 1-norm regularized optimization for reflection seismology signal reconstruction and found that the minimization of both residuals and 1-norm of estimates leads to spiky signals containing zero backgrounds and a few nonzero large spikes. In this regard, the total variation inversion that minimizes 1-norm of gradient produces the geologic structures that avoid small spatial changes of solution but allow a few large changes.

[23] Furthermore, recently emerging compressed sensing/sparse recovery theory [Candes et al., 2006; Donoho, 2006] justifies the use of 1-norm as a regularization term for discrete geologic image reconstruction even with very few measurements. Perhaps the most appropriate prior information for the facies classification is that subsurface property has minimum number of changes among possible images consistent with observation data. This prior information can be achieved by considering “0”-norm as a regularization term:

display math(13)

where math formula counts the number of nonzero elements in a vector. Hence, math formula counts the number of nonzero differences in neighboring elements in s and equation (13) penalizes only the number of changes of s regardless of their magnitude. Integrating the prior (13) with the likelihood model, the discontinuity of subsurface may be estimated accurately by minimizing the number of nonzero changes in s consistent with measurements. However, the minimization of equation (13) is computationally very demanding and practically impossible for any problem of substantial size. Alternatively, it is found that the 0-norm regularized optimization is equivalent to the 1-norm optimization under certain conditions: when the inverse problem (1) is (1) linear, i.e., y = Hs, (2) the columns of H are nearly orthogonal [Donoho and Huo, 2001], and (3) the unknown vector is sufficiently sparse such that the vector to be estimated is represented to very small nonzero elements. For example, we like to solve following optimization problem:

display math(14)

where x is n × 1 unknown quantity we like to estimate, y is m × 1 measurements, and n × m linear forward matrix H maps x to y where m is greater n. If true solution math formula is “k”-sparse, meaning “kmath formula number of nonzero elements in math formula and the measurements y are collected by H in a “parsimonious” way to compress the unknown x as globally and incoherently to each measurement as possible, the solution of equation (14) is equivalent to that of the 1-norm optimization with small n or even close to k observations:

display math(15)

[24] In this case, the 1-norm optimization captures the essential structure, i.e., sparsity, of the unknown x using a few measurements that contain as much information on the structure as possible via a sampling scheme H. Surprisingly, we do not need the information on the number of nonzeros (i.e., math formula), their location or their values that are all completely unknown a priori. In our case, the true zonal subsurface field is assumed to be represented with a few significant nonzero jumps across the facies boundaries. Thus, we can exploit this sparse first-order derivative structure to reconstruct the zonal subsurface structure. Even though measurements in reality rarely contain full information on the sparse structure of the unknown field with nonlinear relationship, the 1-norm regularized optimization still performs well to find a few nonzero solution that fits the data.

2.2. Structure Parameter Estimation Using Expectation-Maximization Algorithm

[25] To derive the posterior distribution of s in equation (11), the structural parameters θ1 and θ2 need to be determined. The choice of the structure parameters is one of the important aspects of the inverse problem since the optimal structure parameters allow proper uncertainty quantification of the estimates [Kitanidis, 2007]. However, direct calculation of equation (9) is usually intractable since the marginal posterior distribution of θ cannot be computed analytically in most cases. To overcome this difficulty, an expectation-maximization (EM) algorithm is implemented to determine the optimal structure parameter estimates math formula and math formula. The EM approach is a general iterative method for computing the mode of the marginal pdf of parameters of interest such as θ1 and θ2, i.e., math formula, from the joint pdf of θ1,θ2 and s, i.e., math formula [McLachlan and Krishnan, 2008]. The EM approach replaces the intractable maximization of the marginal posterior distribution with a sequence of manageable maximization problems. The name of the approach comes from iterative conditional expectation (E-step) and maximization (M-step) and it is shown that at each EM-step, math formula increases until its mode is reached. The conditional expectation in each step is computed using conditional realizations given current estimate of parameters. Generation of conditional realizations will be explained in the next section. The EM approach proceeds as follows:

[26] 1. Start with an initial guess of θ1 and θ2, denoted math formula and math formula. Set counter k = 0

[27] 2. Generate a large number N of conditional realizations of s (e.g. ith realization math formula) using math formula and math formula and formulate the following equation (E-Step)

display math(16)

[28] 3. Maximize math formula with respect to math formula and math formula (M-Step)

[29] 4. Set counter to k = k+1. Return to step 2.

[30] The procedure continues until the iterations converge on the modes of the marginal pdf's of θ1 and θ2.

2.3. Uncertainty Quantification With Total Variation Prior

[31] Along with the best estimate and the optimal structure parameters, uncertainty of the estimation, i.e., the possible range of estimates, should be quantified to make reliable predictions of subsurface phenomena. Since measuring or visualizing a posterior pdf is generally difficult in high dimensions, uncertainty of the estimate can be characterized by the ensemble of subsurface images sampled from the posterior pdf. Typical sampling strategies are Monte Carlo Markov Chain (MCMC) or rejection sampling that require significant computational efforts. Here we use a parameteric bootstraping sampling [Kitanidis, 1995], where N conditional realizations are generated from unconditional realizations of the TV prior and the noise model through the optimization equation (12) N times. This sampling technique is equivalent to the MCMC method for the case where the forward model is linear and even in nonlinear problems, it is shown that sampled realizations are effectively approximate to MCMC samples [Zanini and Kitanidis, 2009; Liu and Oliver, 2003]. In specific, ith conditional realization math formula can be obtained by:

display math(17)

where math formula and math formula is the Laplace distribution with mean μ and scale parameter θ.

[32] In following sections, the Bayesian inversion method using the TV prior will be applied to synthetic problems. Forward models are coded in MATLAB and simulations are performed on a personal computer equipped with an Intel quad-core CPU and 8 GB RAM. The optimization problem is solved with CVX interface, a package for solving convex programs [CVX Research Inc., 2012; Grant and Boyd, 2008]. The code to generate conditional realizations is “embarrassingly parallel”.

3. Applications

3.1. Pumping History Estimation: A Simple Illustration

[33] We illustrate our Bayesian inversion approach with total variation prior by solving a simple source identification problem adapted from Kitanidis [2007], which originally describes how to quantify uncertainty of the estimation effectively in the geostatistical approach. The goal of this example is to estimate pumping history at an extraction well, s(t), from a few drawdown observations, y at a nearby monitoring well and quantify the estimation uncertainty. In a homogeneous and isotropic aquifer, drawdown at the location (x1, x2) at time t due to an extraction well at the location (0,0) is calculated as

display math(18)

where t is time math formula are spatial coordinates math formula is transmissivity math formula is extraction rate math formula and D is the hydraulic diffusivity math formula. We assume that all parameters except the pumping history s(t) are known as math formula, x1 = 2 m and x2 = 0 m. Hundred drawdown measurements are collected every 10 min from t = 10 to 1000 min at the monitoring well. Measurements are contaminated with Gaussian error with zero mean and variance 10−4. The unknown pumping rate s is discretized into 1 min intervals from t = 0 to 1000 min and the integration in equation (18) is approximated using a Riemann sum as

display math(19)

where y is 100 × 1 drawdown measurements, H is the linear forward operator from equation (18), s is 1000 × 1 unknown extraction rates and ϵ is the measurement error. The true pumping history is displayed in Figure 1. In order to observe how different prior information affects the estimation, the geostatistical approach is applied to this example. In the geostatistical approach, the linear generalized covariance function (GCF) is used and structural parameters for variogram and measurement error are determined using cR/Q2 criteria [Kitanidis, 1997]. By using the linear GCF, the rate of changes in s can be controlled by one hyperparameter while the variance of s is not limited, thus this function would be suitable for the numerical examples presented in this paper. The structural parameters in the TV inversion are estimated through the EM approach with 100 realizations as described above.

Figure 1.

“True” pumping history used to generate noisy observations.

[34] Figures 2a and 2b show the estimated pumping history using the TV prior and a geostatistical prior and their confidence intervals. Both methods identify the general trend of the actual pumping rate successfully. However, the geostatistical method results in spurious oscillations at the small scale because its prior information about the structure, i.e., the linear variogram model, presupposes gradual changes, the true structure that is piecewise constant cannot be resolved in detail. On the other hand, the TV prior allows sharp changes while removing oscillations. Since groundwater exploitation in practice is usually planned through several extraction stages of constant pumping rates, the TV prior would be an appropriate choice to represent the temporal structure of the actual extraction rate in general pumping history identification problems.

Figure 2.

Estimated pumping history using (a) Bayesian total variation inversion method (solid line) and (b) geostatistical method (solid line); dash lines represent 95% confidence intervals of the estimate.

3.2. Synthetic Cross Well Tomography Benchmark Problem: Comparison With Existing Methods

[35] In this section, we consider a synthetic cross borehole seismic tomography benchmark problem presented in Cardiff and Kitanidis [2009]. The objective of this problem is to reconstruct the seismic velocity field image of “bones” (Figure 3) from integrated seismic travel-time measurements. A simple linear seismic acquisition model in which the ray path is straight is used. The true field consists of a background seismic velocity of 1900 m/s, with the small bone with a velocity of 1500 m/s and the large bone with a velocity of 1780 m/s. Within-facies parameter variability is simulated by adding Gaussian fluctuations with a magnitude of 10% of the maximum contrast between facies, which makes the facies detection challenging. The measurements are the seismic wave travel time along each horizontal, vertical, and diagonals of the true field. To simulate sampling error, Gaussian noise with a magnitude of 0.1% of travel time is added to the measurements. Two thousand five hundred unknown values, i.e., 50 × 50 pixel image, are estimated from 298 measurements. More details can be found in Cardiff and Kitanidis [2009] and the benchmark problem is available at http://code.google.com/p/blipinv/.

Figure 3.

“True” parameter field for a seismic tomography example.

[36] In order to evaluate the performance of the Bayesian total variation inversion, a quasi-linear geostatistical inversion approach [Kitanidis, 1995] and Bayesian level-set inversion protocol method (BLIP) [Cardiff and Kitanidis, 2009] are applied to the example. For the comparison, the performance of each method is measured in terms of pixel-wise root-mean-square error (RMSE) with respect to the true field (Figure 3). The facies classification error is also measured by thresholding the image obtained from the total variation and the geostatistical approach. The classification error provides how closely each method identify the true zonal structure while pixel-wise RMSE shows the overall performance of methods. We choose the threshold values of 1640 m/s and 1840 m/s, which are the mean values of two different facies velocity values, for identifying three facies. However, these threshold values can be obtained by plotting the histogram of the obtained solution from the total variation inversion method, which will be shown later.

[37] Figure 4 shows the best solution estimated by geostatistical approach. The geostatistical approach uses linear generalized covariance function and the variogram parameters are determined using cR/Q2 criteria [Kitanidis, 1997]. As discussed before, geostatistical methods are particularly suited for smoothly changing parameter fields; thus while it captures overall large-scale behavior of the underlying field and does a decent job in terms of mean-square-error estimation, relatively poor performance is observed for the zoned structure estimation. The performance of geostatistical approach is presented in Table 1.

Figure 4.

Reconstructed subsurface parameters using (a) the geostatistical approach and (b) its facies classification map using 1640 and 1840 threshold levels.

Table 1. Performance Comparison for the Seismic Tomography
Solution MethodPixel RMSE (m/s)aFacies Classification Error (%)b
  1. a

    Pixel-wise root mean square error with respect to the true field (Figure 3).

  2. b

    Thresholded for geostatistical and total variation approach.

Geostatistical inversion75.6922.00
BLIP67.925.64
Total variation inversion54.511.88

[38] Figure 5 is the optimal solution obtained by BLIP. Several BLIP parameters such as step size of level-set propagation, the number of level-set functions and initial guess are tuned to achieve satisfactory results. It is shown that BLIP identifies the boundary of facies relatively well. Around 3000 steps are required to achieve the final image. The detailed analysis of the performance on the level-set method is presented in Cardiff and Kitanidis [2009].

Figure 5.

Reconstructed subsurface parameters using BLIP; final solution obtained after about 3000 level set propagation steps.

[39] The MAP estimate obtained by the total variation approach is presented in Figure 6. It is observed that the result captures the shape of two bones relatively well. In particular, the low velocity zone (the small bone) can be discerned clearly from the obtained image. Although the threshold values are chosen from the true parameter field for the classification, one might determine 1640 and 1840 threshold levels directly from the histogram of the pixel values of the identified image (Figure 7).

Figure 6.

Reconstructed subsurface parameters using (a) total variation inversion approach and (b) its facies classification map using 1640 and 1840 threshold levels.

Figure 7.

Histogram of pixels of the identified image using total variation inversion (Figure 6a).

[40] Table 1 shows that the results obtained by three methods are similar in terms of the overall estimation (pixel-wise root mean square error) while the total variation inversion approach performs slightly better than the geostatistical method and BLIP. When it comes to facies detection, however, the total variation inversion provides a precise estimation of facies type with less than 2% classification error. The result shown here, though the problem is synthetic, suggests that the proposed method provides a significant improvement in sharp boundary structure delineation.

4. Hydraulic Tomography Problem: Nonlinear Inversion

[41] As a last example, we present a nonlinear hydraulic tomography problem adapted from Cardiff and Kitanidis [2009]. A synthetic confined aquifer consisting of three facies shown in Figure 8 is studied in this work and the system is governed by

display math(20)

where K is the hydraulic conductivity, h is the hydraulic head, qi is the 2-D pumping rate (pumping rate divided by the thickness of the aquifer) for a pumping test at a well location xi marked with a circle in Figure 8 and δ(x) is the Dirac delta function. A constant-head boundary condition on the left, right, and upper boundaries as well as no-flux condition on the bottom is imposed. The measurements are steady state head changes due to sequential pump tests at each well location. For each test, water is extracted from one of the 20 well locations at the rate of 2 L/s and the change in the head is measured at all other 19 locations resulting in 380 drawdown measurements (20 pump tests, each with 19 measurements per test). Gaussian noise was added for two cases, one with magnitude 2.5% and the other with magnitude 5% of the drawdown data, to investigate the effect of observation error. The domain is discretized into 100 by 100 resulting in 10,000 unknowns to be estimated. The governing equation in (20) is solved by a finite-volume method and the best estimate is computed iteratively through the optimization of the nonlinear posterior function. The sensitivity matrix at each iteration is evaluated using the adjoint state method [Sykes et al., 1985; Townley and Wilson, 1985].

Figure 8.

“True” log hydraulic conductivity field for a hydraulic tomography example consisting of three facies; Gravel math formula, Sand math formula, and Clay math formula. White circles are the pumping well locations.

[42] Figures 9a and 9b display the optimal solutions with 2.5% and 5% of the measurement errors, respectively. A constant (homogeneous) initial guess of math formula was used in this study and the solutions converged within 20 iterations. For both cases, the background gravel is well identified and the sand and clay layers are clearly visible while the estimated hydraulic conductivity of these layers are underestimated due to the weak sensitivity of measurements to the hydraulic conductivity field. Even in the case with larger measurement error, three separate facies are identified relatively well even though the facies contacts are less accurately imaged. The percentage of facies misclassifications is also computed as 9.16% and 16.92% for both cases when the gravel-sand and sand-clay boundaries are chosen as the mean values of true log-hydraulic conductivity of two different facies (i.e., threshold values of math formula and −6 m/s). The spatial gradients of the true field and the obtained solution are also plotted in Figure 10 showing that the obtained solution are sparse in the spatial first-order gradient domain and the locations of large coefficients correspond well to facies boundaries.

Figure 9.

Reconstructed log hydraulic conductivity field with measurement errors of magnitude of (a) 2.5% and (b) 5%.

Figure 10.

Spatial gradient of (a) true and (b) reconstructed log hydraulic conductivity field with measurement errors of magnitude of 2.5%.

[43] In Figure 11, the uncertainty of the estimated hydraulic conductivity field from 100 realizations is presented. A closer look at the standard deviation of the estimates indicates that the estimation uncertainty is high in the vicinity of the facies boundaries, which shows that the shape of the delineated structural facies is less accurately characterized with current data set. It is also observed that the estimation uncertainty is generally low near the pumping locations where we expect higher sensitivity of the measurements to the estimates. Accurate identification of discrete geologic structures and reasonable uncertainty quantification presented here show that the total variation inversion method is promising for the nonlinear hydraulic tomography problem where the prior information indicates that there is discrete geologic structure even when only a few noisy measurements are available.

Figure 11.

Estimation uncertainty of log hydraulic conductivity field with 2.5% measurement error. Black circles are the well locations.

5. Conclusion

[44] This paper studies a Bayesian inversion method using the total variation prior to identify the discrete geologic structure that may not be captured by the conventional geostatistical inversion approach. The construction of the prior is described in connection with different viewpoints such as Markov Random Field theory, the principle of Maximum Entropy and compressed sensing theory. Structure parameters that represent the measurement noise level and the importance of the prior information are determined through the expectation-maximization approach and uncertainty of the estimate is quantified through realizations from the marginal posterior distribution. The results from three synthetic examples show that the proposed TV inversion approach can characterize zonal structures consistent with data effectively and have potential for the application to real-world inverse problems such as high or low permeability layer structure identification. Nonlinear inversion problems with real noisy observations will be discussed in the subsequent paper (J. Lee et al., Total variation inversion for steady state hydraulic tomography in laboratory aquifers, 2014).

Appendix A:: Laplace Distribution and Random Number Generation

[45] The standard univariate Laplace distribution is given by

display math(A1)

where μ is a location parameter or the mean of s and θ is scale parameter. The expected value of math formula is θ and the variance of Laplace random variable is 2θ2. To generate random values from Laplace distribution in equation (A1), one can use one of the following two approaches. First, the difference between two random values drawn from i.i.d standard exponential distribution is a realization of Laplace distribution with θ = 1. Second, we can use the inverse cumulative distribution function: given a uniform random variable in the interval math formula, the random variable x that follows Laplace distribution with parameters μ and θ can be generated as

display math(A2)

where sgn represents the sign function.

Acknowledgments

[46] The authors thank three anonymous reviewers for their helpful and insightful comments and suggestions. The research was funded by NSF Award 0934596, “CMG Collaborative Research: Subsurface Imaging and Uncertainty Quantification” and the Charles H. Leavell Graduate Student Fellowship. We also received support from the National Science Foundation through its ReNUWIt Engineering Research Center (www.renuwit.org; NSF EEC-1028968). We also would like to thank Michael Cardiff and Xiaoyi Liu for their numerical codes used in this work.

Ancillary