Inverse modeling has been widely used in subsurface problems, where direct measurements of parameters are expensive and sometimes impossible. Subsurface media are inherently heterogeneous in complex ways, which implies that the number of unknowns is usually large. Furthermore, technologies such as hydraulic tomography and electric resistivity tomography allow the collection of more indirect measurements, and at the same time, there is an increased appreciation of the value of detailed characterization of the subsurface media in, for example, remediation projects. Hence, we need efficient inverse methods that can assimilate a large volume of measurements to estimate even larger numbers of parameters, i.e., large-scale inverse modeling. In this paper, we present a Bayesian method that employs a sparse formulation, and we applied this method to a laboratory hydraulic tomography problem, where we successfully estimated half a million unknowns that represent the hydraulic conductivity field of the sandbox at a fine scale. The inversion took about 2 h with a single core.
 Large-scale inverse modeling has been drawing attention in the groundwater area mainly because of two facts. The first one is that engineering practices, such as remediation or deep injection projects, demand ever more accurate prediction of variables such as hydraulic head and solute concentration in groundwater systems. The other one is that current advances in technology allow the collection of even more measurements. Improvements in computational power hold the promise of a more detailed characterization of the physical properties of subsurface media. As an important category in interpretation of indirect measurements, various inverse modeling methods have been developed in the groundwater area since the 1960s, and they have been reviewed and generalized by Yeh , McLaughlin and Townley , Zimmerman et al. , and Carrera et al. . Among all these methods, the Bayesian formulation followed by a quasi-linear solver proposed by Kitanidis and Vomvoris , Hoeksema and Kitanidis , Dagan , and Kitanidis  has been adopted in a number of applications. Snodgrass and Kitanidis  used it to identity contaminant sources, Michalak and Kitanidis  applied the method to estimate historical groundwater contaminant distributions, Li et al.  estimated transmissivity and storativity distributions with transient hydraulic measurements, and Cardiff et al.  employed the approach in steady state hydraulic tomography. In particular, Nowak and Cirpka  utilized this method in a high-resolution hydraulic conductivity and dispersivity estimation problem with an efficient multiplication algorithm (complexity ) for Toeplitz matrices using fast Fourier transform (FFT). However, solving the Toeplitz system is still relatively computationally intensive, with complexity [de Hoog, 1987; Ammar and Gragg, 1987], and this fact affects in each iteration the efficiency of the line search, which on the basis of our experience, is a very important step to secure the global convergence.
 In the Bayesian formulation, inference is made through the posterior distribution of the parameters (i.e., conditioned on the measurements). The posterior distribution is composed of two parts, the likelihood of the measurements and the prior distribution of the parameters. The first part controls the extent of fitness to the data, and the second part controls the smoothness of the solution in the parameter space. The second part, the prior distribution, restricts the class of solutions that is considered possible. In this paper we reformulate the objective function so that we restrict the class of solutions through local variation of the parameters, and we pick out the solution that has the least local variation. This method is also a member of the Tikhonov regularization [Tikhonov and Arsenin, 1977], and it has been referred to as the Occam's inversion and has been widely used in geophysics [Constable et al., 1987; Gouveia and Scales, 1997]. In our reformulation, a general framework for multidimensional aquifer characterization with consideration of structural anisotropy is proposed. In the end, the problem takes the form of a classical least squares (LS) problem with high sparsity that can be efficiently solved with a number of mature LS solvers, for example, with direct decomposition of the system or iterative methods such as LSQR [Paige and Saunders, 1982]. Compared with the classical setup where multiplication of the sensitivity of the observations with the covariance matrix raises the computational cost, this formulation is more efficient. We then apply this method to a laboratory-scale steady state hydraulic tomography problem, and the results are validated with various methods.
2. Bayesian Reformulation
 Using Bayes' rule, with normality assumptions, the posterior distribution of the parameter s has the following form:
where s contains the parameters to be estimated, contains the drift coefficients, Q is the prior covariance matrix of the parameters, is the measurement, X is a structure matrix, h is the forward model that maps the parameter space to the output space, and R is the covariance matrix of the measurement error.
 To eliminate the drift coefficients from the objective function, we follow the method proposed by Kitanidis  by integrating equation (1) with respect to , which yields
where , , and H is the sensitivity matrix evaluated at si.
 The approach above performs quite well in most applications. However, when the number of unknowns (m) is large, for example, under a fine discretization, the storage requirement of matrix Q (m × m) and the evaluation of matrix multiplication QHT (m × n) is formidable without parallelization. This is because Q is a dense matrix with elements that are almost all nonzero. Indeed, the storage issue can be solved by storing matrix QHT instead of Q itself, which does not need to be stored because it can be represented with a simple analytical formula, but this substitution does not decrease the computation burden. Nowak et al.  proposed an FFT-based method to expedite matrix multiplication QHT and successfully applied this method in a sandbox problem with a fine grid [Nowak and Cirpka, 2006]. In their method, the authors utilized the fact that on a regular and equally spaced grid, under a stationarity assumption, all the information regarding the covariance matrix Q is stored in its first row. Their method reduces the computational requirement to the scale of m log m. In this paper, we reformulate equation (2) such that it no longer involves the covariance matrix Q, and the computation requirement is significantly decreased.
 We note that in equation (2), the first part is a measure of smoothness of the estimation, and with the same functionality, this term can be replaced by a variational term [Kitanidis, 1999]:
where is the gradient of s in the parameter domain . The smoothness can be of any order; however, in this work, we limit our efforts to the first-order smoothness, i.e., flatness. After discretization, equation (5) can be written as
where d is the dimension of the domain and Li is a first-order differentiation matrix with respect to dimension i. Li is an extremely sparse matrix that has only two nonzero diagonals. For example, for a two-dimensional problem where s is numbered first in the x direction, then in the y direction, L1 and L2 for the x and y direction differentiation are in the forms of
respectively. Thus, we can rewrite the optimization problem as
where are coefficients that account for balance between data fitness and estimation smoothness and smoothness anisotropy of s in , and contains structural parameters in the covariance matrix of measurement noise.
 Assume h(s) is linear, i.e., h(s) = Hs, then (9) can be written as
where Rf is an upper triangular matrix and
then the optimization problem becomes a least squares problem, i.e.,
 The least squares problem in (15)–(16) can be solved in various ways. Direct methods involve QR decomposition or Cholesky factorization of matrix . Such methods are accurate but do not scale well with the size of the problem (m) in memory requirement or computation time. In large-scale problems (large m), iterative methods are preferred. In this paper we use the LSQR algorithm [Paige and Saunders, 1982] with diagonal preconditioning, which is preferred because of its minimal memory requirement (2(dm + n) + 3m) and workload (3(dm + n) + 5m multiplications per iteration). This method is based on the bidiagonalization [Golub and Kahan, 1965; Paige, 1974] of the original least squares system.
 To compensate for nonlinearity in h(s), the linearized system is solved successively until it converges. In essence, the LSQR algorithm updates the parameters along the Newton direction with a unit step size. Because of nonlinearity in the model and the approximation property of the LSQR algorithm, a line search is usually necessary and is inserted between two consecutive linearizations. The line search could take a lot of time but does not need to be exact. In fact, to guarantee convergence, it is sufficient [Nocedal and Wright, 1999; Gill and Murray, 1981] for the line search step size to satisfy the strong Wolfe's conditions as
where f is the objective function in equation (9) and g is its gradient vector, pk is the search direction, and and are two user-chosen coefficients that define a feasible region for . The first condition guarantees a sufficient decrease in the objective function, and the second condition is a curvature requirement at the new location. In our cases, we use rather loose values with and . Often, the Newton step automatically satisfies these conditions, and it is only occasionally that the step size needs to be modified in the line search. When the former case is true, three evaluations of the objective function are needed for verification. Otherwise, a few more searches and objective function evaluations are needed to find an appropriate step length.
 Besides the consideration of theoretical soundness, on the basis of our experience, the line search is an important step to guarantee convergence, and this fact has been verified by Zanini and Kitanidis  for parameter fields with high contrasts. In our numerical experiments, it is not uncommon to see that the Newton step found is not even descending, which gives rise to the problem of convergence in many cases that are not limited to those with high-contrast parameter fields. In a geostatistical setup, the use of FFT for fast circulant Toeplitz matrix-vector multiplication does significantly improve the efficiency of the search for the Newton direction; however, it does less for the line search step where a Toeplitz system needs to be solved for each objective function evaluation.
4. Estimation Uncertainty and Conditional Realizations
 Uncertainty estimation is important for further stochastic analysis of flow and transport in the field [Rubin, 1991; Ezzedine and Rubin, 1997]. Indeed, the statistical meaning of the objective function may not be evident. However, as Kitanidis  points out, the flattest solution corresponds to the geostatistical approach with an appropriate generalized covariance function that gives the same solution. For a one-dimensional problem, the corresponding generalized covariance function is the linear distance generalized covariance function; for a two-dimensional problem, which is the case in section 5, it is the logarithmic function; and for a three-dimensional problem, it is the inverse distance function. Thus, it makes sense to investigate the estimation uncertainty of this method.
 There are various ways to quantify uncertainty of the estimator; for example, one can approximate the uncertainty with the Fisher information matrix derived as the Cramer-Rao lower bound. From equation (9), we have the Fisher information matrix
where H is the sensitivity matrix of h(s) evaluated at the best estimate of s. The Cramer-Rao inequality tells us that
where V is the estimation covariance matrix. However, this method has a serious disadvantage: matrix is not sparse, and the storage and inversion of it are impractical for large-scale problems.
 Another commonly used strategy is the Monte Carlo method [Kitanidis, 1995; Gomez-Hernandez et al., 1997; Nævdal et al., 2005; Yeh et al., 1996] for generating conditional realizations of the parameters. An important advantage of the Monte Carlo method is that it avoids the computation of the posterior covariance matrix, which is an important consideration for large-scale problems. The Monte Carlo simulation starts from k unconditional realizations of the parameters suj (j = 1, …, k) and k realizations of measurement noise ej (j = 1, …, k); then, k conditional realizations of the parameters are acquired with inverse modeling. To be specific, the conditional realizations scj (j = 1, …, k) are acquired by solving
 Solving (21) is almost the same as solving the problem in equation (9) except that for this problem , where ri = Lisuj, and ri follows the standard normal distribution.
5. Application in Hydraulic Tomography
 We apply the method proposed in section 4 to a laboratory-scale hydraulic tomography problem. The experiments were conducted at the University of Iowa by Walter Illman and colleagues, and the same set of data have been used by Liu et al. , Illman et al. [2007, 2008], and Xiang et al. . Here we will briefly introduce the sandbox and the pumping tests conducted. Please refer to the previous studies for details on the experiments.
 As shown in Figure 1, the 2-D synthetic aquifer in the sandbox is 161 cm long and 81 cm high. Four kinds of commercially categorized sand were used to construct the sandbox, and their names are displayed in the picture. The blocks are composed of finer sand, and the background is composed of coarser sand. On the back of the sandbox (Figure 2), which is made of stainless steel, there are 48 holes so that horizontal cores samples can be extracted out and pressure sensors can be installed in. Permeameter tests conducted on core samples show that the hydraulic conductivity varies by roughly 1 order of magnitude. Eight pumping tests were conducted at the circled ports for hydraulic tomography survey, and hydraulic head responses were recorded at all the ports. However, the hydraulic head measurements at the pumping port will not be included in the inverse model. Another pumping test at the squared port will be used to validate the tomogram of hydraulic conductivity. Transient hydraulic head measurements are also available, and Liu et al.  and Xiang et al.  used them in their inverse models; however, in this application, we will only use the same steady state measurements as used in Illman et al. [2007, 2008]. Here we write the governing equation
on the left, right, and top boundaries and
on the bottom boundary, where qi is the 2-D pumping rate (pumping rate divided by the thickness of the sandbox) for pumping test i at location xi, is the Dirac delta function, hi is the constant boundary head value for pumping test i, and nz is a unit vector in the z direction pointing downward.
 We discretize the domain into a 1000 × 500 grid, totaling 500,000 blocks, hence 500,000 unknowns to estimate. The forward model is solved by a finite volume method, and the sensitivity matrix is evaluated through the adjoint state method, as shown in Appendix A. We assume the measurement errors are independent and identically distributed, the standard deviation of the error is 10% of the average steady state drawdown, and the smoothness anisotropy ratio (ratio between the values for the x and z directions) is 5, which is roughly the ratio between the average length and height of the fine-sand lenses in the sandbox. The code is in Matlab (version 2009b), and it runs on a Sun v40z server with AMD Opteron dual-core CPUs with 32 GB RAM and a CentOS Linux operating system. Without taking advantage of the parallel computing capability of the server, each iteration (for the case with real data) takes, on average, about 18 min, out of which 37% of the time is spent on calculating the sensitivity matrix, 54% on solving the least squares problem, and 5% on the line search. For the case with real data, we use 10−4 as the tolerance on relative improvement of the objective function, and the iteration converges in about 2 h after six iterations.
6. Results With Synthetic Data
 We first generate a set of noise-free synthetic data to test the performance of the algorithm under ideal conditions, and the results are shown in Figure 3, with a comparison to the true distribution of ln(K) shown in Figure 4. The actual mean square error (MSE) of ln(K) is 0.2990.
 We also present in Figure 5 the tomogram of ln(K) in the sandbox with the regular Bayesian formulation and a FFT-based matrix multiplication approach. In this case, a logarithmic generalized covariance function was used. We see that this method gives similar results. In fact, the MSE of ln(K) is 0.3378, which is slightly larger than, although very close to, that from Figure 3.
7. Results With Real Data
 In Figure 6 we first present a low-resolution (39 × 19) tomogram of the logarithm of hydraulic conductivity in the sandbox with a coarse grid. Although the two blocks near the bottom of the sandbox were not captured in Illman et al.  (Figure 7), we see here that all eight blocks with low hydraulic conductivities are successfully captured at the right locations. We then show in Figure 8 a high-resolution (1000 × 500) tomogram of the logarithm of hydraulic conductivity in the sandbox. We can tell that in this tomogram all the blocks are shown distinctly without connection, which represents the reality better than the cases with lower resolution.
 We show in Figure 9 the estimation uncertainty of ln(K) from 100 Monte Carlo runs. The black dots in Figure 9 are the measurement locations. Illman et al.  report that the uncertainty is generally lower at the measurement locations. However, this does not necessarily always have to be the case, as we show with an example in Appendix B.
 In Figure 10 we present a validation of the tomogram with a pumping test at port 46 that has not been used in the inverse model. The same validation strategy has been used by Liu et al.  and Illman et al. [2007, 2008, 2009], and it is considered to be better than the other validation methods because in many cases, prediction is the most important usage of the tomogram. Figure 10 shows that the hydraulic conductivity field acquired from the aforementioned inverse modeling method replicates the hydraulic head very well. We must emphasize here that although 8 out of 47 measurements from this pumping test are reciprocal to 8 other measurements used in the inverse model, the other 39 measurements are not and thus are more useful for validation purposes. What is more, the eight reciprocal measurements are not the “same” as those used in the inverse model because of the noise in the measurements. For clarity, we plot these measurements in Figure 10 in two different colors and shapes.
 We present the tomogram (1000 × 500) of the logarithm of hydraulic conductivity in the sandbox with the regular Bayesian formulation and a FFT-based matrix multiplication approach in Figure 11. We see that again this method gives very similar results, which confirms the equivalence between the two methods.
8. Conclusions and Discussion
 In this paper we reformulated the Bayesian posterior distribution of the parameters to be estimated with a sparse representation of the prior distribution. Compared with the one-dimensional formulation used by Gouveia and Scales , the formulation here is multidimensional, and it takes into account the structural anisotropy of the subsurface media, meaning that this formulation respects the fact that the parameter field could be smoother in one direction than in the others. Furthermore, this formulation leads to a highly sparse least squares problem instead of the linear equations system in Gouveia and Scales's  formulation. Also, in Gouveia and Scales's  formulation, the prior mean values of s are assumed known, while in this formulation this assumption is unnecessary.
 The new form for a linear model represented a standard least squares problem, and it could be solved with various existing LS solvers. For nonlinear models, the LS solver was applied iteratively until convergence. Line search was inserted between two consecutive iterations to ensure and speed up convergence. This method gave results similar to the regular Bayesian method with, however, less computational cost. The efficiency was attributed to the high sparsity in the differentiation matrices so that the LS problem could be solved with much less effort. Preconditioning of the LS system and inexact line search that satisfies the strong Wolfe's conditions also speeded up the convergence.
 We only used one processor for the application in this paper; however, this method can be easily fitted to a multiprocessor high-performance computing circumstance. For instance, to start, the evaluation of the sensitivity matrix can be easily parallelized. Parallel LSQR algorithms are also available to speed up the least squares solver. This topic will be covered elsewhere.
Appendix A:: Adjoint State Sensitivity Analysis
 When finite difference, finite volume, or finite element methods are applied to solve equation (22), we usually end up solving a linear equations system such as
where and can be obtained while matrices A and b are formed.
 For convenience, let us write h = A−1b and assume that ei is a zero vector with the ith row replaced by 1; then, we have
where such that
 We see that the most expensive part is computing . However, since A is usually symmetric, they can be solved together with h; that is, we solve
 To save on storage and to make the form more compact, are stored in one matrix for all j.
 When the measurement location is not at the center of the cell, which is almost always the case, the sensitivity matrix is premultiplied by an inverse distance interpolation matrix.
Appendix B:: Uncertainty Distribution of Cokriging
 We confirm in this section that with analytically derived autocovariance and cross-covariance matrices for hydraulic head measurements and hydraulic conductivity field, cokriging does not return estimates of hydraulic conductivity that has lower uncertainty at locations where there are head measurements. Most equations in this section are also given by Kitanidis .
 Suppose that Y = log (K) is an isotropic stationary two-dimensional field with an exponential covariance function
where is the variance and l is the integral scale. If we consider steady state flow in such a domain without sinks or sources, the mean of the hydraulic head can be written as
where x = (x1, x2)T. Dagan  found that for S1 = S, S2 = 0, the cross covariance between head and Y is
where and . The generalized autocovariance of can be represented by
 We can use these results to perform cokriging for estimating log-transformed hydraulic conductivities with the following linear estimator [Kitanidis, 1997]:
where coefficients and are determined by solving the following linear equations system that is derived from the unbiasedness constraints and the MSE criterion:
 The mean square error of the estimator can be written as
 With the same sandbox dimensions, we show the MSE of the estimator with , S = 1, and l = 20 in Figure B1. To generate this graph, we use 48 measurements at the 48 ports and one Y measurement at the lower left corner. This graph clearly shows that the estimation uncertainty is actually higher around the measurement points and lower between two of them. And our experiments show that only as l gets as small as 1 will the uncertainty around the measurement location decrease, but the uncertainty is still higher at the measurement location, as shown in Figure B2.
 This research was funded by the U.S. Department of Defense Strategic Environmental Research and Development Program (SERDP) Environmental Restoration Focus Area managed by Andrea Leeson under project ER-1611, “Practical Cost Optimization of Characterization and Remediation Decisions at DNAPL sites with Consideration of Prediction Uncertainty.” This work was also supported by NSF award 0934596, “CMG Collaborative Research: Subsurface Imaging and Uncertainty Quantification.” Additional funding was provided by the Stanford Center for Computational Earth and Environmental Science. The use of the data in this paper was with the permission of Walter A. Illman, who, with his colleagues, conducted the sandbox experiments at the University of Iowa. We also want to thank Michael Cardiff and Yuan Liu for their help in preparing the manuscript.