The computation of the sensitivity matrix is the most time-consuming part of any parameter estimation algorithm that requires sensitivity coefficients. An efficient wavelet approach to adjoint sensitivity analysis is proposed to reduce the computational cost of obtaining sensitivity coefficients. The method exploits a wavelet reduction of the data space to reduce the size of the linear system encountered in steady state adjoint equations. In this regard, wavelet transform is used as a data compression tool. Numerical examples applied to spatial data are used to verify and show the effectiveness of the method.
 Sensitivity analysis of large-scale systems governed by differential equations has continued to be of importance in groundwater modeling and parameter estimation [Carter et al., 1974; Sun and Yeh, 1985; Yeh, 1986; Carrera and Neuman, 1986; Yeh and Sun, 1990; Sun and Yeh, 1990a, 1990b]. Applications of such analysis cover a wide spectrum, including optimization, optimal control, model reparameterization, uncertainty analysis, and experimental design. However, the cost of computing sensitivity coefficients often poses a challenge. This cost may be central to determining the choice of optimization method to use in parameter estimation. When the computational overhead of sensitivity calculation becomes prohibitively high, methods such as the conjugate gradient and quasi-Newton, which avoid this computation, are often used in place of gradient-based methods such as Gauss-Newton and Levenberg-Marquardt.
Jacquard and Jain  presented a procedure for numerically computing the sensitivity coefficients for history matching and applied the procedure to estimate permeability in a two-dimensional reservoir from pressure data. Subsequently, Carter et al.  presented a derivation of the method to compute the sensitivity coefficients for two-dimensional single-phase flow problems. Chen et al.  and Chavent et al.  independently proposed the optimal control method to calculate the gradient of the objective function with respect to model parameters for single-phase flow. Wasserman et al.  extended the optimal control theory to automatic history matching in a multiphase reservoir but only computed the adjoint variables for the overall pressure equation and used an objective function based on only the pressure mismatch term. Later, Carrera and Neuman  and Sun and Yeh [1990a] used the optimal control theory to solve the parameter identification problem for groundwater flow. A detailed review of the parameter identification procedures in groundwater hydrology was done by Yeh . Wu et al.  later derived the adjoint equations for multiphase flow in a hydrocarbon reservoir, but the computational cost is still very high when the number of data is large.
 Efforts have been vested in finding cheaper methods of computing the sensitivity matrix without compromising the accuracy of the solution. One method, the forward sensitivity analysis (also known as the gradient simulator method), is very efficient when the model space is small [Yeh, 1986; Tang et al., 1989; Landa, 1997]. This is because this method requires the solution of a linear system with multiple right-hand-side vectors. The number of right-hand-side vectors is exactly equal to the number of model parameters. Moreover, in the forward sensitivity method, the sensitivities of all grid block variables are computed. This is inefficient because only the sensitivities of variables at measurement locations are required. For high-dimensional problems, this method becomes prohibitively expensive. Model space reduction via reparameterization [Oldenburg et al., 1993; Reynolds et al., 1996; Lu and Horne, 2000; Sahni and Horne, 2006; Sarma et al., 2008] is often coupled with the forward sensitivity method to stabilize the algorithm and speed up the computation of sensitivity coefficients.
 Another method, the adjoint approach [Shah et al., 1978; Anterion et al., 1989; Plessix, 2006; Michalak and Kitanidis, 2004; Li and Petzold, 2004], is commonly used to compute the sensitivity coefficients and the gradient of the objective function. The adjoint method of sensitivity computation is particularly useful when the number of data is relatively small. This method is also based on solving a linear system with multiple right-hand-side vectors. However, the number of right-hand-side vectors in this case is equal to the number of data for which sensitivities are to be calculated. The number is therefore independent of the number of parameters. This approach is preferred to the forward sensitivity method when the number of data to match is significantly smaller than the number of parameters. However, there are several instances in which the data space and the model space are both of high dimensions. In such instances, the cost of computing sensitivity coefficients can be very large.
 In this paper, we propose the application of a linear transformation of the data space to reduce the associated cost of adjoint state sensitivity computation. First, we review wavelet analysis, wavelet reparameterization of the data space, inverse modeling, and the conventional adjoint approach to calculating sensitivities in steady state linear systems. We subsequently derive a wavelet approach to adjoint sensitivity computation. The approach uses the data compression capability of the wavelet transform to reduce the size of the adjoint equations for sensitivity computations. Finally, we verify the approach using numerical examples applied to spatially sampled hydraulic head data. The examples involve finding the maximum likelihood estimates of reservoir parameters by matching a reduced set of wavelets of the measured data.
2. Inverse Problems
 Estimating flow parameters from measured data involves minimizing an error norm. Consequently, gradient-based parameter estimation involves successively determining the minimum of a series of linearized problems. Depending on the optimization algorithm, successive linearizations may require the computation of the sensitivity matrix. Algorithms, such as Gauss-Newton and Levenberg-Marquardt, that require sensitivity coefficients usually have faster convergence than algorithms that do not. However, the computation of the sensitivities can be very costly, making the Gauss-Newton and Levenberg-Marquardt algorithms unrealistic for large-dimension problems.
2.1. Parameter and Model Spaces
 The parameter space is the space of all physical parameters that completely characterize a physical system. In this work, we assumed that the reservoir is completely characterized by its conductivity distribution. The model space is the space of all parameters used to represent the system in the solution of the inverse problem. The model parameters may be the same as the system parameters, as is the case with a pixel-based approach or a transformation of the system parameters into a different domain [Oldenburg et al., 1993; Reynolds et al., 1996; Lu and Horne, 2000; Sahni and Horne, 2006; Sarma et al., 2008]. In the case where a transformation is made, the number of model parameters is often smaller than the number of actual system parameters. The choice of model parameters for the system is called parameterization of the system. We denote the unknown system parameters as . The system parameters here are the natural logarithms of the reservoir conductivities.
2.2. Data Space
 The data space is the space comprising all physical responses or measurements obtained from a system or a transformation of such measurements. The data space is divided into two categories: the measurement space and the observation space. The measurement space is the space of all measurements made on the physical system. The measurement space is denoted as and it is the space of all conceivable responses d from the wells and reservoir. The observation space, denoted as , is the space of the actual data matched during inverse problem solution. In traditional inverse modeling, the observation space is the same as the measurement space. Recently, Awotunde and Horne  introduced the concept of observation space, which is a subset of the measurement space. This observation space is obtained by linearly transforming measured data into a wavelet domain. Thus, in this article, .
2.3. Transformation of Data Space and Dimensionality Reduction
 Measured data are transformed into a frequency domain that contains only relevant information necessary to solve the inverse problem. The procedure involves decomposing the time series data into wavelet coefficients and selecting only relevant coefficients for use in parameter estimation. Several previous studies [Jansen and Kelkar, 1997; Kikani and He, 1998; Panda et al., 2000; Athichanagorn et al., 2002] have dealt with the decomposition of the data, but their primary focus was either to denoise the data or to compactly store (compress) the data. In such cases, the coefficients are often back transformed to the real-time domain before use. In this work, we fitted important wavelet coefficients of the modeled response to corresponding wavelet coefficients of the measured data. Thus, we work practically in the wavelet domain until a good match is obtained. Because the data space used in the solution of the inverse problem is a subset of the wavelet transform of the time series data space, the dimension of the observation space is smaller than the dimension of the original measurement space. The magnitudes of the wavelet coefficients of measured data are used to determine the coefficients to keep for history matching. That is, we transform the measured data into wavelet coefficients and select the coefficients with the largest magnitudes to form the observation space.
2.4. Posterior Probability Distribution
 If we assume that the distribution of data and parameters are Gaussian, then the posterior probability distribution of parameters given the data may be expressed as
where is the objective function defined as
 In the absence of any prior information about the model parameters, the second term on the right-hand side of equation (2) may be neglected. Further assumption that measurement errors are independent and have unit variance leads to
 Minimizing equation (3) gives the maximum likelihood estimate of . The minimization procedure may then be stated as
 A wavelet can be described as a real-valued function that satisfies the conditions
Equations (5) and (6) imply that has a zero mean, is nonzero somewhere, and has unit energy. A family of space-frequency atoms is obtained by scaling and translating it by , such that
 Thus, the wavelet transform of any spatial series at location b and scale a can be written as
 Small values of the scale parameter a represent high-frequency components of the signal, while large values of a represent low-frequency components of the signal. The transform defined in equation (8) is redundant because it maps a function of one variable into a function of two variables. To minimize the transform, discrete values of a and b are selected using a critical sampling defined by
 This will produce a minimal but complete orthogonal basis. Thus, we may define an orthonormal wavelet as a function with the property that
defines an orthonormal basis of
 The requirement that the set forms an orthonormal basis of means that any function can be decomposed as
 Here are the wavelet coefficients of d, and equation (12) defines an inner product. The parameters m and n make it possible to analyze a signal's behavior at a dense set of spatial locations and with respect to a large range of scales, thereby providing the possibility to zoom in on the transient behavior of the signal. Because wavelets are functions with compact support in both space and frequency domain, the decomposition in equation (11) is local, and only the terms corresponding to with n2−m near x make a large contribution at x.
3.1. Discrete Wavelet Transforms
 The discrete wavelet transform (DWT) is applied to discrete data sets and is more commonly used than the continuous transform. The DWT can be regarded as dyadic slices through the continuous wavelet transform. It is thus possible to take subsamples of the wavelet transform. The formulation of the DWT as an orthonormal transform simplifies many applications and facilitates statistical analysis of data. A discrete representation of the wavelet transform in equation (12) can be written in the form of a linear transformation using an orthogonal wavelet matrix W:
W maps the input data from the space domain to the wavelet domain. For a complete set of coefficients, W is an L×L matrix applied to an input data of size L. However, when a subset of the wavelet transform is required, W is of dimension Ncoeff×L, with Ncoeff being the size of c. Whenever W is an orthogonal matrix, as is the case in this study, the corresponding transformation is a rotation in in which the n-tuple data are points in and the coordinates of the point in the rotated space comprise the discrete wavelet transform of the original coordinates. Although the implementation of equation (13) is conceptually straightforward, it is of limited practical use because storing and manipulating transformation matrices of large input data may not be feasible. As a result, a recursive form of filtering [Mallat, 1989, 1999; Percival and Walden, 2000] is usually employed in the computation of discrete wavelet coefficients. Thus, classical implementation of equation (13) uses two filter banks with recursion on its low pass.
3.2. Preservation of Euclidean Norm
 It has been shown [Awotunde and Horne, 2008] that for orthonormal wavelet transforms, the Euclidean distance in the space domain is exactly the same as the Euclidean distance in the frequency domain. That is,
 This implies that the minimization procedure implemented in the space domain is exactly the same as that implemented in the frequency domain. In simple terms, minimizing the error norm in the wavelet domain will follow exactly the same regression path and yield the same model parameters as minimizing the error norm in the space domain. This is supported by the numerical results obtained in this work. Thus, the minimization of wavelets of data may be written as
Equation (15) will not provide any advantage over equation (4) because the regression will follow exactly the same path in both cases. However, we do not implement the algorithm as shown in equation (15). Rather, we minimize the sum of squares of errors between corresponding subsets of wavelet coefficients obtained by transforming the measured data and the calculated data. The objective is to transform the data into a wavelet domain and subsequently reduce the dimension of the transformed data. This reduction has two advantages: elimination of redundancy present in the full data set and reduction in the time needed to compute sensitivity coefficients.
4. Parameter Estimation Approaches
 There are two different approaches that were used here in the parameter estimation. The difference in these approaches lies in the choice of parameterization of the observation space. In this section, we consider the pixel-based approach and the wp − k approach.
4.1. Standard Nonlinear Regression: p – k Approach
 In pixel modeling, the number of parameters needed to describe the reservoir fully is equal to the number of grid blocks, with each parameter assigned to a unique grid block. Also, all measured data are matched to estimate the distribution of reservoir parameter. Thus, the objective function is the same as given in equation (3). Calculated data are fitted to measured data by varying actual reservoir parameters. One advantage of this approach is that reservoir description is done at the finest scale of heterogeneity. Unfortunately, most reservoir responses, especially in the form of production data measured in the wells, are responses to large-scale reservoir heterogeneities. Thus, characterizing a large reservoir at the finest scale may not provide a meaningful solution to the inverse problem if the data do not have enough resolution to adequately resolve a large number of model parameters. Another problem is the redundancy in information carried by the measured data. A third problem with this approach is the huge computational cost associated with the computation of the sensitivity coefficients. This can be overcome by using methods such as the quasi-Newton or conjugate gradient methods, which do not require sensitivity coefficients.
4.2. Wavelet Approach: wp – k Approach
 A wavelet reparameterization of the data space (the wp − k approach) was recently presented by Awotunde and Horne . In this approach, the measured data are transformed into a wavelet domain and thresholded to yield a reduced data space that contains only the most relevant coefficients. The coefficients retained form the observation space, i.e., the space of the coefficients that are matched to obtain an estimate of reservoir parameters. Consequently, the objective function for maximum likelihood estimation is given as
 Here dmeas and dcal are the measured and calculated data, respectively, and they are of dimension L×1. W is of dimension Ncoeff×L, while and are of dimension Ncoeff×1. The gradient of with respect to is
and the Hessian matrix for the Levenberg-Marquardt algorithm is computed as
 Here is the Levenberg-Marquardt parameter and is the wavelet sensitivity matrix, defined as
with an update to at the iteration given by
where is the search direction calculated from
and is the step length.
 The algorithm described here is the Levenberg-Marquardt algorithm, which is very efficient because it converges in fewer iterations than many other gradient-based methods. However, it requires the computation of the sensitivity matrix. The sensitivity matrix, although very informative, is expensive to compute when the inverse problem is large and complex. In equation (21), the full sensitivity matrix is first computed and subsequently multiplied by the wavelet matrix W to obtain the reduced wavelet sensitivity matrix. This implementation, as performed by Awotunde and Horne , is inefficient and makes the approach as expensive as the conventional p − k approach. In order to reduce the computational overhead associated with obtaining the wavelet sensitivity matrix, we devised a means to compute the reduced wavelet sensitivity matrix directly from the adjoint equations.
5. Adjoint Method
 In many instances, the number of observations is significantly lower than the number of model parameters. In such situations, it is apt to use an adjoint state method to compute the sensitivity coefficients. Furthermore, when only the gradient of the objective function is required, the adjoint method is the most computationally effective method to compute this gradient. The adjoint method is a special case of linear duality, and we begin this section by reviewing the principle of duality.
 In linear duality, substitution of alternative variables allows us to speed up the computation of the gradient of a function. Suppose that we would like to compute the product
in terms of the unknown matrix B. We assume that the vector g and matrices A and C are known. The direct approach will be to first compute B from equation (24) and then compute . This approach turns out to be very expensive when the number of columns in C is large. An alternative is to introduce a vector and compute
 In equation (25), we have to solve the linear equation for a single vector and then compute . In fact, is exactly equal to . Hence, we are able to achieve the same objective of calculating by solving a linear system with one right-hand-side vector instead of solving a linear system with multiple right-hand-side vectors. This is the principle on which the adjoint method is based, and equation (25) is known as the adjoint equation.
5.2. Adjoint Method for Steady State Differential Equation
 Consider the steady state two-dimensional Darcy flow in a horizontal porous medium:
with boundary conditions
for a no-flow boundary and
for a constant-head boundary. In equations (27) and (28), r represents x or y, is the hydraulic head (L), K is the hydraulic conductivity (L T−1), is the discharge rate (T−1), is the flow region, and is the reservoir boundary. Because K(x, y) is not a function of , the differential equation in equation (26) is linear, and the discretized form of its solution may be represented by
where A is of dimension M×M and and b are of dimension M×1. M is the number of grid blocks. Now consider that the model is parameterized by with
 In equation (33), el is a column vector composed of zeros everywhere except at row l, where there is a 1. Row l in el represents the index of the grid where a measurement of hydraulic head is made. Differentiating with respect to the mth parameter leads to
 Factorizing the common terms and making use of equation (32) leads to
 Now consider that there are L locations where measurements are taken. There will be L such values that need to be computed. Since all the values have the same matrix AT, they may be grouped into a single right-hand-side matrix to speed up the computations. Equation (37) then may be written as
where is a matrix whose columns are composed of all the el values. Equation (38) may then be written as
 While is the vector of calculated hydraulic head values in all the M grid blocks, is the vector of calculated hydraulic head values at measurement locations and is the derivative of calculated head values at measurement locations with respect to . Thus, is of dimension L × 1, and it represents the column m in the sensitivity matrix of dimension L×M. Therefore, we are able to compute the sensitivity coefficients for locations where observations are made instead of the full sensitivity matrix for all grid block head values. Essentially, this implies that the computational effort needed to obtain the sensitivity matrix using an adjoint method depends on the number of measured data.
5.3. Application of Wavelet to Linear Adjoint Model
 Solving the linear system in equation (39) is the most computationally intensive part of adjoint sensitivity computation. The goal in this section is to reduce the size of the linear system in equation (39) without any significant loss of accuracy. Now consider that we intend to match some wavelet coefficients of instead of . These wavelet coefficients are computed by
where W is an orthonormal wavelet matrix of dimension Ncoeff×L and Ncoeff is the number of coefficients retained. If we postmultiply equation (39) by WT so that
 Premultiplying both sides of equation (47) by W and recognizing that WWT = I lead to
 Thus, equations (43) and (48) are the adjoint equations for calculating the sensitivities of to the model parameter . The efficiency of this method lies in the ability to compute a reduced wavelet sensitivity matrix using equation (48). The right-hand-side matrix in equation (43) has fewer columns than the right-hand-side matrix in equation (39), thus leading to significant savings in time and storage. Having a reduced number of right-hand-side vectors brings savings in the computational time required to solve the linear system. A less efficient approach is to compute the full sensitivity matrix using equation (40), transform the sensitivity matrix into a wavelet domain, and subsequently reduce the transformed matrix by thresholding.
 For completeness, we derive the adjoint method for computing the gradient of the objective function discussed in section 4.2 The line search method introduced in section 4.2 requires the gradient in order to determine a good choice for the step length . At each iteration of the Levenberg-Marquardt algorithm, a step length is determined after the search direction is found. Although the gradient can be computed from the sensitivity matrix, as shown in equation (19), this procedure is very inefficient within the inner iterations used to determine the step length. Within these inner iterations, the Hessian matrix is not required. Computing the sensitivity matrix when the Hessian matrix is not required amounts to a gross waste of resources. Therefore, the adjoint method is the most efficient method to compute the gradient under such circumstances. Because the observation space is made of wavelets, there is a need to modify the conventional adjoint equations to compute the gradient. We differentiate equation (16) to obtain
Equations (53) and (54) are the adjoint equations for computing the gradient of the objective function in equation (16). The linear system in equation (53) has only one right-hand-side vector and is therefore very cheap to solve.
6. Practical Considerations
 The main idea of this approach is to reduce the cost of computing sensitivities by compressing the measured data into a small number of coefficients. Thus, it is necessary to use a transform that gives a high compression ratio without compromising the overall speed and memory requirements of the inverse problem. In this regard, we used the Haar wavelet transform to decompose the spatial data. The Haar transform was chosen because of its simplicity, low memory requirements, and high speed of computation. The orthogonality of the Haar wavelet transform and the absence of edge effects make the transform a good candidate for directly computing a reduced adjoint sensitivity matrix (equation (48) requires that the wavelet matrix be strictly orthogonal).
 Discrete wavelet transforms require that the data has a length of 2J, where J is a positive integer. Thus, a practical consideration is how to treat data that have lengths that are not a power of 2. The usual way to deal with this is to pad the data with some constant values, usually zeros. The same set of values used for padding the measured data must be used for padding the calculated data. Because the inverse solution uses the l2 norm of the difference between the measured and calculated data, padding both data with the same values has no effect on the result of the inverse solution.
 The Haar wavelet transform requires that the data be uniformly sampled in order to prevent changes in sampling frequency showing up as sharp discontinuities in the detail coefficients. However, all the measured data used in this work, as is the case in most practical applications, are sampled at irregular intervals. There are many ways to deal with the wavelet decomposition of irregularly sampled data, the simplest being to treat the data as if they were uniformly spaced [Sardy et al., 1999]. That is the approach used in this work. The approach does not decrease the overall variance of the coefficients. However, it may affect the choice of wavelets retained because the thresholding is based on the magnitudes of the wavelets of measured data.
 Another way to treat irregularly spaced data is to fill the spaces between the data with some constant values. This should be applied to both the measured and calculated data sets, and the same infill values should be used for both data sets. While this method does not increase the overall variance of the error, it makes data compression difficult or totally impossible. Take, for example, the case of 64 irregularly sampled data points on a 128×64 grid system. If all the points between the original 64 data points are filled with zero (or any other value) and the 8192 data points are decomposed into wavelets, then selecting 64 or fewer wavelets from the 8192 coefficients may lead to significant loss of information contained in the data set. In this case, it is better to decompose the original 64 data points than to augment the data to a large number.
 Finally, it is important to note that the number of wavelets selected is crucial to obtaining convergence in a reasonable time limit. While too few wavelet coefficients may prevent the model from converging, too many coefficients will increase the time required to compute the sensitivity matrix. There is no general rule to determine the number of wavelets to retain for any particular problem. However, smooth data often require less wavelet coefficients than irregular data. What is important in all thresholding is that all approximation coefficients must be retained, and the detail coefficients with bigger magnitudes should be preferred to those with smaller magnitudes.
7. Sample Applications of Wavelet Transform to Linear Adjoint Equations
 We present three examples to illustrate the usefulness of the approach. All examples used data simulated from the steady state flow of water in reservoirs with Dirichlet boundary conditions. The first example used a one-dimensional reservoir and a one-dimensional Haar wavelet decomposition of the measured data. In the other examples, two-dimensional rectangular reservoirs were used,and the measured data were decomposed using a two-dimensional Haar wavelet transform.
 While the measured data in the first example are fairly smooth, the measured data in the second and third examples are discontinuous, making it difficult to obtain high compression ratios. Thus, the measured data in the second and third examples were arranged in ascending order before decomposition in order to improve their compression ratios. We note that the superscripts on wp are used to denote the number of wavelets retained in the wp − k approach.
 The third example was used to study the performance of the wp − k approach under two different scenarios: (1) the presence of white noise in the data and (2) the use of constant values to augment the data. The augmentation of the data was done to fill the spaces between the irregularly sampled data. In this case, reordering the data was not necessary.
7.1. Example 1: One-Dimensional Reservoir
 In this example, we considered the steady state one-dimensional Darcy flow in a porous medium given by
with boundary conditions
and assumed that the following variables were known:
 The reservoir was divided into 100 grid blocks, and the values of hydraulic head measured at the 64 locations were given. Using the known data, we intend to estimate the distribution of hydraulic conductivity K(m/s) in the reservoir. In this example, is the measured data, and and are the wavelets of the measured and calculated data, respectively.
 First, we verify the approach by comparing the reduced wavelet sensitivity matrix computed directly using equation (48) and that obtained by first calculating the full sensitivity matrix using equation (40), taking a wavelet transformation of the full sensitivity matrix and subsequently reducing the transformed matrix. Figure 1 shows the sensitivities (in absolute values) of the largest 32 wavelet coefficients of measured data to model parameters ln K. The reduced wavelet sensitivity matrix computed directly using equations (43) and (48) is shown in Figure 1a, while that computed from the full sensitivity matrix is shown in Figure 1b. The two matrices are equivalent.
 The p − k and wp − k approaches were used to estimate the distribution of reservoir conductivity. The p − k approach uses equations (39) and (40) to compute the full sensitivity matrix required in the solution of the inverse problem presented in equation (4). In this case, all 64 measured data were matched. However, in the wp − k approach the measured data were transformed into wavelets, and some large (in absolute terms) coefficients were selected and matched to obtain a description of reservoir conductivity. Two cases were investigated. In the first (wp32 − k), the 32 largest wavelet coefficients of data were matched. In the second case (wp16 − k), only 16 coefficients were matched. Results from all three cases are presented in Figures 2–5. The fact that the wp − k approach used fewer coefficients as observation data implies that the number of columns in is less than the number of columns in . Because the adjoint linear system is solved at all iterations, the reduction in computational time can be significant for large-scale problems.
 In Figure 2, we observe that p − k and wp32 − k gave very good matches to the measured data. Because the p − k and wp32 − k approaches gave similar matches to the measured data, only one plot is displayed (in Figure 2) to represent both of them. Figure 3 shows a slight deterioration of the match to measured data (between x = 0.3 and x = 0.4) when the number of wavelets selected was reduced to 16. Apparently, this suggests that there is a limit to which the transformed domain can be compressed.
 In Figure 4, we compare the conductivity estimated by p − k, wp32 − k, and wp16 − k to the true conductivity field. Ko is the starting guess for the algorithm, and Kt is the true conductivity field. We notice that both p − k and wp32 − k exhibited similar performance in their ability to predict the conductivity distribution in the reservoir, and they gave a fairly good match to the true distribution of reservoir conductivity. We also observe that the conductivity predicted by wp16 − k is less accurate. This example suggests that with a good choice of thresholding, we can replace the p − k approach with the less expensive wp − k approach. The level of compression of the observation space depends on the degree of correlation in the measured data. As the thresholded observation (data) space becomes smaller, important information in the data is lost and the ability to estimate the reservoir parameters diminishes.
 The decays of residual norm of data exhibited by the approaches are presented in Figure 5. We observe that p − k and wp32 − k exhibited similar convergence performance, while wp16 − k exhibited poor convergence. The average time taken per iteration by each approach is presented in Table 1. For this small model, the reduction in time achieved by reducing the data space is not significant.
Table 1. Model Performance Data: 1-D Reservoir Model
p − k
wp32 − k
wp16 − k
Number of observed data
Time per iterationg (s)
7.2. Example 2: 16×16 Grid System
 This example consists of a two-dimensional reservoir with constant head boundaries in the x and y directions. The reservoir is a hot rock used as a hot-water source. There are five producers, three injectors, and eight observation wells (Figure 6) in the reservoir. Used water is returned to the reservoir through the injection wells. We discretized the reservoir into a 16×16 grid system and assumed that the following variables are known:
 The well data for this example are given in Table 2, and the steady state distribution of head values is given in Figure 7. Measurements were taken in all 16 wells. The problem is to estimate the distribution of reservoir conductivity K(m/d) by matching the head values measured in the wells (p − k approach) or matching some wavelets derived from the transformation of the measured data (wp − k approach).
Table 2. Well Data: 16×16 Reservoir Model
 The 16 measured head values were arranged in ascending order, shaped into a 4×4 square matrix, and then decomposed using a two-dimensional wavelet transform. The wp − k approach used 10 wavelet coefficients of the measured data for the history match. Figure 8 shows that the p − k and wp − k approaches gave good matches to the measured data. In Figure 9, estimates of ln K distribution obtained from the two approaches are displayed along with the initial guess and the true log conductivity distribution. The estimates of ln K distribution obtained are not close to the true distribution shown in Figure 9b because the information content of the measured data is not enough to resolve all 256 reservoir parameters. In Figure 10, we observe that the wp − k approach exhibited a poor convergence for this example. This can be attributed to the poor compression factor of the measured data. The data have a poor compression factor because the data-to-data correlation is low (Figure 8a). Table 3 shows that a minimal reduction in time is achieved when the number of observation data is reduced from 16 to 10.
Table 3. Model Performance Data: 16×16 Reservoir Model
p − k
wp10 − k
Number of observed data
Time per iteration (s)
7.3. Example 3: 128×64 Grid System
 The reservoir is a hot-water source discretized into a 128×64 grid system, with each grid having a length of 100 m and a width of 100 m. There are 24 hot-water producers, 16 cold-water injectors, and 24 observation wells (Figure 11). The head values at the reservoir boundaries are fixed at 500 m. Measurements were taken from all 64 wells, and the p − k and wp − k approaches were used in estimating the distribution of reservoir conductivity K(m/d). The well locations are shown in Figure 11, and the steady state distribution of head values is given in Figure 12.
 The 64 measured head values were arranged in ascending order, shaped into an 8×8 square matrix, and then decomposed using a two-dimensional Haar wavelet transform. The wp − k approach used 54 wavelet coefficients (wp54 − k) and 32 wavelet coefficients (wp32 − k) of the measured data for history match. Deterioration in the quality of the result obtained is observed when the number of wavelets retained was reduced from 64 to 54 and from 54 to 32 (Figure 13). This indicates that almost all 64 wavelet coefficients carry vital information about the model. Thus, eliminating some wavelets led to losing significant information contained in the data. In Figure 14, estimates of ln K distributions obtained from the two approaches are displayed along with the initial guess. We observe similar estimates from the p − k and wp54 − k approaches (Figures 14b and 14c). The estimate from wp32 − k is worse than the other two (Figures 14d). In general, none of the estimates is close to the true ln K distribution shown in Figure 11. This indicates that the measured data do not contain enough information about the reservoir model.
 In Figure 15, we observe that the wp − k exhibited a poor convergence for this example. However, Table 4 shows that with about 15% reduction in the observation space (64 to 54 data points), the time per iteration was reduced by about 37% (from 609 to 383 s). This significant reduction in time is attributable to the size of the inverse problem. Thus, the reduction in time achieved by using the wavelet approach to adjoint sensitivity computation becomes significant when the size of the inverse problem is large.
Table 4. Model Performance Data: 128×64 Reservoir Model
p − k
wp54 − k
wp32 − k
Number of observed data
Time per iteration (s)
 By adding some white noise to the data, we studied the effect of noise on the wp − k approach. The noise caused some deterioration in the results obtained from the wp − k approach (Figures 16 and17). We also studied the effect of augmenting the data to create a regularly spaced sample. For this purpose, we input 500 m as the data value in all locations at which no measurement was made. In this way, all 8192 grid locations (128×64) have data values assigned to them. The same value (500 m) was used to augment the calculated data. A two-dimensional wavelet transform was then applied to both the measured data and the calculated data. In this case, 64 wavelet coefficients were selected out of a total of 8192. Results obtained are presented in Figures 18 and 19. We observe a poor match to the measured data (Figure 18) and a poor estimate of the log reservoir conductivity (Figure 19). This is because by augmenting the measured data (from 64 to 8192), we unnecessarily distribute the information content of the data into a large number of wavelet coefficients. Retaining only the 64 largest (in absolute) wavelet coefficients (the original data size) means discarding significant amount of information contained in the data set. To obtain more information from the data, a larger number of wavelets must be retained. Retaining a number of wavelets larger than the number of the original measured data (in this case 64) runs contrary to the objective of this work.
 In this paper, a wavelet approach to the adjoint method of computing sensitivities was presented for steady state differential equations. The new approach exploits the ability to match a reduced data space to reduce the size of the adjoint linear system used in the computation of sensitivity coefficients.
 The success of the method depends heavily on the ability to concentrate the information content of the original data into a significantly smaller number of wavelet coefficients. Numerical examples were presented to validate and illustrate the strength of the approach. Observations from the results show that smoothly varying data give larger compression ratios than irregular noncorrelated data and that large compression ratios result in significant reduction in time. Also, the effectiveness of the approach is noticed when the size of the inverse problem is large.
 Overall, the approach presented shows that we can reduce the cost of computing sensitivities if the data space can be reparameterized with a small number of wavelet coefficients.
Appendix A:: Relationship Between Regression in Real Space and Regression in Wavelet Space
 Certain properties of signals are preserved when they are transformed from the real space to the orthogonal wavelet domain. In this appendix, we examined the relationship between regression in the real space and regression in the wavelet domain.
Awotunde and Horne  showed in Lemma 1 of their work that the norm is preserved when a one-dimensional Haar wavelet transformation is performed. This proof, however, does not independently suggest that the regression in the wavelet domain is equivalent to regression in the real space. In this appendix, we give the proof of the equivalence of regression in the two domains.
 Given that all observation space wavelets are used in a minimization involving the wp − k approach, regression using wavelets as observed data will follow exactly the same path as regression using the actual physical measurements as observed data.
 The Newton direction for the wp − k approach is given by
 Substituting the definitions of and into equation (A1) and simplifying gives
 Some matrix manipulations yield
with further simplification leading to
and is the search direction in the p − k approach. Equation (A4) is the linear equation for obtaining the search direction in the p − k approach. This proof shows that if we use all wavelet coefficients of data as the observation data in the wp − k approach, we shall obtain exactly the same results that we obtain by using the p − k approach.
 We are grateful for the financial support provided by the Stanford School of Earth Sciences, including the Hal Dean, the William Whiteford, and Aramco Research Fellowships. We acknowledge other funding provided by SUPRI-D.