Water Resources Research

A wavelet approach to adjoint state sensitivity computation for steady state differential equations

Authors


Abstract

[1] The computation of the sensitivity matrix is the most time-consuming part of any parameter estimation algorithm that requires sensitivity coefficients. An efficient wavelet approach to adjoint sensitivity analysis is proposed to reduce the computational cost of obtaining sensitivity coefficients. The method exploits a wavelet reduction of the data space to reduce the size of the linear system encountered in steady state adjoint equations. In this regard, wavelet transform is used as a data compression tool. Numerical examples applied to spatial data are used to verify and show the effectiveness of the method.

1. Introduction

[2] Sensitivity analysis of large-scale systems governed by differential equations has continued to be of importance in groundwater modeling and parameter estimation [Carter et al., 1974; Sun and Yeh, 1985; Yeh, 1986; Carrera and Neuman, 1986; Yeh and Sun, 1990; Sun and Yeh, 1990a, 1990b]. Applications of such analysis cover a wide spectrum, including optimization, optimal control, model reparameterization, uncertainty analysis, and experimental design. However, the cost of computing sensitivity coefficients often poses a challenge. This cost may be central to determining the choice of optimization method to use in parameter estimation. When the computational overhead of sensitivity calculation becomes prohibitively high, methods such as the conjugate gradient and quasi-Newton, which avoid this computation, are often used in place of gradient-based methods such as Gauss-Newton and Levenberg-Marquardt.

[3] Jacquard and Jain [1965] presented a procedure for numerically computing the sensitivity coefficients for history matching and applied the procedure to estimate permeability in a two-dimensional reservoir from pressure data. Subsequently, Carter et al. [1974] presented a derivation of the method to compute the sensitivity coefficients for two-dimensional single-phase flow problems. Chen et al. [1974] and Chavent et al. [1975] independently proposed the optimal control method to calculate the gradient of the objective function with respect to model parameters for single-phase flow. Wasserman et al. [1975] extended the optimal control theory to automatic history matching in a multiphase reservoir but only computed the adjoint variables for the overall pressure equation and used an objective function based on only the pressure mismatch term. Later, Carrera and Neuman [1986] and Sun and Yeh [1990a] used the optimal control theory to solve the parameter identification problem for groundwater flow. A detailed review of the parameter identification procedures in groundwater hydrology was done by Yeh [1986]. Wu et al. [1999] later derived the adjoint equations for multiphase flow in a hydrocarbon reservoir, but the computational cost is still very high when the number of data is large.

[4] Efforts have been vested in finding cheaper methods of computing the sensitivity matrix without compromising the accuracy of the solution. One method, the forward sensitivity analysis (also known as the gradient simulator method), is very efficient when the model space is small [Yeh, 1986; Tang et al., 1989; Landa, 1997]. This is because this method requires the solution of a linear system with multiple right-hand-side vectors. The number of right-hand-side vectors is exactly equal to the number of model parameters. Moreover, in the forward sensitivity method, the sensitivities of all grid block variables are computed. This is inefficient because only the sensitivities of variables at measurement locations are required. For high-dimensional problems, this method becomes prohibitively expensive. Model space reduction via reparameterization [Oldenburg et al., 1993; Reynolds et al., 1996; Lu and Horne, 2000; Sahni and Horne, 2006; Sarma et al., 2008] is often coupled with the forward sensitivity method to stabilize the algorithm and speed up the computation of sensitivity coefficients.

[5] Another method, the adjoint approach [Shah et al., 1978; Anterion et al., 1989; Plessix, 2006; Michalak and Kitanidis, 2004; Li and Petzold, 2004], is commonly used to compute the sensitivity coefficients and the gradient of the objective function. The adjoint method of sensitivity computation is particularly useful when the number of data is relatively small. This method is also based on solving a linear system with multiple right-hand-side vectors. However, the number of right-hand-side vectors in this case is equal to the number of data for which sensitivities are to be calculated. The number is therefore independent of the number of parameters. This approach is preferred to the forward sensitivity method when the number of data to match is significantly smaller than the number of parameters. However, there are several instances in which the data space and the model space are both of high dimensions. In such instances, the cost of computing sensitivity coefficients can be very large.

[6] In this paper, we propose the application of a linear transformation of the data space to reduce the associated cost of adjoint state sensitivity computation. First, we review wavelet analysis, wavelet reparameterization of the data space, inverse modeling, and the conventional adjoint approach to calculating sensitivities in steady state linear systems. We subsequently derive a wavelet approach to adjoint sensitivity computation. The approach uses the data compression capability of the wavelet transform to reduce the size of the adjoint equations for sensitivity computations. Finally, we verify the approach using numerical examples applied to spatially sampled hydraulic head data. The examples involve finding the maximum likelihood estimates of reservoir parameters by matching a reduced set of wavelets of the measured data.

2. Inverse Problems

[7] Estimating flow parameters from measured data involves minimizing an error norm. Consequently, gradient-based parameter estimation involves successively determining the minimum of a series of linearized problems. Depending on the optimization algorithm, successive linearizations may require the computation of the sensitivity matrix. Algorithms, such as Gauss-Newton and Levenberg-Marquardt, that require sensitivity coefficients usually have faster convergence than algorithms that do not. However, the computation of the sensitivities can be very costly, making the Gauss-Newton and Levenberg-Marquardt algorithms unrealistic for large-dimension problems.

2.1. Parameter and Model Spaces

[8] The parameter space is the space of all physical parameters that completely characterize a physical system. In this work, we assumed that the reservoir is completely characterized by its conductivity distribution. The model space is the space of all parameters used to represent the system in the solution of the inverse problem. The model parameters may be the same as the system parameters, as is the case with a pixel-based approach or a transformation of the system parameters into a different domain [Oldenburg et al., 1993; Reynolds et al., 1996; Lu and Horne, 2000; Sahni and Horne, 2006; Sarma et al., 2008]. In the case where a transformation is made, the number of model parameters is often smaller than the number of actual system parameters. The choice of model parameters for the system is called parameterization of the system. We denote the unknown system parameters as equation image. The system parameters here are the natural logarithms of the reservoir conductivities.

2.2. Data Space

[9] The data space is the space comprising all physical responses or measurements obtained from a system or a transformation of such measurements. The data space is divided into two categories: the measurement space and the observation space. The measurement space is the space of all measurements made on the physical system. The measurement space is denoted as equation image and it is the space of all conceivable responses d from the wells and reservoir. The observation space, denoted as equation image, is the space of the actual data matched during inverse problem solution. In traditional inverse modeling, the observation space is the same as the measurement space. Recently, Awotunde and Horne [2008] introduced the concept of observation space, which is a subset of the measurement space. This observation space is obtained by linearly transforming measured data into a wavelet domain. Thus, in this article, equation image.

2.3. Transformation of Data Space and Dimensionality Reduction

[10] Measured data are transformed into a frequency domain that contains only relevant information necessary to solve the inverse problem. The procedure involves decomposing the time series data into wavelet coefficients and selecting only relevant coefficients for use in parameter estimation. Several previous studies [Jansen and Kelkar, 1997; Kikani and He, 1998; Panda et al., 2000; Athichanagorn et al., 2002] have dealt with the decomposition of the data, but their primary focus was either to denoise the data or to compactly store (compress) the data. In such cases, the coefficients are often back transformed to the real-time domain before use. In this work, we fitted important wavelet coefficients of the modeled response to corresponding wavelet coefficients of the measured data. Thus, we work practically in the wavelet domain until a good match is obtained. Because the data space used in the solution of the inverse problem is a subset of the wavelet transform of the time series data space, the dimension of the observation space is smaller than the dimension of the original measurement space. The magnitudes of the wavelet coefficients of measured data are used to determine the coefficients to keep for history matching. That is, we transform the measured data into wavelet coefficients and select the coefficients with the largest magnitudes to form the observation space.

2.4. Posterior Probability Distribution

[11] If we assume that the distribution of data and parameters are Gaussian, then the posterior probability distribution of parameters given the data may be expressed as

equation image

where equation image is the objective function defined as

equation image

[12] In the absence of any prior information about the model parameters, the second term on the right-hand side of equation (2) may be neglected. Further assumption that measurement errors are independent and have unit variance leads to

equation image

[13] Minimizing equation (3) gives the maximum likelihood estimate of equation image. The minimization procedure may then be stated as

equation image

3. Wavelet Transform

[14] Wavelet analysis [Daubechies, 1992; Chui, 1992; Donoho and Johnstone, 1994; Mallat, 1989, 1999; Oppenheim and Schafer, 1999] has found applications in diverse fields, such as harmonic analysis, numerical analysis, signal and image processing, fractal and multifractal analysis, model parameter estimation, etc.

[15] A wavelet can be described as a real-valued function equation image that satisfies the conditions

equation image
equation image

[16] Equations (5) and (6) imply that equation image has a zero mean, is nonzero somewhere, and has unit energy. A family of space-frequency atoms is obtained by scaling equation image and translating it by equation image, such that

equation image

[17] Thus, the wavelet transform of any spatial series equation image at location b and scale a can be written as

equation image

[18] Small values of the scale parameter a represent high-frequency components of the signal, while large values of a represent low-frequency components of the signal. The transform defined in equation (8) is redundant because it maps a function equation image of one variable into a function equation image of two variables. To minimize the transform, discrete values of a and b are selected using a critical sampling defined by

equation image

[19] This will produce a minimal but complete orthogonal basis. Thus, we may define an orthonormal wavelet as a function equation image with the property that

equation image

defines an orthonormal basis equation image of equation image

[20] The requirement that the set equation image forms an orthonormal basis of equation image means that any function equation image can be decomposed as

equation image

where

equation image

[21] Here equation image are the wavelet coefficients of d, and equation (12) defines an inner product. The parameters m and n make it possible to analyze a signal's behavior at a dense set of spatial locations and with respect to a large range of scales, thereby providing the possibility to zoom in on the transient behavior of the signal. Because wavelets are functions with compact support in both space and frequency domain, the decomposition in equation (11) is local, and only the terms corresponding to equation image with n2−m near x make a large contribution at x.

3.1. Discrete Wavelet Transforms

[22] The discrete wavelet transform (DWT) is applied to discrete data sets and is more commonly used than the continuous transform. The DWT can be regarded as dyadic slices through the continuous wavelet transform. It is thus possible to take subsamples of the wavelet transform. The formulation of the DWT as an orthonormal transform simplifies many applications and facilitates statistical analysis of data. A discrete representation of the wavelet transform in equation (12) can be written in the form of a linear transformation using an orthogonal wavelet matrix W:

equation image

[23] W maps the input data from the space domain to the wavelet domain. For a complete set of coefficients, W is an L×L matrix applied to an input data of size L. However, when a subset of the wavelet transform is required, W is of dimension Ncoeff×L, with Ncoeff being the size of c. Whenever W is an orthogonal matrix, as is the case in this study, the corresponding transformation is a rotation in equation image in which the n-tuple data are points in equation image and the coordinates of the point in the rotated space comprise the discrete wavelet transform of the original coordinates. Although the implementation of equation (13) is conceptually straightforward, it is of limited practical use because storing and manipulating transformation matrices of large input data may not be feasible. As a result, a recursive form of filtering [Mallat, 1989, 1999; Percival and Walden, 2000] is usually employed in the computation of discrete wavelet coefficients. Thus, classical implementation of equation (13) uses two filter banks with recursion on its low pass.

3.2. Preservation of Euclidean Norm

[24] It has been shown [Awotunde and Horne, 2008] that for orthonormal wavelet transforms, the Euclidean distance in the space domain is exactly the same as the Euclidean distance in the frequency domain. That is,

equation image

[25] This implies that the minimization procedure implemented in the space domain is exactly the same as that implemented in the frequency domain. In simple terms, minimizing the error norm in the wavelet domain will follow exactly the same regression path and yield the same model parameters as minimizing the error norm in the space domain. This is supported by the numerical results obtained in this work. Thus, the minimization of wavelets of data may be written as

equation image

[26] Equation (15) will not provide any advantage over equation (4) because the regression will follow exactly the same path in both cases. However, we do not implement the algorithm as shown in equation (15). Rather, we minimize the sum of squares of errors between corresponding subsets of wavelet coefficients obtained by transforming the measured data and the calculated data. The objective is to transform the data into a wavelet domain and subsequently reduce the dimension of the transformed data. This reduction has two advantages: elimination of redundancy present in the full data set and reduction in the time needed to compute sensitivity coefficients.

4. Parameter Estimation Approaches

[27] There are two different approaches that were used here in the parameter estimation. The difference in these approaches lies in the choice of parameterization of the observation space. In this section, we consider the pixel-based approach and the wpk approach.

4.1. Standard Nonlinear Regression: pk Approach

[28] In pixel modeling, the number of parameters needed to describe the reservoir fully is equal to the number of grid blocks, with each parameter assigned to a unique grid block. Also, all measured data are matched to estimate the distribution of reservoir parameter. Thus, the objective function is the same as given in equation (3). Calculated data are fitted to measured data by varying actual reservoir parameters. One advantage of this approach is that reservoir description is done at the finest scale of heterogeneity. Unfortunately, most reservoir responses, especially in the form of production data measured in the wells, are responses to large-scale reservoir heterogeneities. Thus, characterizing a large reservoir at the finest scale may not provide a meaningful solution to the inverse problem if the data do not have enough resolution to adequately resolve a large number of model parameters. Another problem is the redundancy in information carried by the measured data. A third problem with this approach is the huge computational cost associated with the computation of the sensitivity coefficients. This can be overcome by using methods such as the quasi-Newton or conjugate gradient methods, which do not require sensitivity coefficients.

4.2. Wavelet Approach: wpk Approach

[29] A wavelet reparameterization of the data space (the wpk approach) was recently presented by Awotunde and Horne [2008]. In this approach, the measured data are transformed into a wavelet domain and thresholded to yield a reduced data space that contains only the most relevant coefficients. The coefficients retained form the observation space, i.e., the space of the coefficients that are matched to obtain an estimate of reservoir parameters. Consequently, the objective function for maximum likelihood estimation is given as

equation image

where

equation image
equation image

[30] Here dmeas and dcal are the measured and calculated data, respectively, and they are of dimension L×1. W is of dimension Ncoeff×L, while equation image and equation image are of dimension Ncoeff×1. The gradient of equation image with respect to equation image is

equation image

and the Hessian matrix for the Levenberg-Marquardt algorithm is computed as

equation image

[31] Here equation image is the Levenberg-Marquardt parameter and equation image is the wavelet sensitivity matrix, defined as

equation image

with an update to equation image at the equation image iteration given by

equation image

where equation image is the search direction calculated from

equation image

and equation image is the step length.

[32] The algorithm described here is the Levenberg-Marquardt algorithm, which is very efficient because it converges in fewer iterations than many other gradient-based methods. However, it requires the computation of the sensitivity matrix. The sensitivity matrix, although very informative, is expensive to compute when the inverse problem is large and complex. In equation (21), the full sensitivity matrix equation image is first computed and subsequently multiplied by the wavelet matrix W to obtain the reduced wavelet sensitivity matrix. This implementation, as performed by Awotunde and Horne [2008], is inefficient and makes the approach as expensive as the conventional pk approach. In order to reduce the computational overhead associated with obtaining the wavelet sensitivity matrix, we devised a means to compute the reduced wavelet sensitivity matrix directly from the adjoint equations.

5. Adjoint Method

[33] In many instances, the number of observations is significantly lower than the number of model parameters. In such situations, it is apt to use an adjoint state method to compute the sensitivity coefficients. Furthermore, when only the gradient of the objective function is required, the adjoint method is the most computationally effective method to compute this gradient. The adjoint method is a special case of linear duality, and we begin this section by reviewing the principle of duality.

5.1. Duality

[34] In linear duality, substitution of alternative variables allows us to speed up the computation of the gradient of a function. Suppose that we would like to compute the product

equation image

such that

equation image

in terms of the unknown matrix B. We assume that the vector g and matrices A and C are known. The direct approach will be to first compute B from equation (24) and then compute equation image. This approach turns out to be very expensive when the number of columns in C is large. An alternative is to introduce a vector equation image and compute

equation image

such that

equation image

[35] In equation (25), we have to solve the linear equation for a single vector equation image and then compute equation image. In fact, equation image is exactly equal to equation image. Hence, we are able to achieve the same objective of calculating equation image by solving a linear system with one right-hand-side vector instead of solving a linear system with multiple right-hand-side vectors. This is the principle on which the adjoint method is based, and equation (25) is known as the adjoint equation.

5.2. Adjoint Method for Steady State Differential Equation

[36] Consider the steady state two-dimensional Darcy flow in a horizontal porous medium:

equation image

with boundary conditions

equation image

for a no-flow boundary and

equation image

for a constant-head boundary. In equations (27) and (28), r represents x or y, equation image is the hydraulic head (L), K is the hydraulic conductivity (L T−1), equation image is the discharge rate (T−1), equation image is the flow region, and equation image is the reservoir boundary. Because K(x, y) is not a function of equation image, the differential equation in equation (26) is linear, and the discretized form of its solution may be represented by

equation image

where A is of dimension M×M and equation image and b are of dimension M×1. M is the number of grid blocks. Now consider that the model is parameterized by equation image with

equation image

[37] Equation (29) then becomes

equation image

[38] Equation (31) can be expressed in terms of equation image as

equation image

with each element in equation image given by

equation image

[39] In equation (33), el is a column vector composed of zeros everywhere except at row l, where there is a 1. Row l in el represents the index of the grid where a measurement of hydraulic head is made. Differentiating equation image with respect to the mth parameter equation image leads to

equation image

[40] Factorizing the common terms and making use of equation (32) leads to

equation image

[41] If we choose an adjoint variable equation image such that

equation image

then

equation image

and equation (35) becomes

equation image

[42] Now consider that there are L locations where measurements are taken. There will be L such equation image values that need to be computed. Since all the equation image values have the same matrix AT, they may be grouped into a single right-hand-side matrix equation image to speed up the computations. Equation (37) then may be written as

equation image

where equation image is a matrix whose columns are composed of all the el values. Equation (38) may then be written as

equation image

[43] While equation image is the vector of calculated hydraulic head values in all the M grid blocks, equation image is the vector of calculated hydraulic head values at measurement locations and equation image is the derivative of calculated head values at measurement locations with respect to equation image. Thus, equation image is of dimension L × 1, and it represents the column m in the sensitivity matrix equation image of dimension L×M. Therefore, we are able to compute the sensitivity coefficients for locations where observations are made instead of the full sensitivity matrix for all grid block head values. Essentially, this implies that the computational effort needed to obtain the sensitivity matrix using an adjoint method depends on the number of measured data.

5.3. Application of Wavelet to Linear Adjoint Model

[44] Solving the linear system in equation (39) is the most computationally intensive part of adjoint sensitivity computation. The goal in this section is to reduce the size of the linear system in equation (39) without any significant loss of accuracy. Now consider that we intend to match some wavelet coefficients of equation image instead of equation image. These wavelet coefficients equation image are computed by

equation image

where W is an orthonormal wavelet matrix of dimension Ncoeff×L and Ncoeff is the number of coefficients retained. If we postmultiply equation (39) by WT so that

equation image

we then may write

equation image

where

equation image
equation image

[45] Equation (44) can be expressed as

equation image

[46] Substituting equation (46) into (40) gives

equation image

[47] Premultiplying both sides of equation (47) by W and recognizing that WWT = I lead to

equation image

[48] Thus, equations (43) and (48) are the adjoint equations for calculating the sensitivities of equation image to the model parameter equation image. The efficiency of this method lies in the ability to compute a reduced wavelet sensitivity matrix using equation (48). The right-hand-side matrix equation image in equation (43) has fewer columns than the right-hand-side matrix in equation (39), thus leading to significant savings in time and storage. Having a reduced number of right-hand-side vectors brings savings in the computational time required to solve the linear system. A less efficient approach is to compute the full sensitivity matrix equation image using equation (40), transform the sensitivity matrix into a wavelet domain, and subsequently reduce the transformed matrix by thresholding.

[49] For completeness, we derive the adjoint method for computing the gradient of the objective function discussed in section 4.2 The line search method introduced in section 4.2 requires the gradient in order to determine a good choice for the step length equation image. At each iteration of the Levenberg-Marquardt algorithm, a step length is determined after the search direction is found. Although the gradient can be computed from the sensitivity matrix, as shown in equation (19), this procedure is very inefficient within the inner iterations used to determine the step length. Within these inner iterations, the Hessian matrix is not required. Computing the sensitivity matrix when the Hessian matrix is not required amounts to a gross waste of resources. Therefore, the adjoint method is the most efficient method to compute the gradient under such circumstances. Because the observation space is made of wavelets, there is a need to modify the conventional adjoint equations to compute the gradient. We differentiate equation (16) to obtain

equation image

[50] Combining equations (43) and (48) yields

equation image

[51] In equation (50), equation image is the ith column in equation image and i is the index of the wavelet coefficient. Substituting equation (50) into equation (49) gives

equation image

where

equation image

[52] By choosing equation image such that

equation image

we obtain

equation image

[53] Equations (53) and (54) are the adjoint equations for computing the gradient of the objective function in equation (16). The linear system in equation (53) has only one right-hand-side vector and is therefore very cheap to solve.

6. Practical Considerations

[54] The main idea of this approach is to reduce the cost of computing sensitivities by compressing the measured data into a small number of coefficients. Thus, it is necessary to use a transform that gives a high compression ratio without compromising the overall speed and memory requirements of the inverse problem. In this regard, we used the Haar wavelet transform to decompose the spatial data. The Haar transform was chosen because of its simplicity, low memory requirements, and high speed of computation. The orthogonality of the Haar wavelet transform and the absence of edge effects make the transform a good candidate for directly computing a reduced adjoint sensitivity matrix (equation (48) requires that the wavelet matrix be strictly orthogonal).

[55] Discrete wavelet transforms require that the data has a length of 2J, where J is a positive integer. Thus, a practical consideration is how to treat data that have lengths that are not a power of 2. The usual way to deal with this is to pad the data with some constant values, usually zeros. The same set of values used for padding the measured data must be used for padding the calculated data. Because the inverse solution uses the l2 norm of the difference between the measured and calculated data, padding both data with the same values has no effect on the result of the inverse solution.

[56] The Haar wavelet transform requires that the data be uniformly sampled in order to prevent changes in sampling frequency showing up as sharp discontinuities in the detail coefficients. However, all the measured data used in this work, as is the case in most practical applications, are sampled at irregular intervals. There are many ways to deal with the wavelet decomposition of irregularly sampled data, the simplest being to treat the data as if they were uniformly spaced [Sardy et al., 1999]. That is the approach used in this work. The approach does not decrease the overall variance of the coefficients. However, it may affect the choice of wavelets retained because the thresholding is based on the magnitudes of the wavelets of measured data.

[57] Another way to treat irregularly spaced data is to fill the spaces between the data with some constant values. This should be applied to both the measured and calculated data sets, and the same infill values should be used for both data sets. While this method does not increase the overall variance of the error, it makes data compression difficult or totally impossible. Take, for example, the case of 64 irregularly sampled data points on a 128×64 grid system. If all the points between the original 64 data points are filled with zero (or any other value) and the 8192 data points are decomposed into wavelets, then selecting 64 or fewer wavelets from the 8192 coefficients may lead to significant loss of information contained in the data set. In this case, it is better to decompose the original 64 data points than to augment the data to a large number.

[58] Finally, it is important to note that the number of wavelets selected is crucial to obtaining convergence in a reasonable time limit. While too few wavelet coefficients may prevent the model from converging, too many coefficients will increase the time required to compute the sensitivity matrix. There is no general rule to determine the number of wavelets to retain for any particular problem. However, smooth data often require less wavelet coefficients than irregular data. What is important in all thresholding is that all approximation coefficients must be retained, and the detail coefficients with bigger magnitudes should be preferred to those with smaller magnitudes.

7. Sample Applications of Wavelet Transform to Linear Adjoint Equations

[59] We present three examples to illustrate the usefulness of the approach. All examples used data simulated from the steady state flow of water in reservoirs with Dirichlet boundary conditions. The first example used a one-dimensional reservoir and a one-dimensional Haar wavelet decomposition of the measured data. In the other examples, two-dimensional rectangular reservoirs were used,and the measured data were decomposed using a two-dimensional Haar wavelet transform.

[60] While the measured data in the first example are fairly smooth, the measured data in the second and third examples are discontinuous, making it difficult to obtain high compression ratios. Thus, the measured data in the second and third examples were arranged in ascending order before decomposition in order to improve their compression ratios. We note that the superscripts on wp are used to denote the number of wavelets retained in the wpk approach.

[61] The third example was used to study the performance of the wpk approach under two different scenarios: (1) the presence of white noise in the data and (2) the use of constant values to augment the data. The augmentation of the data was done to fill the spaces between the irregularly sampled data. In this case, reordering the data was not necessary.

7.1. Example 1: One-Dimensional Reservoir

[62] In this example, we considered the steady state one-dimensional Darcy flow in a porous medium given by

equation image

with boundary conditions

equation image

and assumed that the following variables were known:

equation image

[63] The reservoir was divided into 100 grid blocks, and the values of hydraulic head measured at the 64 locations were given. Using the known data, we intend to estimate the distribution of hydraulic conductivity K(m/s) in the reservoir. In this example, equation image is the measured data, and equation image and equation image are the wavelets of the measured and calculated data, respectively.

[64] First, we verify the approach by comparing the reduced wavelet sensitivity matrix computed directly using equation (48) and that obtained by first calculating the full sensitivity matrix using equation (40), taking a wavelet transformation of the full sensitivity matrix and subsequently reducing the transformed matrix. Figure 1 shows the sensitivities (in absolute values) of the largest 32 wavelet coefficients of measured data to model parameters ln K. The reduced wavelet sensitivity matrix computed directly using equations (43) and (48) is shown in Figure 1a, while that computed from the full sensitivity matrix is shown in Figure 1b. The two matrices are equivalent.

Figure 1.

Reduced (absolute) wavelet sensitivity matrix for 1-D reservoir model (a) obtained directly from a reduced linear system and (b) obtained by transforming and reducing the full sensitivity matrix.

[65] The pk and wpk approaches were used to estimate the distribution of reservoir conductivity. The pk approach uses equations (39) and (40) to compute the full sensitivity matrix required in the solution of the inverse problem presented in equation (4). In this case, all 64 measured data were matched. However, in the wpk approach the measured data were transformed into wavelets, and some large (in absolute terms) coefficients were selected and matched to obtain a description of reservoir conductivity. Two cases were investigated. In the first (wp32k), the 32 largest wavelet coefficients of data were matched. In the second case (wp16k), only 16 coefficients were matched. Results from all three cases are presented in Figures 25. The fact that the wpk approach used fewer coefficients as observation data implies that the number of columns in equation image is less than the number of columns in equation image. Because the adjoint linear system is solved at all iterations, the reduction in computational time can be significant for large-scale problems.

Figure 2.

Match to measured data produced by pk and wp32k for the 1-D reservoir model.

Figure 3.

Match to measured data produced by wp16k for 1-D reservoir model.

Figure 4.

Estimated conductivity K distribution for 1-D reservoir model.

Figure 5.

Residual decay of norm of data for 1-D reservoir model.

[66] In Figure 2, we observe that pk and wp32k gave very good matches to the measured data. Because the pk and wp32k approaches gave similar matches to the measured data, only one plot is displayed (in Figure 2) to represent both of them. Figure 3 shows a slight deterioration of the match to measured data (between x = 0.3 and x = 0.4) when the number of wavelets selected was reduced to 16. Apparently, this suggests that there is a limit to which the transformed domain can be compressed.

[67] In Figure 4, we compare the conductivity estimated by pk, wp32k, and wp16k to the true conductivity field. Ko is the starting guess for the algorithm, and Kt is the true conductivity field. We notice that both pk and wp32k exhibited similar performance in their ability to predict the conductivity distribution in the reservoir, and they gave a fairly good match to the true distribution of reservoir conductivity. We also observe that the conductivity predicted by wp16k is less accurate. This example suggests that with a good choice of thresholding, we can replace the pk approach with the less expensive wpk approach. The level of compression of the observation space depends on the degree of correlation in the measured data. As the thresholded observation (data) space becomes smaller, important information in the data is lost and the ability to estimate the reservoir parameters diminishes.

[68] The decays of residual norm of data exhibited by the approaches are presented in Figure 5. We observe that pk and wp32k exhibited similar convergence performance, while wp16k exhibited poor convergence. The average time taken per iteration by each approach is presented in Table 1. For this small model, the reduction in time achieved by reducing the data space is not significant.

Table 1. Model Performance Data: 1-D Reservoir Model
 Approaches
pkwp32kwp16k
Number of observed data643216
Time per iterationg (s)0.05620.04150.0412

7.2. Example 2: 16×16 Grid System

[69] This example consists of a two-dimensional reservoir with constant head boundaries in the x and y directions. The reservoir is a hot rock used as a hot-water source. There are five producers, three injectors, and eight observation wells (Figure 6) in the reservoir. Used water is returned to the reservoir through the injection wells. We discretized the reservoir into a 16×16 grid system and assumed that the following variables are known:

equation image
Figure 6.

Location of wells in the 16 × 16 reservoir model.

[70] The well data for this example are given in Table 2, and the steady state distribution of head values is given in Figure 7. Measurements were taken in all 16 wells. The problem is to estimate the distribution of reservoir conductivity K(m/d) by matching the head values measured in the wells (pk approach) or matching some wavelets derived from the transformation of the measured data (wpk approach).

Figure 7.

Steady state distribution of hydraulic head values for 16 × 16 reservoir model.

Table 2. Well Data: 16×16 Reservoir Model
Wellx Coordinatey Coordinateq (d−1)
111−3.6
2710
3161−3.35
4104−3.8
5250
6650
71460
8373.85
9884.5
101100
1113110
129120
1312130
14514−2.6
151164.3
161616−3.55

[71] The 16 measured head values were arranged in ascending order, shaped into a 4×4 square matrix, and then decomposed using a two-dimensional wavelet transform. The wpk approach used 10 wavelet coefficients of the measured data for the history match. Figure 8 shows that the pk and wpk approaches gave good matches to the measured data. In Figure 9, estimates of ln K distribution obtained from the two approaches are displayed along with the initial guess and the true log conductivity distribution. The estimates of ln K distribution obtained are not close to the true distribution shown in Figure 9b because the information content of the measured data is not enough to resolve all 256 reservoir parameters. In Figure 10, we observe that the wpk approach exhibited a poor convergence for this example. This can be attributed to the poor compression factor of the measured data. The data have a poor compression factor because the data-to-data correlation is low (Figure 8a). Table 3 shows that a minimal reduction in time is achieved when the number of observation data is reduced from 16 to 10.

Figure 8.

Match to measured data produced by pk and wp10k for 16 × 16 reservoir model (a) with natural ordering and (b) rearranged in ascending order.

Figure 9.

Log conductivity distributions for 16 × 16 reservoir model.

Figure 10.

Decay of residual norm of data for 16 × 16 reservoir model.

Table 3. Model Performance Data: 16×16 Reservoir Model
 Approaches
pkwp10k
Number of observed data1610
Time per iteration (s)0.09010.0841

7.3. Example 3: 128×64 Grid System

[72] The reservoir is a hot-water source discretized into a 128×64 grid system, with each grid having a length of 100 m and a width of 100 m. There are 24 hot-water producers, 16 cold-water injectors, and 24 observation wells (Figure 11). The head values at the reservoir boundaries are fixed at 500 m. Measurements were taken from all 64 wells, and the pk and wpk approaches were used in estimating the distribution of reservoir conductivity K(m/d). The well locations are shown in Figure 11, and the steady state distribution of head values is given in Figure 12.

Figure 11.

Location of wells in the 128 × 64 reservoir model.

Figure 12.

Steady state distribution of hydraulic head values for 128 × 64 reservoir model.

[73] The 64 measured head values were arranged in ascending order, shaped into an 8×8 square matrix, and then decomposed using a two-dimensional Haar wavelet transform. The wpk approach used 54 wavelet coefficients (wp54k) and 32 wavelet coefficients (wp32k) of the measured data for history match. Deterioration in the quality of the result obtained is observed when the number of wavelets retained was reduced from 64 to 54 and from 54 to 32 (Figure 13). This indicates that almost all 64 wavelet coefficients carry vital information about the model. Thus, eliminating some wavelets led to losing significant information contained in the data. In Figure 14, estimates of ln K distributions obtained from the two approaches are displayed along with the initial guess. We observe similar estimates from the pk and wp54k approaches (Figures 14b and 14c). The estimate from wp32k is worse than the other two (Figures 14d). In general, none of the estimates is close to the true ln K distribution shown in Figure 11. This indicates that the measured data do not contain enough information about the reservoir model.

Figure 13.

Match to measured data produced by pk and wp54k (128 × 64 reservoir model).

Figure 14.

Log conductivity distributions (128 × 64 reservoir model).

[74] In Figure 15, we observe that the wpk exhibited a poor convergence for this example. However, Table 4 shows that with about 15% reduction in the observation space (64 to 54 data points), the time per iteration was reduced by about 37% (from 609 to 383 s). This significant reduction in time is attributable to the size of the inverse problem. Thus, the reduction in time achieved by using the wavelet approach to adjoint sensitivity computation becomes significant when the size of the inverse problem is large.

Figure 15.

Decay of residual norm of data for 128 × 64 reservoir model.

Table 4. Model Performance Data: 128×64 Reservoir Model
 Approaches
pkwp54kwp32k
Number of observed data645432
Time per iteration (s)609.42382.84241

[75] By adding some white noise to the data, we studied the effect of noise on the wpk approach. The noise caused some deterioration in the results obtained from the wpk approach (Figures 16 and17). We also studied the effect of augmenting the data to create a regularly spaced sample. For this purpose, we input 500 m as the data value in all locations at which no measurement was made. In this way, all 8192 grid locations (128×64) have data values assigned to them. The same value (500 m) was used to augment the calculated data. A two-dimensional wavelet transform was then applied to both the measured data and the calculated data. In this case, 64 wavelet coefficients were selected out of a total of 8192. Results obtained are presented in Figures 18 and 19. We observe a poor match to the measured data (Figure 18) and a poor estimate of the log reservoir conductivity (Figure 19). This is because by augmenting the measured data (from 64 to 8192), we unnecessarily distribute the information content of the data into a large number of wavelet coefficients. Retaining only the 64 largest (in absolute) wavelet coefficients (the original data size) means discarding significant amount of information contained in the data set. To obtain more information from the data, a larger number of wavelets must be retained. Retaining a number of wavelets larger than the number of the original measured data (in this case 64) runs contrary to the objective of this work.

Figure 16.

Match to measured data produced by pk and wp54k for 128 × 64 reservoir model, data with noise.

Figure 17.

Decay of residual norm of data for 128 × 64 reservoir model, data with noise.

Figure 18.

Match to measured data produced by wpk for 128 × 64 reservoir model, data augmented to 8192.

Figure 19.

Estimate of log conductivity distribution obtained from wpk (128 × 64 reservoir model, data augmented to 8192).

8. Conclusions

[76] In this paper, a wavelet approach to the adjoint method of computing sensitivities was presented for steady state differential equations. The new approach exploits the ability to match a reduced data space to reduce the size of the adjoint linear system used in the computation of sensitivity coefficients.

[77] The success of the method depends heavily on the ability to concentrate the information content of the original data into a significantly smaller number of wavelet coefficients. Numerical examples were presented to validate and illustrate the strength of the approach. Observations from the results show that smoothly varying data give larger compression ratios than irregular noncorrelated data and that large compression ratios result in significant reduction in time. Also, the effectiveness of the approach is noticed when the size of the inverse problem is large.

[78] Overall, the approach presented shows that we can reduce the cost of computing sensitivities if the data space can be reparameterized with a small number of wavelet coefficients.

Appendix A:: Relationship Between Regression in Real Space and Regression in Wavelet Space

[79] Certain properties of signals are preserved when they are transformed from the real space to the orthogonal wavelet domain. In this appendix, we examined the relationship between regression in the real space and regression in the wavelet domain.

[80] Awotunde and Horne [2008] showed in Lemma 1 of their work that the norm is preserved when a one-dimensional Haar wavelet transformation is performed. This proof, however, does not independently suggest that the regression in the wavelet domain is equivalent to regression in the real space. In this appendix, we give the proof of the equivalence of regression in the two domains.

A1. Lemma

[81] Given that all observation space wavelets are used in a minimization involving the wpk approach, regression using wavelets as observed data will follow exactly the same path as regression using the actual physical measurements as observed data.

A2. Proof

[82] The Newton direction for the wpk approach is given by

equation image

[83] Substituting the definitions of equation image and equation image into equation (A1) and simplifying gives

equation image

[84] Some matrix manipulations yield

equation image

with further simplification leading to

equation image

where

equation image
equation image

and equation image is the search direction in the pk approach. Equation (A4) is the linear equation for obtaining the search direction in the pk approach. This proof shows that if we use all wavelet coefficients of data as the observation data in the wpk approach, we shall obtain exactly the same results that we obtain by using the pk approach.

Acknowledgments

[85] We are grateful for the financial support provided by the Stanford School of Earth Sciences, including the Hal Dean, the William Whiteford, and Aramco Research Fellowships. We acknowledge other funding provided by SUPRI-D.

Ancillary