Dynamic data for parameter estimation in groundwater hydrology are often strongly sensitive to averages of porosity and permeability over extended regions, but only weakly sensitive to the properties of individual gridblocks. Consequently, updates of gridblock parameters from the ensemble Kalman filter (EnKF) will be subject to substantial error due to spurious correlations when the size of the ensemble is small. It has been shown that adaptive localization methods can reduce the magnitude of spurious correlations and regularize updates from the EnKF. In this paper we show that when data are sensitive to an average of parameters over extended regions, adaptive localization methods may eliminate true, but weak, correlations. For this type of data, there appears to be a relationship between the spatial scale of the model variables and the sensitivity of observations to the variables. By introducing a multiscale expansion with adaptive localization, we are able to increase the sensitivity to variables corresponding to large spatial scales and more accurately compute their magnitudes, while eliminating updates to variables corresponding to small spatial scales with weak correlations. Three examples with distinct features are used to compare the new method of combining adaptive localization with wavelet parameterization to the case of no localization and adaptive localization with standard gridblock parameterization. We show that the Kalman gain estimated using the wavelet parameterization with adaptive localization is less biased and has smaller root-mean-square error for the problems in which spatial averaging is important and the Kalman gain has large-scale features.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Typical state variables of saturated groundwater flow and transport problems include pressure and concentration of various species or injectant. For a given set of boundary conditions and well constraints, the distribution of the state variables at any time is a function of the spatial distribution of the formation properties, such as permeability and porosity. One key feature of the problem is that pressure and concentration are sensitive to formation properties over a relatively large region [Oliver, 1993, 1998]. For example, the concentration of an injected conservative tracer at a particular location is primarily sensitive to permeability along a streamline passing through the observation location, but is sensitive to porosity only along the part of the streamline between the injector and the observation location. In both cases, the sensitivity is highly nonlocal, i.e., the observed concentration is not primarily sensitive to formation properties in a small region. We refer to data that are sensitive to uncertain parameters over an extended region as nonlocal observations. In this study we mainly address the estimation of formation properties, so pressure and concentration data are nonlocal observations. In case of state estimation problems, e.g., in which the spatial distribution of the tracer concentration is to be estimated, the concentrations measured at distributed locations become local observations.
 In the ensemble Kalman filter (EnKF) approach [Evensen, 1994] to data assimilation, an ensemble of simulation models is used to propagate the correlation between the formation properties and predicted observations. When the size of the ensemble is small, the ensemble approximation of the covariance can be badly corrupted by noise. In addition, the degrees of freedom in the ensemble is only as large as the size of the ensemble, which makes the assimilation of large amounts of independent data impossible. For distributed parameter estimation problems, it is often reasonable to assume that data beyond some critical distance have no influence on the update of a model variable, so some type of distance-based localization [Houtekamer and Mitchell, 1998] or tapering function [Houtekamer and Mitchell, 2001; Hamill et al., 2001] is used to regularize the updates from the EnKF. The distance beyond which the influence of data can be neglected is a function of the data sensitivity, the prior covariance, and also the size of the ensemble [see Chen and Oliver, 2010a]. Distance-based localization often works fairly well if the localization functions are appropriately chosen, but lacks generality and is not applicable to parameters that are not spatially distributed (e.g., specific hydraulic conductivity curves).
 Adaptive localization/regularization methods are alternatives to the distance-based localization. The localization functions in an adaptive regularization method are estimated from multiple ensembles [Anderson, 2007] or multiple bootstrapped ensembles [Zhang and Oliver, 2010] to reduce the variance of the estimate of the Kalman gain, so the adaptive methods are more general in that no empirical critical distance needs to be defined. Because the observations of pressure and concentration are often sensitive to formation properties over a broad region, the correlation between the observation at one location and the property at another location is generally quite small. If a small ensemble is used to compute the Kalman gain, the weak (but real) correlations will appear to be unreliable in an adaptive localization method, and the updates in those regions may be eliminated or reduced in magnitude. The result will be that the updates to the model may be substantially underestimated. In this situation we propose to apply adaptive localization in a transformed domain in which the standard gridblock parameters are projected to basis vectors with different scales. The multiscale expansion allows the use of basis vectors that correspond to large spatial averages or correspond to high sensitivity regions to data. The Kalman gain is computed for the coefficients of the multiscale expansion, so that the correlations of data to coefficients corresponding to important spatial scales are increased and can be more reliably estimated from a limited-size ensemble. The adaptive localization is applied to the Kalman gain for coefficients of the multiscale expansion, before back transformation to the normal gridblock domain, to reduce the magnitude of updates of less reliable elements. Since the scales of correlation between data and model and state variables are generally not known for typical parameter estimation problems, and the relationships are further complicated by previous data assimilation in sequential methods, we choose a discrete wavelet expansion to include all the scales and the elimination or damping of the updates is only done through adaptive screening of the Kalman gain. Note that compared to expansions in which the structure of the basis vector extends through the entire domain (e.g., Karhunen–Loève and discrete cosine transform), the basis vectors of the discrete wavelet expansion are localized in the spatial domain. This localization feature makes the discrete wavelet expansion more suitable for regularization of observations that correspond to spatial averages.
 There is a fairly extensive history of using multiscale methods to regularize the spatial parameter estimation. In one of the early examples, Liu  compared the behavior of a gradient-based minimization algorithm for solving an inverse problem when the unknown parameters were coefficients of the usual local basis and when the parameters were coefficients of the Haar wavelet basis. A process in which the coefficients were computed at progressively finer scales was determined to be robust and efficient, presumably because the objective function was most sensitive to the parameters at the largest scales and because the relationship of observations to large-scale parameters was more nearly linear. The connection between sensitivity and scale has been noted by other authors who have used it to develop adaptive multiscale estimation algorithms for parameter estimation that is driven by the information content of data [Grimstad and Mannseth, 2000; Grimstad et al., 2003; Berre et al., 2009].
 When wavelet expansions are used to represent parameter fields in inverse problems, truncation of small coefficients has been a common method of regularization. It was, for example, applied by Tangborn  to an extended (not ensemble) Kalman filter to reduce the dimension of the error covariance matrix. For assimilation of observations from Burgers' equation, accurate results were obtained when the covariance was projected onto a wavelet basis that used only 6% of the coefficients. Pannekoucke et al.  showed that diagonalization of the covariance matrix in the wavelet domain could be effective at reducing the amount of spurious correlations in the ensemble approximation of the model covariance matrix while still retaining the ability to allow the covariance to vary spatially. In their examples [and the examples of Buehner and Charron, 2007], the length scale of the covariance was spatially variable, but did not include long range correlations due to assimilation of nonlocal observations. Observations of spatial averages of properties typically result in long range weak negative correlations of local variables and large magnitude, off-diagonal elements of the covariance matrix of the wavelet coefficients, in which case diagonalization of the wavelet transform of the error covariance is not an appropriate regularization method.
 Various multiscale parameterizations have been used with the EnKF previously. Zhou et al.  introduced an ensemble multiscale filter that combines the EnKF with a multiscale autoregressive framework in order to improve the efficiency of the EnKF for high-dimensional dynamic systems with large amounts of data. By truncating states of the coarser multiscale tree nodes and diagonalizing the mapping matrix between adjacent scales, the ensemble multiscale filter was able to provide a certain degree of localization in scale. Zhang and Oliver  and Chen and Oliver [2010b] parameterized the model variables in terms of a global trend model and local fluctuations to improve the estimate over large regions. The distance-based localization was then applied to only the updates of local heterogeneity. Pajonk et al.  showed improved estimation from the EnKF for a linear advection model with local observations by using wavelet analysis with distance-based localization. In Pajonk et al. , the Kalman gain is an interpolation among multiple per-level-Kalman gains that are computed based on ensembles constructed with various degrees of high frequency components truncated in the wavelet domain.
 The objective of this paper is to show that when the observations are spatially nonlocal, there are advantages to the application of adaptive localization methods for the EnKF in the wavelet domain, to improve the estimation of the Kalman gain and to reduce the bias in reproducing data. In the remainder of the paper, we briefly review the EnKF, the bootstrap-based adaptive localization for screening of Kalman gain and the discrete wavelet expansion. The benefit of using wavelet expansion with adaptive localization is first demonstrated using a simple one-dimensional (1-D) problem for which the observations are spatial averages of properties. The true Kalman gain can be easily computed in this example so that the effect of adaptive localization on the root-mean-square error (RMSE) and bias in the Kalman gain estimate and on the ability to reproduce the observations can be investigated. The other two examples are two-dimensional groundwater flow and transport problems, in which the pressure head at an observation well or the spatial distribution of the concentration of an injected pulse of tracer (a contrast agent) is observed at several times. In all of the examples, results from three methods of computing the Kalman gain from the ensemble of samples are compared: (1) a standard, perturbed-observation, form of the EnKF [Burgers et al., 1998; Houtekamer and Mitchell, 1998] with the usual gridblock-based local parameterization, (2) a screened form of the EnKF which uses bootstrap resampling of the Kalman gain in the gridblock domain to remove spurious estimates [Zhang and Oliver, 2010], and (3) a screened form of the EnKF which uses bootstrap resampling of the Kalman gain in the wavelet domain.
 Various methods that are important for the data assimilation algorithm in this paper are described separately in section 2. The basic method is the EnKF, which can be thought of as a Monte Carlo approximation of the Kalman filter. Because the EnKF is a Monte Carlo method, the results can be affected by sampling errors and insufficient degrees of freedom, so it is often necessary to use some type of localization [Chen and Oliver, 2010a]. In this paper we will use the adaptive localization method of Zhang and Oliver  and investigate the effect of using a multiscale (wavelet) basis for the parameterization. In addition, we emphasize that there is no advantage to applying an invertible linear transformation such as a wavelet transform to the state variables, unless it is accompanied by some type of localization or regularization in the transformed domain.
2.1. Ensemble Kalman Filter
 The ensemble Kalman filter [Evensen, 1994] is a sequential data assimilation method for large scale dynamical systems. Typical applications of the EnKF in hydrology include both state estimation [e.g., Reichle et al., 2002] and combined model and state estimation [e.g., Chen and Zhang, 2006; Liu et al., 2008]. In the EnKF, an ensemble of Ne model realizations are integrated in time using the governing equations of the underlying dynamics . This ensemble of realizations are updated at times when there are observed data to be assimilated. At each data assimilation time, every model realization is updated using
where denotes the jth ensemble member before assimilation of observed data and denotes the corresponding updated ensemble member. The simulated data is often related to the model realization through a nonlinear function . If the relationship between data and model variables is linear, we often use G to denote this linear function . This G is often referred to as sensitivity matrix of data to model variables. The matrix in equation (1) is the Kalman gain, estimated from the ensemble
Here is the cross covariance between the forecast state ( ) and the simulated data ( ), is the covariance of simulated data ( ), and is the covariance of observation errors. The observation errors of data are often assumed to be uncorrelated. We make the same assumption in this study, so is a diagonal matrix with the variance of the observation error on its diagonal. Because the estimate of the Kalman gain depends on estimates of the covariances between many variables from a small number of samples, it is usually affected strongly by sampling errors. More thorough discussions of the EnKF can be found by Evensen , Aanonsen et al. , and Oliver and Chen .
2.2. Adaptive Localization
 The objective of localization in the EnKF is to minimize the difference between the true Kalman gain and the estimate of the Kalman gain computed from a small ensemble of samples. In many problems it is reasonable to use distance-based tapering functions to regularize the estimate. For problems in which physical distance is meaningless (e.g., problems in which the parameters are applied to the entire model, such as correlation length in the prior variogram model or parameters describing the specific hydraulic conductivity curves) distance-based localization is not appropriate, and other types of “localization” are required. In adaptive localization, the goal is still to minimize the difference between the true Kalman gain and the estimate of the Kalman gain, but without any assumptions of spatial dependence. Equivalently, if the Kalman gain is estimated in the standard way, then one might seek to compute a screening matrix such that when it multiplies, elementwise, the standard estimate of the Kalman gain matrix , the Frobenius norm of the difference between the product and the true Kalman gain , is minimized, i.e., compute the that minimizes
where “ ” denotes the Schur matrix product, and replace the standard Kalman gain estimate in equation (1) by for the updates. The screening matrix has the same dimension as the Kalman gain matrix. In the following we refer to elements of the matrix by . Furrer and Bengtsson  used a similar approach for computing the optimal taper function for covariance localization when the covariance function is known. Anderson  developed an adaptive method in which an estimate of the variability of the was obtained from several independent ensembles of samples. In this paper we apply the localization approach of Zhang and Oliver  in which a regularized variant of equation (3) that causes shrinkage toward is minimized [Zhang and Oliver, 2010, equation (4)]. For a reasonable choice of the regularization parameter, the solution for is
where and are the mean and variance of elements of the Kalman gain estimate Kij, respectively. The realizations in the ensemble are resampled with replacement to form new ensembles each with the same number of realizations as the original ensemble to compute an approximation of the variance of the Kalman gain estimate ( ). This resampling is often referred to as bootstrap resampling [Efron and Tibshirani, 1993]. The screening factor can also be thought of as an indicator of the reliability of the estimate of the elements of the Kalman gain. When the variance of the estimate is large, and when the mean is small, the estimate is unreliable and the corresponding entry in the matrix will be small so that updates from unreliable estimate are damped. The values of elements of range between 0 and 1.
2.3. Discrete Wavelet Expansion With Localization
 It is advantageous if some of the basis vectors in a reparameterized model are of a similar scale to the features in the true Kalman gain or in the true covariance between data and model/state variables. If the sensitivity matrix G for a general nonlinear observation operator is known, it would be possible to compute a set of basis vectors that concentrates all of the data sensitivity in a small number of coefficients. For example, one might use a basis composed of the right singular vectors of [Rodrigues, 2006], where is the covariance of model variables. In general, however, the sensitivity coefficients are unknown so improvements must be obtained through the use of a multiscale representation that contains new parameters whose support is similar to the support of the observations. We use a discrete wavelet expansion which contains basis vectors at multiple scales and locations [Strang and Nguyen, 1996]. Perhaps the simplest wavelet basis is the Haar basis, for which two basis vectors in 1-D are shown in Figures 1(a) and 1(b). Note that the localized feature of the wavelet basis vectors in the spatial domain makes the discrete wavelet expansion more suitable for localization compared to expansions with structure of the basis vector extending through the entire domain (e.g., Karhunen–Loève and discrete cosine transform). Another basis in the same family of wavelets is the second-order orthonormal wavelet basis with compact support [Daubechies, 1988], sometimes referred to as the four-coefficient Daubechies wavelet [Press et al., 1992, p. 592]. This wavelet is sometimes better suited for expansion of continuous properties than the Haar wavelet. Two basis vectors for the four-coefficient Daubechies wavelet expansion in 1-D are shown in Figures 1(d) and 1(e). In the 2-D groundwater flow and transport examples in this paper, a 2-D wavelet expansion is used. The 2-D wavelets we used are simply tensor products of 1-D wavelets, which might not be optimal for anisotropic property fields. The examples from the Haar wavelet and the four-coefficient Daubechies wavelet are shown in Figures 1(c) and 1(f).
 We emphasize, however, that there is no advantage to applying a linear transformation to the state variables, unless it is accompanied by some type of regularization/localization in the transformed domain. The simplest form of regularization in the transform domain is truncation, e.g., truncating small singular values in Karhunen–Loève decomposition or truncating small coefficients in the discrete cosine or discrete wavelet transform. Following equations (1) and (2) in the standard EnKF each realization j of the ensemble is updated using
where and are mean-removed realizations of the state vector and the predicted data vector. If a linear transformation W is applied to the state vector y and and the updates are computed in the transformed domain, the inverse transform of updates are identical to the results obtained by computation in the usual spatial domain and updating the standard parameters:
On the other hand, the wavelet transform applied to often results in a Kalman gain matrix in which there is a greater separation between elements with large and small magnitudes than in the usual spatial domain. When equation (4) is used to compute the screening matrix for the wavelet domain Kalman gain, the resulting is different from the screening matrix for the spatial domain due to the difference in magnitudes of the elements and variability in the estimates.
2.4. Three Approaches to Data Assimilation
 We will investigate three methods of estimating the Kalman gain from an ensemble of state variables. We denote the three estimates of the Kalman gain by , , and in the following description.
2.4.1. No Localization
 Estimate from the ensemble of realizations of model variables (in the usual gridblock-based local parameterization) and predicted data:
2.4.2. Adaptive Localization
 Estimate from the ensemble of realizations of model variables and predicted data, then apply adaptive localization:
2.4.3. Adaptive Localization With Wavelet Parameterization
 In this approach, we first apply the wavelet transform to obtain an ensemble of wavelet coefficients without truncation of any coefficient. From the ensemble of realizations of the wavelet coefficients and predicted data, we estimate the Kalman gain for updating the wavelet coefficients. We then apply adaptive localization to the estimate in the wavelet domain, and finally apply the inverse transform to obtain a Kalman gain matrix in the usual gridblock domain:
where W is the wavelet transform, is the inverse wavelet transform, and is the matrix of screening coefficients for the Kalman gain in the wavelet domain.
3. Linear 1-D Investigation
 For the linear 1-D test problem, the ensemble members are samples of a multi-Gaussian vector of length 128 with zero mean and a covariance function in the exponential family with a practical correlation range of 10,
where s is the separation distance in 1-D. We observe the averages of the first 16 and 64 elements. The observation operation is where is a column vector of length 128 with all zero elements except for the first 16 elements which are equal to 1/16, and is a column vector of length 128 with all zero elements except for the first 64 elements which are equal to 1/64. The observed averages are both equal to 1.0 (data to be matched). Errors in the observations are assumed to have a Gaussian distribution with zero mean and standard deviation of 0.05. In the case with wavelet parameterization, we use the Haar wavelet, for which the maximum decomposition depth is seven. Because the number of variables in the example is small, the data relationship is linear, and the errors in observations are Gaussian, the true cross covariance and Kalman gain for updating model variables can be computed easily. Figure 2 shows columns of the true cross covariance and Kalman gain for both observations. Although the sensitivity is nonzero only in the intervals and for the two observations, respectively, the nonzero cross covariance extends beyond the interval because of the spatial correlation of the variables. The Kalman gain features are further complicated due to the interaction between the two observations through the inversion term in equation (2). Since there is no dynamics in this 1-D example, the two data are assimilated simultaneously, so there is only one analysis step for the EnKF.
 The primary test of the accuracy of the methods with different estimates of the Kalman gain will be comparison of the Kalman gain estimate with the true Kalman gain . We compute the root-mean-square error (RMSE) of each element Kij of the Kalman gain estimate from independent ensembles:
The experiment is repeated with independent ensembles in order to reduce the effect of variability in the initial ensemble. In this 1-D example, we choose , so 100 Kalman gain estimates ( ) are computed from 100 independent ensembles of samples using the three methods described in section 2.4.
 The average RMSE of the two columns of the Kalman gain estimate are shown in Figure 3 for several ensemble sizes on a log-log scale. In this case, the EnKF with the usual gridblock basis and without localization (dashed line with open circles) always has the largest errors while the EnKF with the wavelet parameterization and adaptive localization (solid line with red triangles) always has the smallest errors. This result, however, would not be true for every observation operator and for every covariance function, because the various methods do not have the same spatial distribution of errors.
Figure 4 shows the RMSE of all elements of the Kalman gain corresponding to the two data for the three methods with two ensemble sizes, 10 and 100. When the Kalman gain is estimated using the usual gridblock basis, without any localization, the error is fairly uniform and has relatively high magnitude (open circles). When adaptive localization is used (black solid circles), the magnitude of the error drops dramatically in the region , where the data sensitivity is zero, but slightly increases compared to the standard method in the region . This is because while adaptive localization eliminates spurious correlations, it also eliminates real correlations that are weak. Finally, when adaptive localization is used with a wavelet basis (red triangles), the magnitude of the error is generally decreased in both regions relative to the standard method without localization. As shown by the red triangles, the use of the Haar wavelet expansion introduces some spikiness at locations with sudden transition of nonzero sensitivity (G) to zero sensitivity for the two data. The Kalman gain is continuous, however, due to the prior covariance of the model variables. The smooth transition zones around x = 16 and x = 64 in the Kalman gain (Figure 2(b)) cannot be represented by low-order Haar wavelets which only have abrupt changes (Figures 1(a) to 1(c)). Note that the “low order” comes from the damping effect of .
 As mentioned with the discussion of Figure 4, one problem with any localization method is that it tends to eliminate some weak, but real, correlations along with the spurious correlations. For point measurements of the model variables, this may not be a serious problem because the correlation of the observation to the model variable is high. For an averaged or nonlocal observation, however, the correlations of the observation to all model variables are weak and they may all be eliminated or reduced in magnitude by the adaptive localization. After updating, if the model is used to compute the predicted data, we find that the predictions from the model do not match the data, even when the relationship is linear. Table 1 shows the average of the mean and standard deviation of the predicted data from the updated ensembles using the three methods with various ensemble sizes from the 100 independent ensemble runs. The true expected value for the predicted data after data assimilation is 0.999 with the standard deviation equal to 0.05 for the first observation, and 0.981 with the standard deviation equal to 0.05 for the second observation. The expected value of the conditional observation is less than 1.0 because it is a weighted average of the value of the observation and the value of the observation operator applied to the prior model variables. Although the Kalman gain estimated from the ensemble without localization using the usual gridblock parameterization has a large RMSE as shown in Figures 3 and 4, its use in the EnKF results in updated models that reproduce the data while the results from both approaches that use localization are biased. The predicted data from the wavelet parameterization with adaptive localization, however, is less biased than those from the usual gridblock parameterization with adaptive localization.
Table 1. Match to the Two Data (Average Over Interval 1 to 16 and Average Over the Interval 1 to 64) After Data Assimilationa
Wavelet w/Adapt Loc
The number in front of the “/” is the average mean and the number after the “/” is the average standard deviation. The average is taken over 100 independent ensembles.
Ne = 10
Ne = 25
Ne = 100
Ne = 10
Ne = 25
Ne = 100
Figure 5 shows the mean and standard deviation of the Kalman gain estimates from the three methods from 100 independent ensembles with ensemble size of 10 and 100. It illustrates why the two adaptive methods have larger or smaller RMSE and why the predicted data are biased. When no localization is used (left column of Figure 5), there is no bias in the estimate of the Kalman gain but the variability is large everywhere. When adaptive localization is used with the usual gridblock parameterization (middle column of Figure 5), the variability in the Kalman gain is small and unbiased in the region , but highly biased and highly variable in the region . This is a result of reducing the magnitude of the Kalman gain in the region of weak but nonzero correlation. For the wavelet expansion with adaptive localization in the wavelet domain (right column of Figure 5), the estimate is still biased in the region but less so than when the standard gridblock parameterization was used. In this case, the correlations with a few coefficients of large-scale variation were strong and were not significantly reduced in magnitude by the adaptive screening. In addition, the variability is smaller with the adaptive localization in the wavelet domain.
 If the covariance of model variables is nearly diagonal in the wavelet domain, diagonalization of the covariance in the wavelet domain could result in improved estimates of the covariance (denoted by ). The Kalman gain matrix (equation (2)) can also be expressed in terms of sensitivity G and covariance of model variables , . The estimate of the Kalman gain can be alternatively regularized using . Experiments showed, however, that if diagonalization is used as a method of regularization it is necessary to have an accurate estimate of the observation operator G in order to reduce the RMSE of the Kalman gain estimate. When the observation operator is a complex nonlinear function representing solutions of a system of partial differential equations as in section 4, the linearization G is often unknown. When G estimated from the ensemble is used with , the resulting estimate of the Kalman gain is worse in some cases than adaptive localization approaches.
 The large magnitude of the spurious correlations in the Kalman gain estimate when localization is not used results in underestimation of the posterior variance after data assimilation. Figure 6 compares the average estimate of the posterior variance for ensemble of size 10 using the three methods for estimating the Kalman gain. Using adaptive localization with both the usual gridblock basis and Haar wavelet basis gives improved estimates of the posterior variance compared to the case without localization, but the choice of basis did not have much effect on the posterior variance estimate.
 Recall that the reason we have chosen to use a multiscale parameterization is that the large-scale parameters or scale parameters that correspond to data sensitivity may have greater sensitivity to data and should be estimated more reliably using small ensembles. We would expect that the corresponding elements of the screening matrix should therefore be closer to 1.0 for the parameters associated with larger scales. Figure 7 compares the mean for all wavelet coefficients computed from 100 independent ensembles of various sizes. In each subplot we show the mean value of versus the Haar wavelet coefficient index. The lower indices are generally associated with larger scales (e.g., the first coefficient is associated with the mean value over the interval 1–128 while the second coefficient is associated with the difference of the mean of the first 64 and the last 64 variables). The last 64 variables in the wavelet expansion are all associated with the difference between values of the variables in adjacent grids. When the ensemble size is small (left column of Figure 7), the magnitude of elements of are small for all small-scale features. The large-scale features (low index) are estimated fairly well at all ensemble sizes, so elements of for large scale features are close to one when the wavelet expansion is used.
4. 2-D Flow and Transport Examples
 In section 4 we compare results from the EnKF with three different estimates of the Kalman gain as described in section 2.4 using two flow and transport examples. In both examples, data are taken at multiple times and are assimilated sequentially, as opposed to the single update in the linear 1-D example. In the first example, the number of data is small at each data assimilation time, only one pressure data at an observation well. Localization is not necessary to increase the degrees of freedom for assimilation of data in the EnKF, but is useful to reduce the effect of spurious correlations. In the second example we assume the spatial distribution of the concentration of an injectant can be measured at several times, so that the number of data is large. In this case localization is necessary for both assimilating data and reducing the spurious correlation. In both examples, the maximum decomposition depth of wavelets is ten. For the four-coefficient Daubechies wavelet, periodic boundary conditions are assumed in the expansion. The flow and solute transport equations are solved using the MODFLOW2K [Harbaugh et al., 2000] and MT3DMS [Zheng and Wang, 1999].
4.1. Pressure Drawdown Measurement
 We consider a slightly compressible single phase 2-D flow problem with closed boundaries. There is one pumping well extracting fluid at a constant rate and one observation well that monitors pressure drawdown. The pressure drawdown data is sensitive to permeability over a large region [Oliver, 1993, Figure 1]. Figure 8(a) shows the reference log-transformed hydraulic conductivity field ( ) with well locations. The size of the grid is . The reference field is drawn from a multivariate normal distribution with anisotropic exponential covariance function with principal directions along the 45 deg line. The standard deviation of the field is 1.2. Figures 8(b) and 8(c) show the coefficients of the Haar and four-coefficient Daubechies wavelet of the reference field. In both cases, the wavelet expansions are much sparser in their representation of the field. The magnitude of coefficients associated with larger scales in the wavelet basis are generally much larger than the coefficients representing small-scale features.
Figure 9 shows the pressure measured at the observation well with time. Only pressure observed at time 20 and 30 are used in the EnKF to estimate the field. The two observation times are indicated by the red dots in Figure 9. The standard deviation of errors in the data is assumed to be 1. Figure 10 compares the ability of each of the methods to generate fields that reproduce the data (reruns using the final estimation of model variables). The results are based on 20 independent ensembles with 50 realizations each, so each box in Figure 10 represents the statistical measures from the 20 runs. The initial realizations are drawn from the same distribution as the reference field. Similar to the linear 1-D example in section 3, adaptive localization with the usual gridblock basis shows the largest data mismatch because the adaptive localization has removed the real, but weak, correlations of pressure drawdown with the hydraulic conductivity of individual gridblocks. Adaptive localization with wavelet basis gives improved data match and in this example the data match from adaptive localization using the Haar and four-coefficient Daubechies wavelet is similar to the case of not using localization.
 Based only on the comparison of the data mismatch in Figure 10, the standard EnKF without localization appears to do as well at matching data as the methods with multiscale expansion and adaptive localization of the Kalman gain. The reason for localization in this example, however, is to reduce the effects of spurious correlations in the Kalman gain estimate which result in errors to the model updates. Figure 11(a) shows the Kalman gain computed from a large ensemble with 1000 realizations for the pressure observation to gridblock values at the first data assimilation. The large ensemble consists of the initial realizations of all the 20 ensembles described earlier. The Kalman gain from this large ensemble is considered as the true Kalman gain to compute the RMSE of the Kalman gain estimates from the 20 independent ensembles with 50 realizations using various methods. The approximation to the true Kalman gain shows negative correlation between the pressure observation and the hydraulic conductivity in the region between the pumping well and the observation well. The feature aligning on the diagonal is due to the anisotropic covariance of the hydraulic conductivity field. Figures 11(b) to 11(e) show the RMSE of the Kalman gain estimates at the first data time for cases of no localization, bootstrap localization, and bootstrap localization with four-coefficient Daubechies basis and Haar basis on the same scale. The use of wavelet parameterization with the adaptive localization clearly gives the lowest RMSE in the Kalman gain estimate in addition to the good match to data (shown in Figure 10).
4.2. Spatial Tracer Concentration Observations
 In the previous example, since there is only one observation at each data assimilation time, localization is not necessary for preventing filter divergence when using the EnKF with moderate ensemble size. The amount of data, however, is generally quite large at fields with a large number of observation wells or sample locations. Localization of some sort is typically necessary in order to reduce the collapse in ensemble variability. In section 4.2 we consider a situation where observations of the spatial concentration of an injectant is used to improve the estimate of the hydraulic conductivity field. We will refer to the injectant as a tracer, although we assume that it has properties such as conductivity contrast that allow the spatial distribution to be monitored. Figure 12(a) shows the reference log-transformed hydraulic conductivity field (same as the previous example) and the location of the injector and the four pumping wells. All wells are under fixed rate control. The study area is closed on the upper and lower side and have fixed pressure boundary on the left and right side. A passive tracer is injected with the injection fluid for the first 10 days. In this example, we assume the injection fluid is the same as the formation fluid so there is only one slightly compressible phase present in the system. Figures 12(b) and 12(c) shows the tracer concentration at day 120 and 330 in the reference case. The size of the ensemble is again 50 and the same initial realizations as in section 4.1 are used. The mean of the initial ensemble and two of the realizations are shown in the top row of Figure 13. The mean of the tracer concentration predicted by the initial ensemble and concentration predicted by the two individual realizations are shown at the bottom row of Figure 13.
 The observation for data assimilation is tracer concentration at each grid cell measured at four different times, day 60, 120, 240, and 330. The standard deviation of concentration observation error is 0.3. At each data assimilation time the concentration measurement is included as data at gridblocks that have nonzero concentration variability from the ensemble forecast. The three methods of computing the Kalman gain are compared in terms of estimation of the and the ability of the updated ensemble to reproduce data. The four-coefficient Daubechies wavelet is used to transform the field for the method using wavelets with adaptive localization.
Figure 14 shows the mean estimate of the after data assimilation on the same scale as the reference in Figure 12(a). Figure 15 shows cross plots of the tracer concentration prediction by two of the updated realizations versus observed concentration at the four data assimilation times. The number of data used in each method at the four data assimilation times are shown in Table 2. Without localization, the ensemble variability reduced excessively after the assimilation of concentration data at the first data assimilation time, the number of gridblocks with nonzero variability in the ensemble concentration prediction at the second data assimilation is only 34, so that only observed concentration at these 34 gridblocks are used as data at the second data assimilation time. The number of gridblocks with nonzero variability in the ensemble concentration prediction further decreases at later data assimilation times due to the collapse in ensemble variability. The mean of the estimated ensemble from the EnKF without localization seems to be somewhat reasonable (Figure 14(a)) but the updated realizations give poor match to the data (Figure 15(a)). The results from adaptive localization and wavelet with adaptive localization are very similar and both show improved estimates and predictions compared to the case of no localization.
Table 2. Number of Data Used at Each Data Assimilation Time
Data Assimilation Time
First (Day 60)
Second (Day 120)
Third (Day 240)
Fourth (Day 330)
Wavelet w/Adapt Loc
 The experiment was again repeated with multiple, independent ensembles for all the three methods in order to reduce the effect of variability of the initial ensemble. Figure 16 shows the mismatch of the concentration at the four data assimilation times predicted from rerunning the updated ensembles, together with prediction from the initial ensembles for comparison. The results are based on 20 independent ensembles with 50 realizations. Similar to results in Figure 15 for a single ensemble, the two adaptive localization methods are similar in their magnitude of data mismatch, and are both much better than the case without localization. While this seems to contradict the findings of the previous examples, in which adaptive localization in wavelet domain gives better data match than in the usual gridblock domain, the reason is that while the covariance between the model parameters and forecast observations ( ) has large-scale features associated with the sensitivity of the data, those large-scale features do not appear in the Kalman gain due to the effect of the covariance of forecast observations ( ). We illustrate the difference between the cross covariance and Kalman gain when there exist highly correlated data (as in this tracer example) using a large ensemble with 1000 realizations. Figure 17(a) shows the standard deviation of the concentration prediction at day 120 from the large ensemble (without conditioning to any data). The observed concentration at gridblocks with nonzero standard deviation in Figure 17(a) are used as data for the assimilation. Figure 17(b) shows cross covariance between the gridblock and the concentration observation at one of the gridblocks (indicated by the black circle). The cross covariance shows large scale features that would potentially make multiscale parameterization beneficial for adaptive localization. Figure 17(c) shows the Kalman gain for updating the gridblock using the same concentration observation. After multiplying the cross covariance by , the Kalman gain does not show the same large scale features as in the cross covariance due to the correlation among data. Since the adaptive screening is performed on the Kalman gain instead of the cross covariance (equation (3)), using multiscale parameterization did not show obvious improvement in this case.
5. Summary and Conclusions
 In this paper we examined the behavior of adaptive localization for estimating the Kalman gain from a relatively small ensemble of samples. The purpose of adaptive localization is to identify spurious correlations and reduce the effects of them to the estimates. We did not evaluate distance-based or sensitivity-based approaches to localization because while they may work well on some problems, our interest is in problems for which the observations are related to spatial averages of local variables, in which case the nonadaptive methods are not trivial to apply.
 It turns out, however, that adaptive methods of localization can also have problems assimilating observations that are averages of variables because, as the region of averaging increases, the correlation of local variables with observations becomes weaker, and it becomes more likely that the true correlations will be removed by the adaptive localization. The results in that case would be biased.
 We address this problem by applying adaptive localization to the coefficients of a multiscale expansion of the Kalman gain. The result is an estimate of the Kalman gain that is less biased and has smaller RMSE when some basis vectors in the multiscale parameterization are of similar scale to the features in the true Kalman gain. Although specific types of wavelets are used in this study, other wavelets or other multiscale expansions might be more suitable depending on the type of observations. In problems with large amounts of correlated data, e.g., time-lapse observations of tracer, although the sensitivity and cross covariance of data with model parameters may contain large-scale features, the Kalman gain does not. In this situation, both adaptive localization methods work nearly equivalently to prevent filter collapse and to generate ensemble members that reproduce the observations.
 The first author acknowledges financial support from the Research Council of Norway (Petromaks program) and industrial sponsors through the project “Reservoir characterization using EnKF.” The second author acknowledges financial support from the Advanced Energy Consortium through the project “Data analysis and inversion for mobile nanosensors.” We thank the three reviewers for their helpful suggestions. A portion of this work was presented in SPE-141810 at the 2011 SPE Reservoir Simulation Symposium held in The Woodlands, Texas, 21–23 February 2011.