## 1. Introduction

[2] Typical state variables of saturated groundwater flow and transport problems include pressure and concentration of various species or injectant. For a given set of boundary conditions and well constraints, the distribution of the state variables at any time is a function of the spatial distribution of the formation properties, such as permeability and porosity. One key feature of the problem is that pressure and concentration are sensitive to formation properties over a relatively large region [*Oliver*, 1993, 1998]. For example, the concentration of an injected conservative tracer at a particular location is primarily sensitive to permeability along a streamline passing through the observation location, but is sensitive to porosity only along the part of the streamline between the injector and the observation location. In both cases, the sensitivity is highly nonlocal, i.e., the observed concentration is not primarily sensitive to formation properties in a small region. We refer to data that are sensitive to uncertain parameters over an extended region as nonlocal observations. In this study we mainly address the estimation of formation properties, so pressure and concentration data are nonlocal observations. In case of state estimation problems, e.g., in which the spatial distribution of the tracer concentration is to be estimated, the concentrations measured at distributed locations become local observations.

[3] In the ensemble Kalman filter (EnKF) approach [*Evensen*, 1994] to data assimilation, an ensemble of simulation models is used to propagate the correlation between the formation properties and predicted observations. When the size of the ensemble is small, the ensemble approximation of the covariance can be badly corrupted by noise. In addition, the degrees of freedom in the ensemble is only as large as the size of the ensemble, which makes the assimilation of large amounts of independent data impossible. For distributed parameter estimation problems, it is often reasonable to assume that data beyond some critical distance have no influence on the update of a model variable, so some type of distance-based localization [*Houtekamer and Mitchell*, 1998] or tapering function [*Houtekamer and Mitchell*, 2001; *Hamill et al.*, 2001] is used to regularize the updates from the EnKF. The distance beyond which the influence of data can be neglected is a function of the data sensitivity, the prior covariance, and also the size of the ensemble [see *Chen and Oliver*, 2010a]. Distance-based localization often works fairly well if the localization functions are appropriately chosen, but lacks generality and is not applicable to parameters that are not spatially distributed (e.g., specific hydraulic conductivity curves).

[4] Adaptive localization/regularization methods are alternatives to the distance-based localization. The localization functions in an adaptive regularization method are estimated from multiple ensembles [*Anderson*, 2007] or multiple bootstrapped ensembles [*Zhang and Oliver*, 2010] to reduce the variance of the estimate of the Kalman gain, so the adaptive methods are more general in that no empirical critical distance needs to be defined. Because the observations of pressure and concentration are often sensitive to formation properties over a broad region, the correlation between the observation at one location and the property at another location is generally quite small. If a small ensemble is used to compute the Kalman gain, the weak (but real) correlations will appear to be unreliable in an adaptive localization method, and the updates in those regions may be eliminated or reduced in magnitude. The result will be that the updates to the model may be substantially underestimated. In this situation we propose to apply adaptive localization in a transformed domain in which the standard gridblock parameters are projected to basis vectors with different scales. The multiscale expansion allows the use of basis vectors that correspond to large spatial averages or correspond to high sensitivity regions to data. The Kalman gain is computed for the coefficients of the multiscale expansion, so that the correlations of data to coefficients corresponding to important spatial scales are increased and can be more reliably estimated from a limited-size ensemble. The adaptive localization is applied to the Kalman gain for coefficients of the multiscale expansion, before back transformation to the normal gridblock domain, to reduce the magnitude of updates of less reliable elements. Since the scales of correlation between data and model and state variables are generally not known for typical parameter estimation problems, and the relationships are further complicated by previous data assimilation in sequential methods, we choose a discrete wavelet expansion to include all the scales and the elimination or damping of the updates is only done through adaptive screening of the Kalman gain. Note that compared to expansions in which the structure of the basis vector extends through the entire domain (e.g., Karhunen–Loève and discrete cosine transform), the basis vectors of the discrete wavelet expansion are localized in the spatial domain. This localization feature makes the discrete wavelet expansion more suitable for regularization of observations that correspond to spatial averages.

[5] There is a fairly extensive history of using multiscale methods to regularize the spatial parameter estimation. In one of the early examples, *Liu* [1993] compared the behavior of a gradient-based minimization algorithm for solving an inverse problem when the unknown parameters were coefficients of the usual local basis and when the parameters were coefficients of the Haar wavelet basis. A process in which the coefficients were computed at progressively finer scales was determined to be robust and efficient, presumably because the objective function was most sensitive to the parameters at the largest scales and because the relationship of observations to large-scale parameters was more nearly linear. The connection between sensitivity and scale has been noted by other authors who have used it to develop adaptive multiscale estimation algorithms for parameter estimation that is driven by the information content of data [*Grimstad and Mannseth*, 2000; *Grimstad et al.*, 2003; *Berre et al.*, 2009].

[6] When wavelet expansions are used to represent parameter fields in inverse problems, truncation of small coefficients has been a common method of regularization. It was, for example, applied by *Tangborn* [2004] to an extended (not ensemble) Kalman filter to reduce the dimension of the error covariance matrix. For assimilation of observations from Burgers' equation, accurate results were obtained when the covariance was projected onto a wavelet basis that used only 6% of the coefficients. *Pannekoucke et al.* [2007] showed that diagonalization of the covariance matrix in the wavelet domain could be effective at reducing the amount of spurious correlations in the ensemble approximation of the model covariance matrix while still retaining the ability to allow the covariance to vary spatially. In their examples [and the examples of *Buehner and Charron*, 2007], the length scale of the covariance was spatially variable, but did not include long range correlations due to assimilation of nonlocal observations. Observations of spatial averages of properties typically result in long range weak negative correlations of local variables and large magnitude, off-diagonal elements of the covariance matrix of the wavelet coefficients, in which case diagonalization of the wavelet transform of the error covariance is not an appropriate regularization method.

[7] Various multiscale parameterizations have been used with the EnKF previously. *Zhou et al.* [2008] introduced an ensemble multiscale filter that combines the EnKF with a multiscale autoregressive framework in order to improve the efficiency of the EnKF for high-dimensional dynamic systems with large amounts of data. By truncating states of the coarser multiscale tree nodes and diagonalizing the mapping matrix between adjacent scales, the ensemble multiscale filter was able to provide a certain degree of localization in scale. *Zhang and Oliver* [2011] and *Chen and Oliver* [2010b] parameterized the model variables in terms of a global trend model and local fluctuations to improve the estimate over large regions. The distance-based localization was then applied to only the updates of local heterogeneity. *Pajonk et al.* [2010] showed improved estimation from the EnKF for a linear advection model with local observations by using wavelet analysis with distance-based localization. In *Pajonk et al.* [2010], the Kalman gain is an interpolation among multiple per-level-Kalman gains that are computed based on ensembles constructed with various degrees of high frequency components truncated in the wavelet domain.

[8] The objective of this paper is to show that when the observations are spatially nonlocal, there are advantages to the application of adaptive localization methods for the EnKF in the wavelet domain, to improve the estimation of the Kalman gain and to reduce the bias in reproducing data. In the remainder of the paper, we briefly review the EnKF, the bootstrap-based adaptive localization for screening of Kalman gain and the discrete wavelet expansion. The benefit of using wavelet expansion with adaptive localization is first demonstrated using a simple one-dimensional (1-D) problem for which the observations are spatial averages of properties. The true Kalman gain can be easily computed in this example so that the effect of adaptive localization on the root-mean-square error (RMSE) and bias in the Kalman gain estimate and on the ability to reproduce the observations can be investigated. The other two examples are two-dimensional groundwater flow and transport problems, in which the pressure head at an observation well or the spatial distribution of the concentration of an injected pulse of tracer (a contrast agent) is observed at several times. In all of the examples, results from three methods of computing the Kalman gain from the ensemble of samples are compared: (1) a standard, perturbed-observation, form of the EnKF [*Burgers et al.*, 1998; *Houtekamer and Mitchell*, 1998] with the usual gridblock-based local parameterization, (2) a screened form of the EnKF which uses bootstrap resampling of the Kalman gain in the gridblock domain to remove spurious estimates [*Zhang and Oliver*, 2010], and (3) a screened form of the EnKF which uses bootstrap resampling of the Kalman gain in the wavelet domain.