## 1. Introduction

[2] Characterizing the underlying heterogeneous hydraulic conductivity field is a critical step for understanding and modeling solute transport under dynamic flow conditions. The ultimate goal of subsurface aquifer characterization has gradually shifted from finding an optimized set of model parameters to finding equally likely realizations of parameters that can be used to quantify uncertainty in model predictions. Bayesian aquifer characterization [e.g., *Murakami et al*., 2010; *Chen et al*., 2012] has been performed at the Hanford IFRC, a field experimental site focused on understanding and modeling subsurface uranium transport in the groundwater-surface water interaction zone. However, these full Bayesian approaches require a large number of realizations to derive the full posterior distribution of parameters, which hinders their application in complex systems where computationally intensive forward models are often needed. In this study, we investigate alternative ensemble-based data assimilation techniques, including the Ensemble Kalman filter (EnKF) and its smoother variants. These are more computationally efficient; yet still provide satisfactory approximations to the posterior distribution of parameters.

[3] EnKF and its variants have been widely used for assimilating dynamic data in meteorology and oceanography since being introduced by *Evensen* [1994], with clarifications presented by *Burgers et al*. [1998]. More recently, its application has been expanded to petroleum engineering and hydrology, mainly due to its computational efficiency, ease of implementation, and relative robustness against nonlinearities. Readers are referred to *Evensen* [2003] for an extensive review of EnKF, *Aanonsen et al*. [2009] and *Oliver and Chen* [2011] for a comprehensive overview of such techniques applied in petroleum engineering history matching, and *Schoniger et al*. [2012] for a review of applications in hydrology. In a review of recent progress on reservoir history matching, *Oliver and Chen* [2011] compared EnKF to other inverse modeling techniques.

[4] The computational efficiency of EnKF and its variants is achieved by avoiding calculation of a sensitivity matrix that is typically required for gradient-based parameter estimation and optimization methods and often requires a large number of forward simulations [*Evensen*, 2003; *Nowak*, 2009]. Furthermore, the computational cost of EnKF is significantly lower than that of alternative Monte Carlo-based methods, such as the sequential self-calibration (SSC) method [*Sahuquillo et al*., 1992; *Gómez-Hernández et al*., 1997; *Capilla et al*., 1997], and the method of anchored distributions (MAD) [*Rubin et al*., 2010; *Chen et al*., 2012]. Comparison studies of SSC versus EnKF in terms of the computational cost and quality of the predictions [*Hendricks Franssen and Kinzelbach*, 2009] revealed that EnKF performs as well as SSC with a lower computational cost, approximately 80 times faster than SSC.

[5] Data assimilation in hydrologic and petroleum engineering problems deals primarily with the estimation of static model parameters, such as the hydraulic conductivity field in aquifer characterization, rather than model states as in meteorology and oceanography. Therefore, EnKF has been reformulated as an augmented state vector approach [e.g., *Aanonsen et al*., 2009; *Evensen*, 2009] and as a dual state parameter approach [e.g., *Moradkhani et al*., 2005], such that the unknown static model parameters are estimated along with the unknown dynamic model states. While model states are usually nonlinear functions of model parameters, the traditional EnKF updating does not enforce consistency of the updated states and model parameters for nonlinear problems. Therefore, *Wen and Chen* [2006] proposed a conforming step (i.e., rerunning the forward simulations once the parameters are updated) to ensure consistency in their case of multiphase flow in porous media. More recently, *Nowak* [2009] reformulated the state space EnKF to a p-space (parameter space) EnKF, which only updates model parameters and not model states. When implementing the p-space EnKF, forward simulations with updated parameters are necessary for evolving the system states in time. Thus, the consistency between the parameters and states is enforced, and nonphysical state variable values are avoided.

[6] Despite the computational efficiency of EnKF, the need to use highly parameterized and computationally intensive groundwater models for evolving the model states in p-space EnKF still poses significant computational challenges for parameter estimation and uncertainty quantification. We see high-performance computing as a solution to the computational challenge with recent advances in computing power and availability of massively parallel simulators. For example, *Chen et al*. [2012] completed 840,000 forward runs (used approximately 267,000 central processing unit (CPU) hours) with reasonable turnaround time for full Bayesian data assimilation using the massively parallel three-dimensional reactive flow and transport code PFLOTRAN [*Hammond and Lichtner*, 2010]. The multirealization simulation capability of PFLOTRAN is especially useful for ensemble-based or realization-based data assimilation methods. PFLOTRAN was consequently used in this study, as a full three-dimensional (3-D) simulation of flow and transport processes is necessary due to the extremely dynamic flow conditions present at our study site in the groundwater-surface water mixing zone.

[7] In this study, we employ the p-space EnKF and its ensemble smoother (ES) variants for characterizing the hydraulic conductivity field at the Integrated Field Research Challenge (IFRC) site in U.S. Department of Energy's Hanford 300 Area (http://ifchanford.pnnl.gov). The objectives of this study are fourfold: (1) To apply the p-space EnKF and ES to condition aquifer characterization on tracer test data and two types of prior hydraulic measurements (constant rate injections and borehole flowmeter surveys) that were assimilated using the MAD approach in a previous study [*Murakami et al*., 2010]; (2) To assess the accuracy and computational efficiency of the p-space EnKF and compare performance with its ES variants; (3) To show the iterative process of implementing ensemble-based data assimilation methods in a real-world application and illustrate implementation details; and (4) To demonstrate the need for high-performance computing to integrate the ensemble-based data assimilation methods with computationally intensive forward simulation models. By using ensemble-based methods for parameter estimation, we adopt the assumption that the dominant uncertainty in modeling flow and transport at the site lies in the heterogeneous hydraulic conductivity field. However, the ensemble-based methods are not restricted to estimating static parameters only. They are also sufficiently flexible to update dynamic parameters, such as model forcings, along with static parameters. This latter issue will be explored in a future study.