Subsurface aquifer characterization often involves high parameter dimensionality and requires tremendous computational resources if employing a full Bayesian approach. Ensemble-based data assimilation techniques, including filtering and smoothing, are computationally efficient alternatives. Despite the increasing use of ensemble-based methods in assimilating flow and transport related data for subsurface aquifer characterization, most applications have been limited to synthetic studies or two-dimensional problems. In this study, we applied ensemble-based techniques adapted for parameter estimation, including the p-space ensemble Kalman filter and ensemble smoother, for assimilating field tracer experimental data obtained from the Integrated Field Research Challenge (IFRC) site at the Hanford 300 Area. The forward problem was simulated using the massively parallel three-dimensional flow and transport code PFLOTRAN to effectively deal with the highly transient flow boundary conditions at the site and to meet the computational demands of ensemble-based methods. This study demonstrates the effectiveness of ensemble-based methods for characterizing a heterogeneous aquifer by assimilating experimental tracer data, with refined prior information obtained from assimilating other types of data available at the site. It is demonstrated that high-performance computing enables the use of increasingly mechanistic nonlinear forward simulations for a complex system within the data assimilation framework with reasonable turnaround time.
 Characterizing the underlying heterogeneous hydraulic conductivity field is a critical step for understanding and modeling solute transport under dynamic flow conditions. The ultimate goal of subsurface aquifer characterization has gradually shifted from finding an optimized set of model parameters to finding equally likely realizations of parameters that can be used to quantify uncertainty in model predictions. Bayesian aquifer characterization [e.g., Murakami et al., 2010; Chen et al., 2012] has been performed at the Hanford IFRC, a field experimental site focused on understanding and modeling subsurface uranium transport in the groundwater-surface water interaction zone. However, these full Bayesian approaches require a large number of realizations to derive the full posterior distribution of parameters, which hinders their application in complex systems where computationally intensive forward models are often needed. In this study, we investigate alternative ensemble-based data assimilation techniques, including the Ensemble Kalman filter (EnKF) and its smoother variants. These are more computationally efficient; yet still provide satisfactory approximations to the posterior distribution of parameters.
 EnKF and its variants have been widely used for assimilating dynamic data in meteorology and oceanography since being introduced by Evensen , with clarifications presented by Burgers et al. . More recently, its application has been expanded to petroleum engineering and hydrology, mainly due to its computational efficiency, ease of implementation, and relative robustness against nonlinearities. Readers are referred to Evensen  for an extensive review of EnKF, Aanonsen et al.  and Oliver and Chen  for a comprehensive overview of such techniques applied in petroleum engineering history matching, and Schoniger et al.  for a review of applications in hydrology. In a review of recent progress on reservoir history matching, Oliver and Chen  compared EnKF to other inverse modeling techniques.
 The computational efficiency of EnKF and its variants is achieved by avoiding calculation of a sensitivity matrix that is typically required for gradient-based parameter estimation and optimization methods and often requires a large number of forward simulations [Evensen, 2003; Nowak, 2009]. Furthermore, the computational cost of EnKF is significantly lower than that of alternative Monte Carlo-based methods, such as the sequential self-calibration (SSC) method [Sahuquillo et al., 1992; Gómez-Hernández et al., 1997; Capilla et al., 1997], and the method of anchored distributions (MAD) [Rubin et al., 2010; Chen et al., 2012]. Comparison studies of SSC versus EnKF in terms of the computational cost and quality of the predictions [Hendricks Franssen and Kinzelbach, 2009] revealed that EnKF performs as well as SSC with a lower computational cost, approximately 80 times faster than SSC.
 Data assimilation in hydrologic and petroleum engineering problems deals primarily with the estimation of static model parameters, such as the hydraulic conductivity field in aquifer characterization, rather than model states as in meteorology and oceanography. Therefore, EnKF has been reformulated as an augmented state vector approach [e.g., Aanonsen et al., 2009; Evensen, 2009] and as a dual state parameter approach [e.g., Moradkhani et al., 2005], such that the unknown static model parameters are estimated along with the unknown dynamic model states. While model states are usually nonlinear functions of model parameters, the traditional EnKF updating does not enforce consistency of the updated states and model parameters for nonlinear problems. Therefore, Wen and Chen  proposed a conforming step (i.e., rerunning the forward simulations once the parameters are updated) to ensure consistency in their case of multiphase flow in porous media. More recently, Nowak  reformulated the state space EnKF to a p-space (parameter space) EnKF, which only updates model parameters and not model states. When implementing the p-space EnKF, forward simulations with updated parameters are necessary for evolving the system states in time. Thus, the consistency between the parameters and states is enforced, and nonphysical state variable values are avoided.
 Despite the computational efficiency of EnKF, the need to use highly parameterized and computationally intensive groundwater models for evolving the model states in p-space EnKF still poses significant computational challenges for parameter estimation and uncertainty quantification. We see high-performance computing as a solution to the computational challenge with recent advances in computing power and availability of massively parallel simulators. For example, Chen et al.  completed 840,000 forward runs (used approximately 267,000 central processing unit (CPU) hours) with reasonable turnaround time for full Bayesian data assimilation using the massively parallel three-dimensional reactive flow and transport code PFLOTRAN [Hammond and Lichtner, 2010]. The multirealization simulation capability of PFLOTRAN is especially useful for ensemble-based or realization-based data assimilation methods. PFLOTRAN was consequently used in this study, as a full three-dimensional (3-D) simulation of flow and transport processes is necessary due to the extremely dynamic flow conditions present at our study site in the groundwater-surface water mixing zone.
 In this study, we employ the p-space EnKF and its ensemble smoother (ES) variants for characterizing the hydraulic conductivity field at the Integrated Field Research Challenge (IFRC) site in U.S. Department of Energy's Hanford 300 Area (http://ifchanford.pnnl.gov). The objectives of this study are fourfold: (1) To apply the p-space EnKF and ES to condition aquifer characterization on tracer test data and two types of prior hydraulic measurements (constant rate injections and borehole flowmeter surveys) that were assimilated using the MAD approach in a previous study [Murakami et al., 2010]; (2) To assess the accuracy and computational efficiency of the p-space EnKF and compare performance with its ES variants; (3) To show the iterative process of implementing ensemble-based data assimilation methods in a real-world application and illustrate implementation details; and (4) To demonstrate the need for high-performance computing to integrate the ensemble-based data assimilation methods with computationally intensive forward simulation models. By using ensemble-based methods for parameter estimation, we adopt the assumption that the dominant uncertainty in modeling flow and transport at the site lies in the heterogeneous hydraulic conductivity field. However, the ensemble-based methods are not restricted to estimating static parameters only. They are also sufficiently flexible to update dynamic parameters, such as model forcings, along with static parameters. This latter issue will be explored in a future study.
2. Site Conditions and Experiment Description
 The Hanford IFRC site is located in southeastern Washington State, approximately 250 m west of the Columbia River. The site lies within the footprint of a former disposal facility for uranium-bearing liquid wastes known as the South Process Pond. Research activities at the site have focused on understanding the long-term persistence of the uranium plume at the site, which appears to be caused by the seasonal release of uranium from the lower vadose zone during water table excursion events [e.g., Zachara, 2010; Murray et al., 2012; Zachara et al., 2012]. The transport behavior at the site is extremely complicated due to the combined effects of dynamic flow conditions as a result of adjacent river stage fluctuations (ranging 2–3 m or more annually and averaging 0.5 m diurnally) and their interaction with highly permeable sediments (known as the Hanford formation) dominated by gravels, cobbles, and boulders with discontinuous low-permeability inclusions. The Hanford formation is underlain by the significantly less permeable Ringold Formation. The lithology of the site is described by Bjornstad et al.  and Chen et al. .
 The 1600 m2 triangular well field at the IFRC site (Figure 1) was designed to monitor solute transport under a highly variable groundwater flow direction. Most of the wells were initially completed to a depth of 20 m and with screens spanning the entire saturated portion of the Hanford formation (approximately 12–14 m thick, depending on the season). The bottom 10 m of the fully screened wells was later sealed with bentonite to minimize the deleterious impacts of intrawellbore flow on monitoring data (as discussed by Vermeul et al.  and Chen et al. ). Three multilevel well clusters screened over three different depth intervals were installed to provide depth-discrete monitoring. The shallow wells in each cluster are screened over a 1.53 m (5 ft) interval located at 9.14–10.67 m below ground surface, the intermediate wells are screened over a 0.61 m (2 ft) interval located at 12.86–13.47 m below ground surface, and the deep wells are screened over a 0.61 m (2 ft) interval located at 16.46–17.07 m below ground surface.
 Initial hydrologic characterization was performed at the site on the fully screened wells before the discovery of wellbore flows. These measurements included constant rate injection tests (on 14 wells), borehole flowmeter surveys (on 26 wells), and two nonreactive tracer experiments (November 2008 and March 2009). The tracer experiments were affected by intrawellbore flow, which has made the selection of reliable measurements from these data sets challenging [Chen et al., 2012]. Therefore, these two tracer experiments were not included in the prior information for this analysis.
 Another nonreactive tracer test was performed in March 2011 after the deep portions of the fully screened wells were sealed, and this test had little to no impact from the intrawellbore flows. This tracer experiment was performed with a long and slow injection of chloride spiked groundwater (Cl−, 210 mg/L) into the upper 2 m of the aquifer. The solution was injected into Well 2–34 for 353 h at a nearly constant rate of 6.47 × 10−4 m3/s. The total injected volume was 822.2 m3. The plume was monitored for several weeks as it migrated out of the monitoring domain. Aqueous samples were collected from all wells intersected by the plume over time and their Cl− concentration was quantified. The plume trajectory was complex given the variable gradients and groundwater flow directions during the experiment. Two snapshots of the tracer plume are provided in Figure 2 for illustration.
3. Methodology and Implementation
 The implementation of the p-space EnKF, like the original EnKF, includes two main steps, the forecast step and the analysis step. In the forecast step, an ensemble of the parameter field is provided to a process model that solves a system of governing equations to generate predictions of model states. The analysis step updates the ensemble of the parameter field based on the covariance structure of the parameter fields and the predicted model states, and the discrepancy between observed and predicted model states. We will first describe the updating scheme of p-space EnKF, and then the implementation of p-space EnKF and its ES variant to assimilate the tracer test data.
3.1. EnKF Updating Scheme
 We denote the heterogeneous parameter field by Y(x) with x being the space coordinate. Each realization of the parameter field is reorganized into a vector yi with the superscript i representing the ith realization in the ensemble. The analysis scheme for updating the parameter vector is then
where the superscripts a and f represent analyzed and forecast states, respectively, and variables in bold font refer to vectors or matrices. The predicted model states given yi,f is di,f, and the observed model states (data) is dobs. Assuming the forward model is g(·), then . The observed data are assumed to have independent Gaussian measurement errors with zero mean and covariance matrix R, and the ith realization of measurement error is ei. When a suite of spatial and temporal measurements is available, yi,f could be a subset of state vector monitored at select locations. CYD is the covariance between the parameter fields and the predicted model states, and CDD is the autocovariance of the predicted model states. Crossvariances included in equation (1) are estimated from the ensemble, and the quality of the covariance estimates depends highly on the size of the ensemble. The measurement error term is added to derive the correct posterior variance and avoid inbreeding problems [Burgers et al., 1998]. in equation (1) is the ensemble-based estimation of Kalman gain and is the mismatch between the observed and simulated states (i.e., residual), and their product forms the perturbed innovation or correction term.
 Each data assimilation step (forecast and analysis) integrates all the information prior to the most current data (equivalent to the prior information in a Bayesian framework) and information provided by the most current data (equivalent to the likelihood in a Bayesian framework). The forecast and analysis step of EnKF can be repeated such that data are assimilated in a sequential Bayesian manner, which is especially useful for assimilating real-time observations in meteorology and oceanography. In real-time applications of EnKF, the data vector only contains the most current data, and the model states updated from a previous data assimilation step will be used as the initial condition for the next data assimilation step.
3.2. Ensemble Smoother
 In our application of the p-space EnKF, the data vector could include all the data that are measured at different time because the model parameters we are estimating are not changing over time and we include a conforming step to rerun model simulations after the analysis step. When all data are assimilated in one step (i.e., in a batch), the p-space EnKF is technically an ensemble smoother [van Leeuwen and Evensen, 1996] that uses all the data available to update all the states (here the parameter field after reformulation in p-space). The applicability of ES for estimating spatially distributed hydraulic conductivity is investigated by Bailey and Bau [2010, 2012] using 2-D synthetic examples.
 The batch update of model parameters using ES is computationally advantageous in our application, because it requires only a single update step, thus avoiding the frequent restarting of conforming simulation runs that are required by sequential updating of the p-space EnKF. ES was used for history matching in reservoir modeling by Skjervheim and Evensen  and for estimating 2-D hydraulic conductivity field by Bailey and Bau [2010, 2012]. A similar technique, the asynchronous EnKF, was introduced by Sakov et al.  to assimilate data that were not all collected at the same time using a batch update of EnKF. On the other hand, the analysis equation of EnKF and ES is the same as that of the Kalman filter and is only optimal for a Gaussian prior distribution with a linear model.
3.3. Iterative Ensemble Method
 The application of EnKF or ES to a nonlinear non-Gaussian problem is analogous to the process of using Gauss-Newton iteration for solving nonlinear equations. When the problem is highly nonlinear, iterations with reduced step sizes are necessary to control the degree of nonlinearity. The nonlinearity often increases with the quantity of data to be assimilated in a single step. While a single batch update using ES may be preferred in terms of computational efficiency, it may not be optimal for accuracy due to increased nonlinearity compared to the sequential EnKF approach. Iterative methods such as the ensemble randomized maximum likelihood method (EnRML) [Gu and Oliver, 2007; Chen and Oliver, 2012] and the quasi-linear Kalman ensemble generator [Nowak, 2009] can be used to reduce the nonlinearity of each assimilation step. We adopted the EnRML approach in this study for its proven effectiveness in petroleum engineering problems with high parameter/state dimensionality. The analysis scheme of EnRML is given as
 In equation (2), the superscripts i and l are the index of the ensemble member and iteration number, respectively. The superscript T represents transposition of a matrix. βl is the step size parameter that varies between 0 and 1, with βl = 1 representing a full step size and all other smaller values representing damped step sizes. Gl is a linearization or sensitivity of g(·) at yi,l and represents its ensemble-based approximation, which can be computed by solving using singular value decomposition [Gu and Oliver, 2007; Chen and Oliver, 2012]. The columns of Δdl are the deviation ensemble of the predicted model states from their ensemble means. The columns of Δyl are the deviation ensemble of the model parameters at the lth iteration from their ensemble means.
 EnRML has sequential and batch implementations with the same analysis scheme as provided in equation (2). The sequential version of EnRML [Gu and Oliver, 2007] is an iterative EnKF, in which iterations are applied on each assimilation step of EnKF. The batch version of EnRML [Chen and Oliver, 2012] is an iterative ES, in which model parameters are updated iteratively with all data being assimilated simultaneously at each iteration. More details on EnRML, including pseudocodes, are provided by Chen and Oliver .
3.4. Implementation to Hanford IFRC
 We will investigate sequential EnKF versus ES and iterative ES (i.e., batch EnRML) updating schemes applied to the Hanford IFRC experimental site in this study.
3.4.1. Initial Ensemble of Parameters
 The model parameters in this study include the permeability for all the grid cells located above the Ringold confining layer, totaling 336,442 out of 432,000 (120 × 120 × 30) grid cells in the entire model domain of 120 m × 120 m × 15 m. The cells within the Ringold confining layer are assigned a deterministic low-permeability value due to its low variability based on field measurements. The dimension of model parameters is significantly higher than that of previous studies on this site that applied a full Bayesian approach, MAD, on a reduced parameter space (including variogram parameters and select conditioning points called anchors, on the order of hundreds) [e.g., Murakami et al., 2010; Chen et al., 2012]. The computational effort required by the full Bayesian approach is prohibitive to obtain the joint distribution of the cell-by-cell permeability. The parameter dimension has to be reduced significantly by assuming the structure of heterogeneity (i.e., the variogram model) and selection of sparse conditioning points for feasible computational effort, which may compromise the accuracy of the full Bayesian approach. On the other hand, EnKF and its variants have been successfully applied to estimate cell-by-cell permeability and porosity on a parameter set of similar size in petroleum reservoir history matching with an ensemble size on the order of hundreds [Oliver and Chen, 2011]. We therefore update the hydraulic conductivity for every cell of our model grid except for those in the Ringold Formation to avoid making assumptions about the variogram model and interpolation between conditioning points.
 In order to implement the aforementioned ensemble-based data assimilation methods to estimate the 3-D hydraulic conductivity field (the parameters) for the Hanford IFRC site given newly available tracer concentration data, we generated the prior ensemble of the permeability field based on results from Murakami et al. , who assimilated the constant rate injection measurements and flowmeter surveys using the MAD approach and produced realizations of structural parameters (parameters for the exponential variogram model) and anchors (conditioning points) for the hydraulic conductivity at selected locations. A 3-D permeability field was generated from each realization of structural parameters and anchors using the kriging method [Rubin, 2003]. More detailed descriptions on how to generate the prior ensemble can be found in Chen et al. . A total of 600 realizations of 3-D permeability fields were included in the prior ensemble.
3.4.2. Forward Simulation/Forecast Model
 We simulated the flow and transport processes using PFLOTRAN, which was chosen for its unique capability of simultaneously running multiple realizations in a single job on a supercomputer. The governing flow equations in PFLOTRAN are based on the following Richards equation
with ρ being water density, ϕ being porosity, s being water saturation, Sl being the source and sink terms. The Darcy velocity q is calculated by
with p being pressure, k being intrinsic permeability, kr being relative permeability, μ being viscosity, g being acceleration of gravity, and z being the elevation above a reference height. For unsaturated flow, the van Genuchten model [van Genuchten, 1980] is used to relate capillary pressure to water saturation, and the Burdine relation [Burdine, 1953] is used for the relative permeability function.
 The transport processes considered for the nonreactive tracer include advection, dispersion, and diffusion, with the following governing equation
where C is the aqueous solute concentration, SC is the source and sink term for the solute, and D is the effective dispersion coefficient that represents combined effects of diffusion and microdispersion. The macrodispersion is the dominant dispersion mechanism at our study site due to the dynamic flow conditions and the highly heterogeneous permeability field. Therefore, the microdispersion and diffusion effects are negligible in this case.
 The PFLOTRAN code employs the finite-volume method to discretize the governing equations and solve the flow and transport equations sequentially, i.e., passing water density, saturation state, and the velocity field from the flow to the transport equations at each time step.
 The size of the model domain is 120 m × 120 m × 15 m, centering the well field as shown in Figure 3. The base of the model domain lies at 95 m elevation above sea level, which is below the lowest elevation of the Hanford-Ringold contact. The top of the model is at 110 m elevation above sea level, which is chosen to be above the maximum water elevation that occurred during the simulation period. The grid resolution is 1 m in the horizontal (x-y) plane and 0.5 m in the vertical direction, and this resolution is considered sufficient for the expected scales of horizontal and vertical heterogeneity. The maximum time step size in the simulation was limited to 1 h, and the PFLOTRAN program automatically reduced the time step as needed to achieve convergence.
 Uniform porosities were used for the Hanford and Ringold Formation sediments because there is insufficient information for us to define a spatially variable porosity field, given a limited number of intact core samples from within the IFRC well field and from previous drilling and sampling efforts at the 300 Area [Williams et al., 2008]. Furthermore, it is questionable whether the “intact cores” actually preserve in situ features given the very coarse lithology. Although the heterogeneity of porosity can affect the estimated hydraulic conductivity field, its variability has a secondary effect on transport compared to the variability in hydraulic conductivity (orders of magnitude difference). The error introduced by not explicitly modeling the spatial variability of porosity is incorporated into the uncertainty in the estimated hydraulic conductivity field. The average total porosity value is set to be 0.2 in the Hanford formation and 0.43 in the Ringold Formation, as recommended by Williams et al.  and adopted by Chen et al. .
 The flow boundary condition was extremely dynamic due to the river stage fluctuations in the adjacent Columbia River during the tracer experiment. An example of flow gradient magnitudes and directions obtained by fitting a linear plane for hourly water elevations at a set of three wells (triangulation) is shown in Figure 4 to demonstrate the flow dynamics from April to October 2011. Figure 4 illustrates that the flow dynamically changed in both directions and magnitude, and the estimation of flow gradient direction and magnitude is sensitive to the choice of well set for triangulation. Therefore, we used hourly water elevations measured at additional wells located inside and outside of the IFRC well field (shown in Figure 3) to interpolate the water elevations at the four lateral boundaries, using a kriging approach with an exponential variogram model. The kriged boundary conditions produced better simulation results as compared to flow boundary conditions interpolated through triangulation. The kriged hourly transient hydrostatic head boundary conditions were applied on the four lateral planes of the domain boundary to capture flow dynamics.
 We employed zero-flux flow boundary conditions at the lower boundary as it is constrained by the fine-grained Ringold Formation with a hydraulic conductivity 2 to 3 orders of magnitude lower than that of the Hanford formation. A small recharge rate (55 mm/yr) was applied at the top boundary based on monitoring results at nearby locations [Rockhold et al., 2009].
 The tracer boundary conditions were set to be free outflow at four lateral boundaries, where the influx flow was assumed to contain no tracer. The recharge water at the top boundary was assumed to contain no tracer, and a zero-flux tracer boundary condition was applied at the lower boundary.
 The injection of tracer at well 2–34 was simulated as a time series of specified flux of water with specified time-varying tracer concentration, which was obtained by analyzing the samples taken at different stages of the injection phase.
 The initial flow condition was a hydrostatic pressure distribution based on the water table interpolated from the same set of wells that were used to create the transient lateral flow boundary conditions. The initial tracer concentration was set to be zero over the entire model domain.
 We placed observation points at all cells that are intersected by the well screen. We then calculated flux-averaged concentration to represent the average simulated concentration at each well, which represents the predicted model states in equation (1).
3.4.3. Simulation Execution and Analysis Step
 One complete simulation for a single realization of permeability field took approximately 30 min of wall-clock time using 64 processors on the Oak Ridge Leadership Computing Facility (OLCF) Jaguar supercomputer at Oak Ridge National Laboratory and on the Hopper supercomputer at the National Energy Research Scientific Computing Center (NERSC). We utilized the multirealization capability of PFLOTRAN to launch the simulation for 600 realizations of permeability fields on 200 processor groups using 12,800 processor cores. Each processor group used 64 processor cores and ran three realizations, one after another. We used approximately 19,200 CPU hours for each forecast step on the parameter ensemble. We ran each simulation to completion so that we can evaluate the data match to the entire time series of data, although this is not necessary for some intermediate assimilation steps where only early time measurements are assimilated.
 The ensemble of model predictions was compiled from the model outputs for all the realizations and used to compute the ensemble covariance CYD and discrepancy between the model predictions and perturbed observations, both of which are required to obtain the posterior ensemble of parameter field using equation (1) and to compute the covariance matrices and ensemble sensitivity matrix that are required in equation (2) for the batch EnRML. All the data analyses were performed using the freely available statistical software R [R Development Core Team, 2010].
 A flowchart summarizing all the steps of implementing ensemble-based methods for estimating the permeability field is provided in Figure 5.
4. Results and Discussion
 For the p-space EnKF, the spatiotemporal observations of tracer concentration during the experiment can be assimilated in a sequential EnKF manner, i.e., one snapshot of spatially distributed data a time, or in a single batch step using ES. Alternatively, the data can also be divided into segments (larger than one time step but smaller than the batch) and be assimilated sequentially, which is similar to the EnKF version of the fixed-lag Kalman smoother introduced by Cohn et al.  and applied to geophysical systems by Khare et al. . Using a smaller lag time in the fixed-lag Kalman smoother is analogous to taking a smaller step in a Gauss-Newton iteration. The fixed-lag Kalman smoother becomes EnKF when one snapshot of data is assimilated at each step, and it becomes ES when the entire batch of data are taken as the data segment. We call this approach segmented ES because it is equivalent to sequentially applying ES on multiple segments of a batch data.
 In this study, we assimilated tracer concentrations observed in 32 wells up to 200 h after the injection started, totaling 258 data points. One snapshot of spatial data was available every 25 h. We compared the assimilation results obtained using EnKF, batch ES, batch EnRML as an iterative ES, and segmented ES where data were divided into two 100 h windows. In all these cases, we set the standard deviation of all observations at 10% of the observed values. The choice of measurement errors depends on the uncertainty level in measurements. It can be inferred from equation (1) that smaller measurement errors result in heavier weight of the residual, and thus larger corrections to forecast model states to yield the analyzed states. Larger measurement errors, on the contrary, lead to smaller corrections in the analysis step. The measurement errors could be slightly inflated to reduce the step size of an analysis step. However, exceedingly inflated measurement errors should be avoided because it will artificially lower the information content in measurement data. Iterative ensemble methods are better alternatives to manipulate the step size of the analysis step for a nonlinear problem.
 Figure 6 shows the prior and posterior ensembles of tracer breakthrough in a set of representative wells that span a range of distances from the injection well. All the posterior ensembles of predictions showed significant improvements in matching the data compared to the prior ensemble. Among the different ensemble-based methods, batch EnRML, two-segment ES, and EnKF produced better matches to the data than the batch ES in most of the selected wells, with batch EnRML being slightly better than EnKF and two-segment ES in terms of prediction biases and uncertainty ranges. The posterior ensemble produced from EnKF after eight assimilation steps appeared to have less variability compared to its counterparts, which implies that sample degeneracy might be expected in the subsequent data assimilation steps and covariance inflation [Anderson and Anderson, 1999] might be necessary to avoid this degeneracy. On the other hand, batch ES, two-segment ES, and batch EnRML showed more variability in the posterior ensemble, which indicates that they might be less subjected to sample degeneracy than the EnKF when assimilating the same amount of data.
 Figure 6 also shows that the batch ES led to inferior fits in far-field wells compared to segmented ES, batch EnRML, and EnKF, which demonstrates the benefits of taking smaller step sizes to reduce the nonlinearity of a problem and ultimately to improve the model predictability. In addition, the different results may be caused by different treatment of temporal correlation in predicted model states. The predicted model states at different times were assumed to be independent when employing EnKF. In contrast, batch and segmented ES and batch EnRML can take into account of such temporal correlations through the calculation of CDD in equation (1) and in equation (2). Four iterations were conducted for the batch EnRML in the results presented in Figure 6, with increasing step sizes (β was chosen to be 0.3, 0.4, 0.5, and 0.6 for the first, second, third, and fourth iterations, respectively). More iterations can be conducted to further improve the model fit until a more strict convergence criterion (in terms of change in permeability field and model misfit) is met.
 To quantify the differences in observed and simulated breakthrough, we computed the root mean square error (RMSE) for each simulated breakthrough curve with respect to the observed curve at each well, which is where csim,i and cobs,i are the simulated and observed concentrations at the ith time step, respectively, c0 is the injection concentration, and n is the total number of observations at that well. Thus, there were 600 samples of RMSE for each well under each scenario, from which we generated boxplots that can be used to compare the mean and range of RMSE between the different ensemble-based methods. We first compared the RMSE for tracer concentrations between 0 and 200 h, as shown in Figure 7, to show how well the history was matched up at a group of wells; Then we showed in Figure 8 the boxplots of RMSE for tracer concentrations from 200 to 800 h at the same group of wells as validation.
 Effective parameter estimation is expected to yield better data matches with reduced bias and uncertainty, i.e., more mass of RMSE distributed close to zero. We observe in Figures 7 and 8 that all the model simulations showed significantly reduced uncertainty after any ensemble-based method was employed for estimating the permeability field, for both the history matching and model validation. The average posterior RMSEs were closer to zero compared to the prior ones in most of the wells for most of the ensemble-based methods. When comparing the performance between the ensemble-based methods, they varied from one well to another. In general, EnKF outperformed the other ones, with batch EnRML and two-segment ES showing comparable performance to EnKF, and all three outperformed the batch ES. With respect to the RMSE averaged over all the measurements in all the wells presented, the two-segment ES and batch EnRML had similar RMSE distributions to the EnKF in both history matching and prediction.
 The differences in data reproduction resulting from the various ensemble-based methods were caused by variations in the estimated permeability fields. Therefore, we examined the means and standard deviations of the log-transformed hydraulic conductivity fields derived from different ensemble-based methods (shown in Figure 9). We observe that the spatial patterns of heterogeneity are considerably different before and after assimilating the tracer data. The posterior estimates show more contrast between the low-permeability and high-permeability regions (shown in the mean field) and less variability (shown in the standard deviation) for permeability at each grid cell, especially for the EnKF results. More posterior variability of permeability is found within the IFRC well field, which affected the flow and transport processes within the well field more than those outside the well field. The small variability of permeability outside of the IFRC well field may be artificial due to their insensitivity to assimilated data. The permeability field estimated by the EnKF led to the largest change from the prior, possibly due to the fact that the predicted model states at different times were assumed to be independent when employing EnKF. The temporal correlations between predicted model states at different times, which are taken into account in batch and segmented ES and batch EnRML, might have constrained the changes in permeability field from its prior. The large area of high-permeability in the posterior estimation by the EnKF is subjected to further verification.
 In terms of computational cost, we took 4800, 600, 1200, and 2400 forward runs from time 0 to 200 h after the start of injection for EnKF, batch ES, segmented ES, and batch EnRML, respectively. The equivalent number of forward runs (assuming each run was performed from time 0 to 200 h after the start of injection) for EnKF was 2700 because it was not necessary to perform simulations to 200 h for the first seven assimilation steps. The computational cost of EnKF is slightly higher than the batch EnRML and is more than double of what is needed by the two-segment ES, with comparable accuracy for all three methods. Compared to 840,000 forward runs used by Chen et al. , the ensemble-based methods are much more computationally efficient. Although we cannot compare the performance of ensemble-based methods against the full Bayesian approach due to limited computational resources, our results have demonstrated satisfactory fit to the history and adequate accuracy in predictions by using the ensemble-based methods.
 We investigated the effectiveness of the p-space EnKF, batch and segmented ES, and batch EnRML (an iterative ES) adapted to static parameter estimation for assimilating data from a field tracer experiment to improve aquifer characterization at the Hanford 300 area. The ensemble-based methods were chosen for their computational efficiency over a full Bayesian approach. In contrast to the typical applications of EnKF and its variants in meteorology and oceanography, which mostly aim at dynamic state estimation, we are interested in estimating physical property fields that do not vary with time. The system states were updated by rerunning the forward simulation model with updated model parameters. Thus the physical consistency between model parameters and model states was guaranteed.
 In the context of parameter estimation for a linear system with Gaussian parameters and errors, ES may be preferred over EnKF, because ES does not require frequent restarts of the forward model as does EnKF while yielding the same results. However, our system is nonlinear and non-Gaussian, which makes the ES or EnKF solutions just an approximation. Taking smaller updating steps by using EnKF, iterative ES, or even iterative EnKF, may be necessary to reduce the nonlinearity of a problem and improve accuracy, analogous to taking the Gauss-Newton approach to solve nonlinear equations. On the other hand, normal-score transform can be implemented to deal with the non-Gaussianity [Li et al., 2011; Zhou et al., 2011; Schoniger et al., 2012], which will be investigated in our future studies. In this study, we compared the p-space EnKF, batch ES, segmented ES, and batch EnRML in assimilating the same set of data to investigate the influences of step size on data assimilation of a nonlinear system. It is noted that all these approaches will yield the same results of the Kalman filter for a linear system with Gaussian parameters and errors. The various approaches to reduce the step sizes (EnKF, segmented ES, and batch EnRML) were all proved effective in reducing the nonlinearity of the problem and thus in improving the accuracy in our study. Readers are referred to Evensen and van Leeuwen  and Crestani et al.  for more comparison studies of EnKF and ES applied to nonlinear synthetic systems. In general, using fewer steps/iterations makes the data assimilation process more convenient and computationally affordable. However, the number of steps can only be reduced to a degree that does not sacrifice required accuracy or lead to sample degeneracy. It is recommended that one closely monitors the changes in parameter field and sample variability at each step and initiate iterations with reduced step sizes when necessary.
 We demonstrated in this study that ensemble-based methods are effective and efficient in assimilating the tracer experimental data for characterizing the hydraulic conductivity field of a localized aquifer zone, with only a small fraction of computational cost that is required by a full Bayesian approach. The uncertainties in the hydraulic conductivity and predicted tracer breakthrough curves were substantially reduced after the tracer data assimilation, and the matches between the observed and predicted tracer breakthrough curves were improved in terms of RMSE distributions for a large number of wells, in both history matching and prediction. The various ensemble-based methods led to different spatial heterogeneity patterns and eventually to different data matches. We found that sequentially assimilating data by segments (segmented ES) or using an iterative ES (i.e., batch EnRML) may be better alternatives to EnKF and batch ES when balancing the accuracy and frequency of restarting the model. When the problem is highly nonlinear such that even the step size of EnKF is excessive, iterations can be initiated on each single step of EnKF using sequential EnRML as shown by Chen and Oliver . It is also trivial to implement the same iterative approach on segmented ES.
 While the ensemble-based methods are considered computationally efficient for large problems with high parameter dimensionality, the complexity of the forward simulation model needed to resolve the system states made it impossible to meet the computational need without high-performance computing. We were only able to complete all simulations within a reasonable turnaround time (a couple days of queuing time on a supercomputer and 1.5 h of simulation time by using 12,800 cores for 600 realizations in each ensemble) by running the massively parallel flow and transport code PFLOTRAN on the Jaguar and Hopper supercomputers. We expect to see more widespread applications of high-performance computing by the hydrology community, especially in the general context of stochastic data assimilation for complex systems.
 Research funding originated from the U.S. Department of Energy (DOE), Biological and Environmental Sciences Division (BER) through the Subsurface Biogeochemical Research Program (SBR) to the Hanford Integrated Field Research Challenge (IFRC) and the PNNL SBR SFA. This work was performed under DOE contract DE-AC05-76RL01830. PFLOTRAN was developed under the DOE Scientific Discovery through Advanced Computing (SciDAC-2) program. Supercomputing resources were provided by the DOE Office of Science Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program with allocations on OLCF Jaguar supercomputer at Oak Ridge National Laboratory. We also used the Hopper supercomputer at NERSC, supported by the DOE Office of Science under contract DE-AC02-05CH11231.