## 1. Introduction

### 1.1. Characterization Methods of Subsurface Heterogeneity in Hydraulic Parameters

[2] Subsurface characterization for groundwater investigations relies on the determination of the distribution of hydraulic parameters such as hydraulic conductivity (*K*) and specific storage (*S _{s}*). These values are then used to build groundwater models of various complexities to obtain quantitative estimates of hydraulic heads, groundwater fluxes, and the distribution and concentration of contaminants. Commonly, hydraulic parameters are estimated by collecting cores and subjecting them to permeameter tests and grain size analysis in a laboratory, or conducting slug, single-hole, and/or pumping tests in situ. Most of these in situ methods rely on analytical solutions that treat the geological medium to be homogeneous. These simplified solutions and the resulting estimated parameters have been utilized in a variety of real world applications and academic studies [e.g.,

*Theis*, 1935;

*Hantush*, 1960;

*Neuman*, 1972], despite the fact that the subsurface is heterogeneous at multiple scales. In particular, the knowledge of detailed three-dimensional distributions of

*K*is critical for the prediction of contaminant transport, delineation of well catchment zones, and quantification of groundwater fluxes including surface water/groundwater exchange. Even though many studies treat

*S*so it does not vary significantly, in some formations, where the aquifer compressibilities vary significantly from one material type to the next (e.g., sands versus clays),

_{s}*S*could vary several orders of magnitude.

_{s}[3] The characterization of subsurface heterogeneity is fraught with difficulties as numerous samples are required to delineate the variability of hydraulic parameters as well as their spatial correlations and connectivity. Using soil cores to accurately characterize the *K* heterogeneity of a site requires a large number of samples to be tested in the laboratory [e.g., *Sudicky*, 1986; *Sudicky et al.*, 2010]. Alternatively, these samples are sieved to obtain grain size distributions, which can then be analyzed using various empirical relations to estimate *K*.

[4] Characterizing the heterogeneity in *S _{s}* is seldom done as the parameter is considered to be less variable than

*K*[e.g.,

*Gelhar*, 1993 and others]. In fact, data in the literature suggests this to be the case in both porous and fractured geologic media [e.g.,

*Meier et al.*, 1998;

*Sanchez-Vila et al.*, 1999;

*Illman and Neuman*, 2001;

*Vesselinov et al.*, 2001a, 2001b;

*Martinez-Landa and Carrera*, 2005;

*Illman and Tartakovsky*, 2006;

*Liu et al.*, 2007;

*Willmann et al.*, 2007;

*Illman et al.*2009], although the estimated variance is known to be dependent on the estimation method. In particular,

*Sanchez-Vila et al.*[1999] showed that applying the Jacob's method to hydrographs at different observation wells to infer transmissivity (

*T*) and storage coefficient (

*S*) of a synthetic aquifer of spatially random

*T*values with a constant

*S*value, one obtains some “interpreted”

*T*values which converge to a constant and

*S*values which are highly variable in space. On the other hand,

*Wu et al.*[2005] showed that the estimated

*T*and

*S*values for an equivalent homogeneous aquifer depend on the heterogeneity near the pumping and observation wells. The estimates vary with time as well as the principal directions of the effective

*T*. Results of field pumping tests by

*Stratface et al.*[2007] and

*Wen et al.*[2010] seem to corroborate with those by

*Wu et al.*[2005]. All these studies suggest that the traditional interpretation of pumping tests by treating the medium to be homogeneous could potentially lead to biased estimates of hydraulic parameters.

*Yeh and Lee*[2007] advocated new ways to collect and analyze data for characterizing aquifers.

### 1.2. Methods for Capturing Spatial Heterogeneity in Hydraulic Parameters

[5] Common approaches when mapping *K* (and less so *S _{s}*) heterogeneity are to utilize geostatistical or stochastic estimation techniques or more sophisticated interpolation methods. In particular, these approaches are considered to be the de facto standards which assume that a user-specified covariance function is valid and hydrogeologic parameters are lognormal and stationary. However, these assumptions are difficult to satisfy in many geologic settings. Because of these assumptions, and when data are not abundant, stochastic estimation techniques may provide a smooth image of the spatial heterogeneity and may not represent the true distribution accurately. Although a variety of stochastic simulation techniques [e.g.,

*Deutsch and Journel*, 1998] exist that can overcome this issue of smoothing, it still does not address the preservation of many geological features. This is because of the fact that traditional geostatistical methods are based on variograms computed using two-point statistics. To overcome this shortcoming, multiple point geostatistics [e.g.,

*Guardiano and Srivastava*, 1993;

*Caers*, 2001;

*Strebelle*, 2002;

*de Vries et al.*, 2009] have been developed through the use of more complex point configurations, whose statistics are retrieved from training images that represent the geological facies distributions obtained from outcrop mappings and/or geophysical imaging.

[6] Other approaches used to model subsurface heterogeneity include the transition probability Markov Chain method [*Carle and Fogg*, 1997; *Carle*, 1999; *Weissmann et al.*, 1999] and the indicator kriging approach [*Journel*, 1983; *Journel and Isaaks*, 1984; *Journel and Alabert*, 1990; *Journel and Gomez-Hernandez*, 1993]. Both approaches allow one to construct discontinuous facies models. However, the Markov model is better able to account for spatial cross-correlation, such as juxtapositional relationships, including the fining-upward tendencies of different facies, than the indicator approach [*De Marsily et al.*, 2005].

[7] Recently, geostatistical and stochastic inverse methods have received increasing attention. The approach produces the first and second statistical moments of hydrogeologic variables, representing their most likely estimates and their uncertainty, respectively, conditioned on available observations. Cokriging relies on the classical linear predictor theory that considers spatial correlation structures of flow processes (such as hydraulic head and velocity) and the subsurface hydraulic property, and cross-correlation between the flow processes and the hydraulic property. In the past few decades, many researchers [e.g., *Kitanidis and Vomvoris*, 1983; *Hoeksema and Kitanidis*, 1984, 1989; *Rubin and Dagan*, 1987; *Gutjahr and Wilson*, 1989; *Harvey and Gorelick*, 1995; *Yeh et al.*, 1995, 1996] have demonstrated its ability to estimate *K*, head, and velocity, as well as solute concentrations in heterogeneous aquifers.

[8] Recently, hydraulic tomography has been developed to obtain information on subsurface heterogeneity of *K* and *S _{s}* through sequential pumping tests. To our knowledge,

*Neuman*[1987] was the first to suggest the approach using geophysical tomography as an analogy. Since then, various inverse methods have been developed for hydraulic tomography, which utilize pumping test data simultaneously or sequentially [e.g.,

*Gottlieb and Dietrich*, 1995;

*Yeh and Liu*, 2000;

*Bohling et al.*, 2002;

*Brauchler et al.*, 2003;

*McDermott et al.*, 2003;

*Zhu and Yeh*, 2005, 2006;

*Li et al.*, 2005;

*Fienen et al.*, 2008;

*Castagna and Bellin*, 2009;

*Xiang et al.*, 2009;

*Liu and Kitanidis*, 2011]. Numerous laboratory [e.g.,

*Liu et al.*, 2002, 2007;

*Illman et al.*, 2007, 2008, 2010;

*Yin and Illman*, 2009] and field experiments [e.g.,

*Bohling et al.*, 2007;

*Straface et al.*, 2007;

*Illman et al.*, 2009;

*Cardiff et al.*, 2009] have been conducted to show the utility of hydraulic tomography, but a rigorous study which compares the results to other more traditional characterization methods is generally lacking.

[9] In the laboratory, *Illman et al.* [2010] recently assessed the performance of various methods for characterizing *K* estimates by predicting the hydraulic response observed in cross-hole pumping tests in a synthetic heterogeneous aquifer and total flow rates obtained via flow-through tests. Specifically, they characterized a synthetic heterogeneous sandbox aquifer using various techniques (permeameter analyses of core samples, single-hole, cross-hole, and flow-through testing). They then obtained mean *K* estimates through traditional analysis of test data by treating the medium to be homogeneous. Heterogeneous *K* fields were obtained through kriging and steady state hydraulic tomography. To assess the performance of the each characterization approach, *Illman et al.* [2010] conducted forward simulations of 16 independent pumping tests and six steady state, flow-through tests using these homogeneous and heterogeneous *K* fields. The results of these simulations were then compared to the observed data. The results showed that the mean *K* and heterogeneous *K* fields estimated through kriging of small-scale *K* data (core and single-hole tests) produced biased predictions of drawdowns and flow rates under steady state conditions. In contrast, the heterogeneous *K* distribution or “*K* tomogram,” estimated via steady state hydraulic tomography, yielded excellent predictions of drawdowns of pumping tests not used in the construction of the tomogram and very good estimates of total flow rates from the flow-through tests. On the basis of these results, *Illman et al.* [2010] suggested that steady state groundwater model validation is possible if the heterogeneous *K* distribution and forcing functions (boundary conditions and source/sink terms) are characterized sufficiently.

### 1.3. Goal of This Study

[10] This study extends the work of *Illman et al.* [2010] who examined only various *K* characterization approaches and their performance in predicting independent test data under steady state conditions. In particular, the main goal of this study is to extend the work of *Illman et al.* [2010] to the transient case. Using the same sandbox aquifer as *Illman et al.* [2010], we jointly assess the performance of various characterization and modeling techniques that treat the aquifer to be either homogeneous or heterogeneous through the prediction of independent, transient cross-hole pumping tests not used in the characterization effort. Specifically, we characterize the 2-D heterogeneous aquifer using both single- and cross-hole pumping tests. These data are then used to construct various forward groundwater models with homogeneous and heterogeneous *K* and *S _{s}* estimates. Two homogeneous or effective parameter models are constructed: (1) by averaging local scale

*K*and

*S*estimates from single-hole pumping tests and treating the medium to be homogeneous and isotropic; and (2) using MMOC3 [

_{s}*Yeh et al.*, 1993] coupled with PEST [

*Doherty*, 1994] to estimate the horizontal and vertical hydraulic conductivities (

*K*,

_{x}*K*), as well as

_{z}*S*by simultaneously matching the transient drawdown data from all ports during a cross-hole pumping test and treating the medium to be homogeneous and anisotropic for

_{s}*K*. Three heterogeneous models are constructed and consist of spatially variable

*K*and

*S*fields obtained via (1) kriging single-hole

_{s}*K*and

*S*data; (2) accurately capturing the layering and calibrating the

_{s}*K*and

*S*values for these layers using a parameter estimation program (i.e., a calibrated geological model); and (3) conducting transient hydraulic tomography (THT). The performance of these homogeneous and heterogeneous

_{s}*K*and

*S*fields are then quantitatively assessed by simulating 16 independent cross-hole pumping tests and comparing the simulated drawdowns to the observed drawdowns. It should be noted that these different methods utilize varying amounts of data, thus one may consider the comparison to be not fair in a strict sense. However, aside from hydraulic tomography, the approaches examined are commonly utilized to deal with heterogeneity and the goal of this comparison is to assess the performance of these various methods in comparison to hydraulic tomography, which is designed to incorporate data from multiple pumping tests.

_{s}