## 1. Introduction

[2] Groundwater flow and transport simulations require a densely defined hydraulic conductivity (*K*) field to populate the model grid. Because it is not practical to collect 2-D or 3-D data at this resolution, stochastic simulation methods are commonly used to interpolate between measured data values. Stochastic *K* field simulation requires a statistical analysis of the available *K* data, to ensure that the synthesized *K* field resembles the data in terms of its distribution and correlation structure. The two main simulation steps are: (1) generate an uncorrelated noise field and (2) apply an appropriate filter to impose a correlation structure. Since random number generators produce only uncorrelated noise, both steps are necessary. To parameterize the simulation model, the process is reversed: (1) apply an appropriate inverse filter to the raw data to remove the correlation and (2) examine the filtered, uncorrelated data to determine its true underlying distribution. Unless the data is filtered properly to remove correlations, the data histogram can significantly misrepresent the underlying distribution, since a histogram of correlated data need not reflect the true underlying distribution. In this paper, we will see a remarkable example of this simple and well-known fact.

[3] Hydraulic conductivity data from the Macro Dispersion Experiment (MADE) site, at the Columbus Air Force Base in Mississippi, clearly show a high level of heterogeneity [*Rehfeldt et al*., 1992; *Zinn and Harvey*, 2003; *Llopis-Albert and Capilla*, 2009]. The site was recently revisited to obtain *K* measurements with much higher spatial resolution than previous measurements [*Bohling et al*., 2012; *Liu et al*., 2009]. Vertical columns (profiles) of hydraulic conductivity data were measured at approximately 1.5 cm depth increments, using a new direct-push profiling method that couples the direct-push injection logger (DPIL) and the direct-push permeameter (DPP) [*Butler et al*., 2007; *Liu et al*., 2009, 2012]. This novel high-resolution *K* (HRK) tool was advanced into the subsurface, while water was injected out of a small screened port located a short distance behind the tool tip. The injection rate, and injection-induced back pressure, were recorded every 1.5 cm, and the ratio of these quantities was then transformed into *K* estimates [*Liu et al*., 2009]. The cm-scale spatial resolution of the resulting *K* data is orders of magnitude finer than the data considered in previous studies [*Rehfeldt et al*., 1992; *Meerschaert et al*., 2004]. *Bohling et al*. [2012] analyzed the resulting *K* data, and compared those measurements to previous flowmeter-based *K* estimates collected at lower resolution across the same site.

[4] A parallel data collection effort used ground-penetrating radar (GPR) to image the related sedimentary structures in the aquifer, called facies, by identifying distinct reflection characteristics, such as reflection terminations, dip angles, amplitudes, and continuity. Such GPR facies have been shown to correlate with hydrogeological units [*Van Overmeeren*, 1998; *Heinz and Aigner*, 2003; *Schmelzbach et al*., 2011]. Full-resolution 3-D GPR data using 50 and 100 MHz antennae were obtained with step sizes (and line spacings) of 0.2 and 0.1 m, respectively, using a Sensors and Software pulseEKKO 100 system. Data processing and analysis to extract facies boundaries were detailed in *Dogan et al*. [2011]. The map in Figure 1 outlines the GPR data collection site, and the location of the four HRK profiles that form the basis for our study. The intensively cored area (ICA) cube was the site of a push-pull tracer test described in *Liu et al*. [2010], see also *Zheng et al*. [2011]. The multilevel sampler (MLS) cube was the site of the MADE-5 tracer test reported in *Bianchi et al*. [2011].

[5] The modeling of hydraulic conductivity fields at the MADE site has been the focus of intensive study and modeling for over 20 years. The geostatistical analysis of *Rehfeldt et al*. [1992] documented a high level of heterogeneity, indicated by the variance of 4.5 for ln *K* in their multi-Gaussian model, as well as anisotropy, indicated by horizontal and vertical correlation scales of 12.8 and 1.6 m, respectively. *Silliman and Wright* [1988] and *Rubin and Journel* [1991] argued that a Gaussian model with a single covariance function cannot reproduce the preferential pathways (connected regions with the highest ln *K* values) observed in real aquifers. *Gómez-Hernández and Wen* [1998] continued this argument against the multi-Gaussian model, and cautioned against drawing broad conclusions on the basis of one-dimensional data distributions. *Renard and Allard* [2011] survey several methods for characterizing connectivity, and note that the multi-Gaussian model alone is often insufficient to reproduce the connectivity observed in real aquifers. Significant deviations from a Gaussian profile were noted by *Painter* [1996] and *Meerschaert et al*. [2004], and some alternative non-Gaussian models were proposed. *Zinn and Harvey* [2003] point out that even in a model with Gaussian ln *K* profiles, deviation from the usual multi-Gaussian model can lead to connected features. *Salamon et al*. [2007] discuss the nonmonotone variograms in MADE ln *K* data, and recommend a sequential Gaussian simulation methodology with a nonmonotone covariance structure, to reproduce this “hole effect.” *Llopis-Albert and Capilla* [2009] use a gradual conditioning algorithm to produce non-Gaussian ln *K* fields based on flowmeter, head, and concentration data from MADE-2. This controversy between Gaussian and non-Gaussian ln *K* fields has profound implications for flow and transport modeling. Heavy tailed ln *K* distributions support novel approaches including the continuous time random walk (CTRW) [*Berkowitz et al*., 2006], fractional advection dispersion equation (ADE) [*Benson et al*., 2013], and some related stochastic hydrology models [*Cushman and Ginn*, 2000; *Neuman and Tartakovsky*, 2009], while Gaussian ln *K* models are more consistent with the traditional ADE, mobile-immobile, and dual-domain models.

[6] The two main findings of this study are that: (1) a fractional difference filter can be useful to reveal the true underlying distribution of highly correlated vertical columns of HRK data and (2) using GPR facies, a multi-Gaussian simulation method with an appropriate operator scaling correlation structure applied to each facies can reproduce the significantly non-Gaussian profiles seen in columns of filtered HRK data. There remains a significant debate in the literature between those who favor Gaussian models, and others who believe that a non-Gaussian approach is needed. In our view, both groups are correct, albeit at different scales. Within a single facies, an appropriate multi-Gaussian model can be effective, and when different facies are combined, a non-Gaussian profile with a sharper peak and a heavier tail will emerge.