## 1. Introduction

[2] An essential requirement for the accurate prediction of groundwater flow and transport in field situations is that the naturally heterogeneous distribution of soil hydraulic properties be correctly characterized and represented in the model. Numerous methods have been proposed to describe heterogeneity of sedimentary deposits (e.g., as reviewed by *Koltermann and Gorelick* [1996] and *de Marsily et al.* [2005]), of which spatial statistical methods involve models of spatial correlation [*Koltermann and Gorelick*, 1996]. Conventional geostatistical techniques based on such models of spatial correlation, such as kriging and cokriging techniques, have been successfully applied to estimate the spatial distribution of soil properties using point observations of hydraulic conductivity [*Journel et al.*, 1998], hydraulic head [*Kitanidis and Vomvoris*, 1983; *Hoeksema and Kitanidis*, 1984, 1985; *Carrera and Neuman*, 1986; *Dagan*, 1985; *Kitanidis*, 1996; *Gómez-Hernández et al.*, 1997; *Li et al.*, 2008], or tracer concentration data [*Harvey and Gorelick*, 1995; *Cirpka and Kitanidis*, 2000; *Hendricks Franssen et al.*, 2003].

[3] Another suite of geostatistical methods used in characterization is based on discrete distributions of hydraulic properties rather than on continuous distributions. In that framework, each discrete value of hydraulic conductivity is assigned to a different lithofacies. In a more general context, a lithofacies can be defined as a mappable subdivision of a designated stratigraphic unit, distinguished from adjacent subdivisions on the basis of lithology, texture, or mineral composition. Methods for characterizing lithofacies structures have also received substantial attention [e.g., *Copty and Rubin*, 1995; *Dai et al.*, 2005; *Ye and Khaleel*, 2008; *Segal et al.*, 2008; *Harp et al.*, 2008]. Information on the lithofacies in the subsurface often comes from boreholes. Examination of core samples and other types of borehole logging data provide knowledge about the vertical sequence/distribution of lithofacies along the borehole (hereafter, referred to as the “lithofacies data”). The three-dimensional distribution of lithofacies within the domain of interest may then be approximated by interpolating the lithofacies data between boreholes [e.g., *Carle and Fogg*, 1996, 1997; *Elfeki and Dekking*, 2007; *Ouellon et al.*, 2008; *Ye and Khaleel*, 2008]. However, the development of a model of lateral spatial variability usually tends to be more uncertain owing to the fact that lateral continuity of lithofacies can be significantly smaller than spacing between boreholes from which lithofacies data are obtained; that is, boreholes are often too sparse. In that case, data from geophysical tomography such as seismic velocity data [*Copty and Rubin*, 1995; *Hyndman and Gorelick*, 1996; *Liu et al.*, 2004] or ground penetrating radar data [*Beres and Haeni*, 1991; *Jol and Smith*, 1991; *van Overmeeren*, 1998] can be incorporated in the development of the geologic model. The use of geophysical data in the framework of lithofacies delineation can compensate for the lack of densely sampled hydrogeologic data obtained from sparsely distributed boreholes. Moreover, geological knowledge and expert's opinion can be applied in a straightforward fashion to eliminate models of the heterogeneous distribution of lithofacies that do not meet certain requirements, for example, in terms of connectivity of lithofacies [*Poeter and McKenna*, 1995].

[4] Models of transition probability based on Markov chains (TPMC) are a powerful geostatistical approach to estimate the spatial distribution of geologic units using categorical indicator variables [*Carle and Fogg*, 1996, 1997; *Li et al.*, 1999; *Fogg et al.*, 1998; *Ritzi*, 2000; *Elfeki and Dekking*, 2001, 2007; *Lu and Zhang*, 2002; *Park et al.*, 2004; *Ritzi et al.*, 2004; *Zhang et al.*, 2006; *Li*, 2007a, 2007b; *Dai et al.*, 2007; *Zhang and Li*, 2008; *Ye and Khaleel*, 2008]. For example, *Elfeki and Dekking* [2007] utilized the TPMC approach to investigate how conditioning on a number of boreholes helps to reduce uncertainty in the geological model of the two-dimensional section in the field. *Ye and Khaleel* [2008] used a TPMC approach to characterize soil classes defined on the basis of particle size distribution measurements at a field site and obtained three-dimensional models of the spatial distribution of four lithofacies. *Harp et al.* [2008] adopted a transition probability approach for identifying spatially correlated aquifer structures. In their method, the aquifer structure was updated while the sparse lithofacies distribution in boreholes was honored.

[5] The conversion of a spatial distribution of lithofacies into a spatial distribution of hydraulic conductivity, for example, to be used in a groundwater flow model, is usually done through calibration [e.g., *Cooley*, 1983; *Carrera and Neuman*, 1986; *Dai and Samper*, 2006]. “Model calibration” is done by changing model inputs such as system geometry, initial and boundary conditions, or in this case lithofacies hydraulic conductivity, so that the model output matches the corresponding observed values [*Hill and Tiedeman*, 2007]. The model inputs that are changed during the calibration process are referred to as the “model parameters.”

[6] The goal of this study is to develop an improved understanding of how adding lithofacies data into the construction of the conceptual model of aquifer heterogeneity helps reduce errors in parameter estimation and uncertainty in the outcome of a groundwater flow model. We anticipated that the spatial refinement of the data required by such a study would need to be very high. Therefore, we adopted an approach based on both laboratory experiments using an intermediate-scale sandbox and numerical simulations.

[7] Advances in measurement methods give laboratory experiments the potential to be used for quantitative studies of fluid flow and solute transport in media with complex heterogeneity [*Silliman et al.*, 1998]. Sandbox approaches have been adopted in many studies [e.g., *Barth et al.*, 2001; *Silliman and Zheng*, 2001; *Ursino et al.*, 2001a, 2001b; *Liu et al.*, 2002, 2007; *Jose et al.*, 2004]. They allow mimicking field-scale processes while keeping the costs low and the experimental conditions under control [*Lenhard et al.*, 1995; *Liu et al.*, 2007]. The latter condition is essential when validating an inverse modeling procedure, since uncertainties linked to model geometry, initial conditions and hydraulic stresses have to be minimized. *Liu et al.* [2002, 2007] used a series of laboratory sandbox experiments to demonstrate the effectiveness of hydraulic tomography, a technique that images the spatial distribution of hydraulic conductivity in the subsurface. *Nowak and Cirpka* [2006] used the sandbox experiments performed by *Jose et al.* [2004] to validate their geostatistical inverse method. The method allowed the estimation of hydraulic conductivity and dispersivity on the basis of point head measurements and concentration data.

[8] The approach we present in this paper involves three major tasks: Task 1 is the design of the test case and generation of experimental data; Task 2 is the generation of lithofacies data with different borehole densities and creation of random realizations of the lithofacies distribution that honor the lithofacies data; and Task 3 is the estimation of hydraulic conductivity values and elimination of the realizations that do not allow honoring head data to back-estimate the distribution of hydraulic conductivity. In Task 1, a three-dimensional synthetic heterogeneous aquifer was created in an intermediate-scale laboratory tank using five different test sands (lithofacies). Three pumping tests were performed in order to generate experimental hydraulic head and discharge data. The model of heterogeneity of the synthetic aquifer was also used to simulate the pumping tests numerically and provide error-free synthetic hydraulic head and discharge data. In Task 2, random realizations of the three-dimensional heterogeneous distribution of lithofacies were generated. The hydraulic conductivities of the lithofacies were not specified at this stage and the realizations were conditioned on the knowledge of the sand type along the hypothetical boreholes. As the exact spatial distribution of lithofacies within the synthetic aquifer was known, it was possible to generate any desired quantity of lithofacies data at selected borehole locations. In order to quantify the effect of lithofacies data on groundwater flow model accuracy, we varied the number of boreholes used to generate the lithofacies data. Markov chains models of transition probability were used to analyze lithofacies data. In this work, we refer to the realizations as the “conditional random lithofacies realizations” or the “lithofacies realizations.” Finally, in Task 3, the hydraulic conductivities of the five lithofacies were estimated by calibrating a groundwater flow model built using the lithofacies realizations and using known initial and boundary conditions. Both experimental and error-free synthetic drawdown data were used for this task. Moreover, a Metropolis-Hastings criterion [*Metropolis et al.*, 1953*;**Hastings*, 1970] was used to reject realizations that did not allow honoring the head data. Error and uncertainty in the calibration and predictive simulations were then calculated in a systematic manner, as a function of the quantity of lithofacies data, to investigate how incorporating more lithofacies data in random lithofacies realizations improved model calibration and prediction accuracy. Note that, in our approach, we considered only lithofacies data from hypothetical boreholes for constructing lithofacies realizations and did not use any other type of information that might be obtained from geophysical methods. As noted above, boreholes are often “too sparse” in terms of providing sufficient information to estimate horizontal lithofacies distribution. By focusing only on the lithofacies data from different numbers of hypothetical boreholes, we attempted to develop a quantitative indicator to answer the fundamental and intuitive question; how sparse is too sparse?

[9] To the authors' best knowledge, because of the amount of time, effort, and cost required, this type of experiment has seldom been conducted and, rather, performed only numerically. Therefore, the combined experimental and modeling approach is one of the main contributions of this work to investigate the value of lithofacies data for improving model predictions.

[10] This paper is organized as follows: in section 2 the experimental setup and procedures are described, followed by the theoretical background for transition probability and procedures for generating lithofacies realizations in section 3. Then, the parameter estimation procedure and algorithm for accepting/rejecting realizations are described in section 4. Results are presented in section 5. Finally, conclusions are given in section 6.