Improvement of the prediction accuracy of groundwater flow models has been receiving substantial attention from many researchers through the development of enhanced characterizations of the structure of subsurface lithofacies and of the distribution of hydraulic conductivity. In this study, we investigated how incorporating increasing amounts of lithofacies data into the construction of a conceptual model of aquifer heterogeneity helps to reduce prediction error and uncertainty in groundwater flow models. An approach based on both laboratory experiments and numerical simulations was tested using data from an intermediate-scale synthetic heterogeneous aquifer. The heterogeneous aquifer consisted of five lithofacies, corresponding to five test sands. Three pumping tests were conducted and provided experimental data to perform groundwater flow model calibration and validation. The pumping tests were also simulated numerically in order to provide a series of error-free synthetic hydraulic data sets. On the basis of Markov chains models of transition probabilities, a total of 901 random realizations of the heterogeneous distribution of lithofacies were created using varying amounts of conditioning lithofacies data sampled along randomly placed hypothetical boreholes. For each realization and for two other simplified lithofacies models, parameter estimation was performed to estimate the hydraulic conductivity of the lithofacies using the experimental and synthetic hydraulic data from the three pumping tests. The results generally showed that the use of more lithofacies data in the construction of the lithofacies realizations led to an improvement in groundwater flow model prediction accuracy. When using the error-free synthetic hydraulic data, the calibration-prediction error and uncertainty decreased drastically when the mean borehole spacing was on the order of twice the horizontal correlation length or less. When the experimental hydraulic data were used, this drastic improvement in the calibration-prediction error was somewhat obscured and, in some cases, exhibited a local minimum. This local minimum, although beyond practical limits, corresponded to an optimal number of boreholes. Finally, the effect of incorporating more lithofacies data for the construction of lithofacies realizations was found to have a similar impact on the quality of model calibration and on the quality of predictive simulations conducted using the calibrated model.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Another suite of geostatistical methods used in characterization is based on discrete distributions of hydraulic properties rather than on continuous distributions. In that framework, each discrete value of hydraulic conductivity is assigned to a different lithofacies. In a more general context, a lithofacies can be defined as a mappable subdivision of a designated stratigraphic unit, distinguished from adjacent subdivisions on the basis of lithology, texture, or mineral composition. Methods for characterizing lithofacies structures have also received substantial attention [e.g., Copty and Rubin, 1995; Dai et al., 2005; Ye and Khaleel, 2008; Segal et al., 2008; Harp et al., 2008]. Information on the lithofacies in the subsurface often comes from boreholes. Examination of core samples and other types of borehole logging data provide knowledge about the vertical sequence/distribution of lithofacies along the borehole (hereafter, referred to as the “lithofacies data”). The three-dimensional distribution of lithofacies within the domain of interest may then be approximated by interpolating the lithofacies data between boreholes [e.g., Carle and Fogg, 1996, 1997; Elfeki and Dekking, 2007; Ouellon et al., 2008; Ye and Khaleel, 2008]. However, the development of a model of lateral spatial variability usually tends to be more uncertain owing to the fact that lateral continuity of lithofacies can be significantly smaller than spacing between boreholes from which lithofacies data are obtained; that is, boreholes are often too sparse. In that case, data from geophysical tomography such as seismic velocity data [Copty and Rubin, 1995; Hyndman and Gorelick, 1996; Liu et al., 2004] or ground penetrating radar data [Beres and Haeni, 1991; Jol and Smith, 1991; van Overmeeren, 1998] can be incorporated in the development of the geologic model. The use of geophysical data in the framework of lithofacies delineation can compensate for the lack of densely sampled hydrogeologic data obtained from sparsely distributed boreholes. Moreover, geological knowledge and expert's opinion can be applied in a straightforward fashion to eliminate models of the heterogeneous distribution of lithofacies that do not meet certain requirements, for example, in terms of connectivity of lithofacies [Poeter and McKenna, 1995].
 The conversion of a spatial distribution of lithofacies into a spatial distribution of hydraulic conductivity, for example, to be used in a groundwater flow model, is usually done through calibration [e.g., Cooley, 1983; Carrera and Neuman, 1986; Dai and Samper, 2006]. “Model calibration” is done by changing model inputs such as system geometry, initial and boundary conditions, or in this case lithofacies hydraulic conductivity, so that the model output matches the corresponding observed values [Hill and Tiedeman, 2007]. The model inputs that are changed during the calibration process are referred to as the “model parameters.”
 The goal of this study is to develop an improved understanding of how adding lithofacies data into the construction of the conceptual model of aquifer heterogeneity helps reduce errors in parameter estimation and uncertainty in the outcome of a groundwater flow model. We anticipated that the spatial refinement of the data required by such a study would need to be very high. Therefore, we adopted an approach based on both laboratory experiments using an intermediate-scale sandbox and numerical simulations.
 Advances in measurement methods give laboratory experiments the potential to be used for quantitative studies of fluid flow and solute transport in media with complex heterogeneity [Silliman et al., 1998]. Sandbox approaches have been adopted in many studies [e.g., Barth et al., 2001; Silliman and Zheng, 2001; Ursino et al., 2001a, 2001b; Liu et al., 2002, 2007; Jose et al., 2004]. They allow mimicking field-scale processes while keeping the costs low and the experimental conditions under control [Lenhard et al., 1995; Liu et al., 2007]. The latter condition is essential when validating an inverse modeling procedure, since uncertainties linked to model geometry, initial conditions and hydraulic stresses have to be minimized. Liu et al. [2002, 2007] used a series of laboratory sandbox experiments to demonstrate the effectiveness of hydraulic tomography, a technique that images the spatial distribution of hydraulic conductivity in the subsurface. Nowak and Cirpka  used the sandbox experiments performed by Jose et al.  to validate their geostatistical inverse method. The method allowed the estimation of hydraulic conductivity and dispersivity on the basis of point head measurements and concentration data.
 The approach we present in this paper involves three major tasks: Task 1 is the design of the test case and generation of experimental data; Task 2 is the generation of lithofacies data with different borehole densities and creation of random realizations of the lithofacies distribution that honor the lithofacies data; and Task 3 is the estimation of hydraulic conductivity values and elimination of the realizations that do not allow honoring head data to back-estimate the distribution of hydraulic conductivity. In Task 1, a three-dimensional synthetic heterogeneous aquifer was created in an intermediate-scale laboratory tank using five different test sands (lithofacies). Three pumping tests were performed in order to generate experimental hydraulic head and discharge data. The model of heterogeneity of the synthetic aquifer was also used to simulate the pumping tests numerically and provide error-free synthetic hydraulic head and discharge data. In Task 2, random realizations of the three-dimensional heterogeneous distribution of lithofacies were generated. The hydraulic conductivities of the lithofacies were not specified at this stage and the realizations were conditioned on the knowledge of the sand type along the hypothetical boreholes. As the exact spatial distribution of lithofacies within the synthetic aquifer was known, it was possible to generate any desired quantity of lithofacies data at selected borehole locations. In order to quantify the effect of lithofacies data on groundwater flow model accuracy, we varied the number of boreholes used to generate the lithofacies data. Markov chains models of transition probability were used to analyze lithofacies data. In this work, we refer to the realizations as the “conditional random lithofacies realizations” or the “lithofacies realizations.” Finally, in Task 3, the hydraulic conductivities of the five lithofacies were estimated by calibrating a groundwater flow model built using the lithofacies realizations and using known initial and boundary conditions. Both experimental and error-free synthetic drawdown data were used for this task. Moreover, a Metropolis-Hastings criterion [Metropolis et al., 1953;Hastings, 1970] was used to reject realizations that did not allow honoring the head data. Error and uncertainty in the calibration and predictive simulations were then calculated in a systematic manner, as a function of the quantity of lithofacies data, to investigate how incorporating more lithofacies data in random lithofacies realizations improved model calibration and prediction accuracy. Note that, in our approach, we considered only lithofacies data from hypothetical boreholes for constructing lithofacies realizations and did not use any other type of information that might be obtained from geophysical methods. As noted above, boreholes are often “too sparse” in terms of providing sufficient information to estimate horizontal lithofacies distribution. By focusing only on the lithofacies data from different numbers of hypothetical boreholes, we attempted to develop a quantitative indicator to answer the fundamental and intuitive question; how sparse is too sparse?
 To the authors' best knowledge, because of the amount of time, effort, and cost required, this type of experiment has seldom been conducted and, rather, performed only numerically. Therefore, the combined experimental and modeling approach is one of the main contributions of this work to investigate the value of lithofacies data for improving model predictions.
 This paper is organized as follows: in section 2 the experimental setup and procedures are described, followed by the theoretical background for transition probability and procedures for generating lithofacies realizations in section 3. Then, the parameter estimation procedure and algorithm for accepting/rejecting realizations are described in section 4. Results are presented in section 5. Finally, conclusions are given in section 6.
2. Experimental Setup and Procedures (Task 1)
 In this section, we provide details on the methods adopted for designing the heterogeneous distribution of hydraulic conductivity and for packing the synthetic aquifer. We also describe instrumentation as well as three pumping tests that were conducted. Finally, as a form of validation of the experimental results from the pumping tests, we present a comparison of the experimental results with forward numerical simulations based on the designed heterogeneity, material properties, and imposed boundary conditions.
2.1. Design of the Synthetic Aquifer
2.1.1. Spatial Statistical Properties of the Synthetic Aquifer
 We consider an aquifer domain with dimensions Lx × Ly × Lz = 208 × 117 × 57 cm, in x (horizontal, parallel to ambient flow), y (horizontal, perpendicular to ambient flow), and z (vertical) directions, respectively. As shown in Figure 1, the domain was subdivided into 41 × 23 cells horizontally and 30 layers vertically, resulting in a total of 28,290 cells. Each cell has dimensions of 5.1 × 5.1 × 1.9 cm in x, y, and z directions, respectively.
 The three-dimensional heterogeneous aquifer consisted of two regions with different characteristic scales. The natural logarithm of the hydraulic conductivity (ln K) in the first region followed a multi-Gaussian model. It had a mean of 0.0642 cm s−1 and a variance σlnK2 = 1.2 (Figure 2). An exponential covariance function was used, with correlation lengths of λh = 10.2 cm in x and y directions, and λv = 3.8 cm in the vertical direction (twice the cell size). The statistical anisotropy ratio λh/λv = 2.7 was selected to allow correlation lengths to be sufficiently smaller than tank dimensions. A computer code based on the turning bands algorithm, similar to the one described by Tompson et al. , was used to generate the three-dimensional distribution of ln K.
 A large-scale layer of fine sand was embedded into the stationary field as a second region with dimensions 137 × 117 × 11.4 cm (in direction x, y, and z, respectively), resulting in a nonstationary composite synthetic aquifer. The horizontal scale of the fine layer was more than ten times larger than the horizontal correlation length of the random field, resulting in two different characteristic scales in the composite aquifer.
2.1.2. Discrete Synthetic Aquifer and Packing Procedure
 Five industrial silica sands of different grain sizes were used in this study. The generated continuous distribution of ln K values was binned into five categories, corresponding to the five sands. The sands are relatively uniform in terms of grain size distribution. Four of the sands are classified by their effective sieve numbers: #110 (sand 1, Ottawa sand, sold as F-110; U.S. Silica Company, Ottawa foundry sand F-110 unground silica, product data, 1 p., Ottawa, Illinois, 1997), #70, #30, and #16 (sand 2, 4, and 5, crushed sand, sold as Granusil 7030, 4060, and 2075, respectively; Unimin Corporation, Granusil mineral fillers, technical data, 2 pp., Emmett, Idaho, 1997). The last sand (sand 3) is a mixture of #30 (Unimin Corporation, technical data, 1997) and #50 (sold as grade #540, 2:1 by weight; Wedron Silica Company, Washed and dried silica sand #540, technical data sheet, 1 p., Wedron, Illinois, 1989). This was done so that the saturated hydraulic conductivities of the five sands were evenly distributed on a logarithmic scale over the range of values that was of our interest (Figure 2). Note that the mean K was chosen so that it coincided with the saturated hydraulic conductivity of sand 3. Throughout this paper, these sands are to be identified by their numbers (sands 1 to 5) as defined in Figure 1. Selected properties of the sands obtained in separate column tests are also provided in Figure 1. Saturated hydraulic conductivities were determined using a constant-head method [American Society for Testing and Materials, 2009].
 Once the three-dimensional continuous distribution of hydraulic conductivity generated in section 2.1.1 was converted into a three-dimensional discrete distribution of five sands, sand 1 was assigned to the 3,726 cells located in the fine sand layer (Figure 1). The fractions of the five sands in the random part of the discrete distribution are shown in Figure 2. The discrete distribution is found to reasonably approximate the initial continuous distribution, and the mean ln K and variance σlnK2 recalculated for the discrete random field are identical to those of the continuous distribution. Hereafter, the “reference model” will refer to the discrete spatial distribution of the five sands as shown in Figure 1 with the hydraulic conductivity values of the five sands reported in Figure 1.
 The discrete random field was wet-packed using a sheet metal grid (uniform grid size of 5.1 cm × 5.1 cm). Each cell had a volume of 49.0 cm3 and was filled with one of the five sands. For each sand, the mass that filled a single cell when thoroughly compacted was determined beforehand. During packing, the water level was always kept slightly higher than the top surface of the layer. The predetermined mass of sand was poured into the designated cell of the metal grid. Once a layer was filled, a hand vibrator was applied to the sheet metal grid to compact the sand. The thickness of the aquifer was checked every five layers, and no significant settlement was observed. We refer to the aquifer that was actually packed in the tank as the “synthetic aquifer.” It needs to be distinguished from the reference model, because the packed synthetic aquifer may not perfectly preserve the characteristics of the designed reference model, as detailed in section 2.3.
2.2. Flow Configuration and Hydraulic Data
 Ambient flow was created by applying a hydraulic head drop of 2 cm along the aquifer length. The water level in the upstream end reservoir (at x = 0 cm) was set at the surface of the aquifer (z = 57 cm) and the level in the downstream end reservoir at x = 208 cm was set at z = 55 cm. The resulting average ambient hydraulic gradient across the aquifer was thus approximately 0.01. The side boundaries along the x axis were no-flow boundaries. Three separate pumping tests (P1, P2, and P3, pumping well locations shown in Figure 3) were conducted under this initial average hydraulic gradient of 0.01. The three pumping wells (2.1 cm O.D.) with a screen length of 5.7 cm (z = 13.3–19.0 cm) were installed before packing by securely placing them at their final locations. For each test, a pumping rate of 100 cm3 min−1 was applied at the well using a diaphragm pump.
 The hydraulic head distribution within the tank was monitored using 92 observation wells located throughout the entire synthetic aquifer. The observation wells (groups A and B, as illustrated in Figure 3) were installed after completion of packing. Observation wells in group A were distributed at 26 horizontal locations and were designed so that hydraulic head could be measured at three different depths (z = 16.2, 39.1, and 50.5 cm from the bottom), whereas observation wells in group B were distributed at 14 horizontal locations and only allowed head measurements at a depth similar to that of the screens of the pumping wells (z = 16.2 cm from the bottom). Each observation well was made of a brass pipe (3.2 mm O.D., and 2.4 mm I.D.) with a small hole (2 mm diameter) at 1 cm from the tip, and the hole was covered by a piece of fine stainless steel mesh. The observation wells were connected to two mechanical multiplexers (Scanivalve Corp., model SSS-48C7/Biny/MK4), each being connected to a precision pressure transducer (Validyne Engineering Co., model P55D, range: 0–8.8 cm of water, resolution: ±0.22 mm of water). Each multiplexer can accommodate up to 48 observations wells, yielding a total of 96 available pressure measuring ports. Four of the ports were used to monitor water levels in the end reservoirs, and the remaining 92 ports were used for the observation wells. The hydraulic head distribution within the tank (in terms of drawdown, hereafter referred to as the “head data”), the discharge rate at the tank outlet (hereafter referred to as the “discharge data”), and the water temperature were recorded by a computer. Throughout this paper, we refer to head and discharge data collectively as “hydraulic data.” The transient change in drawdown was very quick (within the order of minutes), and the recorded data were not accurate enough for inverse modeling. Therefore, only steady state experimental drawdowns are reported here.
2.3. Validation of the Synthetic Aquifer Based on Forward Flow Simulations
 Despite the precautions that were taken during the packing procedure, slight differences between the designed reference model and the synthetic aquifer may exist. They can result from deviations of hydraulic conductivities from the values provided in Figure 1, or from mixing of sands at cell interfaces. Errors in the location of the wells and reading errors linked to the pressure measurement system can also cause errors in the measurement of hydraulic data (hereafter referred to as the “hydraulic data errors”).
 These potential imperfections can cause a discrepancy between the reference model and packed synthetic aquifer, and can affect the reliability of the experimental hydraulic data sets. The quality of the data generated using the synthetic aquifer was evaluated using numerical simulations performed with the finite difference code MODFLOW 2000 [Harbaugh et al., 2000]. The flow domain was discretized into 41 × 23 cells horizontally and 30 layers according to the physically constructed synthetic aquifer described in section 2.1. All numerical cells had a size of 5.1 × 5.1 × 1.9 cm. The simulations were conducted in a forward mode; that is, each pumping test was simulated using the lithofacies distribution and material properties given in Figure 1. The boundary conditions and stresses on the system were assumed to be known and identical to those mentioned in section 2.2. Drawdown data measured for pumping tests P1–P3 are plotted against the corresponding simulated values in Figure 4 and generally show a good agreement. The relative error ɛh between the observed and simulated hydraulic heads was calculated as
where Nobs is the number of hydraulic head observation wells, hobs are experimental head data observed in the synthetic aquifer, and hsim are the corresponding values simulated using MODFLOW 2000. The denominator of equation (1) corresponds to the hydraulic head drop of 2 cm applied between both end reservoirs of the synthetic aquifer. The relative error of the discharge was computed as ɛQ = (Qobs − Qsim)/Qsim, Qobs being the experimental discharge observed in the synthetic aquifer and Qsim being the corresponding value simulated using MODFLOW 2000. The relative errors in hydraulic heads were around 2% for each pumping test. The error in total discharge at the tank outlet was somewhat larger: the simulated discharges were larger than measured values by 10.9% for P1, 14.3% for P2 and 15.0% for P3. However, a fine adjustment of the hydraulic conductivity of each sand allowed reconciling the discrepancies observed for the total discharge while maintaining the error in the heads in the same range. The initial column-scale values reported in Figure 1 were always found to be in the confidence interval of the adjusted K values. For the sake of brevity, this procedure is not further described here.
 This analysis provides evidence that the synthetic aquifer packed in the tank preserved the designed heterogeneity within a reasonable margin of error. Although the actual hydraulic conductivities of the sands may be slightly different from those obtained in separate column tests (reported in Figure 1), the experimental data sets generated during the pumping tests were found to be accurate enough for inverse modeling.
3. Generation of Three-Dimensional Random Distributions of Lithofacies (Task 2)
 In this section, the general strategy for the generation of random distributions of lithofacies that approximate the reference model designed in Task 1 is presented. The random realizations of the three-dimensional heterogeneous distribution of lithofacies were generated on the basis of the knowledge of the sand type along the selected hypothetical boreholes. The theoretical background for Markov chains models of transition probability and procedures for generating lithofacies realizations are described in the following sections.
3.1. Lithofacies Data
 Five different lithofacies corresponding to the five sands in the synthetic aquifer are considered. We focus on the difference in the hydraulic properties of the lithofacies, assuming that each lithofacies is characterized by a single deterministic value of hydraulic conductivity as adopted by, for example, Fogg et al. , Zhang et al. , Lee et al. , and Ye and Khaleel . Lithofacies data come under the form of vertical distributions of sand numbers (as defined in section 1) obtained from hypothetical boreholes drilled in the reference model.
 Ten sets of hypothetical boreholes were considered for providing lithofacies data. Each set contains a different number of boreholes (NBH): 943, 472, 236, 105, 59, 38, 27, 20, 15, and 10, respectively. In each set, the boreholes were randomly located and did not necessarily coincide with the pumping or observation wells. A “borehole density index” dL/λh is introduced, where dL = (A/NBH)0.5 can be considered as the mean borehole spacing, A is the area of the domain, and λh is the horizontal correlation length. The borehole density index is a dimensionless index relating the mean borehole spacing to the horizontal correlation length and will help in discussing the possible generalization of the results to other systems. The values of dL/λh for the ten borehole sets are: 0.5, 0.71, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, and 5.0, respectively.
 Lithofacies data used in this study are ideal synthetic data. In the field, lithofacies data can be perturbed by numerous factors, including disturbances resulting from the drilling process and erroneous lithofacies identification. Moreover, field data may not be available along the entire length of the borehole owing to core loss. The potential effects of such additional uncertainties are not discussed here.
 The lithofacies distribution can be defined in terms of an indicator variable Ik(x):
where in this case k = 1…5 represents the sand number. We define the transition probability tij (h) from lithofacies i at x to lithofacies j at x + h (h being the lag between two points in direction ϕ) as the probability of finding lithofacies j at a distance h in direction ϕ from a location where lithofacies i is observed:
A one-dimensional continuous Markov chains model applied to a categorical variable in a direction ϕ assumes a transition probability matrix T of exponential form [Carle and Fogg, 1997; Carle, 1999]:
where R is a square transition rate matrix of the size of the number of categories to be considered for the variable of interest.
This parameterization is of high interest for the characterization of lithofacies transition probabilities in the vertical direction, as the mean length can be directly estimated from lithofacies data obtained in boreholes (see section 3.3). Off-diagonal terms in the transition rate matrix contain information on probabilities of transitions between facies (or juxtapositional tendencies):
where πij are embedded transition probabilities, which can also be directly estimated from the lithofacies data (see section 3.3). Other noticeable properties of the transition rate matrix are [Carle and Fogg, 1997; Carle, 1999]
where pi is the volume fraction of lithofacies i. Equations (7) and (8) can be used to define a background lithofacies, whose properties are inferred from those of other lithofacies.
3.3. Parameterization of a TPMC Model From Lithofacies Data
 The volume fractions pi, mean vertical lengths Li,z, and embedded vertical transition probabilities πij,z can be obtained from lithofacies data in a very straightforward fashion. The volume fraction of lithofacies i is obtained by dividing the total length of lithofacies i by the sum of the total lengths of the boreholes accounted for. The mean vertical lengths of each lithofacies were computed by dividing the total length of each lithofacies by its number of embedded occurrences (or the number of isolated inclusions) in the borehole set. Finally, we computed embedded transition probabilities by counting the number of transitions from one lithofacies to another [Carle and Fogg, 1997; Carle, 1999].
 Horizontal Markov chains models for transition probability cannot be obtained as easily, since only vertical boreholes were used in this study. Our approach was to infer the TPMC models in the horizontal directions from the vertical models. The horizontal-to-vertical statistical anisotropy ratio was assumed to be known and set to 2.7, the value adopted for the spatially correlated region of the reference model. The horizontal transition probabilities were computed by averaging upward and downward vertical transition probabilities using the methodology proposed by Walker . Finally, multidimensional Markov chains models were obtained through elliptical interpolation between one-dimensional Markov chains models in the principal directions [Carle and Fogg, 1997; Carle, 1999].
3.4. Simulation of Conditional Random Lithofacies Realizations
 On the basis of the TPMC models obtained in section 3.3, the simulation of lithofacies distributions conditional to the lithofacies data sets was performed using T-PROGS. For each of the 10 sets of boreholes, one hundred random lithofacies realizations were generated (Figure 5). A horizontal layer in the reference model consists of 41 × 23 = 943 cells. The realization generated using 943 boreholes was identical to the reference model shown in Figure 1, as the sand number was provided for all cells from the 943 boreholes. Hereinafter, the realizations generated on the basis of lithofacies data from 943, 472, … boreholes are referred to as the BH943, BH472, … realizations, respectively.
 Visual observation of the lithofacies realizations shown in Figure 5 reveals that as fewer boreholes are used, the correct representation of the fine sand layer is lost. This is a consequence of using a TPMC model to characterize a heterogeneous distribution that has two different characteristic lengths. In the TPMC model that we employed, only one parameter (statistical anisotropy) was used for inferring horizontal lithofacies distributions from those in the vertical direction.
 Finally, in order to provide a point of comparison, we also used two other simplified lithofacies models: a homogeneous model and a two-material layered model. The homogeneous model was used as an equivalent upscaled model with a process-specific effective hydraulic conductivity Khomo, to estimate limit values of model calibration-prediction error and uncertainty (that will be described in section 5.2). The layered model was built by modifying the homogeneous model to account for the correct position and extent of the fine sand layer. Two hydraulic conductivities Klayer for the fine sand layer and Kback for the background were considered for the layered model. Hence, the total number of lithofacies realizations tested was 903 (9 boreholes sets times 100 realizations, plus one realization generated using 943 boreholes that is identical to the reference model, one homogeneous model, and one layered model; see Figure 5).
4. Estimation of the Hydraulic Conductivity Field (Task 3)
 The estimation of the K field using both lithofacies data and hydraulic observations was performed using a maximum a posteriori framework (MAP), or Bayesian approach [e.g., Ezzedine et al., 1999; Chen and Rubin, 2003; Ye et al., 2004; Michalak, 2008]. The MAP framework has already been successfully applied in the literature to combine multiple data types in inversion problems. Chen et al.  estimated the hydraulic conductivity at the South Oyster Site, Virginia, using Bayesian techniques to integrate multiple types of geophysical measurements. Kowalsky et al.  presented a Bayesian method to integrate ground-penetrating radar data with other hydraulic measurements (such as point measurements of water saturation). Linde et al.  used a MAP approach to estimate the geometry of the groundwater table at a catchment scale by integrating self-potential and piezometric measurements. Unlike most of these studies, in which the MAP is determined by minimizing a single objective function combining hydrologic and other data, our approach requires to estimate the full a posteriori distribution of hydraulic conductivities. In this section, we briefly recall the statistical framework involved in the MAP method and introduce how the likelihood of hydraulic measurements is calculated, and how it is used within a Metropolis-Hastings criterion to accept or reject a given realization. We also describe how the K values of each lithofacies are computed, as well as their associated error and uncertainty. In section 4.5, a summary of the methodology is presented under the form of a computational algorithm.
4.1. Maximum a Posteriori Framework
 We denote the vectors of all log-hydraulic conductivity and facies number Y and F, respectively. Following the definitions introduced in section 3.2, we have
where Kk is the hydraulic conductivity of lithofacies k. The vector of hydraulic measurements is Zh, and the vector of containing lithologic data gathered from borehole logs is Zf. The goal of the inversion is to estimate Y and F given the measurements Zh and Zf. In a MAP framework, the best estimates of Y and F correspond to the mode of p(Y,F∣Zh,Zf), the joint probability density function of Y and F conditioned to the Zh and Zf measurements. Using Bayes theorem of conditional probabilities, we can write
where p(Zh∣Y,F) is the likelihood of hydraulic heads, representing the mismatch between measured and simulated hydraulic observations, p(F∣Zf) is the distribution of lithofacies conditioned to observations, and p(Y∣F) expresses prior knowledge about the attribution of log-hydraulic conductivity values to certain lithofacies types.
 It is reasonable to assume that p(F∣Zf) does not differ dramatically from one conditional realization of the lithofacies distribution to another. Then, if no prior information on the hydraulic conductivity of lithofacies is available, it appears from equation (10) that the conditional probability density function of the log conductivity field p(Y,F∣Zh,Zf) is dominated by the likelihood of the hydraulic measurements p(Zh∣Y,F), and p(Y,F∣Zh,Zf) can be directly estimated from p(Zh∣Y,F).
4.2. Estimation of the Likelihood of Hydraulic Measurements
 For each lithofacies realization drawn from p(F∣Zf) created according to the methodology described in section 3.4, the five hydraulic conductivity values (K1 to K5) of the five lithofacies were estimated using the inversion code UCODE 2005, applied in combination to the numerical model set up using MODFLOW 2000. UCODE 2005 is a general-purpose inversion code developed by the USGS [Poeter et al., 2005]. If no prior information is used, UCODE 2005 minimizes a weighted least squares objective function of the form
with respect to the five hydraulic conductivity values K1 … K5. Observed and simulated heads are hobs and hsim, respectively, while observed and simulated total discharges are Qobs and Qsim. The weights applied to head and total discharge observations are ωh and ωQ, respectively. In this case, we used a weight ωh = 1 for all head observations and a weight ωQ = 0.1. This was done in order to avoid the objective function to be driven by discharge observations, since the numerical values of discharge data are about 1 order of magnitude larger than the numerical values of head data. A more rigorous approach would have required the weights to be inversely proportional to the variances of measured data [Hill and Tiedeman, 2007], but we found that similar results could be obtained as long as head and discharge data contributed evenly to the objective function. The sensitivities of observations to the five hydraulic conductivities were computed using a forward finite difference approximation.
 The numerical model uses the same numerical grid and boundary conditions as the model presented in section 2.3. Each cell of the numerical model contains a single lithofacies, assigned according to the realization under consideration. Convergence was assumed to be reached when parameters changed by less than 1% between iterations and a maximum of 20 iterations was imposed. Starting values of hydraulic conductivities were those obtained from column tests. Not all realizations were expected to successfully converge during the parameter estimation process, considering the relatively arbitrary nature of the convergence criterion adopted. For lithofacies realizations that did not converge within 20 iterations, we used the K values that yielded the lowest value of the objective function. For the homogeneous model, only one hydraulic conductivity Khomo was estimated, while two hydraulic conductivities Klayer and Kback were estimated for the layered model (see section 3.4). The K values for the simplified models were also estimated by the inversion code UCODE 2005.
 The likelihood p(Zh∣Y,F) is related to the minimum value of the objective function O after application of the inversion code UCODE 2005. In this study, we assume
Hence, the likelihood is higher when the distribution of hydraulic conductivities allows a close match of observed hydraulic data and reciprocally.
4.3. Metropolis-Hastings Algorithm to Accept or Reject a Realization
where q(θt, θt+1) is called proposal distribution, from which θt+1 is randomly sampled. The distribution q is usually centered on θt and has a given shape and width. In this application, since lithofacies realizations generated as described in section 3.4 are independent realizations, q is a uniform distribution and the probability of acceptance reduces to
A priori information on the log-hydraulic conductivity values of lithofacies p(Y∣F) could be easily included in the methodology by adding to equation (14) penalty terms when the estimated hydraulic conductivities are too different from the column-scale values or when the order of hydraulic conductivity values (i.e., K1 < K2 < … < K5) is not verified.
Hence, when the minimum value of the objective function for θt+1 is lower than that corresponding to θt, θt+1 is systematically accepted. On the contrary, when min(O(Zh, θt)) < min(O(Zh, θt+1)), θt+1 is accepted with a probability α. Since lithofacies realizations are generated independently of each other, the bulk application of this method leads to the acceptance of several realizations with a very large value of the minimized objective function. Once such a realization becomes the reference realization θt, more realizations with large values of min(O(Zh, θt+1)) are included in the a posteriori distribution, and the distribution becomes artificially wide. In order to further constrain the estimation procedure, it was decided to limit the probability of acceptance to values larger than 75%. While this criterion was chosen somehow arbitrarily, it allowed maintaining some convergence in the estimation process by systematically eliminating realizations resulting in a poor match of hydraulic data. A more rigorous examination of the convergence properties of this algorithm is beyond the scope of this study.
4.4. Estimated K Values and Head Distributions and Associated Error and Uncertainty
 By applying a Lilliefors test [Lilliefors, 1967], it was found that the estimated ln K values of each lithofacies were most of the time normally distributed, which allowed us to estimate the K value of each lithofacies as the mean of the a posteriori distributions and to calculate the uncertainty as the standard deviation of the distributions.
 For each hydraulic conductivity field, it is possible to compute the hydraulic head at each observation point. By compiling the simulated hydraulic heads from all conductivity fields (for a given set of boreholes), we obtain 92 probability distributions of hydraulic head, each corresponding to one observation point. Performing a Lilliefors test on the head data also showed that the hydraulic head values at each observation point were mostly normally distributed. The estimated hydraulic head hest at each observation point was computed as the mean of the corresponding distribution, and the “calibration error” was computed using equation (1). Then, the “calibration uncertainty” in simulated heads was computed as the average over 92 observation points of the coefficient of variation of the a posteriori distributions, the latter being the ratio of the standard deviation divided by the mean of the distribution:
where hest is the mean of the hydraulic head distribution and σest is its standard deviation (one value for each observation point).
 We refer to “calibration error and uncertainty” when ɛh and σh are computed for hydraulic head data that have been used to estimate hydraulic conductivities in the inverse procedure (e.g., simulated/calibrated head data in P1 using K values estimated using observed data from P1). They do not reflect any predictive capabilities of the calibrated model, but rather describe the goodness of fit of estimated hydraulic heads with observed data. “Prediction error and uncertainty” refer to ɛh and σh being computed for data that have not been used to estimate lithofacies K values in the inverse procedure (e.g., simulated/predicted head data in P2 or P3 using K values estimated using observed data from P1). In this case, the models are used in a forward mode and ɛh and σh describe their predictive capabilities.
4.5. Summary of the Procedure
 The general methodology used to determine the a posteriori distribution of ln K values is summarized as follows.
 1. Collect hydraulic data during pumping tests.
 2. Select a set of boreholes that are randomly located in the reference model.
 3. Collect lithofacies data along the selected boreholes.
 4. Generate 100 lithofacies realizations using a transition probability/Markov chains model.
 5. For each realization, estimate the K values of the five lithofacies by matching the hydraulic observations. This step is achieved using UCODE 2005.
 6. For each realization, accept or reject the realization using an accept-reject Metropolis-Hastings algorithm, by comparing the minimum value of the objective function to that of the last accepted realization. Practically, the first accepted realization was chosen as the first realization to successfully converge within 20 iterations, using the convergence criteria described in section 4.2. The modified Metropolis-Hastings algorithm proceeds as follows: First, draw a random number Z between 0.75 and 1 from a uniform distribution. Second, compute the ratio of the minimum value of the objective function for the last accepted realization to that of the current realization min(O(Zh, θt))/min(O(Zh, θt+1)). Third, compare min(O(Zh, θt))/min(O(Zh, θt+1)) to Z. If min(O(Zh, θt))/min(O(Zh, θt+1)) < Z, reject the realization. Otherwise accept the realization and set θt = θt+1.
 7. Repeat steps 5 and 6 for 100 realizations to compute the a posteriori distributions of lithofacies, K, and hydraulic heads.
 8. Compute calibration error and uncertainty and prediction error and uncertainty.
 9. Repeat steps 2–8 for different densities of borehole data dL/λh in order to assess the value of lithofacies data for improving the quality of a groundwater flow model.
5. Parameter Estimation Results
 In this section, the results of the parameter estimation exercises performed using the hydraulic data obtained during the three pumping tests are presented. Two types of hydraulic data were used as observations to constrain the inversion: (1) experimental drawdowns at the 92 observation wells and ambient discharge rates measured during the pumping tests performed in the laboratory aquifer and (2) synthetic drawdowns at the 92 observation wells and ambient discharge rates calculated with MODFLOW 2000, using the reference model and the hydraulic conductivity values measured for individual sands in the separate column tests (shown in Figure 1). The experimental hydraulic data are affected by errors caused by several imperfections, some of them being noted in section 2.3. On the contrary, the synthetic head data simulated numerically using the reference model and the hydraulic conductivity values shown in Figure 1 are free of errors. In this context, when the synthetic hydraulic data are used, the “true” distribution and K values of five lithofacies are known. Comparison of the parameter estimation results based on the above two types of observation data will allow us to (1) assess how the model calibration-prediction accuracy and uncertainty improve when increasing quantities of lithofacies data are incorporated into the construction of three-dimensional lithofacies distributions and (2) separate the effect of experimental errors in hydraulic data and evaluate the sole effect of lithofacies data on the parameter estimation process.
 A total of six parameter estimation exercises were performed. First, three exercises were performed using steady state experimental hydraulic data recorded during the pumping tests conducted in P1, P2, and P3, respectively. Second, the corresponding error-free synthetic hydraulic data were used for three additional parameter estimation exercises. Each of the parameter estimation exercises was based on hydraulic data obtained in one of the pumping tests. Since 901 different lithofacies realizations plus two simplified models (thus, a total of 903 realizations/models) were tested for each set of observations, a total of 5,418 steady state groundwater flow models (903 realizations/models × 3 pumping tests × 2 drawdown data types) were run in inverse mode to estimate K1 through K5 values of the conditional random lithofacies realizations, Klayer and Kback in the layered model, and Khomo for the homogeneous model. In addition, we applied the same procedure to two other synthetic cases with higher and lower levels of heterogeneity. The same realizations shown in Figure 5 were used, but the range of the five K values was shrunk/stretched to lower/higher values of the variance σlnK2 = 0.6 and 2.4, respectively. Details on the methodology and the results are given in section 5.3.
5.1. Estimated K Values Using Experimental and Synthetic Hydraulic Data
Figure 6 shows the distributions of estimated ln K values for three selected sets of boreholes (NBH = 472, 105, and 10) based on the experimental and synthetic hydraulic data, respectively. In general, for BH010 realizations, the estimated K values spread over up to 7 orders of magnitude for K5 and about 3 orders of magnitude for K1. Black downward arrows indicate the K values from the separate column tests. As more lithofacies data were used, the distributions were characterized by smaller widths and more distinct peaks that, most of the time, coincided with the arrows. When the synthetic hydraulic data were used, the peaks were found to be higher with a narrower width. For BH472 realizations, all of the peaks coincided with the “true” K values.
 The means of the distributions of K values of five lithofacies as a function of the borehole density index dL/λh are shown in Figure 7, as well as their corresponding 95% confidence intervals, both of which were computed from distributions similar to those shown in Figure 6. The estimated K values for sands 2 to 5 are reasonably close to those obtained in the column tests (plotted at dL/λh = 0), even for lithofacies realizations generated using a smaller number of boreholes. However, it appears that the 95% confidence intervals can be very large, especially for sands with lower volume fractions, probably as a result of a lower sensitivity of hydraulic observations to these parameters.
5.2. Model Calibration-Prediction Error and Uncertainty
 Model calibration error and uncertainty as defined in equations (1) and (16) are shown in Figure 8 as a function of dL/λh. The results obtained on the basis of the experimental hydraulic data were plotted separately from those obtained using error-free synthetic hydraulic data. When the experimental hydraulic data were used (Figure 8a), in general, the lithofacies realizations generated using more lithofacies data (i.e., smaller dL/λh) resulted in smaller model calibration error. The improvement in the model calibration errors is somewhat distinct for dL/λh < 2.5 in cases P1 and P2 and remained relatively constant for 2.5 < dL/λh < 5. Another noticeable and important behavior in Figure 8a is the upward trend at dL/λh = 0.5–0.71. The calibration error was larger for the BH943 realization than for the BH472 realizations. The experimental hydraulic data set contains errors resulting from the factors described in section 2.3. Since the BH943 realization is identical to the reference model, the distribution of lithofacies is given and the inversion procedure has only 5 degrees of freedom corresponding to the hydraulic conductivities of the five sands. Inversions using the BH472 realizations have a larger number of degrees of freedom, since the type of sand is allowed to vary among realizations where no hypothetical borehole is present. Hence, experimental errors can be partly addressed in the BH472 realizations by rejecting lithofacies distributions corresponding to a poor likelihood of hydraulic data. The calibration uncertainty in Figure 8b shows a more distinct improvement for dL/λh < 2.5, while it remains relatively constant for 2.5 < dL/λh < 5. A small uncertainty reflects a small variability of the calculated heads at the 92 observation wells among different realizations. Thus, as more lithofacies data were incorporated, although the discrepancy between observed and simulated heads improved rather gradually, the variability among realizations seemed to decrease more rapidly.
 When using the error-free synthetic hydraulic data, the general trends in calibration error (Figure 8c) and uncertainty (Figure 8d) were similar to those observed using the experimental data. However, overall levels of error and uncertainty were lower than using the experimental hydraulic data. The error and uncertainty remained almost constant for 2.5 < dL/λh < 5 as also observed in Figures 8a and 8b. A remarkable difference in calibration error when the synthetic hydraulic data were used was the drastic improvement observed for dL/λh < 2.5. Because the BH943 realization at dL/λh = 0.5 was identical to the reference model, the corresponding calibration error was zero, as a result of which the upward trend observed in Figure 8a disappeared in Figure 8c.
 The improvement in calibration error for dL/λh < 2.5 using the “error-free” synthetic hydraulic data was obscured when the experimental data were used, largely resulting from the “hydraulic data errors” due to the several factors as described in section 2.3. In practice, findings when the experimental data were used would be more realistic because the observed hydraulic data are expected to contain some degree of errors. The slight upward trend in Figure 8a may suggest that there exists a trade-off between accuracy in lithofacies distribution and precision of hydraulic data, which appears as the local minimum in calibration error at around, in this case, dL/λh = 0.71. Such a minimum corresponds to the optimal number of boreholes needed to construct random lithofacies realizations, above which increasing boreholes does increase the quality of the realizations but does not further reduce calibration error. It has to be emphasized that the optimal number of boreholes suggested by these results is beyond practical limits. Further study is needed to investigate whether or not the optimum number of boreholes may be more feasible under different conditions.
 Calibration errors for the layered and homogeneous models are also plotted in Figures 8a and 8c. The homogeneous model yielded the largest error among the models and realizations tested, and the error was about twice that of the BH010 realizations (dL/λh = 5). The errors for the layered model were only slightly lower than those of the homogeneous model. The simplified models have only 1 or 2 degrees of freedom. The flexibility of the models significantly increases when the lithofacies type can vary in each of the 28,290 cells, allowing lower calibration errors to be reached.
Figure 9 shows prediction error and uncertainty. Although equations (1) and (16) were used to calculate these, the computation was done differently, as noted in section 4.4. For example, the “prediction” error for P1 was calculated on the basis of the “predictive” simulation of pumping test P1 using the models calibrated on the basis of the hydraulic data from P2 or P3. The “calibration” error for P1, on the other hand, was based on the model calibrated using the data from P1. It is interesting to note that the predictive error and uncertainty show nearly the same trend as the calibration error and uncertainty in Figure 8. Incorporating more lithofacies data for the construction of lithofacies realizations has the same impact on the quality of model calibration and on the quality of predictive simulations conducted using the calibrated model.
 The results presented in Figures 6–9 suggested that (1) incorporating more lithofacies data generally improved the quality of the random realizations; the mean of the estimated K values approached those from the column tests and their corresponding 95% confidence intervals became narrower as dL/λh decreased, (2) the effect of adding lithofacies data became significant when the mean borehole spacing was on the order of about twice the horizontal correlation length or less; both calibration-prediction error and uncertainty showed a distinct improvement for dL/λh < 2.5 when synthetic data were used, (3) prediction error and uncertainty showed a similar behavior, indicating that incorporating more lithofacies data into the lithofacies realizations has the same impact on the quality of model calibration and predictive simulations, (4) errors in hydraulic data led to a decrease in parameter estimation accuracy and obscured the improvement in random realization quality; additional analysis showed that adding 2% random error to the synthetic hydraulic data yielded similar calibration-prediction error and uncertainty as when using the experimental hydraulic data, which is consistent with the levels of errors reported in section 2.3, and (5) calibration-prediction error based on the experimental hydraulic data yielded a local minimum, suggesting that there exists an optimal number of boreholes allowing to maximize the quality of the groundwater flow model; in this particular study, it was found that this optimal number was beyond practical feasibility.
5.3. Sensitivity Study for Different Levels of Heterogeneity (σlnK2 = 0.6 and 2.4)
 The heterogeneity with σlnK2 = 1.2 employed in the above discussion is considered to be moderate. With the goal of generalizing the findings for a range of aquifers with various degrees of heterogeneity, two other hypothetical cases were investigated by changing σlnK2 to 0.6 (relatively homogeneous) and 2.4 (highly heterogeneous). This was achieved by scaling the hydraulic conductivity values K1, K2, K4, and K5 according to the new σlnK2 value, while keeping K3 unchanged. Although the resulting distributions may not be precisely lognormal, the parameter estimation results should yield insight for situations with different levels of heterogeneity. For each heterogeneity level, the 903 lithofacies realizations/models shown in Figure 5 were used. For the parameter estimation, using the “scaled” K values and the reference heterogeneity shown in Figure 1, synthetic steady state drawdown data were generated with MODFLOW 2000 for a pumping test P1 and used as observation data. Parameter estimations were conducted, and improvements in model calibration error and uncertainty with increasing quantities of lithofacies data were investigated.
Figure 10 shows calibration-prediction error and uncertainty for the cases with σlnK2 = 0.6 and 2.4 as well as the reference case σlnK2 = 1.2 (used in the reference model). Both calibration error and uncertainty decreased for the relatively homogeneous case (σlnK2 = 0.6) and increased for the highly heterogeneous case (σlnK2 = 2.4). Interestingly, the shape of the plots remained the same, demonstrating a significant reduction in calibration error and uncertainty for dL/λh < 2.5 and remaining nearly constant for 2.5 < dL/λh < 5. For the conditions that were considered in this study, this trend was therefore insensitive to the level of heterogeneity. These results suggest that the threshold value of dL/λh = 2.5 seemed to be maintained and similar conclusions as given in section 5.2 can be derived for the range of heterogeneity that we examined.
 Through an approach based on both intermediate-scale laboratory sandbox experiments and numerical simulations, we investigated how adding lithofacies data into the construction of the conceptual model of aquifer heterogeneity helps reduce errors in parameter estimation and uncertainty in the outcome of a groundwater flow model. A heterogeneous aquifer was created numerically (reference model) and in a three-dimensional laboratory tank (synthetic aquifer) using five different test sands. Three pumping tests were performed to generate experimental hydraulic data to be used in the calibration procedure and when validating the groundwater flow model. The pumping tests were also simulated numerically in order to provide a set of error-free synthetic hydraulic data.
 We have shown that the packed synthetic aquifer reasonably followed the design of the reference model. The average discrepancy between experimental and numerically simulated head data was about 2%. The discrepancy between measured and simulated total discharge was somewhat larger owing to slight imperfections in the packing of the laboratory synthetic aquifer.
 A simple Bayesian method was proposed to integrate lithofacies data and hydraulic data for the estimation of the hydraulic conductivity field. The method required (1) the generation of a number of random lithofacies realizations conditioned to lithofacies data obtained in a given number of boreholes, (2) the estimation of the hydraulic conductivities of the lithofacies on the basis of hydraulic data measured in observation wells during a pumping test by calibrating a numerical model of groundwater flow, and (3) the acceptance or the rejection of the resulting hydraulic conductivity fields on the basis of a Metropolis-Hastings criterion applied to the likelihood of the hydraulic head observations. The likelihood of the hydraulic head observations was computed from the weighted average of the squared discrepancies between observed and simulated hydraulic data. The application of the Metropolis-Hastings criterion allowed rejecting lithofacies realizations that honored hydraulic observations to a lesser extent. The result of the method was a distribution of hydraulic conductivity fields, which could be used to compute a distribution of hydraulic head data.
 A dimensionless parameter referred to as the borehole density index dL/λh was introduced to describe the amount of lithofacies data used in the inverse procedure. The borehole density index relates the average distance between boreholes from which lithofacies data were obtained to the horizontal correlation length characterizing the heterogeneous distribution of lithofacies materials. The aforementioned method was applied for a range of dL/λh values. The findings of this study were the following:
 1. The distributions of a posteriori ln K were characterized by smaller widths and more distinct peaks that, most of the time, coincided with the column test K values. The confidence intervals of the estimated hydraulic conductivities were found to decrease steadily with the decreasing borehole density index dL/λh. The presence of experimental errors in hydraulic data used in the inverse procedure was found to have a limited influence on the a posteriori distributions of ln K.
 2. Using the experimental hydraulic data, the calibration error and uncertainty were found to decrease gradually with decreasing dL/λh. In several cases, the calibration error exhibited a local minimum near dL/λh = 0.71. This indicates that, for BH943 and BH472 realizations, decreasing the density of lithofacies information resulted in a lower accuracy of the distribution of lithofacies but a better match with the experimental hydraulic data. The amount of lithofacies data corresponding to this local minimum therefore represented a trade-off between accurate representation of the lithofacies distribution and accurate modeling of hydraulic data. When the hydraulic data contain some errors as expected in the field, an optimal value of dL/λh that minimizes the calibration error may exist. With the level of errors we used, the optimal value was at around 0.71, which corresponded to a very large number of boreholes.
 3. When the error-free synthetic data were used instead, the calibration error and uncertainty remained relatively constant for 2.5 < dL/λh < 5 but decreased drastically for dL/λh < 2.5. In other words, incorporating lithofacies data into lithofacies model construction started to yield an improvement in the groundwater flow model accuracy when the mean borehole spacing was on the order of twice the horizontal correlation length. This improvement in the groundwater flow model solely results from the increase of the quantity of lithofacies data because the hydraulic data contained no uncertainty.
 4. Prediction error and uncertainty were found to follow the same trends as the calibration error and uncertainty. This implies that incorporating more lithofacies data for the construction of lithofacies realizations has a similar impact on the quality of model calibration and on the quality of predictive simulations conducted using the calibrated model.
 5. The method was also applied to hypothetical aquifers with the same lithofacies distribution but with higher and lower levels of heterogeneity, for the same range of dL/λh values. While the overall levels of error and uncertainty in calibrations and predictions increased/decreased with the variance of the ln K field, we also observed a systematic significant improvement in groundwater flow model accuracy for dL/λh < 2.5.
 Using more lithofacies data yielded a systematic improvement of the accuracy of the groundwater flow model, as compared to a homogeneous model or a two-lithofacies layered model of the heterogeneity. The limiting borehole density indices provided in findings 2 and 3, that is, dL/λh ≈ 0.71 and dL/λh ≈ 2.5, which correspond to when the optimal accuracy in the groundwater flow model was achieved and when the increase in the lithofacies data quantity started to yield an improvement in the groundwater flow model accuracy, respectively, can be considered as quantitative indicators to answer the fundamental and intuitive question regarding the borehole spacing that we raised in section 1; how sparse is too sparse?
 The values of λh were estimated to be between 2.7 and 5.1 m at Borden site, 2.9 and 3.5 m at Cape Cod site, and 4.8 to 12.8 m at the MADE site (e.g., as summarized by Fernàndez-Garcia et al. ). For example, to satisfy dL/λh < 2.5, our results suggest that lithofacies data from boreholes would be needed roughly every 5 to 26 m for field sites with this range of λh values. Such densities of boreholes might not be practically feasible and the borehole density index dL/λh is more likely to be larger than the limiting value of 2.5 identified in this study. Hence, in field studies, while using lithofacies data in a stochastic framework always yields an improvement as compared to simple deterministic models of heterogeneity, lithofacies data could be too sparse to provide significant constrains on the estimation of the heterogeneous distribution of lithofacies in the subsurface. On the basis of the same reference model and synthetic aquifer, we are currently conducting further studies to investigate the effect of the quality of hydraulic data and the improvement in model accuracy when other types of data are incorporated in the inverse procedure, such as concentration data from a series of tracer tests.
 This research was funded by the U. S. Army Research Office award W911NF-04-1-0169. Christophe Frippiat acknowledges funding from Los Alamos National Laboratory and from the National Science Foundation of Belgium (F.R.S.-F.N.R.S. grant 1.1.035.07.F). Russell S. Harmon of the U. S. Army Research Office, Stacy E. Howington and John F. Peters of the U. S. Army Engineer Research and Development Center, Harihar Rajaram of the University of Colorado at Boulder and Eileen Poeter of the Colorado School of Mines are thankfully acknowledged for their helpful suggestions and assistance. We are also grateful to Olaf Cirpka, Kamini Singha, and two anonymous reviewers, who helped to substantially improve the manuscript.