Two sandbox experiments were conducted to evaluate the performance of a sequential geostatistical inverse approach for hydraulic tomography in characterizing aquifer heterogeneity. One sandbox was packed with layered sands to represent a stratified aquifer, while the other was packed with discontinuous sand bodies of different shapes and sizes to represent a more complex and realistic heterogeneous aquifer. Parallel to the sandbox experiments, numerical experiments were conducted to assess the effects of measurement errors and uncertainties associated with laboratory data, and to diagnose the hydraulic conductivity estimates obtained from sandbox experiments. Results of this study show that our sequential inverse approach works well under realistic conditions, in spite of measurement errors and uncertainties associated with pumping rates, boundary conditions, pressure head measurements, and other parameters required by our model. The tomography was found to be ineffective if abundant head measurements were collected at closely spaced intervals in a highly stratified aquifer. On the other hand, it was found to be beneficial when pressure head measurements were limited and the geological structure was discontinuous.
 Hydraulic tomography, a sequential aquifer test, has recently been proposed to characterize aquifer heterogeneity [Gottlieb and Dietrich, 1995; Butler and Liu, 1993; Butler et al., 1999; Yeh and Liu, 2000]. Specifically, fully screened wells are divided into many vertical intervals using packers. Water is pumped from an aquifer at one of the intervals to create a steady state flow condition. Hydraulic head responses at other intervals are then monitored, yielding one set of head/discharge data. Then the pumping location is moved to another interval, and the resulting steady state head responses at other locations are collected accordingly, resulting in a second data set. By performing this procedure sequentially, a large number of head/discharge data sets can be obtained. With a proper inverse methodology, these data sets can be used to produce a detailed image of heterogeneity in the aquifer.
 Several researchers [Scarascia and Ponzini, 1972; Sagar et al., 1975; Giudici et al., 1995; Snodgrass and Kitanidis, 1998] have investigated the use of data corresponding to different flow situations to improve the uniqueness of the inverse solution or to reduce uncertainties in the identification of flow model parameters. Until recently, very few researchers have investigated the idea of hydraulic tomography. Gottlieb and Dietrich  proposed a hydraulic tomography method and employed a least squares based inverse approach to illustrate its potential to identify the permeability distribution in a hypothetical two-dimensional saturated soil. Butler et al.  applied the hydraulic tomography concept to networks of multilevel sampling wells. They developed new techniques for measuring drawdown at multilevel sampling ports that had previously been unobtainable. They suggested that such sampling techniques could facilitate the implementation of hydraulic tomography in the field. Until recently, even fewer researchers have attempted to develop a realistic three-dimensional (3-D) inverse model for hydraulic tomography because computational burdens hinder applications of classical inverse algorithms to 3-D hydraulic tomography. Yeh and Liu  have developed a sequential geostatistical inverse approach that eases the burdens and allows one to efficiently interpret the abundant data sets produced by hydraulic tomography. In their study, not only did they demonstrate the robustness of their inverse approach, but they also investigated the network design issue for hydraulic tomography, and addressed uncertainty in the hydraulic conductivity estimate.
 Hydraulic tomography has been tested using numerical experiments [Gottlieb and Dietrich, 1995; Yeh and Liu, 2000] but not laboratory or field experiments. In numerical experiments, effects of conceptual model errors are absent because synthetic tomography data are generated from the same model used in the inversion. Model inputs are also assumed to be error-free. Conversely, in field experiments, the effects of conceptual model errors are unknown. Model inputs, such as boundary conditions, pumping rates, mean, variance, and correlation scales, are always subject to uncertainty. Further, the pressure head/discharge data inevitably contain unknown measurement errors. Field experiments are thus the most appropriate test for hydraulic tomography.
 Nevertheless, field experiments are so costly that well-controlled sandbox experiments are a reasonable alternative. In this paper, we tested the effectiveness of our sequential inverse approach [Yeh and Liu, 2000] with two sandbox experiments. The first experiment represented a stratified aquifer system, while the other represented a more complex and realistic heterogeneous aquifer. In addition, numerical experiments were conducted to diagnose anomalies in the inverse results from the sandbox experiments, and to explore conditions under which the hydraulic tomography can be effective.
2. Experimental Setup
2.1. Design of the Sandbox
 The sandbox has outside dimensions of 92 cm in length, 4.5 cm in width, and 62 cm in height, and inside dimensions of 80, 3.2, and 50 cm, respectively. Two commercially sieved sands were used to pack the sandbox: a number 30 and number 60 silica sand. For number 30 sand, greater than 60% of the sand is retained on sieve number 30, whose mesh size is 0.6 mm. For number 60 sand, greater than 60% of the sand is retained on sieve number 60, whose mesh size is 0.25 mm. These two sands were selected because of the contrast in their grain sizes and the uniformity of their grain size distributions. The hydraulic conductivity of these two types of sand was determined using the constant head permeameter procedure [Klute and Dirksen, 1986]. The resulting saturated hydraulic conductivity values were 0.165 cm s−1 for the medium sand (number 30) and 0.038 cm s−1 for the fine sand (number 60). These two sands were used to create two different structures of heterogeneity in the sandbox for flow experiments to be discussed in sections 2.2 and 2.3.
 The sandbox was constructed with 1/4 inch acrylic and supported by 1/8 inch angle irons to brace the walls of the sandbox and to control bowing due to the mass of soil and water. A network of 14 monitoring ports consisting of two columns of seven locations each was drilled into one face of the sandbox. A cylindrical filter with a diameter of 0.5 cm and a length of 2.5 cm was placed in each of the ports. These filters protruded into the soil, partially penetrated the sandbox, and were connected to the exterior of the sandbox through tubing. This allowed each location to be monitored by a pressure transducer or used as an extraction port. There are reservoirs on either side of the sandbox, and a constant head is maintained in these reservoirs through a mariotte device connected through inlets at the bottom of the reservoirs. This mariotte device consists of a sealed carboy with an atmospheric line set at the level of constant head in the sandbox. Water in the reservoirs enters the packed sand through perforated plates on either side of the box, which allow water to pass through, but prevent sand from leaking into the reservoirs (Figure 1).
 During the tomography some of the 14 monitoring ports were selected to be the pumping ports. Water was pumped using a vacuum pump, and the flow field was allowed to reach steady state. The pumping rate was maintained constant throughout the experiment by use of a rotometer (Matheson Gas Products model 604). An increase in pumping rate causes a float to rise in relation to graduations on the rotometer, and a needle valve of the rotometer allows one to adjust the pumping rate. The pumping rate was carefully selected during the two sandbox experiments. A pumping rate that is too low may cause changes in pressure smaller than the sensitivity of the instruments. A pumping rate that is too large could disrupt the upper boundary condition of the sandbox due to an insufficient communication between the reservoirs and the sand unit.
 Pressure heads were monitored using gage pressure transducers during each pumping event. Data from the experiments were obtained using a Campbell Scientific Data Logger connected to two multiplexers, and were processed by the Campbell Scientific PC-208W software. The use of multiplexers allows for virtually simultaneous measurement of the transducers.
2.2. Description of Sandbox 1
 Sandbox 1 was designed to represent a layered aquifer. A uniform and continuous horizontal layer of fine sand was packed in the middle of the sandbox with a thickness of 9.5 cm, and the remainder of the sandbox was packed with the medium sand. The configuration of sandbox 1 is shown in Figure 2.
 During the data collection phase the sandbox was pumped at one of the 14 ports, and the resultant steady state pressure heads were recorded at the other 13 ports of the monitoring network. The flow in the sandbox reached steady state within approximately 10 s. Afterward, the pump was moved to another port, and the corresponding pressure heads were collected, yielding the second data set. By performing this procedure sequentially, several data sets were obtained. For sandbox 1, three data sets were obtained by pumping at three different locations.
2.3. Description of Sandbox 2
 Since the geological structure of an aquifer is typically more complex than that in sandbox 1, sandbox 2 was packed with the two types of sand but with a discontinuous and complex structure. Specifically, four lenses of fine sand were contained within a medium sand matrix, with each lens of the fine sand exhibiting a different shape and size. The configuration of sandbox 2 is shown in Figure 3.
 Our experience with sandbox 1 experiments (to be discussed in section 5.1) led to several modifications of the design of sandbox 2. Two more sampling ports on the upper part of the sandbox and two more at the lower part of the sandbox were added to each column of ports, for a total of 11 evenly spaced ports per column. These additional ports gave us more information of the pressure response of the system. The number of holes in the perforated plates connecting the reservoirs to the main portion of the sandbox was also doubled to ensure sufficient communication between these two regions. Finally, the diameter of the tubes connecting the mariotte device to the inlets in the reservoirs was increased to improve the reservoirs' response to changes in constant head level. Thus the constant head boundary condition could be maintained at a stationary level with less uncertainty. Since there was some concern over the ability of the rotometer to accurately measure a low flow rate, in sandbox 2 the rotometer was replaced with a more accurate instrument of similar design (Key Instruments FR-4000 Model 4l52).
 During the implementation of hydraulic tomography in sandbox 2, data sets were again collected from the two columns of ports. Five data sets were created by pumping at five selected locations (see Figure 7), and each data set contained pressure head measurements for 21 locations.
3. Method of Analysis
 A sequential geostatistical inverse approach was used to interpret the data yielded from hydraulic tomography in our sandbox experiments. Details of the approach are given by Yeh and Liu . Here only a brief description is provided.
 Our approach is composed of two steps. First, the successive linear estimator (SLE) is employed for each data set. This estimator starts with the classical cokriging technique using observed conductivity and head values collected in one pumping test during the tomography to create a cokriged, mean-removed log conductivity (f, i.e., perturbation of log conductivity) map. However, cokriging does not take full advantage of the observed head values because it assumes a linear relationship between heads and conductivity while the true relationship is nonlinear. To circumvent this problem, a linear estimator based on the difference between the simulated and observed head values is used successively to improve the estimate.
 During the estimation a simultaneous inclusion of all the head/discharge data collected during the tomography can lead to extremely large and ill-conditioned matrices that are difficult to solve [Hughson and Yeh, 2000]. Thus the second step of our approach is to use the head data sets sequentially. That is, once an estimated f field based on a set of head/discharge data is derived, it is employed along with the head data sets collected from the next pumping operation to obtain the next estimated f field. During this new estimation the conditional effective parameters, the covariances, and the cross-covariances derived from previous estimation are propagated to evaluate the weights of our new estimate [Li and Yeh, 1999]. In essence, our sequential approach uses the estimated hydraulic conductivity field and covariances, conditioned on previous sets of head measurements, as prior information for the next estimation based on a new set of pumping data. It continues until all the data sets are fully utilized. Such a sequential approach allows for the accumulation of high-density head information obtained from hydraulic tomography, while maintaining the covariance matrix at a manageable size that can be solved with minimal numerical difficulty. Vargas-Guzman and Yeh  provided proof of the validity of such a sequential approach for linear systems.
4. Inputs to the Inverse Model
 To solve the inversion problem, the sandbox was discretized into 1066 elements with dimensions of 1.95 cm × 3.2 cm × 1.95 cm. Both sides and the top boundary were set to be constant head boundary conditions, while the bottom boundary of the sandbox was considered a no-flow boundary.
 Inputs to our inverse model include the effective conductivity, the variance and the correlation scales of hydraulic conductivity, pressure head/discharge data sets, and available point measurements of conductivity. The procedures we used to obtain the required input parameters are discussed below.
4.1. Effective Hydraulic Conductivity
 Two approaches were used to obtain the effective hydraulic conductivity of the sandbox, Keff. In the first approach the geometric mean was computed to approximate Keff. Since the conductivity of each sand used in the sandbox was measured and their spatial distributions were known, a simple calculation determines the geometric mean. That is,
where Kg is the resulting geometric mean conductivity, n represents the total number of elements, and Ki is the hydraulic conductivity of each element. However, this method does not consider the effect of packing, and more importantly, the flow dynamics of the system. A more desirable approach to determine the effective hydraulic conductivity is by the use of model calibration. Specifically, the heterogeneous sandbox was considered as an equivalent homogeneous system. By adjusting the conductivity value of this fictitious medium to minimize the discrepancy between the simulated and the observed head values, the effective conductivity was obtained. Two criteria, L1 and L2, as described below, were used to evaluate the goodness of fit between the simulated head responses and the observed ones:
where Hi and represent the observed and simulated pressure head, respectively.
 The approach described above was applied to the simulated pressure responses at all 1066 elements based on the known f field subject to pumping at a given location. The contour map of the head distribution due to pumping at element number 603 (x = 55.575 cm, y = 1.6 cm, z = 28.275 cm) of the heterogeneous field of sandbox 2, and that of the equivalent homogeneous field, are plotted in Figure 4a. The corresponding scatterplot of the pressure heads is shown in Figure 4b. Applications of this approach to the head distribution induced by pumping show that the effective conductivity value varies with the pumping location. This variation may be attributed to the nonergodic flow condition due to the simple heterogeneous structure in the sandbox. Consequently, the final Keff value was determined by taking the average of these values. Table 1 tabulates the Keff values obtained from pumping at elements number 341 (x = 24.375 cm, y = 1.6 cm, z = 16.575 cm), number 603 (x = 55.575 cm, y = 1.6 cm, z = 28.275 cm), and number 767 (x = 55.575 cm, y = 1.6 cm, z = 36.075 cm). Averaging the three Keff values yields the final Keff of 0.1242 cm s−1.
Table 1. Summary of Keff Values Computed Using Synthetic Data for Sandbox 2
Keff, cm s−1
 We also applied the same procedure to the pressure head data collected in the laboratory sandbox experiments. However, there are fewer points in the laboratory data (21 points) than in the synthetic data (1066 points). Estimated effective conductivity values for each pumping location are given in Table 2. The averaged Keff value is 0.0783 cm s−1. The estimated Keff values using both simulated and observed heads were employed in our analysis for sandbox 2. The effect of this input parameter on our f final estimate is discussed in section 5.
Table 2. Summary of Keff Values Computed Using Laboratory Data for Sandbox 2
Keff, cm s−1
4.2. Other Inputs
 The variance and the correlation scales of the conductivity field (our inverse model assumes an exponential correlation structure) are also required input to our inverse model. Table 3 lists values of these statistical parameters used in our inverse analysis for both sandbox 1 and sandbox 2.
Table 3. Inputs Specification for Sandbox 1 and Sandbox 2
 Estimation of the variance always involves uncertainty. Our previous numerical study [Yeh and Liu, 2000], however, has demonstrated that the variance has negligible effects on the estimated hydraulic conductivity using our inverse model. Therefore an estimate of the variance was obtained based on the known conductivity distribution and measured conductivity of each sand.
 Correlation scales represent the average size of heterogeneity that is critical for analyzing the average behavior of aquifers. Correlation scales of any geological formation are difficult to determine in general. The effects of uncertainty in correlation scales on the estimate based on the tomography are negligible because the tomography produces a large number of head measurements, reflecting detailed site-specific heterogeneity [Yeh and Liu, 2000]. Therefore the correlation scales were approximated based only on the average thickness and length of the heterogeneity.
5.1. Sandbox 1
 The “true” mean-removed natural log of the conductivity (the “true” f field) in sandbox 1 is depicted in Figure 5a, and the estimated f field based on the head data set produced by pumping at the first location is plotted in Figure 5b. Figure 5c shows the estimated f field using the head data set obtained by pumping at the second location, in addition to the data set used in Figure 5b. The final estimate, based on the head data sets from the pumping at the third location and those used in Figures 5b and 5c, is illustrated in Figure 5d. Note that the true f field depicted in Figure 5a is our conceptualization of the conductivity distribution based upon the geometry of the layered sands and conductivity measurements of the sand samples. This field may not correspond to the actual conductivity field in the sandbox. According to the figures the estimated f field upon the sequential inclusion of data sets, nevertheless, gradually resembles the true field at the region where we had pressure head measurements. At the upper and lower portions of the sandbox, where measurements were not available and the model boundaries were nearby, the estimate of f was poor. Several factors could be responsible for the poor estimates: a lack of pressure measurements at these regions, boundary effects, and uncertainties in the input data.
 To investigate possible causes of the poor estimate, numerical experiments were conducted. First, we assumed that the true f field is identical to the one shown in Figure 5a. Forward simulations were then carried out using the same pumping rate, pumping locations, and boundary conditions. Afterward, three error-free head data sets were collected from the simulations at the monitoring locations identical to those in the sandbox experiments. Our inverse model was then employed to estimate the f field using these synthetic data sets. The final estimate is displayed in Figure 6a. Compared to Figure 5d (the final estimated f field using the laboratory data), Figure 6a is a better estimate because the synthetic data sets do not involve any uncertainty inherent in our laboratory data sets. However, at the region where there are no pressure measurements, the f estimate is still poor. This indicates that some point measurements are lacking and may thus be necessary at this region. To substantiate this speculation, simulated head data were collected at two more measurement points of the top and the bottom of the two columns. Consequently, 21 pressure head measurements instead of 13 were sampled and were subsequently used in our inverse model. The resultant f estimate is shown in Figure 6b, which indicates that the increase of pressure measurements significantly improves the final f estimate.
 Since the additional pressure measurements improved the f estimate, we then tested the inversion with the maximum number of pressure measurements along the two columns. In this case, the inversion employed 51 pressure head measurements instead of 21 for each pumping operation. As expected, the abundant point measurements significantly improved our estimate of the heterogeneity in the sandbox. More importantly, we observed that in this case, only one pumping test was necessary to closely reproduce the true f field, and the inclusion of additional data sets from pumping at the other locations did not improve the estimate. Figure 6c illustrates the estimated f field using 51 pressure head measurements corresponding to pumping at element number 603 (x = 55.575 cm, y = 1.6 cm, z = 28.275 cm) only. This result suggests that in a layered aquifer, hydraulic tomography is not necessary if one can collect a large number of closely spaced pressure measurements during one pumping test.
 Criteria similar to (2) and (3) were also employed to quantify the success of our estimated f fields. The values for L1 and L2 associated with these estimated fields for sandbox 1 are listed in Table 4. The smaller the values were of L1 and L2, the better were our estimates.
Table 4. L1 and L2 Values for Estimates of Sandbox 1
 With the help of synthetic data sets the causes of poor estimates based on the actual laboratory data appeared to be diagnosed. While a lack of pressure head measurements on the upper and lower portion of the sandbox was one reason for the poor estimates, the boundary effects during experiments could not be excluded. Specifically, the design of sandbox 1 was not able to keep the upper boundary condition at the specified level during pumping tests. In order to reduce this boundary effect several improvements were made in the design of sandbox 2. Details are discussed in section 2.3.
5.2. Sandbox 2
 Because sandbox 2 has a more complex heterogeneous structure, data sets from five sequential pumping tests were used for the inversion. Figure 7 illustrates the comparison of the true f field and those produced from the successive inclusion of the five data sets. The conductivity of 0.0783 cm s−1 determined from laboratory data was used as the effective conductivity in this inversion. As shown in Figure 7, the estimated f field progressively resembles the true one and successively reveals more details of heterogeneity.
 To investigate the effects of the uncertainty associated with effective conductivity, another inversion was conducted using 0.1242 cm s−1 as the effective conductivity while keeping other inputs the same. The result is shown in Figure 8. Comparing Figure 7f with Figure 8 (two final f estimates corresponding to the use of two different effective values), we find that the resulting conductivity is different in magnitude but the major heterogeneous patterns are similar.
 An inversion was also conducted using synthetic data sets. Again, the synthetic data sets were sampled from the pressure head fields derived from flow simulations based on the conceptualized true f field under the same conditions as those in the laboratory. Figure 9 displays the final inversion result. Comparing Figure 7f and Figure 8 to Figure 9, we find that all three figures adequately captured the major features of the heterogeneity of sandbox 2 (the four lenses of fine sand contained in the medium sand matrix). However, Figure 9 appears to have a slightly better result over the entire domain for the reason that synthetic data sets do not contain any measurement errors and other uncertainties. Again, please note that the exact true f field in sandbox 2 is unknown except for the general pattern. Considering the fact that measurement errors and some uncertainties in boundary conditions and input parameters are inevitable, the result (Figure 7f) of our inverse modeling is exciting.
 To evaluate the efficiency of hydraulic tomography under the conditions of sandbox 2, numerical experiments were conducted. These experiments considered measurements taken at five columns with 26 locations each, instead of the previous laboratory configuration of two columns with 11 locations each, and kept other conditions the same. Therefore, for each pumping operation, head responses at 129 locations were collected. This large amount of secondary information dramatically improved the final estimated f field (Figure 10). The values for L1 and L2 associated with these estimated fields for sandbox 2 are listed in Table 5. It is interesting to observe that the 129 head responses generated by a single pumping event in sandbox 2 did not reproduce the true f field as effectively as the 51 pressure head measurements in sandbox 1. The discontinuous and nonuniform nature of the sand structures in sandbox 2 explains the difference. Consequently, multiple head/discharge sets obtained by hydraulic tomography are necessary to produce a more detailed hydraulic conductivity distribution in this case.
Table 5. L1 and L2 Values for Estimates of Sandbox 2
 The performance of our sequential inverse approach for hydraulic tomography was evaluated using two sandbox experiments. One sandbox was packed with layered sands to represent a stratified aquifer. The other was packed with discontinuous sand bodies of different shapes and sizes to represent a more complex and realistic heterogeneous aquifer. For both sandbox experiments our inverse model was able to reproduce the major heterogeneous patterns. The results show that our approach works well under realistic conditions, in spite of measurement errors and uncertainties associated with the pressure head/discharge data sets and other input parameters required by our model.
 Results of our analysis indicate that in the cases we investigated, hydraulic tomography does not improve the conductivity estimate significantly if abundant head measurements are available. This is especially true for a stratified aquifer system represented in sandbox 1. Hydraulic tomography can be useful and effective when pressure head measurements are not available at a large number of sample locations and when aquifer heterogeneity exhibits a highly discontinuous and nonuniform nature as presented in sandbox 2.
 Paralleled numerical experiments were useful in this study. Not only did they assess the effects of measurement errors and uncertainties associated with laboratory data, but also helped to diagnose the causes of those poor estimates. This diagnosis improved our design of sandbox 2.
 Our well-controlled sandbox experiments tested the effectiveness of our inverse method for realistic problems where measurements are inherently imperfect. Our successful laboratory verifications of the inverse approach are a step toward applications of our inverse model to field-scale problems.
 The authors are grateful for useful comments by Mauro Giudici and Michael H. Young. This research is funded in part by a DOE EMSP96 grant through Sandia National Laboratories (contract AV-0655 1) and a DOE EMSP99 grant through University of Wisconsin, A019493, and in part by an EPA grant R-827114-01-0. This work does not necessarily reflect the views of DOE and EPA, and no official endorsement should be inferred.