## 1. Introduction

[2] It is well recognized that the local spatial configuration of hydraulic conductivity (K) in heterogeneous geological environments is required for accurate predictions of contaminant transport [e.g., *Poeter and Gaylord*, 1990; *Scheibe and Yabusaki*, 1998; *Wen and Gómez-Hernández*, 1998; *Zheng and Gorelick*, 2003]. Traditionally, aquifer characterization has been based on the analysis of drill cores and/or the results of tracer and pumping experiments; however, these techniques are often inadequate for reliably characterizing heterogeneous aquifers because of an inherent gap that exists between them in terms of resolution and coverage [*Beckie*, 1996; *Hubbard and Rubin*, 2005]. Environmental geophysical methods have the potential to bridge this gap and improve characterization of subsurface variability. A trade-off associated with using such methods, however, is that they are sensitive to, and therefore give us information regarding, geophysical properties in the subsurface and not directly the hydrological properties of interest. As a result, a number of studies have attempted to link geophysical and hydrological variables through a variety of approaches, including the development of petrophysical relationships at the laboratory scale [e.g., *Archie*, 1942; *Topp et al.*, 1980; *Mavko et al.*, 1998], the numerical upscaling of such relationships to the field scale [e.g., *Moysey and Knight*, 2004; *Moysey et al.*, 2005; *Singha and Gorelick*, 2006; *Singha et al.*, 2007], and the use of statistical techniques such as cokriging of field-estimated collocated geophysical and hydrological properties [e.g., *Doyen*, 1988; *Cassiani et al.*, 1998]. Unfortunately, while relationships between a specific geophysical property and those of interest to hydrologists may exist on a scale-, site-, and/or facies-specific basis, they are often complicated, nonunique, and difficult to establish [*Day-Lewis et al.*, 2005; *Singha et al.*, 2007].

[3] To deal with the inherent difficulties associated with using geophysical methods to quantify hydrological properties as mentioned above, a number of approaches have been presented. One of these involves the use of multiple geophysical survey data, combined with either statistical regression analysis of collocated hydrological data and/or integrated petrophysical models, to reduce the estimation uncertainty associated with the use of a single geophysical method alone [e.g., *Ezzedine et al.*, 1999; *Chen et al.*, 2001; *Hubbard et al.*, 2001; *Garambois et al.*, 2002; *Linde et al.*, 2006a]. Another approach involves the use of one or more inverted geophysical data sets to divide or cluster the subsurface into a small number of zones having similar combinations of geophysical properties [e.g., *Beres and Haeni*, 1991; *Hyndman and Harris*, 1996; *Mukerji et al.*, 2001; *Moysey et al.*, 2003; *Tronicke et al.*, 2004]. Such zones are then assumed to represent different lithologies and possess distinct hydrological properties, which can be estimated either through the inversion of hydrological test data [e.g., *Hyndman et al.*, 1994, 2000; *McKenna and Poeter*, 1995; *Hyndman and Gorelick*, 1996; *Linde et al.*, 2006b], or through the analysis of collocated borehole hydrological measurements [e.g., *Paasche et al.*, 2006].

[4] Another promising means of using geophysical methods more effectively for hydrological characterization, and our focus in this paper, involves acquiring time-lapse geophysical data as changes occur in an aquifer as a result of some form of hydrological stress manifested as, for example, changes in soil saturation or the transport of solutes in the subsurface. Although geophysical data collected statically may provide little information regarding the distribution of a particular hydrological property, a set of dynamic data that are sensitive to changes in hydrological state variables, such as water content or salinity, can be much more uniquely tied to this distribution through the underlying hydrological process model [*Binley et al.*, 2002; *Kemna et al.*, 2002; *Binley and Beven*, 2003; *Day-Lewis et al.*, 2003; *Cassiani et al.*, 2004; *Lambot et al.*, 2004; *Cassiani and Binley*, 2005; *Singha and Gorelick*, 2005; *Koestel et al.*, 2008; *Chen et al.*, 2009]. One increasingly common way of utilizing such time-lapse geophysical measurements is through coupled or integrated inversion, where the numerical models for the geophysical and hydrological processes are linked together such that the geophysical data are inverted directly for the hydrological properties of interest. This research has been ongoing for petroleum applications [e.g., *Huang et al.*, 1997; *Kretz et al.*, 2004; *Wen et al.*, 2006] and has more recently become popular in hydrology [e.g., *Kowalsky et al.*, 2004, 2005; *Lambot et al.*, 2006; *Finsterle and Kowalsky*, 2008; *Jadoon et al.*, 2008; *Looms et al.*, 2008; *Lehikoinen et al.*, 2009; *Hinnell et al.*, 2010]. Coupled inversion has the significant advantage over separated or uncoupled inversion strategies in that it avoids the formation of geophysical images, which are subject to inversion artifacts and depend on the regularization of the geophysical inverse problem, both of which can significantly affect the hydrological estimates obtained [*Day-Lewis et al.*, 2005; *Ferré et al.*, 2009; *Hinnell et al.*, 2010]. However, while the coupled inverse problem has been an important step forward in quantifying hydrologic parameters, it has to a large extent been considered only within a deterministic or quasi-deterministic inversion framework, which does not allow for adequate exploration of the often strongly nonlinear and nonunique nature of the coupled system and corresponding model parameter and prediction uncertainties.

[5] In recent years, a number of papers have appeared in the geophysical literature that treat the complex data integration and inversion problem for spatially distributed subsurface properties in a fully stochastic manner using Bayes' Theorem [e.g., *Mosegaard and Tarantola*, 1995; *Bosch*, 1999; *Aines et al.*, 2002; *Eidsvik et al.*, 2002; *Ramirez et al.*, 2005]. Once thought computationally impractical, the results in these papers have demonstrated that with modern computational resources and state-of-the-art forward simulation and sampling algorithms, such stochastic data integration is feasible for real-world problems. In Bayesian inversion, the solution to the inverse problem is described as a joint posterior probability distribution for all model parameters, which is obtained by updating a prior distribution for these parameters using likelihood functions corresponding to the available sources of data. Samples from the posterior distribution (i.e., multiple feasible configurations of subsurface properties) can then be generated numerically using Markov Chain Monte Carlo (MCMC) sampling, as in general, explicit analytical expressions for this distribution are unavailable owing to the complexity of the associated forward models. When taken together, these posterior samples represent our uncertainty regarding the subsurface environment, and they can be used to make predictions within a stochastic context. Bayesian MCMC methods are naturally suited to dealing with the important issues of data worth and integration. They are also flexible in that they can incorporate any information that can be posed within a probabilistic framework. Although such methods have been applied in a wide variety of fields for many years, they have seen limited use in the field of hydrogeophysics.

[6] In this paper, we investigate the use of a Bayesian MCMC approach for the coupled inversion of saline tracer test concentration measurements and time-lapse electrical resistivity (ER) data for the purpose of estimating the spatial configuration and connectivity of K in the context of predicting solute transport. This is done between two boreholes in a saturated heterogeneous aquifer, and tested on two complex K fields having different facies correlation lengths. We focus here on testing the methodology numerically such that fundamental issues associated with the data integration and inversion can be examined in the case where the true subsurface model is known. Our research has conceptual similarities to other recent work on the fully stochastic inversion of dynamic hydrogeophysical data, in that we test randomly generated sets of model parameters with regard to how well they predict measurements, and then accept or reject them accordingly [*Binley and Beven*, 2003; *Cassiani et al.*, 2004; *Cassiani and Binley*, 2005; *Looms et al.*, 2008; *Hinnell et al.*, 2010]. However, an important difference is that we address here the problem of estimating a complex spatial distribution of subsurface properties, whereas most other work has focused on the determination of a small number of average parameters. In the studies cited above, for example, 1-D flow models were considered and continuous, uncorrelated prior parameter distributions could be assumed without overloading computational resources. In contrast, because we consider a substantially larger number of model parameters and perform forward simulations in multiple dimensions, simplification strategies are required to make the stochastic inverse problem computationally tractable. To this end, we explore a facies-based parameterization in our work. Another key difference between this and previous related efforts is that we account for uncertainty in the relationship between solute concentration and resistivity to address the fact that field-scale petrophysical relationships are difficult to establish. With few exceptions [*Kowalsky et al.*, 2005; *Finsterle and Kowalsky*, 2008], most previous work has considered such petrophysical relationships to be known and precise.

[7] The paper proceeds as follows. First, we outline briefly some general concepts of the Bayesian MCMC methodology used. Next, we describe our numerical experiment and the details of how we implement this methodology to integrate concentration and time-lapse resistivity data simulated from the two “true” K distributions. Finally, for each of the two cases, we examine the simplified binary K realizations generated from (1) the prior distribution, (2) the posterior distribution obtained by incorporating only the concentration measurements, (3) that obtained by incorporating only the resistivity measurements, and (4) that obtained using both the concentration and resistivity measurements. A key part of our testing is model validation. In this regard, the sets of realizations are evaluated against the corresponding true facies distributions to explore their ability to identify important transport pathways, and then in terms of their ability to predict a different solute injection/extraction experiment in the subsurface region.