## 1. Introduction

[2] Subsurface systems pose some of the most challenging characterization and modeling problems in science with significant environmental, public health, and energy security implications. The main difficulties in understanding and modeling subsurface phenomena is related to inaccessibility and heterogeneity of geologic formations, together with the complex interactions between fluids and rocks over a wide range of temporal and spatial scales. Consequently, significant uncertainty is introduced into predictions of the related flow and transport processes, thereby complicating the development of subsurface hydrological, energy, mineral, and environmental resources.

[3] Parallel to advances in numerical forward modeling [*Peaceman and Rachford*, 1955; *Aziz and Settari*, 1979], significant progress is made in inverse modeling to integrate diverse and disparate data sets into numerical models of complex groundwater and hydrocarbon reservoirs [e.g., *Hill and Tiedeman*, 2007; *Oliver et al*., 2008]. A particularly important aspect of inverse modeling is quantification of uncertainty that results mainly from data scarcity and lack of an adequate understanding and modeling of the involved physical processes and subsurface heterogeneity [e.g., *Moore and Doherty*, 2005; *Hendricks Franssen et al*., 2009; *Blazkova and Beven*, 2009; *Gotzinger and Bardossy*, 2008; *Solomatine and Shrestha*, 2009; *Thyer et al*., 2009; *Tonkin and Doherty*, 2009; *Zhang et al*., 2008].

[4] The dynamic response of an aquifer to forced disturbances contains valuable information pertaining to both local and global trends in aquifer hydraulic properties and their connectivity. Constraining subsurface flow models to dynamic response measurements of head, concentration, or flow rates is more involved than integration of static data. This complexity is primarily attributed to the nonlinearity and computationally complexity in mapping input model parameters (model space) onto dynamic aquifer response (data space), together with the integral (spatially averaged) and sparse nature of the available data. Over the last several decades, various deterministic and probabilistic inversion techniques have been developed and applied to solve subsurface flow-model calibration problems [e.g., *Sun*, 1994; *de Marsily et al*., 1999; *Carrera et al*., 2005; *Yeh et al.,* 2007; *Hill and Tiedeman*, 2007; *Oliver et al*., 2008].

[5] Deterministic inverse methods seek a single “best” solution by minimizing a suitable cost function that penalizes the discrepancies between predicted and observed dynamic and static data, as well as departure from direct and/or indirect prior information about the solution. Inference of heterogeneous hydraulic rock properties such as spatial distribution of permeability from flow measurements typically leads to ill-posed nonlinear inverse problems that have multiple solutions and provide different flow and transport predictions [*Yeh*, 1986; *Carrera and Neumann*, 1986a-1986c; *Carrera*, 1987; *McLaughlin and Townley*, 1996; *de Marsily et al*., 1999; *Carrera et al*., 2005; *Hill and Tiedeman*, 2007; *Oliver et al*., 2008]. Probabilistic methods, on the other hand, address the issue of nonuniqueness and uncertainty quantification by characterizing the solution of an inverse problem in terms of probability distributions. The Bayesian inversion theory provides an elegant framework for combining prior model parameter distributions with observed model responses [*Tarantola*, 2004]. A practical approach to apply the Bayesian inversion to large-scale nonlinear inverse problems is Monte Carlo approximation of the posterior distribution using a finite number of samples. This approach has become particularly popular primarily due to availability of powerful and inexpensive computational resources, development of relatively simple ensemble model calibration techniques, and suitability for systematic uncertainty quantification and risk assessment analysis [*Sahuquillo et al*., 1992; *LaVenue et al*., 1995; *RamaRao et al*., 1995; *Gomez-Hernandez et al*., 1997; *Sambridge and Mosegaard*, 2002; *Lorentzen et al*., 2003; *Nævdal et al*., 2005; *Chen and Zhang*, 2006; *Wen and Chen*, 2006; *Nowak*, 2009].

[6] Many of the existing inversion techniques are suitable for calibrating flow models with spatially distributed parameters that are amenable to second-order statistical characterization. Although conventional second-order (two-point) geostatistics is widely applied to represent the variability in spatial distribution of hydraulic properties in groundwater models, the connectivity structures in some geologic formations such as meandering fluvial channels are far too complex to model using second-order descriptions. Popularity of variogram-based modeling techniques is rooted more in their mathematical simplicity, computational efficiency, and implementation ease than in their geological interpretation and realism. However, many complex geologic structures such as those containing discrete geologic objects with sharp discontinuities across facies boundaries cannot be described with two-point statistical techniques [e.g., *Gomez-Hernandez and Wen*, 1998; *Deutsch and Journel*, 1998; *Carle et al*., 1998; *Western et al*., 2001; *Zinn and Harvey*, 2003; *de Marsily et al*., 2005]. Of particular importance in subsurface flow and transport are the extreme phenomena that induce preferential flow paths (e.g., channels and fractures) or flow barriers (e.g., thin shale layers) that can dominate the behavior of local and global flow regimes. These complex extreme features do not lend themselves to conventional second-order geostatistical modeling descriptions. In addition, stochastic processes with distinctly different higher-order statistics can sometimes be indistinguishable when only their second-order characterization is considered, indicating the importance of higher-order statistics in describing geologic formation with more complex spatial connectivity [*Strebelle*, 2002; *Caers et al.,* 2002].

[7] Two common approaches for generating multiple realizations of geologic facies that honor a prior statistical representation and various types of measured and interpreted data are pixel-based approaches such as sequential indicator simulation [*Journel*, 1983; *Isaaks*, 1990; *Srivastava*, 1992; *Goovaerts*, 1997; *Chiles and Delfiner*, 1999] and object-based (Boolean) methods, e.g., *marked point process*, that are better able to describe the continuity in geobodies with well-defined shapes [*Haldorsen and Lake*, 1984; *Stoyan et al*., 1987; *Deutsch and Wang*, 1996; *Holden et al*., 1998]. Object-based methods, however, lack the flexibility of grid-based simulation techniques, rendering the data integration aspect particularly cumbersome.

[8] Multiple-point statistics (MPS) [*Guardiano and Srivastava*, 1993; *Strebelle*, 2002; *Caers and Zhang*, 2004] presents a grid-based pattern-imitating simulation method to model complex geological connectivity that are not amenable to variogram-based modeling techniques. Instead of using merely point-to-point statistical correlations, MPS accounts for the higher-order statistics captured by multiple-point patterns in a prior training image (TI). Because of its grid-based implementation, conditioning MPS realizations to facies measurement at well locations and soft (e.g., 3-D seismic) data is not difficult [*Strebelle*, 2002; *Journel*, 2002; *Remy et al*., 2009]. However, calibrating the output of MPS simulation against dynamic flow data remains an important research area.

[9] The nonlinear and indirect relation between hydraulic properties and dynamic flow data presents the main difficulty in constraining MPS simulation results to reproduce flow measurements. In recent years, several authors have proposed alternative approaches to address the problem of conditioning non-multi-Gaussian fields to flow data [*Sarma et al*., 2008; *Jafarpour and McLaughlin*, 2008, 2009a; *Capilla and Llopis-Albert*, 2009; *Sun et al*., 2009; *Alcolea and Renard*, 2010; *Zhou et al.,* 2011; *Mohammad-Khaninezhad et al*., 2012a]. *Sarma et al*. [2008] apply a nonlinear parameterization to the permeability field via kernel principle component analysis to preserve the higher-order statistics of the prior model during calibration. *Jafarpour and McLaughlin* [2008, 2009a] applied discrete cosine parameterization with ensemble Kalman filter (EnKF) to improve facies continuity and reduce dimensionality. *Capilla and Llopis-Albert* [2009] present a gradual deformation-based inverse method for conditioning transmissivity fields to various static and dynamic data types. *Sun et al*. [2009] use an EnKF with grid-based localization and a Gaussian mixture-model clustering to update multimodal parameter distributions from dynamic data. They consider block updating and dimension reduction to reduce the computational costs of their proposed schemes and report improved performance over the regular EnKF implementation. *Alcolea and Renard* [2010] use a blocking moving window algorithm for conditioning MPS simulations to hydrogeological data such as connectivity constraints and heads. *Zhou et al*. [2011] report EnKF performance improvement by applying a normal-score transform to the original state vector to ensure univariate Gaussianity prior to update and to preserve the univariate (non-Gaussian) prior statistics after the update (via a back transformation). *Mohammad-Khaninezhad* et al. [2012b] apply the sparse K-SVD dictionary for reconstruction of geologic models from dynamic flow data and show that the method is robust against prior uncertainty and is able to preserve the geologic continuity in the prior model.

[10] An alternative approach is to incorporate the nonlinear dynamic flow data into the MPS simulation algorithm to generate conditional facies realizations. The probability perturbation method of *Caers and Hoffman* [2006] uses a parameterization of the simulation probabilities to condition the MPS facies realizations on flow data. This approach can have a slow convergence due to a lack of direct feedback mechanism to adapt and improve the predictive performance of subsequent facies realizations. *Mariethoz et al*. [2010] present an iterative spatial resampling as a general transition kernel to preserve the prior spatial model during conditional simulation. In a recent paper [*Jafarpour and Khodabakhshi*, 2011], we introduced a probability conditioning method (PCM) that is used to condition facies simulation from a given TI to nonlinear dynamic flow measurements. We showed that although EnKF update data do not preserve the categorical (discrete) nature of MPS facies realizations, they may be applied to infer probabilistic information about facies distribution in space (i.e., a facies probability map) from flow data. We were able to use the obtained facies probability maps and used them to guide pattern-based MPS facies simulation from a *known* TI and draw conditional facies samples that reproduce the observed dynamic measurements.

[11] A standing challenge in MPS-based model calibration, however, is the uncertainty in the prior TI. This issue becomes particularly important considering the strict pattern-imitating nature of MPS simulation that restricts the spatial variability of the resulting facies to the structural connectivity and encoded patterns in the given TI. Specifically, realization of facies maps from TIs with different structural connectivities can exhibit distinctly different flow and transport prediction, which can be detrimental for development planning.

[12] The main objective of this paper is to develop an adaptive sampling strategy when multiple TIs are used to acknowledge the uncertainty in the geologic continuity model. A key question to address is how to identify and sample from relevant TIs in a list of candidate prior TIs. We introduce a Bayesian mixture-modeling algorithm for generating conditional facies realizations from multiple uncertain TIs. Data scarcity and low resolution, together with errors in geologic modeling and imperfect assumptions can leave significant uncertainty in interpretation of the existing patterns in a prior TI model. Figure 1 shows a satellite view of a section of Mississippi River near Baton Rouge. The river structure, orientation, and thickness vary in different regions, implying that the distribution of naturally occurring features, such as the fluvial systems, can be too complex to represent with a single stationary TI. As depicted in Figure 1, the consistent TI for the fluvial system inside the left box is different from that on the right, even though the two sections of the river are close to each other. Underground fluvial or turbidite systems portray a similar complexity. One approach to deal with the uncertainty in describing the geologic continuity in a TI is to consider several TIs that capture the full range of geologic variability for a given formation. These TIs could be obtained based on different plausible geological scenarios, for example, from independent interpretations by different geologist or by stochastic treatment of parameters in a geologic modeling study that is used to identify possible connectivity patterns in the formation.

[13] We combine a Bayesian mixture model with the PCM for adaptive conditional sampling from multiple TIs. This is accomplished by initially generating unconditional facies realizations from multiple TIs using an initially equal weight for each. The TI weights are then updated based on their predictive performance (likelihood function). For conditional sampling, we first convert the dynamic flow data into a facies probability map using the PCM presented by *Jafarpour and Khodabakhshi* [2011]. We then incorporate the generated probability map as input into MPS simulation to draw new conditional facies samples from each TI according to the weights assigned to each TI based on their likelihood to match the observed data. This leads to an adaptive facies sampling technique where fewer (more) realizations are generated from the TIs with inconsistent (consistent) geologic continuity.