## 1. Introduction

[2] Our knowledge of the spatial distribution of the physical properties of geologic formations is often uncertain because of ubiquitous heterogeneity and the sparsity of data. Geostatistics has become an invaluable tool for estimating such properties at points in a computational domain where data are not available, as well as for quantifying the corresponding uncertainty. Geostatistical frameworks treat a formation's properties, such as hydraulic conductivity *K*(**x**), as random fields that are characterized by multivariate probability density functions or, equivalently, by their joint ensemble moments. Thus, *K*(**x**) is assumed to vary not only across the real space coordinates **x**, but also in probability space (this variation may be represented by another coordinate ξ, which is usually suppressed to simplify notation). Whereas spatial moments of *K* are obtained by sampling *K*(**x**) in real space (across **x**), its ensemble moments are defined in terms of samples collected in probability space (across ξ). Since in reality only a single *realization* of a geologic site exists, it is necessary to invoke the ergodicity hypothesis in order to substitute the sample spatial statistics, which can be calculated, for the ensemble statistics, which are actually required. Ergodicity cannot be proved and requires a number of modeling assumptions [*Rubin*, 2003, section 2.7, and references therein].

[3] Machine learning provides an alternative to the geostatistical framework, allowing one to make predictions in the absence of sufficient data parameterization, without treating geologic parameters as random and, hence, without the need for the ergodicity assumptions. Intimately connected to the field of pattern recognition, machine learning refers to a family of computational algorithms for data analysis that are designed to automatically tune themselves in response to data. Neural networks [*Bishop*, 1995] are an example of such a class of algorithms that has found its way into hydrologic modeling. While versatile and efficient for many important applications, such as the delineation of geologic facies [*Moysey et al.*, 2003], the theory of neural networks remains to a large extent empirical in this context.

[4] Here we introduce another subset of the machine learning techniques — the Support Vector Machine (SVM) and its mathematical underpinning, the Statistical Learning Theory (SLT) of *Vapnik* [1998] — which is ideally suited for the problem of facies delineation in geologic formations. While similar to neural networks in its goals, the SVM is firmly grounded in rigorous mathematical analysis, which allows one not only to assess its performance but to bound the corresponding errors as well. Like other machine learning techniques, the SVM and SLT enable one to treat the subsurface environment and its parameters as deterministic. Uncertainty associated with insufficient data parameterization is then represented by treating sampling locations as a random subset of all possible measurement locations. Since such a formulation is ideally suited for hydrologic applications, the use of the SVM in the context of subsurface imaging deserves to be fully explored. This letter is the first step in this direction.

[5] We consider an idealized problem of identifying a boundary between two geologic facies from a sparsely sampled parameter *K*. We formulate the problem in Section 2, and provide a brief description of a geostatistical approach to its solution in Section 3. Section 4 introduces Support Vector Machines, which we use in Section 5 to estimate the boundary between the two heterogeneous facies in a simulated problem.