## 1. Introduction

[2] The use of reduced-order models (ROMs) is becoming popular in approximating complex and computationally expensive numerical models, such as the simulation of subsurface flow and transport processes [*Vermeulen et al*., 2004; *Markovinović and Jansen*, 2006; *Razavi et al*., 2012]. These processes often occur in a large three-dimensional heterogeneous domain, may involve multiple fluid phases, and are governed by nonlinear flow and transport equations that are computationally expensive to solve. The issue of computational cost is even more serious in resources management optimization, inverse modeling, and uncertainty quantification in which multiple forward simulations of these processes are needed.

[3] Model reduction is first introduced in the analysis of turbulent flow [*Lumley*, 1967; *Berkooz et al*., 1993]. An earlier detailed review of model reduction with the Karhunen-Loève (KL) expansion can be found in *Newman* [1996]. The idea of model reduction is to constrain the solution of the forward model to a subspace of the actual solution space (e.g., Galerkin projection). The subspace usually has a much lower dimension so that at the cost of minor accuracy loss, the computational saving in finding the reduced-order solution instead of the exact solution is substantial. Further, the approximation error is manageable and often trivial for field applications that involve a fair amount of noise in measurement data. To construct the subspace, one first computes with the full model a number of state variable (e.g., pressure) distributions, or snapshots, in the space domain for a variable set of times or parameters. The snapshots are then orthogonalized to remove redundant information and form a basis set to which the reduced-order solution will be projected.

[4] ROMs have been developed for single-phase and multiphase flow problems. For example, *Cazemier et al*. [1998] use proper orthogonal decomposition (POD) and Galerkin projections in driven cavity flow simulations. *Vermeulen et al*. [2004] use similar strategies for groundwater flow simulations. *Markovinović and Jansen* [2006] use model reduction to construct solution predictors for iterative solution of multiphase flow problems in porous media. *Cardoso and Durlofsky* [2010] construct linearized ROMs for multiphase subsurface flow problems. *Siade et al*. [2010] propose methods for snapshot selection on the time axis for groundwater flow model reduction, and each snapshot is the hydraulic head configuration in the whole space domain at a certain time. More recently, *Razavi et al*. [2012] categorize model reduction as a type of surrogate modeling strategy and provide a review of its use in water resources applications.

[5] ROMs have also been used in parameter estimation and inverse modeling, and most of their use can be summarized in two ways. The first way is to select in an efficient manner candidate samples for Monte Carlo (MC) sampling, especially Markov chain Monte Carlo (MCMC) sampling. The more accurate a ROM is, the higher the probability that a poor candidate will be filtered out by the ROM before checking it with the full model. For instance, *Lieberman et al*. [2010] propose an MCMC algorithm with model reduction for statistical inverse problems of groundwater flow. In their work, the authors construct a reduced steady-state model with the objective of covering the parameter space by choosing parameter sets via a greedy algorithm that uses the prior information of parameters. *Watzenig et al*. [2011] use ROMs in electrical capacitance tomography to accelerate MCMC sampling. The other way is to find the optimal parameters that meet a certain set of criteria such as data fitting and spatial smoothness requirements in a calibration-by-optimization mindset. In field applications, the approximation error from ROMs is often smaller than the noise in measurement data so that the parameter estimation is almost not affected. For example, *Park et al*. [1999] use model reduction to solve inverse heat transfer problems that involve unknown time-varying heat source functions. *Vermeulen et al*. [2005] use model reduction in inverse modeling of groundwater flow problems with six to nine unknown parameters. *Siade et al*. [2012] use a ROM with quadratic programming to estimate up to 15 parameters in groundwater flow problems. *Pasetto et al*. [2011] use the POD to reduce the computational cost associated with Monte Carlo uncertainty analysis in a problem of groundwater flow driven by randomly distributed recharge. In all of these applications, the number of parameters estimated is very small, and the way in which snapshots are selected in the parameter domain makes the construction of ROMs computationally prohibitive and hence the approximation inaccurate when the number of parameters is large.

[6] In this work, we apply model reduction to underdetermined geostatistical inverse problems in which a large number of unknown parameters are estimated using a limited amount of measurements. We propose a new way of snapshot selection for such underdetermined inverse problems based on the low-dimensional solution space of linear underdetermined geostatistical inverse problems. With this method, the number of snapshots is proportional to the dimension of the solution space, which is often *m* + *p*, where *m* is the number of measurements and *p* is the number of unique mean values in the parameter domain (1 if we assume a homogeneous mean field). Since the solution space is spanned by the cross-covariance of measurements and parameters, we refer to this ROM as the geostatistical reduced-order model (GROM).

[7] This paper is arranged as follows. In section 2, we first review concepts and discuss methods in model reduction and geostatistical inverse modeling with an example in groundwater flow. Then we develop efficient snapshot selection strategies for underdetermined geostatistical inverse problems. In section 3, we use numerical examples in groundwater flow to (1) examine the accuracy of the new snapshot selection strategy and (2) test efficacy and efficiency of the new GROM in estimating synthetic Gaussian parameter fields with noise-free numerical data. In section 4, we apply the GROM to steady-state hydraulic tomography to estimate non-Gaussian parameter fields with noisy laboratory-collected hydraulic data. We then discuss the results in section 5.