1. Introduction and Literature Review
 Geological carbon sequestration (GCS) has been proposed as a means to reduce greenhouse gas emissions. Department of Energy (DOE's) National Energy Technology Laboratory  estimated that North America could store 900 years worth of carbon dioxide at the current North American emission levels. The most widely available sites are saline aquifers. Supercritical CO2 is injected at a depth of at least 800 m underneath a “caprock” of very low permeability. In the aquifer, CO2 density is lower than the surrounding brine at typical subsurface pressure and temperature conditions. There is a small risk that as the injection goes on, the CO2 plume may reach some fracture in the caprock or abandoned wells with faulty seals, in which case the CO2 can rise, potentially contaminating fresh water aquifers or leaking into the atmosphere. Industrial-scale carbon sequestration will generate a plume that will extend over a horizontal area covering many square kilometers, making it very difficult and costly to monitor for leaks.
 Figure 1 shows a schematic diagram of the sort of CO2 plumes that may develop in the storage formation for different permeability conditions. The great variety demonstrates that it is crucial to have a good understanding of the permeability distribution in order to predict how and where the CO2 plume moves, in order to focus attention on weak areas (like abandoned wells or known fault lines) that the CO2 plume reaches. One means of estimating plume movement is through monitoring wells, but since they are expensive to construct, plume estimation needs to be based on data from relatively few sampling points.
 The goal of this paper is to present an effective and computationally efficient method for obtaining estimates of CO2 and pressure fields from very sparse monitoring data and to demonstrate its effectiveness on a realistic GCS case for which the amount of data available would be spatially very limited. The major steps are as follows: (a) use available geological data to the extent possible to identify permeability zones and ranges of parameter values to describe likely permeability ranges within each zone, (b) formulate one or more models of the site that use a process-based simulation model with a relatively low number of parameters for the zones, (c) estimate the values of the parameters using the time series of pressure data (and possibly CO2 gas saturation) over a short period from only a few monitoring locations using an efficient global optimization method, (d) use the calibrated model to then estimate the current and future pressure field and location of CO2 plume, and (e) utilize this information to determine the region where risk assessment should be focused and consider if there is any evidence that the system is not working properly.
 The approach of using a predictive model with relatively few parameters is necessary because the very limited amount of available data in a GCS situation cannot statistically support the estimation of a large number of spatially distributed parameters. In this paper the model is the TOUGH2 simulation code and the ECO2N module and requires about 2 h/simulation. Therefore, an optimization algorithm for calibration cannot require a large number of simulations. The computational efficiency is achieved in this paper with the use of a surrogate response surface global optimization algorithm, which reduces the number of computationally expensive simulations required to find parameters that are the global minimum of a sum of squared errors (maximum likelihood) function. The sparsity of the monitoring data is addressed by using preexisting geological data and by adopting a zonation strategy that is reasonable given the paucity of spatially distributed monitoring data. The resulting calibrated model is then used to estimate the current and future CO2 plumes for risk assessment. Pressure data are much easier to obtain than CO2 gas saturation so we examine the differences in the accuracy of estimation with and without CO2 gas saturation data. Note that we call CO2 gas saturation the amount of free phase CO2 per unit volume, not dissolved CO2; actually, CO2 will be supercritical, but for brevity, it will be referred to as gas.
 There are multiple issues that make this a very difficult problem: (a) the lack of data requires a model with relatively few parameters; (b) calibration of the GCS model is a problem with multiple local minima [Espinet, 2012], which requires then a global optimization method; and (c) the computational cost of the simulation model forces us to look for a method that requires few simulations, which is not possible with popular heuristic global optimization methods like genetic algorithms, and so requires a more efficient global optimization method like one of the surrogate surface optimization (SSO) methods used here.
 There is no paper in the peer-reviewed journal literature with a previous application of this inverse methodology (including with global optimization) for a computationally expensive (e.g., at least 1 h/simulation), multimodal GCS monitoring, and estimation problem. The global optimization method we use is mathematically proven to converge asymptotically to the global minimum [Regis and Shoemaker, 2007] and has been extended to solve problems with up to 150 decision variables [Regis and Shoemaker, 2013].
1.2. Literature Review
 Weir et al.  and Pruess and Garcia  have carried out numerical simulations of CO2 injection in homogeneous formations, and Doughty and Pruess , Juanes et al. , and Flett et al.  and others numerically simulated three-dimensional heterogeneous formations for CO2 sequestration. The impact of heterogeneity and anisotropy on CO2 plume development, trapping mechanisms, and storage capacity have been examined in Doughty et al. , Doughty , and Green and Ennis-King . The focus of these papers is on physical processes occurring during CO2 storage and the impact of various characteristics of the geological formation on CO2 sequestration capabilities. These detailed numerical models are known as process-based forward models. However, in each case, the geology is an input and is always treated as known.
 In real cases however, the geology (e.g., the spatial distribution and permeabilities of various facies) is not known precisely. This is especially true for saline formations that have not previously had any economic value and so have not been thoroughly characterized. This situation makes inverse methods, in which monitoring data are used to estimate hydrogeological properties, appealing.
 There is a large history of parameter estimation in petroleum engineering that has been expanding at an increasing rate in the last 20 years. We acknowledge the work of Oliver and Chen  for part of this literature search. An overview of algorithms and applications can be found in Makhlouf et al.  and a more recent overview in the book from Oliver et al. .
 Since it is often the case that the available data do not support estimating large numbers of parameters, we need a way to reduce the number of parameters. Approaches include using a linear combination of the original parameters or a zonation coupled with sensitivity-based approach (as chosen in this study), similar to the adaptive multiscale estimation approach [Grimstad et al., 2001]. Zonation is the oldest way to parameterize models, and the first studies can be found in Jacquard and Pain  and Shah et al. . Numerous studies can be found, each with different mathematical approach: Rodrigues , Aanonsen and Eydinov , Cominelli et al. , Zandvliet et al. , Jafarpour and McLaughlin [2009a], and Bhark et al. . A third way to parameterize is based on prior knowledge. Approaches include the use of pilot points [Wen et al., 2006], spline interpolation [Lee et al., 1986], wavelets [Sahni and Horne, 2005], or sparsity information [Jafarpour et al., 2010]. More complex methods and applications can be found in Liu and Oliver , Zhao et al. , and Agbalaka and Oliver .
 Once the parameters to be estimated are defined, optimization algorithms can find the parameter set that gives the best fit. It is important to note that the optimization problem of minimizing the sum of squared errors of a nonlinear model almost certainly has multiple local minima (i.e., is “multimodal”). For example, the sum of squared errors of a quadratic (as the simplest example) model leads to a fourth-order polynomial, which will have two modes in one dimension and as many as 2N in N dimensions. In particular, in our many optimization runs of the carbon sequestration model for different formations and numbers of parameters, we found many local minima, which is corroborated by Finsterle .
 In petroleum reservoir engineering, different approaches for optimization in parameter estimation have been suggested. The oldest way to calibrate a model is manually, but manual calibration can take a great deal of time and does not necessarily result in better prediction ability. A fairly recent study can be found in Agarwal et al. , which took an entire year in order to successfully match 25 years of production data. Doughty et al.  and Daley et al.  manually calibrated numerical models to well logs, pressure measurements, seismic data, and fluid samples, using trial and error to best match field measurements. This only allows a small number of parameters to be estimated. Automatic calibration has become more widely used and can be categorized in several ways, depending on the type of optimization algorithm used.
 Heuristic optimization methods like simulated annealing, genetic algorithms or fuzzy programming have been used on multiphase flow problems [Kobayashi et al., 2008; Serhat and Demiral, 1998; Romero and Carter, 2001; Schulze-Riegert and Ghedan, 2007; He et al., 2008]; the difficulty with heuristics like simulated annealing and genetic algorithms is that they require a large number of simulations of the multiphase simulation model (typically thousands) to get a good answer for a problem with even a few (e.g., 10) real-valued parameters as was indicated earlier.
 Our GCS model takes 2 h/simulation, so running it thousands of times is undesirable and would be even less so for more complex multiphase models. For example, Kobayashi et al.  shows that the computing time is 292 days for a real-world model (7 CPU hours to run the model and 1000 iterations for the optimization).
 One method for history matching is the gradual deformation technique, which uses a linear combination of several reservoir geological properties and optimizes the coefficient of this combination in order to improve the history match. Several studies have used it: Hu et al. , Caers , and Caers and Hoffman  (probability perturbation method is used in conjunction with the gradual deformation method to generate realizations that improve data match).
 Gradient optimization approaches require that derivatives be calculated. We will cite the adjoint system approach used by Li et al.  and Eydinov et al. . They use streamline-based method to compute approximate sensitivities [Kulkarni and Datta-Gupta, 2000; Datta-Gupta et al., 2001] and more recently applied by Stenerud et al.  and Oyerinde et al. . Main methods for gradient-based optimization are the Levenberg-Marquardt formulation and the conjugate gradient or quasi-Newton approaches. For the Levenberg-Marquardt formulation, studies and algorithm development are included by Zhang et al. , Tonkin and Doherty , Finsterle and Kowalsky , and Finsterle and Zhang . For the conjugate gradient or quasi-Newton approach, Cheng et al.  and Oyerinde et al.  are examples. Derivative-free local optimization methods exist such as optimization by radial basis function interpolation in trust-regions (ORBIT) [Wild et al., 2008] that have the advantage of not requiring derivatives but have the same theoretical convergence properties as derivative-based local optimizers with the same assumptions (as proven in Wild and Shoemaker ). Gradient-based algorithms are usually local optimization algorithms unless they have a restart option. Some other studies using linear or nonlinear local optimization algorithms are by Hazra and Schulz , Bieker et al. , Jansen , Senger et al. , and Suwartadi et al. . All local optimization methods will stop at the first local minimum found, which depends on the initial starting value of the search. Hence, there is a good chance of not finding the best solution unless the initial starting value is close to the best solution, which is unlikely for our problem.
 Ensemble Kalman filter can deal with a large number of variables and possesses the advantage of offering uncertainty analysis features. A downside is that this is not a global optimization method, which means that it will not always find the best parameter estimate if there are local minima. References to ensemble Kalman filter can be found in Zafari and Reynolds  and Aanonsen et al. . Reservoir history matching studies using the ensemble Kalman filter can be found in Skjervheim et al. , Arroyo-Negrete et al. , Chen and Oliver , and Liu and Oliver . However, one of the downside of the ensemble Kalman filter is that it is “not well suited for variables with multimodal distributions unless transformations are possible” [Oliver and Chen, 2011]. This statement corroborated by Jafarpour and McLaughlin [2009a] when they write “If the ensemble replicates are derived from training images that do not describe the channel geometry properly, the Kalman filter has difficulty identifying the correct permeability field,” which means that the ensemble Kalman filter could have trouble finding the correct answer unless the initial starting values in the search are close to the (unknown) correct answer.
 The last category of optimization methods is derivative-free global optimization methods, i.e., global optimization methods using surrogate response surface methods (e.g., stochastic radial basis function (RBF) from Regis and Shoemaker  and efficient global optimization (EGO) from Jones et al. ). These methods, in addition to being global optimization methods (and therefore very well suited for reservoir history matching), are also advantageous from a computational standpoint because they are built to reuse information of previous model runs and therefore usually require fewer simulations than traditional methods. In this study, we therefore focus on the stochastic RBF method cited earlier. It is worthwhile to cite Horowitz et al.  who used EGO, cited earlier, to optimize up to 24 parameters. However, the simulation run times and number of simulations required are not mentioned, making it difficult for the authors to use this study as a benchmark. In addition, the results from the local and global optimization are the same, implying that the application is a special case that does not indicate performance on the more general case with multiple local minima. Espinet and Shoemaker  compare the efficacy of five different optimization algorithms for calibration of a multiphase GCS problem for three different reservoir configurations. They find that only their simplest example, which is homogenous, has a single local minimum. The more realistic heterogeneous examples are both multimodal. The surrogate global optimizers including stochastic RBF worked best on the multimodal problems. This earlier paper addresses the narrower issue of computational efficiency of alternate algorithms for calibration of model parameters on a deterministic problem, whereas the present paper differs by considering the entire process of designing and evaluating a modeling and monitoring system (including well location and types of data) in terms of its ability to estimate and forecast plumes, given limited data, data error, and uncertainty in heterogeneous-aquifer permeability patterns (represented by multiple realizations).
 Another approach to dealing with optimization of computationally expensive functions is to use a standard optimization method operating on a “lower fidelity model” designed to mimic the original computationally expensive model. The lower fidelity model is somehow simplified (e.g., by using a coarser grid for partial differential equations) from the original forward model to be fast to compute. In some cases response surfaces are used to create a surrogate (faster lower fidelity) forward model. However, lower fidelity models can lead to inaccuracies in the simulation predictions. For example, Vasco and Datta-Gupta  propose a method for integrating field production history into reservoir characterization and use a faster (lower fidelity) three-dimensional streamline simulator to carry out the inversion. However, given the large size of current geological models, this approach requires too many simulations. Tran et al.  propose using a (lower fidelity) coarse-scale inversion to reduce the number of parameters with the sequential self-calibration method [Gomez-Hernandez et al., 1997; Hosseini et al., 2011] and then applying downscaling to capture small-scale heterogeneities. However, the computational costs as well as the amount of data necessary to carry out the inversion are not discussed.
 Our approach with surrogate surface optimization (SSO) is different from the “lower fidelity model” because it is designed to save computational effort while computing the full fidelity forward model. SSO reduces computation by reducing the number of simulations required to reach an optimal value, which the SSO algorithm does by building an iterative approximation of the objective function value (e.g., the goodness of fit for a set of parameters) using the results from each prior simulation. In the surrogate surface optimization approach the approximation is changed in each iteration of the optimization. However, it is possible to combine our method with a lower fidelity model to further reduce computational effort.
 This paper focuses on estimating CO2 plume evolution based on limited monitoring data with use of a process-based model, which is the same numerical code that would be used for forward simulation. This involves using numerical simulations with optimization to solve the inverse problem, i.e., to characterize the unknown geology. Then the calibrated model is simulated forward in time with appropriate parameters to predict the movement of the CO2 plume. We focus on optimization methods that are computationally efficient for processed-based simulation models. Model calibrations for geological carbon sequestration models have been carried out by Bickle et al. , but the model was analytical (i.e., not numerical simulations) and specific to the studied formation. Our optimization algorithm makes the estimation process fully automatic and more efficient in terms of both human time and computer time. The automatic calibration process can be repeated many times for updating as new monitoring data become available.
 In section 2, we pose the forward problem formulation, which includes describing the numerical simulator, the geological model, and its implementation in a numerical model. In section 3, we explain the inverse methodology to tackle the problem, which includes the choice of observation data and formulation of the objective function, the choice of parameters and optimization algorithm, and finally, the setup of the calibration problem and method to measure the goodness of the results. In section 4, we present our results. Section 5 provides further discussion of the results and how they may be applied to other sites.