## 1. Introduction

[2] Computer simulation models, which simulate abstract representations of physically based systems using mathematical concepts and language, are playing a key role in engineering tasks and decision making processes. There are various types of problems utilizing computer simulation models (for simplicity referred to as *simulation models* hereafter) including prediction, optimization, operational management, design space exploration, sensitivity analysis, and uncertainty analysis. There are also problems such as model calibration and model parameter sensitivity analysis dealing with simulation models to enhance their *fidelity*to the real-world system. Fidelity in the modeling context refers to the degree of the realism of a simulation model. Modern simulation models tend to be computationally intensive as they rigorously represent detailed scientific knowledge about the real-world systems [*Keating et al.*, 2010; *Mugunthan et al.*, 2005; *Zhang et al.*, 2009]. Many model-based engineering analyses require running these simulation models thousands of times and as such demand prohibitively large computational budgets.

[3] *Surrogate modeling*, which is a second level of abstraction, is concerned with developing and utilizing cheaper-to-run “*surrogates*” of the “*original*” simulation models. Throughout this paper, the terms “original functions,” “original simulation models,” and “simulation models” are used interchangeably. A wide variety of surrogate models have been developed to be intelligently applied in lieu of simulation models. There are two broad families under the large umbrella of surrogate modeling, response surface modeling and lower-fidelity modeling. Response surface surrogates employ data-driven function approximation techniques to empirically approximate the model response. Response surface surrogates may also be referred to as “*metamodels*” [*Blanning*, 1975; *Kleijnen*, 2009] as a response surface surrogate is a “*model of a model*.” “*Model emulation*” is another term referring to response surface surrogate modeling [*O'Hagan*, 2006]. The term “*Proxy models*” has also been used in the literature to refer to response surface surrogates [*Bieker et al.*, 2007]. Unlike response surface surrogates, *lower-fidelity surrogates*are physically based simulation models but less-detailed compared to original simulation models, which are typically deemed to be high-fidelity models; they are simplified simulation models preserving the main body of processes modeled in the original simulation model [*Forrester et al.*, 2007; *Kennedy and O'Hagan*, 2000].

[4] In a surrogate modeling practice (response surface surrogates or lower-fidelity physically based surrogates), the goal is to approximate the response(s) of an original simulation model, which is typically computationally intensive, for various values of explanatory variables of interest. The surface representing the model response with respect to the variables of interest (which is typically a nonlinear hyper-plane) is called “response surface” or “response landscape” throughout this paper. For the majority of response surface surrogate modeling techniques, different response surfaces must be fit to each model response of interest (or each function aggregating multiple model responses). The neural network technique is one exception capable of fitting multiple model responses. In contrast, since lower-fidelity surrogates retain some physically based characteristics of the original model, one lower-fidelity surrogate model is typically capable of approximating multiple model responses of interest.

[5] The main motivation of developing surrogate modeling strategies is to make better use of the available, typically limited, computational budget. *Simpson et al.* [2008] report that the common theme in six highly cited metamodeling (or design and analysis of computer experiments) review papers is indeed the high cost of computer simulations. Global optimization algorithms based on response surface surrogates such as EGO [*Jones et al.*, 1998], GMSRBF and MLMSRBF [*Regis and Shoemaker*, 2007a], and Gutmann's method [*Gutmann*, 2001] and also uncertainty analysis algorithms such as ACUARS [*Mugunthan and Shoemaker*, 2006] and RBF-enabled MCMC [*Bliznyuk et al.*, 2008] all have been developed to circumvent the computational budget limitations associated with computationally intensive simulation models. In this regard, surrogate modeling may only be beneficial when the simulation model is computationally intensive, justifying the expense of moving to a second level of abstraction (reduced model fidelity) which typically leads to reducing the accuracy of analyses. Therefore, even though *Jones* [2001] and *Simpson et al.* [2008] both point out that surrogate modeling is more than simply reducing computation time, reviewing other possible motivations for surrogate modeling is beyond the scope of this paper.

[6] Many iterative water resources modeling analyses potentially stand to benefit from surrogate modeling. Benefits are only potential because any surrogate-enabled modeling analysis provides an approximation to the analysis with the original model and the error of the analysis result seems difficult or impossible to assess without repeating the exact analysis with the original simulation model. For example, there is no guarantee that a model parameter deemed insensitive on the basis of surrogate modeling analysis is truly insensitive in the original simulation model. An incomplete list of classic or popular iterative modeling analyses in water resources, which are candidates for efficiency enhancement with surrogate modeling, include deterministic model parameter optimization (calibration) studies with evolutionary algorithms [e.g.,*Duan et al.*, 1992; *Wang*, 1991], uncertainty-based or Bayesian model calibration studies [e.g.,*Beven and Freer*, 2001; *Kavetski et al.*, 2006; *Vrugt et al.*, 2009], management or design optimization with evolutionary algorithms [e.g., *McKinney and Lin*, 1994; *Savic and Walters*, 1997], multiobjective optimization algorithms [e.g., *Cieniawski et al.*, 1995; *Reed et al.*, 2003], global sensitivity analysis methods [e.g., *Hornberger and Spear*, 1981; *Saltelli et al.*, 2000], and any traditional Monte Carlo–based reliability or uncertainty analysis [e.g., *Melching et al.*, 1990; *Skaggs and Barry*, 1997].

[7] This paper aims to review, analyze, and classify the research on surrogate modeling with an emphasis on surrogate modeling efforts arising from the water resources modeling field. *Simpson et al.* [2001] and *Wang and Shan* [2007] also review the literature on response surface surrogates for engineering design optimization problems. *Simpson et al.* [2004] summarize a discussion panel on response surface surrogate modeling held at the 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization. *Simpson et al.* [2008]review the literature on response surface modeling and motivations from a historical perspective and also emphasize the appeal of lower-fidelity physically based surrogate modeling.*Forrester and Keane* [2009]review recent advances in surrogate modeling including advances in lower-fidelity physically based surrogates in the field of optimization. Special journal issues on surrogate modeling summarize the first and second International Workshops on Surrogate Modeling and Space Mapping for Engineering Optimization (see*Bandler and Madsen* [2001] and *Bandler et al.* [2008]). Another special issue publication on surrogate modeling is a recent thematic journal issue on surrogate modeling for the reduction and sensitivity analysis of complex environmental models (see *Ratto et al.* [2012]). In addition, there are more specific review papers focusing on specific tools/strategies involved in surrogate modeling. *Kleijnen* [2009] reviews kriging and its applications for response surface surrogate modeling. *Jin et al.* [2001] and *Chen et al.* [2006] review and compare multiple function approximation models acting as response surface surrogates. *Jin* [2005] focuses on response surface surrogate modeling when used with evolutionary optimization algorithms.

[8] Surrogate modeling has been increasingly more popular over the last decade within the water resources community and this is consistent with the increasing utilization of metamodels in the scientific literature since 1990 as documented by *Viana and Haftka* [2008]. A research database search of formal surrogate modeling terminology in mid-2011 (search of Thomson Reuters (ISI) Web of Knowledge) in 50 journals related to surface water and groundwater hydrology, hydraulic, environmental science and engineering, and water resources planning and management returned 110 articles on surrogate modeling in water resources. We believe that the actual number of articles on surrogate modeling in water resources is higher as there are articles not using the formal terminology of surrogate modeling and/or not published in water resources related journals. Forty-eight of the available surrogate modeling references published until mid-2011 dealing with water or environmental resources problems were selected for detailed review based on their relevance, our judgment of their quality and clarity of reporting and the significance of surrogate modeling to the contribution of the publication. The phrase “water resources literature” used throughout this paper refers to the 50 water resources journals and these 48 surrogate modeling references.

### 1.1. Goals and Outline of Review

[9] Our primary objective (1) is to provide water resources modelers considering surrogate modeling with a more complete description of the various surrogate modeling techniques found in the water resources literature along with some guidance for the required subjective decisions when utilizing surrogate models. The depth of the review of the topics covered here generally varies with the popularity of the topic in the water resources literature and as such, discussion largely revolves around optimization applications. Additional more specific objectives are as follows: (2) describe each of the components involved in surrogate modeling practice as depicted in Figure 1; (3) provide a categorization of the different surrogate-enabled analysis frameworks (i.e., the different ways the components inFigure 1 can interact); (4) relate existing surrogate modeling efforts in the water resources literature with similar efforts in the broader research community; and (5) identify relevant underutilized ideas for consideration in future water resources studies.

[10] Figure 1presents a diagram that shows all the components involved in the surrogate modeling analysis framework and the sections in the paper that are directly related to each component. Conventional frameworks not involving surrogate models, such as different simulation-optimization frameworks, consist of only the original model and the search or sampling algorithm components being directly linked together. In surrogate-enabled frameworks, however, three new components, design of experiments, response surface surrogate, and/or lower-fidelity surrogate, may also be involved. These three components, and the framework through which all the components interact, are of particular interest in this paper. Such frameworks generally begin with a design of experiments to generate a sample with which to train or fit a response surface or lower-fidelity surrogate model; then the sampler/search algorithm repeatedly runs the original computationally expensive model and/or the surrogate and collects their response. During this metamodel-enabled analysis, the surrogate model can be static or dynamically updated. Any original model evaluation which is utilized to fit the surrogate model is referred to as a design site.Section 2 details the elements associated with the response surface surrogates and presents their advances, considerations, and limitations. Section 3presents the motivation and different types and frameworks for lower-fidelity surrogate modeling.Section 4discusses how the performance of a surrogate-enabled framework should be evaluated and benchmarked against other alternatives. The paper ends with summary and concluding remarks insection 5.

### 1.2. Case Study or Problem Characteristics Influencing Surrogate Model Design

[11] The most critical problem characteristics that should influence surrogate model/technique selection are as follows:

[12] 1. Model analysis type to be augmented by the surrogate model—search or sampling. For the remainder of this paper, search analysis is meant to refer to optimization (management, calibration, single-objective or multiobjective) or uncertainty-based/Bayesian model calibration procedure while all other modeling analyses are referred to as sampling analyses.

[13] 2. Computational budget constraints. This refers to how many original model evaluations can be utilized to build the surrogate model and ultimately perform the model analysis of interest. In applications where a surrogate model is to be repeatedly utilized after it is initially constructed (i.e., optimize real-time operational decisions), the time available for each utilization can be critical.

[14] 3. Dimensionality of the problem. In general, as the number of explanatory variables increases, surrogate modeling becomes less advantageous and even infeasible.

[15] 4. Single-output versus multioutput surrogates. This is a key distinction in the context of environmental simulation modeling, where model outputs are typically variable in both time and space. Single-output surrogates are common where the original model output of interest is a function calculated from a large number of model outputs (i.e., calibration error metric).

[16] 5. Exact emulation versus inexact emulation. In other words, should the surrogate model exactly predict the original model result at all design sites?

[17] 6. Availability of original model developers/experts. Some surrogate modeling techniques require these experts (lower fidelity modeling), and *Gorissen* [2007] notes that they can provide valuable insight into the significance of surrogate modeling errors relative to original model errors.

[18] Although not an original problem characteristic, the availability of surrogate modeling software and experts also has an impact on surrogate model design. The aforementioned surrogate modeling reviews and the literature in general do not precisely map all problem characteristics to specific or appropriate types of surrogate models. We also do not attempt this and instead only make periodic observations and judgments as to when certain types of surrogate modeling techniques might be more or less appropriate than others. Properly considering the case study specific factors above is the first key to avoid implementing a poor surrogate modeling technique.