## Introduction

Societal decision-making on water issues—both quantity and quality—requires science-based tools such as computer models. Numerical models are vital for informing such decisions because the models can be used to investigate a range of actions using a quantitative and physically based framework, which in turn facilitates reactive as well as proactive action. However, groundwater models can have long runtimes; large, transient groundwater flow, contaminant transport, and coupled groundwater-surface water models can take more than a day to complete a single run. Such long runtimes are an impediment to investigating alternate conceptual models and alternate management scenarios with models— especially when performing model calibration and sensitivity analysis where the model is run many times.

At the November 2009 PEST (Parameter Estimation) Conference in Potomac, Maryland, the recent development of “cloud computing” was discussed as a means to bring unprecedented computing power to bear on groundwater problems (Luchette et al. 2009; Schreuder 2009). Cloud computing has been widely covered in the recent popular press (e.g., http://www.newsweek.com/id/140864), and in its simplest form includes Internet-accessible e-mail. However, cloud computing also includes other capabilities, including allowing customers to create multiprocessor configurations, or “supercomputers,” by renting virtual computers over the Internet. Thus, cloud computing allows the modeler to access the number of machines that best suit the modeling problem rather than restricting that number to those machines available locally. In a common parallel computing application, multiple processors are used to reduce runtimes by parallelizing a single model run. For example, Arnett and Greenwade (2000) reported an almost fivefold reduction in runtime when a contaminant transport model run on a single processor was split among 10 parallel processors.

Many groundwater models are not well suited for parallel computing, however, because communication overhead between processors can offset the gain of adding processors. Therefore, such speedups cannot be universally expected even with additional computing capability provided by cloud computing. However, all models can benefit from another parallel computing application—automated calibration and uncertainty analysis using parameter estimation techniques. During the parameter estimation process, each user-specified model parameter is adjusted and the model outputs are compared with corresponding field measurements. The effect of each parameter change on the model-generated counterparts to field observations is used to develop an updated estimate of the optimal parameters. Hence, a large number of runs must be performed to compute the updated parameter set that improves a model's fit to the data. Fortunately, parameter estimation has properties of an “embarrassingly parallel” problem (Foster 1995), thus making it well suited for parallel (and thus cloud) computing. Three aspects make this the case: (1) the runs are completely independent of each other (no interaction between runs); (2) all the runs can be decided before any run is launched; and (3) the runs are idempotent, that is, doing the same run more than once has no side effects. Given this universal application to groundwater models, indeed to all environmental models, the remainder of the discussion will focus on the application of cloud computing to parameter estimation problems.

Parameter estimation, like all groundwater modeling, confronts a common problem—the natural world always has more complexity than can be included in any model parameter set. To the extent that processes and characteristics are so simplified that observed system behavior is not completely replicated in a model, so-called “structural noise” (e.g., Doherty and Welter 2010) degrades model outputs that correspond to these observations (as well as predictions that the model is required to make). The more salient information that is omitted, the more *over*-simplified the model and the larger its structural error. These deficiencies in model behavior can be addressed to some degree through use of appropriate complexity, attained by using higher numbers of parameters than have been traditionally included. This increased model flexibility can help reduce structural noise by allowing model parameterization to be more receptive to information contained in calibration data, which in turn can reduce the potential for error in model predictions. Moreover, new complimentary methods have taken advantage of insight gained from highly parameterized models to better estimate the potential for prediction error (e.g., Moore and Doherty 2005).

Why not run all models with hundreds or thousands of parameters as a means to maximize model flexibility and keep the structural error associated with omitted detail small? Estimation of parameters for overly complex models can be unstable and nonunique, though these problems can be overcome with a “regularized inversion” approach where large numbers of parameters are constrained using mathematical methods and soft-knowledge of the system (Hunt et al. 2007). Carrying many parameters during model calibration, however, still carries high computational costs; most parameter estimation methods commonly require at least one model run per parameter during each iteration. Even mathematical enhancements such as the use of “super parameters,” (linear combinations of base parameters, Tonkin and Doherty 2005), require an initial sensitivity analysis where sensitivity of all model outputs to each parameter must be calculated before defining the more limited number of super parameters. Although such enhanced methods are now routinely used in everyday modeling practice, the upper limit on the number of parameters is often still chosen based on the number of computers available, and not on what is best suited for calibration, or for analyzing the uncertainty of a prediction of interest. Thus, these artificial and arbitrary constraints to model parameter estimation and uncertainty analysis could limit our ability to bring the best science to water resources decision-making. Cloud computing is a powerful new tool that can overcome this restriction (Luchette et al. 2009).