## 1. Introduction

[2] Almost all watershed simulation models contain effective physical and/or conceptual model parameters that are either difficult or impossible to directly measure. Applications of these models therefore require that model parameters are adjusted so that model predictions closely replicate the observed environmental system response data. The process of model parameter conditioning to historical system response data is called calibration. The traditional approach to model calibration has been to calibrate the model manually by trial and error. While such a manual calibration is useful as a learning exercise for modelers, it can be extremely labor intensive and difficult to implement for complex model calibration situations where models are calibrated to long time series of measured system response data with different constituents at multiple locations.

[3] Watershed modelers have long since recognized that optimization algorithms could be used to automate the calibration process. Automatic calibration is defined here as an optimization algorithm based search for a set of watershed model parameter values that minimize the model prediction errors relative to available measured data for the system being modeled. This study will focus on the automatic calibration of watershed simulation models. The results of this study however are also relevant to all other environmental simulation models requiring calibration. *Gupta et al.* [1998] and *Singh and Woolhiser* [2002] note that the automatic calibration methodology has a number of important parts including: (1) the selection of appropriate calibration data, (2) the definition of the objective function that measures the error between model predictions and the calibration data, and (3) the optimization algorithm used to optimize the selected objective function. This study is focused on investigating optimization algorithms for automatic calibration and in particular will introduce a new and efficient algorithm called the dynamically dimensioned search (DDS).

[4] Early automatic calibration studies utilized local optimization techniques that find locally optimal solutions close to the initial solution [*Ibbitt*, 1970; *Nash and Sutcliffe*, 1970; *Sorooshian and Gupta*, 1983]. Examples include derivative-based (e.g., quasi-Newton) algorithms or derivative free algorithms like the Nelder-Mead Simplex method [*Nelder and Mead*, 1965]. The problem with these methods is that they may find only a local optimum and never get close to the global optimum. Given the inherent complexity of watershed models, recent studies have utilized more advanced global search methods. *Duan* [2003] provides a good review of optimization algorithms for watershed model calibration and his list of global optimization algorithms applied to watershed model calibration includes adaptive random sampling [*Masri et al.*, 1980], controlled random search [*Price*, 1978], the multistart Simplex, genetic algorithm [*Wang*, 1991], simulated annealing [*Thyer et al.*, 1999], and the shuffled complex evolution (SCE) algorithm [*Duan et al.*, 1993, 1992]. SCE is the dominant optimization algorithm in the watershed model automatic calibration literature over the past 10 years given that more than 300 different publications reference the original set of SCE publications [*Duan et al.*, 1993, 1992, 1994]. Therefore our new DDS algorithm is tested extensively against SCE.

[5] The introduction of SCE for automatic calibration of watershed models was a great advancement that has enabled a substantial number of modelers to solve difficult calibration problems. A review of the algorithm performance comparisons in the watershed modeling literature shows that the SCE algorithm was judged to outperform the other global optimization algorithms in the previous paragraph in at least one study (and often multiple studies). However, most of these SCE comparisons involved computationally efficient lumped parameter conceptual watershed models with simulation times often on the order of a few seconds or less. As a result, most previous SCE comparisons utilize very large numbers of total allowable model evaluations per optimization trial. For example, in three studies calibrating 11–13 model parameters, SCE results were generated using 11,000 to 23,000 model evaluations [*Duan et al.*, 1994; *Gan and Biftu*, 1996; *Sorooshian et al.*, 1993]. In more complex model calibration examples, *Tanakamaru and Burges* [1996] used 39,000 to 49,000 model evaluations for SCE in a 16 parameter problem, while *Franchini et al.* [1998] use 250,000 model evaluations in a 37 parameter problem. Consider that the Soil and Water Assessment Tool version 2000 (SWAT2000) distributed watershed model calibration case study utilized here (see section 2.4) requires at least 2 minutes to execute a single, 9-year, daily time step simulation on a Pentium IV 3-GHz processor. Therefore one SCE optimization run in this situation would require about 14 days of computation time for 10,000 SWAT model evaluations and about 4.6 months for 100,000 model evaluations. With such extreme computational burdens in mind, this study is focused on evaluating optimization algorithm performance on rather limited computational budgets (1000 to 10,000 model evaluations).

[6] One approach to address this SCE efficiency issue is to simply run SCE for as long as the case study specific computational constraints allow for (e.g., ∼1000 rather than 100,000 simulations). While this approach will produce results, and perhaps even a seemingly reasonable objective function value, SCE was not specifically developed and tested against other algorithms from this perspective. Instead, SCE was developed so that optimal or near-optimal solutions are returned with high reliability upon algorithm convergence (typically more than 10,000 model evaluations). The available SCE comparison literature almost exclusively presents algorithm performance comparisons in terms of effectiveness (solution quality) and computational effort required to find the final best solutions at algorithm termination or convergence but do not assess algorithm effectiveness prior to termination [see, e.g., *Duan et al.*, 1993; *Gan and Biftu*, 1996; *Franchini et al.*, 1998]. This comparison approach is entirely appropriate given that the hydrologic models being calibrated in these case studies were lumped parameter conceptual models with very short simulation times.

[7] When automatic calibration is applied to spatially distributed models, or more generally any model that presents a significant computational burden, the comparison of two optimization algorithms must consider how solution quality changes with varying computational effort. This is because distributed modeling computational timescales can vary by many orders of magnitude depending on what model, spatial discretization level and watershed size is selected for the modeling case study. *Singh and Woolhiser* [2002] report in their review of mathematical modeling of watershed hydrology that many current watershed hydrology models are spatially distributed. In fact, as soon as one considers a limited number of model evaluations perspective, the idea of achieving global optimality becomes unreasonable in most automatic calibration problems. As a result, the methods for comparing algorithm performance in this paper are necessarily different from the methods found in the great majority of previous SCE literature. In addition, we believe that improved automatic calibration optimization algorithms can be developed with such a perspective in mind and introduce the new DDS as one such algorithm focused on identifying good calibration solutions when model evaluations are limited.

[8] The specific goals of this study are (1) to introduce the new DDS algorithm for watershed model calibration and (2) present DDS and SCE comparative algorithm performance results in ways that are meaningful for modelers subject to a wide range of computational limitations. DDS requires essentially no parameter tuning and the search strategy is scaled to the user-specified maximum number of objective function evaluations in order to return good solutions across a range of computational limitations. Numerical results will show that DDS is robust and effective and it outperforms the SCE algorithm for real SWAT2000 watershed simulation model calibration formulations of 14, 26, and 30 parameters limited to 10,000 or fewer total model evaluations. This study limits DDS algorithm comparisons to the SCE algorithm because SCE is so frequently applied to hydrologic or watershed simulation model calibration.

[9] The remainder of the paper is organized as follows. Section 2.1 highlights the benchmark optimization algorithms utilized in this study, and the DDS algorithm is described in detail in section 2.2. The optimization test problems and SWAT2000 automatic calibration case studies are introduced in sections 2.3 and 2.4, respectively. All algorithm comparison results are provided in section 3, while section 4 summarizes and highlights the significance of the results. Conclusions and future research directions are detailed in section 5.