## 1. Introduction

[2] In classic hydrological model development, the modeler seeks to address and reproduce the numerous dynamics occurring within the catchment. Model components may be built and organized according to the individual characteristics of the catchment, the modeling exercise, or the available data. Often model improvement is attempted via the addition to an existing model structure of extra mechanisms that describe the hydrological processes occurring in the catchment. However, the widespread use of more complicated models is often hindered by lack of available data or by the inherent mistrust of practitioners to adopt a model that is too complex given predominant data limitations. Hence most modeling exercises search for a single good model, simple enough to apply to a range of exercises but with sufficient complexity for its predictions beyond the range of calibration data to be deemed “reliable.”

[3] A modeler's ability to include all the processes occurring in a catchment is also limited by the inherent errors and uncertainties in the modeling process. This problem is related to the limited information available in existing data. Recent advances in hydrological modeling have attempted to characterize the uncertainties in the modeling process, to better assess the risk associated with our model outputs. Approaches range from those reporting ensemble predictions, interval estimates, or probability distributions. However, the existence of uncertainty in our calibration data and input data or boundary conditions, and the limits in our understanding of the hydrological system, mean that we are forced to conceptualize or approximate the description of the catchment and the processes occurring.

### 1.1. Utility of Multimodel Formulations

[4] Given all the uncertainties in the modeling process, it is well noted that a single model is unlikely to have a consistent level of accuracy at all times and for all events. As a result, hydrologic modelers may try to improve model predictions by combining the results of several models at once, taking, say, a weighted average of individual models to try to capture the benefits of these models. The advantages of combining predictions from several models is well documented in many disciplines, with extensive literature for economics and the social sciences [*Clemen*, 1989]. In hydrology, established methods of combining models might include a simple or weighted average of models' results, or a nonlinear weighting such as that offered by artificial neural networks [*Georgakakos et al.*, 2004; *Shamseldin et al.*, 1997; *Xiong et al.*, 2001]. For combining multiple models under uncertainty, Bayesian model averaging is an appealing approach [e.g., *Neuman*, 2003]. Multimodel ensembles are more commonly used in hydrological and climate forecasting, and can provide a forecast with greater skill than that from any single model [*Hagedorn et al.*, 2005]. Established approaches to combining multiple models in forecasting for hydrology and climatology range from a simple average of the individual models, models averaged but weighted according to their overall perceived skill [e.g., *Shamseldin et al.*, 1997], or models combined using multiple linear regression [*Doblas-Reyes et al.*, 2005].

[5] Despite the usefulness of these approaches, it cannot be denied that the processes occurring in a hydrological system are highly dynamic and constantly changing. Many studies have examined the use of “dynamic” model structures, or assessed the way in which parameters vary in time [see, e.g., *Wagener et al.*, 2003]. This idea could easily be extended to combining model structures. A method of model aggregation that can take into account the usefulness of different model structures or parameterizations under different hydrologic regimes is desirable.

### 1.2. Model Calibration and Validation: Specification of the Objective Function or Likelihood

[6] In identifying different model structures, modelers are also faced with issues in attempting parameter specification given available catchment information. One important area of research has been concerned with using likelihood-based methods for calibration, by making assumptions about the statistical distribution of the data via a probability density function. This is necessary in Bayesian approaches, where parameter and model uncertainty are described probabilistically. These classical statistical approaches are of use in model development and application, as they allow comparison of models which make different likelihood assumptions. This is often not possible in many other criteria used in hydrology. However, the choice of probability distribution to define the likelihood is often not thoroughly explored and the model errors are often assumed normally, independently, and identically distributed. Alternatively likelihoods have been used which assumed heteroscedastic, correlated errors [*Bates and Campbell*, 2001; *Yapo et al.*, 1998], usually by transforming the data and fitting autoregressive models.

[7] Much research has also been based on using different performance criteria to assess different aspects of model fit. It is well recognized that multiobjective approaches allow focus on different important aspects of model predictions [*Gupta et al.*, 1998]. These approaches generally try to provide some trade-off in model fitting by assessing different aspects of the model's predictions (such as minimizing peak flow error as well as overall flow error estimated over the full hydrograph). An issue often ignored is the assumptions on the distribution of the residuals or errors that result from the model fit. It is generally unwise to assume that the error distribution associated with a model calibrated using one objective is the same as using another.

[8] A possible solution may be to allow the error distribution to vary with time, or depend on the nature of the runoff generation mechanism that may dominate at a given time step. Several studies have noted the utility of calibrating models on different sections of the data [see, e.g., *Boyle et al.*, 2000]. What if we could specify different error models to represent different sections of the hydrograph? This could allow implementation of formalized likelihood approaches, but also allow focus on different attributes of the hydrograph that the multiobjective approach idealizes.

### 1.3. Research Motivation

[9] This traditional approach to model development and calibration shows an important limitation in how we specify hydrological models. How do we model what is inherently a dynamic system, when the majority of our modeling tools are deterministic and static? We generally assume a single static model to represent the dominant observable processes in a catchment. If it is true that the catchment is dynamic, should it not also be true that different assumptions will better simulate the data at different times? These issues can also be extended to our understanding of the model errors. Is it sensible to make the assumption that the structure of the errors in our data do not change over the range of model responses? In this study, we propose a framework within which these issues can be addressed.