## 1. Introduction

[2] Hydrological models are conceptualizations and need to be assessed [*Gupta et al*., 1998] against observations of a variable of prediction interest. By prediction of a variable of interest, we mean the deterministic, simulated output of the model in response to the measured inputs alone (i.e., in the example of section 5, the precipitation and evapotranspiration). Models are estimated on a finite data sample which even when uncorrupted by measurement errors can lead to uncertainty about the model, out of the available candidates, that best approximates the underlying processes. A model estimated on a finite sample may significantly differ from a model estimated on a sufficiently large sample size due to sampling uncertainty. This leads to uncertainty in predicting future events. However, even at large sample sizes where prediction uncertainty due to sampling uncertainty vanishes, two entirely different model structures or conceptualizations may yield similar predictions. These issues are closely linked to the issue of ill-posed hydrological model selection problems. Issues of uniqueness and stability limit the possibility of well-posed model identification [*Gupta and Sorooshian*, 1983; *Vapnik*, 2002; *Renard et al*., 2010]. While the former is a result of model predictive equation specification leading to nonunique global optima, the latter is linked to a model's capacity to recreate data with little or no hydrological information or the model's complexity relative to the amount of available data [*Vapnik*, 2002; *Pande et al*., 2009, 2012].

[3] A hydrological model selection problem is ill-posed (in Hadamard's sense) if the optimal solution of the selection problem either does not exist, is not unique or is not stable. Here by optimal model we imply that the estimated model is closest in its predictions of a variable of interest to the observed in some notion of closeness. By stability, we here mean parametric stability and distinguish it from its use in dynamical systems. A solution is stable if small perturbations in the parameters of the (solution) model result in small perturbations in its predictions of a variable of interest. We here note that the parameters can also represent combinations of various subcomponents of a model, thus this definition of stability is applicable in a broader context of model structures. We posit a regularized hydrological model selection approach that restricts the set of solutions, where the regularization is with respect to the complexity of the problem, and can “correct” the ill-posedness of a model selection problem (in Tikhonov's sense) [*Vapnik*, 1982]. The regularization achieves this correction by restricting the set of solutions to a smaller set where stability is ensured. The use of regularization methods is one of several ways for solving ill-posed problems and hydrological model complexity is one possible basis for stabilization (to regularize). However, the role of hydrological model complexity, as it is in this paper, is closely tied to prediction uncertainty due to sampling uncertainty. The paper therefore studies the issues of ill-posed hydrological model selection problems and hydrological prediction uncertainty through its treatment of model complexity.

[4] We define the empirical risk *ξ* (also called empirical error, finite sample prediction error, or finite sample model performance) as the mean absolute difference between observed and model predictions of a variable of interest. For small sample sizes *N*, the empirical risk can significantly differ from its expected value, the expected empirical risk. The selection of a model that performs best in expected sense on future unseen data depends on the expected empirical risk (i.e., the expected risk in validation) rather than the empirical risk estimated on a single data realization of finite length. Therefore, we use expected empirical risk to assess the prediction error or model performance. Since an infinite number of realizations is needed to calculate a mathematical expectation, the expected empirical risk cannot be calculated directly and has to be approximated.

[5] We express the expected empirical risk in terms of the empirical risk and an upper bound on the deviation of the empirical risk from its expected value. The size of this deviation depends on the convergence rate of the empirical risk to the expected empirical risk. Model complexity influences this rate. An upper bound for the expected empirical risk can be given by a sum of empirical risk and a function of complexity and sample size [*Pande et al*., 2012].

[6] The related issues of prediction uncertainty, that we thus address, are associated with the predictability problem of second or third kind of *Kumar* [2011] since the deviation of the empirical risk from the expected risk can either be due to uncertain boundary conditions, inadequate model structure or changes in the error of the observations of the output being assessed against. Novel techniques for efficient parameter uncertainty estimation, data assimilation, numerical integration, and multimodel ensemble prediction have been introduced to better describe or tame hydrological prediction uncertainty [such as *Vrugt et al*., 2009; *Moradkhani et al*., 2005; *Kavetski and Clark*, 2010; *Parrish et al*., 2012]. Bayesian approaches to hydrological model selection, prediction uncertainty, model complexity, and regularization have also been well studied [*Schwarz*, 1978; *Jakeman and Hornberger*, 1993; *Young et al*., 1996; *Cavanaugh and Neath*, 1999; *Ye et al*., 2008; *Gelman et al*., 2008]. The use of prior distribution as a regularization term in a log-likelihood maximization is similar in form to the regularization proposed in this paper [*Gelman et al*., 2008]. *Ye et al*. [2008] compared AIC, BIC, and KIC measures and showed that an effective complexity measure (and thus regularization based on it) in KIC, being a finite (though asymptotically large) sample version of BIC [*Ye et al*., 2008], depends on the Hessian of the likelihood function at the optimum under certain regularity conditions [*Cavanaugh and Neath*, 1999; *Ye et al*., 2008]. Meanwhile in BIC it depends on model parameter dimensionality. The regularity conditions are used to replace the need for full specification (that the observations are generated by a member of the model space specified by a likelihood function). These conditions exploit the second-order Taylor series expansion of a log-likelihood function, certain assumptions on the prior and large sample size arguments to justify the use of KIC for model selection [*Cavanaugh and Neath*, 1999]. The use of KIC for model selection may however not be accurate for finite sample sizes. This is because it is a good approximation for posterior model probability (integral of the likelihood function over the parameter space) with an error of *O*(*N*^{−1}), where *N* is the sample size, when the likelihood function is normally distributed [*Slate*, 1994; *Tierney and Kadane*, 1986] or when the log-likelihood function is highly peaked near its maximum even for small *N* [*Kass and Raftery*, 1995]. Such conditions rarely hold on the likelihood functions when *N* is finite, in particular when it is small.

[7] *Jakeman and Hornberger* [1993] and *Young et al*. [1996] used complexity measures related to the information matrix. In particular, the seminal work of *Young et al*. [1996] on model complexity is quite different from the notion of complexity discussed in this paper. They identify a model with lower complexity than another model by identifying the “dominant modes” of the more complex model. The lower order model is identified on the basis of noise-free simulated data from the higher order more complex model. The identification is based on YIC measure that refers to the inverse of the instrumental product matrix and is related to the information matrix. The lower order model explains the output of the more complex model almost exactly and without ambiguity.

[8] The treatment of prediction uncertainty here excludes numerical inadequacies in computing the states of a system under consideration [*Kavetski and Clark*, 2010]. Further, the aim is not to discuss hydrological model structure improvements since we only analyze the convergence of the empirical risk of a hydrological model to its expected value (for a given hydrological variable of prediction interest) and its dependence on model complexity and available number of observations. This in turn is conditional on the set of candidate hydrological models or on a given model structure and elucidates the relationship between hydrological prediction uncertainty, data finiteness and model (structure) complexity [*Ye et al*., 2008; *Clement*, 2011; *Pande et al*., 2012].

[9] Here the role of model complexity relative to data availability in ill-posed hydrological problems (in Hadamard's sense) and in bounding the expected empirical risk is recognized. The ill-posedness in hydrological model selection problems appears due to the possibility of the many-to-one mapping from a set of hydrological processes to a response variable such as streamflow. The many-to-one mapping can yield solutions to hydrological model selection, which in turn is a selection of hydrological processes, that either are unstable, nonunique, or nonexistent. Solutions to model selection problems are deemed unstable when a small variation in the observed variable of interest, with respect to which the process of model selection (of process conceptualizations) is being undertaken, results in large variation in the preferred sets of hydrological process conceptualizations.

[10] We emphasize that model complexity plays the role of a stabilizer to restrict the set of solutions of an ill-posed hydrological model selection problem to a subset of the original set that is compact. A compact set is a set that is bounded and closed. The restriction of the set of solutions to a compactum treats the issues of nonexistence of a solution. Thus, this restriction regularizes the model selection problem, correcting the ill-posedness by restricting the set of solutions to a subset where the problem has a solution that exists, is unique and stable for any set of observations such as streamflow or evaporation (or any other hydrological variable of interest) with respect to which the model selection problem is defined. The hydrological model selection is then well posed in Tikhonov's sense.

[11] The role of model complexity as a stabilizer has been undertaken in other type of problems such as density estimation problems [*Vapnik*, 1982]. However, a stabilizer does not have to be a measure of complexity; other choices for stabilization are available. But prediction uncertainty crucially depends on model complexity as defined in this paper. We make minimal assumptions on the data and the underlying distributions. These assumptions are explicitly stated. Nonetheless, it is the issue of obtaining unstable solutions which translates into finding a widely different process conceptualization as the number of observations for model selection increases, that unsettles a modeler the most. Selecting widely different process conceptualizations also implies different model complexities, affecting our confidence in its predictions. This is because the uncertainty in model prediction, in the sense of the probability in exceedance by an arbitrary positive number of the deviation of empirical risk from its expectation, is bounded from above by a function of model complexity and sample size. Widely different model complexities for similar sample sizes would imply different prediction uncertainties and hence a lack of confidence in model predictions.

[12] We identify an upper bound on the expected empirical risk for any hydrological model as a function of empirical risk, model complexity, and sample size. This upper bound for any given sample size serves to distinguish between models. This is akin to regularized hydrological model selection wherein a model with minimal complexity is selected from those which have lower empirical risk [*Pande et al*., 2009]. Many concentration inequalities (inequalities that can bound the deviation of a random variable from its expected value) exist to estimate such bounds [*Boucheron et al*., 2004], but most are applicable in hydrological model estimation only when model predictions are assumed to be independent between any two time steps. Since such model predictions are never independent between time steps, we use Markov's inequality that does not require independence in model predictions.

[13] Further, since the treatment of ill-posedness and prediction uncertainty crucially depend on the estimation of model complexity, we look at the computation of model complexity along with its geometric interpretation. We use mean absolute error as a measure of the empirical risk that can be interpreted as a measure of distance between the observed and predicted in a *N*-dimensional space, where *N* is the sample size. Here the sample size is the number of data points of a time series of a hydrological variable of interest such as streamflow or evaporation. Under a mild assumption, we show that the empirical risk depends on the distance of a prediction from its mathematical expectation, whose probability of exceedance is a function of model complexity and sample size. Using the same probability of exceedance, we show that model complexity, within the framework presented, is the expected absolute deviation of model prediction from the expectation of model prediction. This is not an assumption but it is a consequence of the theory presented in the paper. This geometrically describes model complexity as a summary statistic (expectation) of the size of model output space (measured by mean absolute deviation of predictions from expected values).

[14] The paper is organized as follows. Section 2 deals with prediction uncertainty, ill-posedness, and the role of model complexity. Section 3 then further discusses the notion of model complexity while section 4 provides a geometrical interpretation of model complexity. Section 5 then presents two algorithms to implement the theory presented and applies it on SAC-SMA and SIXPAR model structures. Section 6 presents a third algorithm to estimate an upper bound on expected empirical error. It is applied on five other nonlinear rainfall-runoff model structures using Guadalupe river basin data set (of daily streamflow, precipitation, and potential evapotranspiration) to determine models with optimal complexity on different sample sizes. It is also used to rank the model structures in terms of its (complexity regularized) suitability for the study area and compare it with the rankings provided by model selection without complexity regularization and BIC criterion. Finally, section 7 concludes.