## 1. Introduction

[2] Hydrological models are important tools for water management but their representations of real-world hydrological systems are affected by significant uncertainties. Problems such as inadequate conceptualizations of physical processes, uncertainties in available data and commensurability errors between spatial and temporal scales of model and data hinder a derivation of model-parameter values directly from basin traits [*Beven*, 2002]. Most model applications require calibration, an adjustment of the model-parameter values so that observed values of at least one flux or state variable are adequately simulated. Historically, calibration techniques have gravitated toward finding the parameter-value combination (parameter vector) that gives the best result, according to one, or multiple, goodness-of-fit measures. Today, the uncertainties in the model calibration as a result of the time period used for calibration [*Gupta and Sorooshian*, 1985; *Yapo et al*., 1996; *Wagener et al*., 2003; *Singh and Bárdossy*, 2012], the selected objective function [*Efstratiadis and Koutsoyiannis*, 2010], and the nature and magnitude of data errors [*Bárdossy and Singh*, 2008; *Renard et al*., 2010; *McMillan et al*., 2011; *Westerberg et al*., 2011b], are increasingly recognized and taken into account. The robustness of the calibrated parameter values to these types of uncertainties is therefore an important concern.

[3] The term robust has multiple interpretations. Robustness can be linked to the model structure that should be as parsimonious as possible [*Klemeš*, 1983; *Dooge*, 1986; *Son and Sivapalan*, 2007], to data that should have small errors and for which the error structure is either known or can be approximated [*Beven and Westerberg*, 2011], or to the calibration procedure [*Bárdossy and Singh*, 2008; *Peel and Blöschl*, 2011]. For the latter, *Bárdossy and Singh* [2008] (BS08 from now on) list the following criteria for robust parameter vectors: 1) they lead to good model performance, 2) they lead to hydrologically reasonable representations of the corresponding processes, 3) they are not sensitive to small changes in parameter values, and 4) they are transferable in time, leading to good model performance for other time periods and perhaps also other catchments. Model performance can in summary be expressed as the value of the performance measure used for model calibration and evaluation, and therefore depends on the choice of this measure [*Freer et al*., 2003; *Krause et al*., 2005]. Model performance can also be analyzed in depth to determine the ability of the simulations to represent the relevant hydrological processes in a posterior performance analysis against observational data, given their estimated uncertainties, e.g., *McMillan et al*. [2010] and *Westerberg et al*. [2011b]. In this paper we used a depth measure to analyze how hydrological robustness of calibrated model-parameter values varied within the geometric structure of the behavioral parameter space.

[4] Data depth is a concept in computational geometry, a discipline engendered by *Shamos* [1978], with links to statistical analysis since its inception. If parameter vectors in the behavioral space are considered samples from a multivariate distribution, then the deepest point (or region of the space) is an estimator of the distribution's median [*Aloupis*, 2006].

[5] The first application of data depth to hydrology is found in the work of *Chebana and Ouarda* [2008], who use the *Mahalanobis* [1936] depth function to find the weights in a weighted linear least-squares regression for regional flood estimation in Canada. Shortly after came BS08, who build on the results of *Bárdossy* [2007], and find that parameter vectors deep within the well-performing parameter space are robust in terms of sensitivity to small changes in parameter values and transferability to other time periods. BS08 also propose a calibration method (robust parameter estimation: ROPE) that iteratively refines the convex region encompassing the well-performing parameter values. BS08 use the half-space depth [*Tukey*, 1975], which is a convex depth measure and one of many depth functions in computational geometry [*Liu et al*., 1999; *Zuo and Serfling*, 2000; *Aloupis*, 2006; *Lee*, 2006].

[6] *Thapa* [2010], *Kiss et al*. [2011], and *Singh* [2010] are examples of ROPE implementations. *Cullmann et al*. [2011] compare ROPE to another calibration method, PEST [*Skahill and Doherty*, 2006], and find that ROPE works better in evaluation of small to medium-sized discharge events for a model applied to a Swiss basin. *Bárdossy and Singh* [2011] explore the application of ROPE in 28 British catchments for regionalizing model parameters and identifying catchment characteristics that can be used for regionalization, and find that deep parameters values perform better than shallow ones. Half-space data depth is also used to identify unusual events in precipitation and discharge time series [*Singh and Bárdossy*, 2012] and to compare weather states [*Bliefernicht*, 2010].

[7] *Chebana and Ouarda* [2011a] use the Mahalanobis depth to identify extremes in a multivariate extreme-value distribution and apply their method to flood-frequency analysis with better results than when using canonical-correlation analysis [*Chebana and Ouarda*, 2011b]. They also suggest the use of depth functions to calculate multivariate quantiles [*Chebana and Ouarda*, 2011b], and to describe and visualize multivariate data [*Chebana and Ouarda*, 2011c]. In the latter example they test multiple depth functions to identify outliers and to describe bivariate data that combine flood peaks, durations and volumes for a Canadian basin.

[8] The half-space and the Mahalanobis depths are convex depth measures. Considering that a convex boundary might be ill-suited to describe the shape of a point cloud (Figure 1), we decided to test a nonconvex depth-measure, based on α shapes [*Edelsbrunner*, 1992]. The two main advantages of such a depth measure are that it does not require the assumption, implicit in most depth functions, that the underlying multivariate distribution is unimodal [*Liu et al*., 1999; *Krasnoshchekov and Polishchuk*, 2013], and that it might give a tighter delimitation of the behavioral parameter space, i.e., less space to be explored. An added advantage is that a depth function based on alpha shapes has potential to detect different modes in a multivariate distribution [*Krasnoshchekov and Polishchuk*, 2013].

[9] In this paper we used α shapes to investigate how the hydrological robustness of calibrated parameter values varied within the geometric structure of the behavioral parameter space. We used an in-depth posterior performance analysis that accounted for observational uncertainties in the data to analyze how the hydrological robustness varied with depth for model simulations for six Honduran and one UK catchment with different data quality and hydrological characteristics. We also wanted to find out if the α shape depth was a useful estimator of parameter depth and under which conditions deep parameters would be hydrologically robust. Our concrete objectives were to:

[10] 1. Investigate how the hydrological robustness of the behavioral parameter vectors changed within the behavioral parameter space.

[11] 2. Use α shapes to detect multimodality in the behavioral parameter space.

[12] 3. Analyze the potential to reduce the computational cost of ROPE.