A Signature‐Based Hydrologic Efficiency Metric for Model Calibration and Evaluation in Gauged and Ungauged Catchments

Rainfall‐runoff models are commonly evaluated against statistical evaluation metrics. However, these metrics do not provide much insight into what is hydrologically wrong if a model fails to simulate observed streamflow well and they are also not applicable for ungauged catchments. Here, we propose a signature‐based hydrologic efficiency (SHE) metric by replacing the statistical components of current efficiency metrics with hydrologic signatures that can be regionalized for model evaluation in ungauged catchments. We test our new efficiency metric across 633 catchments from Great Britain. Strong correlations with Spearman rank and Pearson correlation values around 0.8 are found between our proposed metric and commonly used statistical evaluation metrics (NSE, KGE, and NP) demonstrating that the proposed SHE metric is related to existing metrics as much as these metrics are related to each other. For ungauged catchments, we regionalize the three signatures included in SHE and find that 78% of catchments have an absolute difference of SHE values between gauged and ungauged cases of less than 0.2. This difference varies depending on the quality of the regionalized bias and variance signature values. It means that the SHE metric is applicable for model evaluation in ungauged catchments if its signatures can be regionalized well. When applying the SHE metric in other domains with different hydrologic properties, modellers should carefully consider the components of the proposed metric with signatures best suited to their research domains and their regionalization potential.

Another key issue is that metrics like NSE and KGE are only applicable to gauged catchments because they require historical time series of observed streamflow to estimate residuals.However, previous studies have regionalized hydrologic signatures (e.g., Guo et al., 2021;Hrachowitz et al., 2014;Pool et al., 2021;Yadav et al., 2007), and the statistical hydrology literature is rich with examples where streamflow statistics have been regionalized (e.g., Archfield & Vogel, 2010;Stedinger & Tasker, 1985;Vogel et al., 1999).Therefore, at least some of the components that make up efficiency metrics, that is, bias and variance, have already been estimated in ungauged basins.Indeed, there have been quite a few studies that have used (uncertain) regionalized hydrologic signatures as constraints for rainfall-runoff model ensembles (e.g., Bulygina et al., 2009;Westerberg et al., 2011;Zhang et al., 2008).However, there has been no attempt so far to build an efficiency metric for ungauged basins from these components.
In this paper, we propose a signature-based hydrologic efficiency (SHE) metric that builds upon the work that has been done previously with signatures in both gauged and ungauged catchments.The proposed metric consists of three equally-weighted components representing bias, variance and correlation, with hydrologic signatures used to represent the bias and variance terms.Integration of hydrologic signatures in an evaluation metric will provide opportunity for hydrologic interpretation of model performance and being able to regionalize these signatures will provide hydrologic efficiency evaluation of models for ungauged catchments.We test our ideas across 633 catchments in Great Britain (GB) using model simulations in a Monte Carlo framework for a 10-year time period of daily data.

Data
In this paper, we analyze 633 catchments spread across Great Britain.Great Britain-consisting of England, Wales, and Scotland-is characterized by a temperate climate, moderate topographic variability, and significant geological heterogeneity.Precipitation decreases from north-west to south-east with mean annual values ranging from 550 to 3,500 mm/year (Coxon et al., 2020a(Coxon et al., , 2020b)).Conversely, potential evapotranspiration (PET) increases from north-west (minimum of about 350 mm/year) to south-east (maximum of about 550 mm/day).Most of England is dominated by lowland terrain, whereas Wales and Scotland are dominated by more mountainous regions.Great Britain has a diverse geology including aquifers consisting of permeable Chalk, Magnesian, Jurassic, Devonian/Carbonifero limestone and Permo-Triassic sandstone.
We use daily rainfall, streamflow, potential evapotranspiration time series for ten years (1 October 1999-30 September 2009) and catchment attributes from the CAMELS-GB data set to develop and demonstrate the new metric.CAMELS-GB is a large sample, open-source, hydro-meteorological data set for Great Britain (Coxon et al., 2020a(Coxon et al., , 2020b)).It includes hydro-meteorological time series (consisting of rainfall, streamflow, potential evapotranspiration, temperature, radiation and humidity for the years 1970-2015), catchment attributes (including topography, climate, hydrology, land cover, soils, hydrogeology and human influences) (see Table S1) and catchment boundaries for 671 catchments across Great Britain.Our study additionally uses BFI-HOST as a measure of catchment responsiveness, obtained from the National River Flow Archive website (https://nrfa.ceh.ac.uk/, see Table S1 for a more detailed description and information regarding its derivation).
Considering hydro-climatic variability (i.e., wet and dry periods), ten-years of data is assumed to be sufficient to capture long-term climatic and hydrologic characteristics of our catchments for the purpose of this study.About 96% of the 671 catchments have >90% complete streamflow data in this 10-year period (i.e., 1999-2009).From the 671 CAMELS-GB catchments, we exclude 12 catchments from the analysis where (a) the runoff ratio or variance ratio value (defined below) is higher than 1-suggesting significant and unexplained water balance issues, (b) there is no available BFI-HOST data, or (c) there is insufficient streamflow data for the specified study years.In addition, we also exclude 26 catchments where water balance analysis (see Text S2 and Figures S1-S6 in Supporting Information S1) shows that they are significantly losing water most likely through subsurface processes which is not captured by the hydrologic model used in this study.Hence, 633 GB catchments are used in our analysis.

A Signature-Based Hydrologic Efficiency (SHE) Metric
Commonly used objective functions, such as NSE and KGE, comprise three components: bias, variance, and correlation.The bias and variance terms assess the model's capability to replicate the first and second moments of the observed data distributions, respectively, whereas the correlation term assesses the model's capacity to reproduce the timing and shape of the observed data (Gupta et al., 2009).We build on this previous work (e.g., Gupta et al., 2009;Pool et al., 2018) but focus on using signatures to represent the different efficiency metric components (see Table 1).In this context, we select runoff ratio and variance ratio as the signatures used to reflect the bias and variance terms of the SHE metric, respectively, while defining a different strategy to estimate correlation in ungauged basins.Details regarding the selected signatures and all three terms are given in the following subsections.

Bias Term: Runoff Ratio
The bias term of the SHE metric is represented by the ratio between simulated runoff ratio and observed runoff ratio.Runoff ratio (RR) is defined as the ratio of long-term average streamflow to long-term average precipitation.It represents the long-term water balance separation between water being released from the catchment as streamflow and as evapotranspiration (Milly, 1994;Olden & Poff, 2003;Sankarasubramanian et al., 2001).Higher runoff ratios represent catchments where a larger amount of water leave the catchment as streamflow with respect to precipitation and vice versa.Furthermore, it can also be interpreted as an indirect measure of long-term evaporation (i.e., Q/P = 1 − Ea/P) assuming a negligible change in storage over time (i.e., dS/dt ∼ 0) (Istanbulluoglu et al., 2012).Various studies have effectively utilized runoff ratio as a metric to constrain or calibrate the water balance related parameters of different hydrologic models (e.g., Bock et al., 2016;Greve et al., 2020;Hulsman et al., 2021;Muñoz-Castro et al., 2023;Yadav et al., 2007).

Variance (i.e., Amplitude) Term: Variance Ratio
The variance term of the SHE metric is represented by the ratio between the simulated variance ratio and the observed variance ratio.We define variance ratio as the ratio of standard deviation of streamflow to the standard deviation of precipitation.The signature shows how variable (i.e., flashy) streamflow is with respect to its precipitation driver and is as such an indicator of the damping of precipitation variability through the catchment (a lower value indicating more damping).The same formulation of this metric has previously been used to study the damping of isotopes in a tracer study (Hrachowitz et al., 2021).

Correlation Term
The correlation component is linked to the models' capacity to accurately replicate the timing and shape of the hydrograph (Gupta et al., 2009;Yilmaz et al., 2008).However, redefining it as a distinct signature and regionalizing it is more challenging compared to the bias and variance components.The closest concept that we considered is that of time of concentration (Beven, 2020).One could define the correlation component as the ratio of the time between peak rainfall and subsequent peak discharge to the time between peak rainfall and baseflow.But this requires that the data is of sufficiently high resolution to capture these dynamics, which was not the case in our case study.Most GB catchments are rather small (i.e., 35% and 70% of them having catchment areas <100 km 2 and <300 km 2 , respectively) and fast-responding (Giani et al., 2021), hence the time difference between peak rainfall and subsequent peak discharge  (Nash & Sutcliffe, 1970;Gupta et al., 2009) KGE (Gupta et al., 2009) NP (Pool et al., 2018)  often falls within the time step of our daily data and thus does not distinguish between catchments.Hourly data would therefore be needed for our study domain (Giani et al., 2021).Further exploration of this signature with different data is beyond this technical note, but is one interesting area for future research.
For now, we decided to use Spearman rank correlation between observed and simulated streamflow values as the correlation term of SHE as used in the non-parametric form of KGE developed by Pool et al. (2018).
Similar to the KGE and NSE metrics, a SHE value of 1 indicates that the simulations are in perfect agreement with the observations.Similar to the Knoben et al. (2019) estimate for KGE, a SHE value of approximately −0.41 would be obtained when the simulations exhibit the same predictive skill as using the mean of the observations as prediction (see Text S3 in Supporting Information S1).Finally, both runoff and variance ratios could be simplified in the calculations by canceling out precipitation if all terms were estimated for the same time period and with the same precipitation data.However, the interpretation of runoff and variance ratios remain, and canceling out precipitation might not be possible (sensible) if previously estimated or regionalized ratio values are used.

Application of SHE Metric in Ungauged Catchments
Applying the SHE metric in ungauged situations requires estimates of all three metric components for ungauged basins.Using signatures, we benefit from the vast number of studies where signature regionalization has already been performed across many regions of the world (e.g., Yadav et al., 2007  We perform this regionalization step in two different ways.The bias and variance components are regionalized via stepwise regression while the correlation term is regionalized via a geostatistical strategy.We use the simplest and widely used strategy, stepwise linear regression, to establish relationships of climate/catchment attributes with the runoff ratio and variance ratio signatures (e.g., Almeida et al., 2016).We regionalize bias and variance signatures for 633 GB catchments testing 48 catchment attributes from CAMELS-GB representing topography, climate, hydrology, land cover, soils, hydrogeology and human influences (see Table S1).Regionalized signatures are estimated using the following procedure (see Text S5 in Supporting Information S1 and Table S3): (a) Stepwise regression is implemented for each signature independently.Predictors are selected according to their p-values and the R 2 value of the resulting regression model.While only aridity index is selected as the predictor of runoff ratio; aridity index, BFI-HOST and inwater_perc are selected as the predictors of variance ratio.(b) 633 catchments are randomly divided into five groups.One group is left out each time and the remaining ones are used to fit the regression models for each signature (5-fold cross-validation).(c) After obtaining regression models in step 2 (see Tables S4 and S5 in Supporting Information S1), the signature values are estimated for the catchments in the group omitted during model development.
The correlation term is treated differently, given that we have no simple approach to regionalize it in the same way we regionalize the other two signatures.However, Archfield and Vogel (2010) have demonstrated that it might be feasible to estimate expected correlation for ungauged locations using a geostatistical strategy.They introduced their map correlation method which selects the strongest correlated gauge as the reference gauge for an ungauged catchment, given that the nearest gauge was not always the most correlated one in their study of US catchments.The approach by Archfield and Vogel (2010) follows the basic idea of directly transferring streamflow from gauged to ungauged locations (see wider review of such approaches by He et al., 2011).Drogue and Plasse (2014) tested four different distance-based regionalization methods including the strategy by Archfield and Vogel (2010) for European catchments.They found that using multiple reference catchments rather than one is preferable for assessing daily streamflow hydrographs in a densely gauged study domain.A similar but simpler strategy to directly transfer streamflow is the one by Patil and Stieglitz (2012), who used inverse distance weighted (IDW) interpolation to transfer daily streamflow from multiple neighboring gauged catchments to ungauged catchments in the US.Their approach is formulated as follows: wk(x) * q(xk) and wk(x) = 1 d(x, xk) p where q(x) is daily streamflow (mm/day) at the ungauged catchment that is located at point x in the region, q(x k ) is the daily streamflow of neighboring reference catchment k located at point x k in the region and N is the total number of neighboring reference catchments for interpolation.Variable d is the distance between gauges and w is the interpolation weight of the reference catchments.The exponent p is a positive real number, called a power parameter, and it is set as 2 in our study, the same as in Patil and Stieglitz (2012).
We adopt this simple approach for estimating streamflow to ungauged locations within our GB data set because it works surprisingly well-though a more elaborate methodology might work more generically across domains with different gauge densities.To identify a suitable number of reference catchments, we assume each catchment in turn to be ungauged, estimate the streamflow time series using IDW interpolation with different numbers of reference catchments (1-5 reference catchments), and calculate the Spearman Rank Correlation (SRC) between transferred and observed streamflow time series.We find that using three reference catchments provides optimum SRC estimate for the ungauged catchments in our sample (Figure S7 in Supporting Information S1).We could actually use a similar streamflow transfer strategy to estimate the bias and variance terms but found this strategy to perform less well than the direct regionalization of those signatures (see Figure S8 in Supporting Information S1).

Rainfall-Runoff Model Implementation
We use a typical lumped parsimonious model structure widely used in Great Britain.The model structure, implemented in the Rainfall-Runoff Modeling Toolbox (RRMT; Wagener et al., 2001) combines a probability-distributed soil moisture accounting component (i.e., PDM), which represents the variability in soil moisture storage across a humid catchment using a distribution of storage depths (Moore, 2007), and a combination of two linear reservoirs in parallel for routing, one representing fast flow and the other representing slow flow (i.e., 2PAR), with a fixed split between them (see Figure S1 in Supporting Information S1).Effective rainfall is produced as overflow from the PDM stores which are described as Pareto distribution based on two parameters, the maximum storage capacity, C max , and parameter, b, describing the shape of the distribution.The effective rainfall (ER) is split with respect to parameter a describing the fraction of flow through the fast reservoir, while both reservoirs are defined by a single time constant.The reason of choosing PDM is that it represents a flexibility in soil moisture accounting through its distribution function to influence the runoff response and combining it with 2PAR flow routing module provides different flow pathways for catchments across GB with different levels of baseflow contribution (Kiraz et al., 2023;Wagener et al., 2001).
The model is calibrated for a ten-year study period.To calibrate the model, 10,000 parameter sets are independently sampled from uniform random distributions.A value of 10,000 samples has been shown to be sufficient for models with similar complexity (i.e., number of parameters) that we used here in the application to GB catchments (e.g., Lane et al., 2019;Wagener et al., 2001).The first 5% of the ten-year study period is used as a warm-up period.The parameter sets producing the best performances according to SHE metric are used to simulate streamflow.

Results
First, we compare the values estimated for our SHE metric in gauged situations with other efficiency metric implementations, that is, KGE (Kling et al., 2012), NSE (Gupta et al., 2009;Nash & Sutcliffe, 1970) and NP (Pool et al., 2018).Figure 1 shows scatter plots of SHE values with KGE, NSE, and NP values, including with Pearson Correlation (PC) and Spearman Rank Correlation (SRC) values.Correlations (PC/SRC) are highest for SHE-NP (0.86/0.81), then SHE-KGE (0.79/0.76) and then SHE-NSE (0.6/0.67).Our formulation is most closely correlated to that of Pool et al. (2018) and Kling et al. (2012) due to the equal weighting of the terms within the efficiency metric (compared to NSE) and the terms used.Component-based comparisons between SHE and previous efficiency metrics (see Text S7 in Supporting Information S1) show that the poor correlation in the SHE-KGE and SHE-NP is mainly due to differences between their variance terms (Figures S9g and S9h in Supporting Information S1).For KGE-SHE, the correlation terms also play a role.Since bias and correlation terms of SHE and NP are identical by definition (Figures S9b and S9n in Supporting Information S1), SHE-NP has the highest correlation.
Second, we estimate the components of our metric for ungauged locations.The scatter plots in Figures 2a and 2b show that the predicted RR and VR using stepwise linear regression correlate well with observed RR and VR values.We obtain PC and SRC correlation values above 0.9.The maps indicate that predicted RR and VR values have similar patterns showing decreasing values from the north-west to south-east of GB. Figure 2c shows correlations for ungauged locations.SRC values between observed and transferred streamflows are above 0.8 for 94% of all catchments (77% above 0.9), even when using the simple inverse distance method with the three closest catchments.All components of our SHE metric can therefore be estimated individually in ungauged catchments within our study domain.
And third, we calculate the differences between SHE values for gauged and ungauged cases to evaluate how well we can estimate the performance of a model in ungauged catchments.Figures 3a and 3c shows histograms of the differences between SHE values for gauged and ungauged cases (i.e., SHE g -SHE u ).Cumulative distribution   3).CDF plots also show that catchments with high positive differences (i.e., >0.3) have the highest positive and the lowest negative values of bias and variance component differences, respectively, suggesting that poor regionalization is a problem there.This could be due to uncertainties in these regionalization estimates which is discussed in Section 5. Figure 3c shows that correlation component differences are overall small across catchments except for very few catchments with high positive differences.In summary, the results suggest that when the regionalization of the bias and variance signatures works, we can obtain similar SHE values for both gauged and ungauged cases.

Discussion and Conclusions
In summary, we introduced and tested a new signature-based hydrologic efficiency (SHE) metric based on the idea that a model's fit to signatures will be easier to interpret hydrologically, and more importantly, that we can estimate it in ungauged basins.The SHE metric is correlated to different degrees with existing metrics, and we show how its components, and hence the metric itself, can be estimated in ungauged catchments or could be used to represent the naturalized state of a catchment, that is, unimpacted by human activities (Terrier et al., 2021).For the latter case, the regionalized signatures of the catchment represent an estimate of its natural response given that human activities are not included in the regionalization.
A flexible efficiency metric based on signatures provides significant opportunity for hydrologically relevant diagnostic model calibration and evaluation (Shafii & Tolson, 2015;Yadav et al., 2007;Yilmaz et al., 2008).Here, we replace the statistical components of the KGE (Gupta et al., 2009) with signatures suitable for our study domain, Great Britain.We chose to use runoff ratio and variance ratio as our signatures to represent bias and variance aspects of the hydrograph.However, other signatures could and should be considered for different study domains depending on the hydrologic characteristics of the catchments involved.Hydrologists have investigated many signatures and found different ones valuable to characterize major hydrologic catchment functions or hydrograph aspects depending on the study domain (McMillan, 2020).Different aspects of the flow duration curve have for example, been used to characterize the variability of flow (e.g., McMillan, 2021;Pool et al., 2018;Sawicz et al., 2011;Westerberg et al., 2011;Yilmaz et al., 2008).If the study domain contains catchments with significant snowfall or if they are located in arid regions, it may be useful to use different signatures, such as the relationship between streamflow and air temperature (Horner et al., 2020) or a zero flow ratio (Zhang et al., 2014).
We therefore do not believe that SHE would be universally applicable everywhere in the world in the formulation used here.Actually, we believe that its components should potentially be replaced by regionally-appropriate signatures of a catchment's, water balance, its damping, and the way it translates precipitation variability into streamflow variability and timing.Different signatures might be best suited to represent these components depending on whether the study domain is for example, located in a temperate, tropical, dry or cold part of the world.Equally, existing regionalized streamflow indices correlated with these components might provide a baseline from which such a metric can be estimated in both gauged and ungauged catchments.The issue of signature choice is thus also linked to the ability of regionalizing signatures.Many regionalization studies exist (see discussions in He et al., 2011;Wagener & Montanari, 2011;Hrachowitz et al., 2013; see examples in Table S2 in Support ing Information S1), though in how far these studies provide a regional basis to calculate efficiency metrics in ungauged locations has so far been unexplored.Runoff ratio has been regionalized widely in different parts of the world, including Australia (Zhang et al., 2014), Thailand (Visessri & McIntyre, 2016), and the US (Addor et al., 2018).Therefore, it is a bias signature that can be used in many places around the world while considering different influences of snow, vegetation or other catchment properties (Berghuijs et al., 2020).Of course, opting for Budyko-style equations may be more favorable than employing a linear equation when regionalizing runoff ratio in other regions than Great Britain (Budyko, 1974).Our study showed that assuming a linear correlation between aridity index and runoff ratio was effective for Great Britain, owing to its limited climatic variability.However, in water-limited regions, a curvilinear relationship between aridity index and runoff ratio would be more suitable (Budyko, 1974;Lebecherel et al., 2013).Also, different regions may employ various aspects of the flow duration curve as a variance signature such as Canada (Shu & Ouarda, 2012), Italy (Boscarello et al., 2016), or as alternative in the UK (Yadav et al., 2007), where flow duration curve characteristics have been regionalized.Lastly, correlations between streamflow time series at different gauges have been found to vary between regions (Kiang et al., 2013).Kiang et al. (2013) studied streamflow correlations across the US and found that it was difficult to find highly correlated gauges in the flatter and drier parts of the US (independent of gauge density), while it was possible in areas with more topographic variability.They also found that distance was not always the best predictor for high correlation, thus suggesting that there is more to explore regarding what processes control high correlations.An advantage of the opportunity and need for tailoring is that making these choices puts the discussion about suitable objective functions into the realm of hydrology, rather than just statistics.
One issue we did not tackle here in this context is that of uncertainty in signature regionalization (e.g., Kapangaziwiri et al., 2012;Westerberg et al., 2014;Zhang et al., 2008).Uncertainties originate from the underlying measurements of physical catchment properties and of hydro-meteorological variables (e.g., precipitation, streamflow), from processing of the original observations, and from choices made regarding space-time averaging etc. (McMillan et al., 2022;Westerberg et al., 2016).There is opportunity for integrating uncertainty in a coherent statistical framework covering both gauged and ungauged situations, which should further increase the value of available regionalized information in the context of model calibration and evaluation.
, and P are simulated streamflow, and observed streamflow and precipitation, respectively.μ is the mean and σ is the standard deviation of streamflow.x S, I(i) is the simulated streamflow value where I(i) is the time step when the ith largest flow occurs within simulated time series and x O, J(i) is the observed streamflow value of target catchment where J(i) is the time step when the ith largest flow occurs within observed time series.VR Pred and RR Pred are regionalized variance ratio and runoff ratio for the target catchment derived using stepwise linear regression.Predictors of VR Pred are aridity index, BFI-HOST and inland water percentage.Predictor of RR Pred is only aridity index.Variance ratio is the ratio of standard deviation of streamflow to standard deviation of precipitation.Runoff ratio is the ratio of long-term mean of streamflow to long-term mean of precipitation.r Pearson = Pearson correlation between simulated and the observed streamflow in the target catchment.r Spearman = Spearman rank correlation between simulated and the observed streamflow in the target catchment. r * spearman = Spearman rank correlation between simulated streamflow of a catchment which is assumed to be ungauged and the streamflow values obtained by inverse distance weighting interpolation of this catchment's three closest catchments' observed streamflow.

Figure 2 .
Figure 2. (a) Predicted RR map and scatter plot for predicted versus observed RR, (b) predicted VR map and scatter plot for predicted versus observed VR and (c) map illustrating SRC values between observed streamflow of catchments and the streamflow values calculated by taking inverse distance interpolation of their closest three catchments' observed streamflows and its histogram plot.Predictor of RR is aridity index and predictors of VR are aridity index, BFI-HOST and inland water percentage.

Figure 1 .
Figure 1.Scatter plots for (a) KGE versus SHE, (b) NP versus SHE, and (c) NSE versus SHE.x and y axes are limited to [0 1].KGE, NP, and NSE values are calculated using the best simulation values based on SHE metric values.

Figure 3 .
Figure 3. Cumulative distribution function (i.e., cdf) plot and histogram plot of difference between SHE for gauged and ungauged cases (i.e., SHE g -SHE u ).Cdf plot is color-coded by (a) bias component difference (Δβ), (b) variance component difference (Δα) and (c) correlation component difference (Δr) between SHE formulations for gauged and ungauged cases summarized in Table 1.The histograms given in (a), (b), and (c) are identical.

Table 1
Bias, Variance and Correlation Components and Formulations of Evaluation Metrics for GB; Shu & Ouarda, 2012 for Canada; Zhang et al., 2014 for Australia; Boscarello et al., 2016 for Italy; Visessri & McIntyre, 2016 for Thailand and Addor et al., 2018 for the US) (see Table S2 in Supporting Information S1 for more details).